# Feature Engineering

In [1]:
from modules.FeatureBuilder import *

### Why we don't simply fit intertial signals to the model?

Although they showed relatively different trends between walking and stationary activity groups, we observed that most of the signals overlap within the same group activities. Hence, it's not clear whether they would be helpful for modelling activity detecting problem. Nevertheless, it's important to add that total acceleration signals showed considerable distinction even within the activities of same group. But, still, the total acceleration signals alone isn't enough to capture all the information about the event.

### What we can do?

Most of the signal show similar behaviour (periodic patterns) over a period of time. It's important to find when this change happens. Thus, transforming signals between time and frequency domain, we try to decompose the frequency of these periodic components and later find possible significant peaks in frequency spectra. We'll use these concets:
    
   1. **Fast Fourier Transform (FFT)** - Used for finding frequency of periodical components
   2. **Power Spectral Density (PSD)** - Find peaks corrensponds to power distribution at that frequency
   3. **Atocorrelation (aCORR)** - Caculates the serial correlation of a signal with its lagged signal
   
   
   For  implemataion of these mehotds go to *'modules.SignalCalculations'*

### What is next now?

Now we have extra signals to conduct more detailed feature engineering which may convey better information. Our total features set we'll be consist of features calculated on this newly created signal and those calculated on normal inertial signals. 


### Main features

main_features: those are the features that we'll be calculated on original inertial signals.

In [10]:
featreBuilder = FeatureBuilder(n_peaks=2)
featreBuilder.init_features()

In [11]:
featreBuilder.get_main_features(return_values=False)

['mean',
 'std',
 'mad',
 'max',
 'min',
 'iqr',
 'entropy',
 'correlation-1',
 'correlation-2']

#### Descrioption of main features

* 'mean' mean vale
* 'std': standart deviation
* 'mad': median absolute deviation
* 'max': larget value in array
* 'min': smallest value in array
* 'iqr': interquartile range
* 'entropy': entropy of signal
* 'correlation-1': correlation
* 'correlation-2': correlation

We know that we have 3 different signals and each signal is represented on 3-axis. So if the given signal is on the x-axis, then *'correlation-1'* and *'correlation-2'* show correlation between x-y and x-z. It the same way around for the others.


### Domain  features

domain_features: those are the features that we'll be calculated from signal transformations.

In [12]:
featreBuilder.get_domain_features(return_values=False)

['FFT-min',
 'FFT-max',
 'FFT-mean',
 'FFT-peak-values-0',
 'FFT-peak-values-1',
 'FFT-peak-domains-0',
 'FFT-peak-domains-1',
 'PSD-min',
 'PSD-max',
 'PSD-mean',
 'PSD-peak-values-0',
 'PSD-peak-values-1',
 'PSD-peak-domains-0',
 'PSD-peak-domains-1',
 'aCORR-min',
 'aCORR-max',
 'aCORR-mean',
 'aCORR-peak-values-0',
 'aCORR-peak-values-1',
 'aCORR-peak-domains-0',
 'aCORR-peak-domains-1']

#### Description of domain featues

for each of FFT, PSD, and aCORR we have the same structue below:

* 'min':  smallest of selected peaks
* 'max':  largest of selected peaks
* 'mean': mean of the peaks
* 'peak-values':{} : e.g. for the first selected 2 peaks, it will be like {'1':0, '2':0} 
* 'peak-domains': {}: e.g for the first selected 2 peaks, it will be like {'1':0, '2':0}


For each transformation, we'll look at the first n peaks in the signal. We're not only the interested in the amplitude of these peaks happened, but also where/when this peaks happened in the t/f-domains. Because moving where this peak occurs can also be helpful for discrimination period pattern. Thus we'll not only take first n peaks of transformations, but also consider their t/f-domains.

So *'peak-values'* stores the amplitude of the first n peaks, *'peak-domains'* stores the information about at what frequency/time domain this peaks happens. In the above example n_peak=2, therefore, two peak-related features has generated.

## Feature Generation

Here comes the fun part, finally! In this part, will carry out feature engineering on test/train internal signals.