# Feature Calculator Settings

By default, all feature calculators are used when you call `extract_features`.
There could be multiple reasons why you do not want that:
* you are only interested on a certain feature (or features)
* you want to save time during extraction
* you have ran the feature selection before and already know, which features are relevant

For more information on these settings, please have a look into [the documentation](http://tsfresh.readthedocs.io/en/latest/text/feature_extraction_settings.html).

In [1]:
from tsfresh.feature_extraction import extract_features
from tsfresh.feature_extraction import settings

import numpy as np
import pandas as pd

## Construct a time series container

For testing, we construct the time series container that includes two sensor time series, "temperature" and "pressure", for two devices "a" and "b".

In [2]:
df = pd.DataFrame({"id": ["a", "a", "b", "b"], "temperature": [1,2,3,1], "pressure": [-1, 2, -1, 7]})
df

Unnamed: 0,id,temperature,pressure
0,a,1,-1
1,a,2,2
2,b,3,-1
3,b,1,7


## The `default_fc_parameters`

Which features are calculated by `tsfresh` is controlled by a dictionary that contains a mapping from feature calculator names to their parameters. 
This dictionary is called `fc_parameters`. 
It maps feature calculator names (= keys) to parameters (= values). 
Every key in the dictionary will be looked up as a function in `tsfresh.feature_extraction.feature_calculators` and be used to extract features.

`tsfresh` comes with some predefined sets of `fc_parameters` dictionaries:

In [3]:
settings.ComprehensiveFCParameters, settings.EfficientFCParameters, settings.MinimalFCParameters

(tsfresh.feature_extraction.settings.ComprehensiveFCParameters,
 tsfresh.feature_extraction.settings.EfficientFCParameters,
 tsfresh.feature_extraction.settings.MinimalFCParameters)

For example, to only calculate a very minimal set of features:

In [4]:
settings_minimal = settings.MinimalFCParameters() 
settings_minimal

{'sum_values': None, 'median': None, 'mean': None, 'length': None, 'standard_deviation': None, 'variance': None, 'root_mean_square': None, 'maximum': None, 'absolute_maximum': None, 'minimum': None}

Each key stands for one of the feature calculators. 
The value are the parameters. If a feature calculator has no parameters, `None` is used as a value (and as these feature calculators are very simple, they all have no parameters).

This dictionary can passed to the extract method, resulting in a few basic time series beeing calculated:

In [5]:
X_tsfresh = extract_features(df, column_id="id", default_fc_parameters=settings_minimal)
X_tsfresh.head()

Feature Extraction: 100%|██████████| 4/4 [00:02<00:00,  1.93it/s]


Unnamed: 0,temperature__sum_values,temperature__median,temperature__mean,temperature__length,temperature__standard_deviation,temperature__variance,temperature__root_mean_square,temperature__maximum,temperature__absolute_maximum,temperature__minimum,pressure__sum_values,pressure__median,pressure__mean,pressure__length,pressure__standard_deviation,pressure__variance,pressure__root_mean_square,pressure__maximum,pressure__absolute_maximum,pressure__minimum
a,3.0,1.5,1.5,2.0,0.5,0.25,1.581139,2.0,2.0,1.0,1.0,0.5,0.5,2.0,1.5,2.25,1.581139,2.0,2.0,-1.0
b,4.0,2.0,2.0,2.0,1.0,1.0,2.236068,3.0,3.0,1.0,6.0,3.0,3.0,2.0,4.0,16.0,5.0,7.0,7.0,-1.0


By using the settings_minimal as value of the default_fc_parameters parameter, those settings are used for all type of time series. 
In this case, the `settings_minimal` dictionary is used for both "temperature" and "pressure" time series.

Please note how the columns in the resulting dataframe depend both on the settings as well as the kinds of the data.

Now, lets say we want to remove the length feature and prevent it from beeing calculated. We just delete it from the dictionary.

In [6]:
del settings_minimal["length"]
settings_minimal

{'sum_values': None, 'median': None, 'mean': None, 'standard_deviation': None, 'variance': None, 'root_mean_square': None, 'maximum': None, 'absolute_maximum': None, 'minimum': None}

Now, if we extract features for this reduced dictionary, the length feature will not be calculated

In [7]:
X_tsfresh = extract_features(df, column_id="id", default_fc_parameters=settings_minimal)
X_tsfresh.head()

Feature Extraction: 100%|██████████| 4/4 [00:02<00:00,  1.95it/s]


Unnamed: 0,temperature__sum_values,temperature__median,temperature__mean,temperature__standard_deviation,temperature__variance,temperature__root_mean_square,temperature__maximum,temperature__absolute_maximum,temperature__minimum,pressure__sum_values,pressure__median,pressure__mean,pressure__standard_deviation,pressure__variance,pressure__root_mean_square,pressure__maximum,pressure__absolute_maximum,pressure__minimum
a,3.0,1.5,1.5,0.5,0.25,1.581139,2.0,2.0,1.0,1.0,0.5,0.5,1.5,2.25,1.581139,2.0,2.0,-1.0
b,4.0,2.0,2.0,1.0,1.0,2.236068,3.0,3.0,1.0,6.0,3.0,3.0,4.0,16.0,5.0,7.0,7.0,-1.0


## The `kind_to_fc_parameters`

Now, lets say we do not want to calculate the same features for both type of time series. Instead there should be different sets of features for each kind.

To do that, we can use the `kind_to_fc_parameters` parameter, which lets us specifiy which `fc_parameters` we want to use for which kind of time series:

In [8]:
fc_parameters_pressure = {"length": None, 
                          "sum_values": None}

fc_parameters_temperature = {"maximum": None, 
                             "minimum": None}

kind_to_fc_parameters = {
    "temperature": fc_parameters_temperature,
    "pressure": fc_parameters_pressure
}

print(kind_to_fc_parameters)

{'temperature': {'maximum': None, 'minimum': None}, 'pressure': {'length': None, 'sum_values': None}}


So, in this case, for sensor "pressure" both "max" and "min" are calculated. 
For the "temperature" signal, the length and sum\_values features are extracted instead.

In [9]:
X_tsfresh = extract_features(df, column_id="id", kind_to_fc_parameters=kind_to_fc_parameters)
X_tsfresh.head()

Feature Extraction: 100%|██████████| 4/4 [00:02<00:00,  1.88it/s]


Unnamed: 0,temperature__maximum,temperature__minimum,pressure__length,pressure__sum_values
a,2.0,1.0,2.0,1.0
b,3.0,1.0,2.0,6.0


### Extracting from data

After applying a feature selection algorithm to drop irrelevant feature columns you know which features are relevant and which are not.
You can also use this information to only extract these relevant features in the first place.

The provided `from_columns` method can be used to infer a settings dictionary from the dataframe containing the features.
This dictionary can then for example be stored and be used in the next feature extraction.

In [10]:
# Assuming `X_tsfresh` contains only our relevant features
relevant_settings = settings.from_columns(X_tsfresh)
relevant_settings

{'temperature': {'maximum': None, 'minimum': None},
 'pressure': {'length': None, 'sum_values': None}}

## More complex dictionaries

We provide `fc_parameters` dictionaries with larger sets of features.

The `EfficientFCParameters` contain features and parameters that should be calculated quite fast:

In [11]:
settings_efficient = settings.EfficientFCParameters()
settings_efficient

{'variance_larger_than_standard_deviation': None, 'has_duplicate_max': None, 'has_duplicate_min': None, 'has_duplicate': None, 'sum_values': None, 'abs_energy': None, 'mean_abs_change': None, 'mean_change': None, 'mean_second_derivative_central': None, 'median': None, 'mean': None, 'length': None, 'standard_deviation': None, 'variation_coefficient': None, 'variance': None, 'skewness': None, 'kurtosis': None, 'root_mean_square': None, 'absolute_sum_of_changes': None, 'longest_strike_below_mean': None, 'longest_strike_above_mean': None, 'count_above_mean': None, 'count_below_mean': None, 'last_location_of_maximum': None, 'first_location_of_maximum': None, 'last_location_of_minimum': None, 'first_location_of_minimum': None, 'percentage_of_reoccurring_values_to_all_values': None, 'percentage_of_reoccurring_datapoints_to_all_datapoints': None, 'sum_of_reoccurring_values': None, 'sum_of_reoccurring_data_points': None, 'ratio_value_number_to_time_series_length': None, 'maximum': None, 'absolu

The `ComprehensiveFCParameters` are the biggest set of features. It will take the longest to calculate

In [12]:
settings_comprehensive = settings.ComprehensiveFCParameters()
settings_comprehensive

{'variance_larger_than_standard_deviation': None, 'has_duplicate_max': None, 'has_duplicate_min': None, 'has_duplicate': None, 'sum_values': None, 'abs_energy': None, 'mean_abs_change': None, 'mean_change': None, 'mean_second_derivative_central': None, 'median': None, 'mean': None, 'length': None, 'standard_deviation': None, 'variation_coefficient': None, 'variance': None, 'skewness': None, 'kurtosis': None, 'root_mean_square': None, 'absolute_sum_of_changes': None, 'longest_strike_below_mean': None, 'longest_strike_above_mean': None, 'count_above_mean': None, 'count_below_mean': None, 'last_location_of_maximum': None, 'first_location_of_maximum': None, 'last_location_of_minimum': None, 'first_location_of_minimum': None, 'percentage_of_reoccurring_values_to_all_values': None, 'percentage_of_reoccurring_datapoints_to_all_datapoints': None, 'sum_of_reoccurring_values': None, 'sum_of_reoccurring_data_points': None, 'ratio_value_number_to_time_series_length': None, 'sample_entropy': None, 

### Feature Calculator Parameters

More complex feature calculators have parameters that you can use to tune the extracted features.
The predefined settings (such as `ComprehensiveFCParameters`) already contain default values of these features.

However for your own projects, you might want/need to tune them.

In detail, the values in a `fc_parameters` dictionary contain a list of parameter dictionaries. 
When calculating the feature, each entry in the list of parameters will be used to calculate one feature.

For example, lets have a look into the feature `large_standard_deviation`, which depends on a single parameter called `r` (it basically defines how large "large" is).
The `ComprehensiveFCParameters` contains several default values for `r`. 
Each of them will be used to calculate a single feature:

In [None]:
settings_comprehensive['large_standard_deviation']

If you use these settings in feature extraction, that would trigger the calculation of 20 different `large_standard_deviation` features, one for `r=0.05` up to `r=0.95`.  

In [None]:
settings_tmp = {'large_standard_deviation': settings_comprehensive['large_standard_deviation']}

X_tsfresh = extract_features(df, column_id="id", default_fc_parameters=settings_tmp)
X_tsfresh.columns

If you now want to change the parameters for a specific feature calculator, all you need to do is to change the dictionary values.