**Example of Extracting Features from dataframes with Datetime indices:**
- Assuming that time-varying measurements are taken at regular intervals can be sufficient for many situations. However, for a large number of takss it is important to take into account when a measurement is made. An example can be healthcare, where the interval between ,easurement of vital signs contains crucial info

- Tsfresh now supports calculator functions that use the index of the timeseries container to calculate the features. The only requirement for these functions is that the index of the input dataframe is of type pd.DatetumeIndex. These functions are contained in the new class TimeBasedFCParameters

In [1]:
import pandas as pd
from tsfresh.feature_extraction import extract_features
# Time Based FC Parameters - contains all functions that use the datetime index of the timeseries container.

from tsfresh.feature_extraction.settings import TimeBasedFCParameters

Build a time series container with datetime indices:
- Build a dataframe with a datetime index. the format must be with a value and a kind column, since each measurement has its own timestamp - i.e. measurements are not assumed to be simultaneous.

In [2]:
df = pd.DataFrame({"id": ["a", "a", "a", "a", "b", "b", "b", "b"], 
                   "value": [1, 2, 3, 1, 3, 1, 0, 8],
                   "kind": ["temperature", "temperature", "pressure", "pressure",
                            "temperature", "temperature", "pressure", "pressure"]},
                   index=pd.DatetimeIndex(
                       ['2019-03-01 10:04:00', '2019-03-01 10:50:00', '2019-03-02 00:00:00', '2019-03-02 09:04:59',
                        '2019-03-02 23:54:12', '2019-03-03 08:13:04', '2019-03-04 08:00:00', '2019-03-04 08:01:00']
                   ))
df = df.sort_index()
df

Unnamed: 0,id,value,kind
2019-03-01 10:04:00,a,1,temperature
2019-03-01 10:50:00,a,2,temperature
2019-03-02 00:00:00,a,3,pressure
2019-03-02 09:04:59,a,1,pressure
2019-03-02 23:54:12,b,3,temperature
2019-03-03 08:13:04,b,1,temperature
2019-03-04 08:00:00,b,0,pressure
2019-03-04 08:01:00,b,8,pressure


Right now - TimeBasedFCParameters only contains linear_trend_timewise - which performs a calculation of a linear trend, but using the time difference in hours between measurements in order to perform regression

In [3]:
settings_time = TimeBasedFCParameters()
settings_time

{'linear_trend_timewise': [{'attr': 'pvalue'}, {'attr': 'rvalue'}, {'attr': 'intercept'}, {'attr': 'slope'}, {'attr': 'stderr'}]}

In [4]:
from tsfresh.feature_extraction.settings import ComprehensiveFCParameters

settings_compreh = ComprehensiveFCParameters()

In [5]:
'linear_trend_timewise' in settings_compreh.keys()

True

In [6]:
settings_compreh['linear_trend_timewise']

[{'attr': 'pvalue'},
 {'attr': 'rvalue'},
 {'attr': 'intercept'},
 {'attr': 'slope'},
 {'attr': 'stderr'}]

In [7]:
X_tsfresh = extract_features(df, column_id = "id", column_value = "value", column_kind = "kind", default_fc_parameters= settings_time)
X_tsfresh.head()

Feature Extraction: 100%|██████████| 4/4 [00:02<00:00,  1.85it/s]


Unnamed: 0,"pressure__linear_trend_timewise__attr_""pvalue""","pressure__linear_trend_timewise__attr_""rvalue""","pressure__linear_trend_timewise__attr_""intercept""","pressure__linear_trend_timewise__attr_""slope""","pressure__linear_trend_timewise__attr_""stderr""","temperature__linear_trend_timewise__attr_""pvalue""","temperature__linear_trend_timewise__attr_""rvalue""","temperature__linear_trend_timewise__attr_""intercept""","temperature__linear_trend_timewise__attr_""slope""","temperature__linear_trend_timewise__attr_""stderr"""
a,0.0,-1.0,3.0,-0.22019,0.0,0.0,1.0,1.0,1.304348,0.0
b,0.0,1.0,0.0,480.0,0.0,0.0,-1.0,3.0,-0.240545,0.0


The output looks exactly, like usual - if we compare it with the 'regular' linear_trend feature calc we can see that the intercept, p and R values are the same, as we'd expect. 