# Feature extraction with tsfresh transformer

In this tutorial, we show how you can use aeon with [tsfresh](https://tsfresh.readthedocs.io) to first extract features from time series, so that we can then use any scikit-learn estimator.

## Preliminaries
You have to install tsfresh if you haven't already. To install it, uncomment the cell below:

In [1]:
# !pip install --upgrade tsfresh

In [2]:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline

from aeon.datasets import load_arrow_head, load_basic_motions
from aeon.transformations.collection.feature_based import TSFreshFeatureExtractor

## Univariate time series classification data

For more details on the data set, see the [univariate time series classification notebook](https://github.com/aeon-toolkit/aeon/blob/main/examples/02_classification_univariate.ipynb).

In [3]:
X, y = load_arrow_head()
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(158, 1) (158,) (53, 1) (53,)


In [4]:
X_train.head()

Unnamed: 0,dim_0
69,0 -1.7998 1 -1.7987 2 -1.7942 3 ...
103,0 -1.8091 1 -1.8067 2 -1.7866 3 ...
34,0 -2.0417 1 -2.0572 2 -2.0522 3 ...
14,0 -2.1888 1 -2.1855 2 -2.1765 3 ...
121,0 -1.9586 1 -1.9371 2 -1.8798 3 ...


In [5]:
#  binary classification task
np.unique(y_train)

array(['0', '1', '2'], dtype=object)

## Using tsfresh to extract features

In [6]:
# tf = TsFreshTransformer()
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:10<00:00,  2.05s/it]


Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_0__fourier_entropy__bins_2,dim_0__fourier_entropy__bins_3,dim_0__fourier_entropy__bins_5,dim_0__fourier_entropy__bins_10,dim_0__fourier_entropy__bins_100,dim_0__permutation_entropy__dimension_3__tau_1,dim_0__permutation_entropy__dimension_4__tau_1,dim_0__permutation_entropy__dimension_5__tau_1,dim_0__permutation_entropy__dimension_6__tau_1,dim_0__permutation_entropy__dimension_7__tau_1
0,0.0,0.0,0.0,1.0,-8e-05,249.998516,0.052357,-1e-06,-5e-06,-0.024066,...,0.046288,0.092513,0.092513,0.092513,0.250609,1.323194,1.819631,2.183824,2.46322,2.707387
1,0.0,0.0,1.0,1.0,-0.000525,250.000756,0.049118,0.0,-6e-06,-0.031622,...,0.046288,0.046288,0.092513,0.092513,0.184769,1.213529,1.668744,2.081159,2.418614,2.707518
2,0.0,0.0,0.0,1.0,-3.4e-05,249.998998,0.069971,8.4e-05,2.5e-05,0.01888,...,0.08151,0.092513,0.092513,0.138673,0.311663,1.116706,1.545256,1.889777,2.155644,2.374722
3,0.0,0.0,0.0,1.0,0.000202,249.999702,0.067601,-2e-06,-1e-05,0.38477,...,0.046288,0.092513,0.092513,0.204643,0.414263,1.323315,1.91533,2.406197,2.794719,3.117007
4,0.0,0.0,0.0,1.0,-0.000146,249.998674,0.050355,-4e-06,-4.6e-05,-0.045353,...,0.046288,0.092513,0.092513,0.092513,0.230801,1.173933,1.628543,2.003443,2.303091,2.559695


## Using tsfresh with aeon

In [7]:
classifier = make_pipeline(
    TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False),
    RandomForestClassifier(),
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)

  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:11<00:00,  2.21s/it]
  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:03<00:00,  1.45it/s]


0.8490566037735849

## Multivariate time series classification data

In [7]:
X, y = load_basic_motions()
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(60, 6, 100) (60,) (20, 6, 100) (20,)


In [9]:
#  multivariate input data
X_train[0]

array([[-3.9606000e-01, -3.9606000e-01, -2.6802200e-01,  3.5369800e-01,
         1.9509000e-02,  4.6675000e-02,  1.4739420e+00,  1.1125200e+00,
         7.8866500e-01,  1.3313740e+00,  2.1749090e+00,  2.1749090e+00,
         5.0339940e+00,  5.0339940e+00,  5.3231220e+00,  4.4124500e+00,
         1.9780440e+00,  2.3526440e+00,  6.2446770e+00,  1.9513924e+01,
         6.7642000e+00, -1.7881320e+00,  1.0757590e+00,  1.2332840e+00,
        -4.5358600e-01, -4.5358600e-01, -3.3842900e-01, -6.4972600e-01,
        -8.9578600e-01, -4.9298800e-01, -3.8119500e-01, -1.0570280e+00,
        -1.5661520e+00, -3.0054600e-01,  1.0001540e+00,  1.0001540e+00,
         2.9969570e+00,  1.9172228e+01,  6.2364740e+00,  6.2364740e+00,
         2.0156710e+00, -1.6590800e+00, -1.6590800e+00, -2.6860770e+00,
        -2.5598480e+00, -2.9207950e+00, -4.9697600e-01,  2.0360800e-01,
         1.8681700e-01,  2.1576900e-01,  3.2038200e-01, -5.3663300e-01,
        -1.0022830e+00, -1.7833800e-01,  4.4786190e+00,  3.40265

In [10]:
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_5__fourier_entropy__bins_5,dim_5__fourier_entropy__bins_10,dim_5__fourier_entropy__bins_100,dim_5__permutation_entropy__dimension_3__tau_1,dim_5__permutation_entropy__dimension_4__tau_1,dim_5__permutation_entropy__dimension_5__tau_1,dim_5__permutation_entropy__dimension_6__tau_1,dim_5__permutation_entropy__dimension_7__tau_1,dim_5__query_similarity_count__query_None__threshold_0.0,dim_5__mean_n_absolute_max__number_of_maxima_7
0,1.0,0.0,0.0,1.0,188.420744,2898.605554,2.760254,0.003944,0.0,0.25631,...,1.021704,1.616277,3.328862,1.491726,2.411503,3.303501,3.900955,4.167081,0.0,10.817097
1,1.0,0.0,0.0,1.0,307.637735,5948.915061,4.407277,0.17775,0.084208,0.722776,...,1.197552,1.78495,3.313297,1.724168,2.931721,3.927239,4.382343,4.499051,0.0,11.957401
2,1.0,1.0,0.0,1.0,335.029358,5296.984407,3.394429,0.001664,0.0,0.858254,...,0.80654,1.329515,2.997721,1.563712,2.55774,3.505548,4.037344,4.334874,0.0,10.854765
3,1.0,0.0,0.0,1.0,354.190525,5467.09735,3.464491,0.016783,0.0,1.083522,...,0.827677,1.441878,3.255335,1.621399,2.69462,3.630462,4.180166,4.384684,0.0,11.958162
4,1.0,0.0,0.0,1.0,97.414175,380.452882,1.180756,0.004439,-0.003649,0.45889,...,0.288342,0.288342,0.733435,1.361921,2.056203,2.707323,3.275544,3.775574,0.0,3.471516
