# Feature extraction with tsfresh transformer

[tsfresh](https://tsfresh.readthedocs.io) is a tool for extacting summary features
from a collection of time series. It is an unsupervised transformation, and as such
can easily be used as a pipeline stage in classification, clustering and regression
in conjunction with a scikit-learn compatible estimator.

## Preliminaries
You have to install tsfresh if you haven't already. To install it, uncomment the cell below:

In [1]:
# !pip install --upgrade tsfresh

In [17]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline

from aeon.datasets import load_arrow_head, load_basic_motions
from aeon.transformations.collection.feature_based import (
    TSFreshFeatureExtractor,
    TSFreshRelevantFeatureExtractor,
)

## Example data set

We use the ArrowHead data from the [UCR TSC archive](https://timeseriesclassification.com).
as an example dataset. See
[dataset notebook](https://github.com/aeon-toolkit/aeon/blob/main/examples/datasets
/provided_data.ipynb) for more details.

In [23]:
X_train, y_train = load_arrow_head(split="train")
X_test, y_test = load_arrow_head(split="test")
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(36, 1, 251) (36,) (175, 1, 251) (175,)


In [24]:
X_train[0]

array([[-1.9630089 , -1.9578249 , -1.9561449 , -1.9382889 , -1.8966569 ,
        -1.8698569 , -1.8387049 , -1.8122888 , -1.7364328 , -1.6733288 ,
        -1.6230727 , -1.5858727 , -1.5438407 , -1.4567846 , -1.3787206 ,
        -1.2924965 , -1.2169605 , -1.1089764 , -0.96868834, -0.83160026,
        -0.76030422, -0.59963213, -0.46625605, -0.30638396, -0.22684791,
        -0.08975983,  0.04137625,  0.23203876,  0.38728525,  0.41471247,
         0.51567412,  0.62614779,  0.72741025,  0.75345186,  0.78001988,
         0.83840391,  0.88817034,  0.91981996,  0.93344237,  0.9834616 ,
         1.04958   ,  1.1308921 ,  1.1898697 ,  1.2635882 ,  1.2976586 ,
         1.4139322 ,  1.4014314 ,  1.4443339 ,  1.4868475 ,  1.4448603 ,
         1.4448603 ,  1.4635131 ,  1.4635131 ,  1.4424827 ,  1.4822811 ,
         1.5221659 ,  1.5411515 ,  1.5181995 ,  1.4952875 ,  1.4739563 ,
         1.4479355 ,  1.3584794 ,  1.2685802 ,  1.2195033 ,  1.1558585 ,
         1.0848617 ,  0.97762959,  0.94645038,  0.9

## Using tsfresh to extract features

There are two versions of TSFresh feature extractors wrapped in aeon. The
first is the unsupervised
`TSFreshFeatureExtractor` which by default extracts all 4662 features. See the
documentation for parameter configuration.

In [25]:
t = TSFreshFeatureExtractor()
Xt = t.fit_transform(X_train)
Xt.shape
Xt2 = t.transform(X_test)

(36, 777)

The second is `TSFreshRelevantFeatureExtractor` which uses `y` to select the most
relevant features.

In [26]:
t = TSFreshRelevantFeatureExtractor()
t.fit(X_train, y_train)
Xt = t.transform(X_test)
Xt.shape

(175, 147)

## Using tsfresh with aeon estimators

You can use the tsfresh transformer with any scikit-learn compatible estimator.


In [7]:
classifier = make_pipeline(
    TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False),
    RandomForestClassifier(),
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)

0.8490566037735849

For convenience and consistency of use we also have hard coded TSFresh classifier,
regressor and clusterer.

In [29]:
from aeon.classification.feature_based import TSFreshClassifier
from aeon.clustering.feature_based import TSFreshClusterer

cls = TSFreshClassifier()
clst = TSFreshClusterer()

cls.fit(X_train, y_train)
cls.score(X_test, y_test)
clst.fit(X_train)
print(cls.predict(X_test))
print(clst.predict(X_test))

['0' '2' '0' '0' '0' '0' '2' '2' '0' '0' '0' '0' '0' '2' '2' '2' '0' '2'
 '0' '0' '1' '0' '0' '2' '1' '2' '0' '0' '0' '2' '2' '0' '2' '1' '0' '0'
 '2' '0' '0' '0' '0' '2' '0' '2' '0' '0' '0' '0' '0' '0' '0' '0' '0' '0'
 '0' '1' '0' '2' '0' '0' '0' '0' '2' '0' '0' '0' '0' '0' '0' '2' '1' '0'
 '0' '1' '1' '1' '1' '1' '1' '2' '1' '1' '2' '1' '1' '2' '0' '0' '1' '1'
 '1' '0' '1' '1' '1' '2' '2' '2' '1' '1' '2' '1' '1' '1' '1' '1' '2' '2'
 '1' '1' '1' '0' '2' '1' '2' '1' '2' '0' '1' '1' '1' '1' '2' '2' '2' '2'
 '1' '1' '1' '2' '2' '2' '1' '0' '2' '2' '2' '2' '2' '2' '1' '2' '2' '2'
 '2' '2' '2' '2' '2' '0' '2' '2' '2' '1' '2' '2' '2' '2' '2' '2' '2' '0'
 '2' '2' '2' '2' '2' '2' '1' '2' '2' '2' '2' '2' '2']
[7 6 0 0 5 0 0 4 1 6 4 7 0 0 4 5 5 2 2 5 2 0 4 2 3 2 7 6 2 0 0 6 4 5 0 0 7
 2 2 1 6 5 5 0 2 4 0 2 0 2 2 5 0 2 6 6 0 6 0 6 0 5 6 0 6 4 0 7 0 7 2 0 2 6
 2 0 2 7 0 2 2 6 2 6 0 7 2 0 2 5 0 2 0 2 2 0 5 3 0 2 0 6 5 5 0 0 2 6 5 7 0
 2 5 0 0 5 6 2 2 5 1 5 2 5 3 2 4 7 2 0 0 3 2 2 2 6 7 0 2 2 0 0 2

By default, the `TSFreshClassifier` uses the supervised
`TSFreshRelevantFeatureExtractor` and the scitkit `RandomForestClassifier`.
 You can
change this through the constructor

In [None]:
from aeon.classification.sklearn import RotationForestClassifier

cls = TSFreshClassifier(
    relevant_feature_extractor=False, estimator=RotationForestClassifier(n_estimators=5)
)  #
cls.fit(X_train, y_train)
cls.score(X_test, y_test)

By default, the `TSFreshClusterer` uses the unsupervised `TSFreshFeatureExtractor`
and the `sklearn` clusterer `KMeans` with default parameters (which fits 8 clusters).
 You can also configure this through the constructor.

In [30]:
from sklearn.cluster import KMeans

clst = TSFreshClusterer(estimator=KMeans(n_clusters=3))
clst.fit(X_train)
print(clst.predict(X_test))

[0 1 0 0 0 0 0 1 2 1 1 0 0 0 1 0 0 1 1 0 1 0 1 1 1 1 0 1 1 0 0 1 1 0 0 0 0
 1 1 2 1 0 0 0 1 1 0 1 0 1 1 0 0 1 1 1 0 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 1 1
 1 0 1 0 0 1 1 1 1 1 0 2 1 0 1 0 0 1 0 1 1 0 0 1 0 1 0 1 0 0 0 0 1 1 0 0 0
 1 0 0 0 0 1 1 1 0 2 0 1 0 1 1 1 0 1 0 0 1 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 0
 1 0 0 0 1 0 1 1 1 1 1 0 1 0 1 0 0 1 1 1 1 0 1 1 1 0 0]


The `TSFreshRegressor` uses the supervised
`TSFreshRelevantFeatureExtractor` and the scitkit `RandomForestRegressor`.

In [None]:
from aeon.regression.feature_based import TSFreshRegressor

reg = TSFreshRegressor()
from aeon.datasets import load_covid_3month

X, y = load_covid_3month()
reg.fit(X, y)

## TSFresh with multivariate time series data

All three estimators can be used with multivariate time series. The estimators
calculate the features on each channel independently then concatenate the results.
The full transform creates `777*n_channels` features.

In [31]:
X_train, y_train = load_basic_motions(split="train")
X_test, y_test = load_basic_motions(split="test")
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(40, 6, 100) (40,) (40, 6, 100) (40,)


In [33]:
tsfresh = TSFreshFeatureExtractor()
X = tsfresh.fit_transform(X_train, y_train)
X.shape

(40, 4662)

In [34]:
cls = TSFreshClassifier()
clst = TSFreshClusterer(estimator=KMeans(n_clusters=4))
cls.fit(X_train, y_train)
cls.score(X_test, y_test)
clst.fit(X_train)
print(cls.predict(X_test))

['standing' 'standing' 'standing' 'standing' 'standing' 'standing'
 'standing' 'standing' 'standing' 'standing' 'running' 'running' 'running'
 'running' 'running' 'running' 'running' 'running' 'running' 'running'
 'walking' 'walking' 'walking' 'walking' 'walking' 'walking' 'walking'
 'walking' 'walking' 'walking' 'badminton' 'badminton' 'badminton'
 'badminton' 'badminton' 'badminton' 'badminton' 'badminton' 'badminton'
 'badminton']
