# Transforming time series with aeon

Transformers are objects that transform data from one representation to another. `aeon`
contains time series specific transformers which can be used in
pipelines in conjunction with other estimators.
Note: the term "transformer" is used in deep learning to refer to specific neural
network architectures. `aeon` transformers follow the `scikit-learn` design: they
have `fit`, `transform` and `fit_transform`  methods that combine the two functions.
Some transformers also have `inverse_transform` that allows you to reverse the change.

`aeon` distinguishes different types of transformer, depending on the input type accepted
by the `fit` and `transform` methods. The main distinction is whether all series types
(i.e. single time series, collections of time series, hierarchical time series) are accepted
and implicitly converted, or whether only a singular input type (i.e. collections) is accepted.

## Transformers

General transformers (in the package `aeon/transformations`, `aeon/transformations`,
`aeon/transformations/bootstrap` and `aeon/transformations/hierarchical`) aim to
accept all input types, and will attempt to restructure the data or broadcast to multiple transformer
objects if necessary to fit the input data to the data structure used by the transformer. For example,
if the class excepts a singular series but is given a collection of series, a separate instance of the
transformer is applied independently to each series. Transformers all extend the base class
`BaseTransformer`. General transformations are mostly used for single series tasks such as forecasting
and annotation, and are best used with `pd.Series` or `pd.DataFrame` input. Other valid data types will
be accepted but are likely to be converted to another format internally, see the
[data structures](../datasets/data_structures.ipynb) notebook for clarification of how
best
to store
data with aeon.

Transformers differ in terms of whether they convert time series into different time series
(series-to-series transformation), or whether they convert series into feature vector(s)
(series-to-vector transformation).

To illustrate the difference, we compare two single-series transformers with different output:

* the Box-Cox transformer `BoxCoxTransformer`, a series-to-series transformer using the
[Box Cox power transform](https://en.wikipedia.org/wiki/Power_transform#Box%E2%80%93Cox_transformation).
* the summary transformer `SummaryTransformer`, a series-to-vector transformer that
finds summary statistics such as the mean an standard deviation of each series.


In [29]:
import warnings

warnings.filterwarnings("ignore")

In [30]:
from aeon.datasets import load_airline
from aeon.transformations.boxcox import BoxCoxTransformer
from aeon.transformations.summarize import SummaryTransformer
from aeon.visualisation import plot_series

boxcox_trans = BoxCoxTransformer()
summary_trans = SummaryTransformer()

# airline is a single time series stored in a pd.Series
airline = load_airline()

type(airline)

In [31]:
airline[:5]

In [32]:
plot_series(airline)

In [33]:
# this produces a pandas Series containing the transformed series
airline_bc = boxcox_trans.fit_transform(airline)
type(airline_bc)

In [34]:
airline_bc[:5]

In [35]:
plot_series(airline_bc)

In [36]:
# this produces a pandas.DataFrame containing the feature vector for our single series
airline_summary = summary_trans.fit_transform(airline)
type(airline_summary)

In [37]:
airline_summary

You can get a list of all series-to-series and series-to-vector transformers using
the output tags. Please consult the API for details on each transformations

In [38]:
from aeon.registry import all_estimators

all_estimators(
    "transformer",
    exclude_estimator_types="collection-transformer",
    filter_tags={
        "output_data_type": ["Series", "Collection"],
    },
    as_dataframe=True,
)

In [39]:
all_estimators(
    "transformer",
    exclude_estimator_types="collection-transformer",
    filter_tags={
        "output_data_type": "Tabular",
    },
    as_dataframe=True,
)

If your series is split into training and testing data, you should call `fit` and
`transform` separately. `BoxCoxTransformer` has a parameter `lambda` that can be
learned from the train data:

In [40]:
from aeon.forecasting.model_selection import temporal_train_test_split

train, test = temporal_train_test_split(airline)
boxcox = BoxCoxTransformer(method="mle")
test[:5]

You can then apply the model without refitting lambda using just `transform`:

In [41]:
# fit the transformer on the training data
boxcox.fit(train)
# apply to test data
test_new = boxcox.transform(test)
test_new[:5]

Fitted model components of transformers can be found with the `get_fitted_params()`
method:

In [42]:
boxcox.get_fitted_params()
# this is a pandas.DataFrame that contains the fitted transformers

### Pipelines with transformers

Tasks such as forecasting are not compatible with `sklearn` pipelines, because `fit` and `predict`
are used differently and require different input. They can still be combined into `aeon` pipelines. The easiest
way to do this is with `make_pipeline`. Pipelines can be a combinations of
`BaseTransformer` objects, or a combination of `BaseTransformer` and estimators such
as forecasters, classifiers or regressors:

In [43]:
from aeon.forecasting.naive import NaiveForecaster
from aeon.pipeline import make_pipeline
from aeon.transformations.difference import Differencer

pipe = make_pipeline(Differencer(), NaiveForecaster(strategy="last", sp=12))

# this constructs a TransformedTargetForecaster, which is also a forecaster
pipe

In [44]:
# this is a forecaster with the same interface as NaiveForecaster
# first applies differencer, then naive forecaster, then inverts differencing
pipe.fit(airline, fh=list(range(1, 13)))
forecast = pipe.predict()
print(forecast)

In [45]:
plot_series(airline, forecast)

In [46]:
from aeon.transformations.summarize import SummaryTransformer

trans_pipe = make_pipeline(Differencer(), SummaryTransformer())

trans_pipe

In [47]:
trans_pipe.fit_transform(airline)

## Collection Transformers

Collection transformers inherit from `BaseCollectionTransformer`, itself a subclass
of `BaseTransformer`. Collection transformers differ from the other transformers in
`aeon` in that the only accept collections of series, and they are more likely to not
transform each series independently. A `BaseCollectionTransformer` works best with the same
data structures used by clusterers, regressors and classifiers: 3D numpy of shape
`(n_cases, n_channels, n_timepoints)` for equal length series or a list of 2D numpy `[n_cases]`.
 Like before, other valid collection input types can be used.
 See the [data storage notebook](../datasets/data_structures.ipynb) for more
 details.


In [48]:
from aeon.datasets import load_arrow_head, load_basic_motions, load_covid_3month

# univariate classification
arrows, arrows_labels = load_arrow_head()
# multivariate classification
motions, motions_labels = load_basic_motions()
# univariate regression
covid, covid_response = load_covid_3month()

print("Arrows shape (n_cases, n_channels, n_timepoints) = ", arrows.shape)
print("Motions shape (n_cases, n_channels, n_timepoints) = ", motions.shape)
print("Covid shape (n_cases, n_channels, n_timepoints) = ", covid.shape)

Collection transformers can also be series-to-series or series-to-vector. Most transformers will
always transform a collection of $n$ series into a collection of $n$ series or
vectors. For example, `Catch22` transforms each channel of each series into 22 summary features.


In [49]:
from aeon.transformations.collection.feature_based import Catch22

c22 = Catch22()
t = c22.fit_transform(arrows)
t.shape

Series-to-series transformers transform each series into a different series. This can
 mean it has a different number of channels and/or be different length. For example,
 `ElbowClassPairwise` performs a supervised channel selection to reduce
 dimensionality. In the example below, it selects the best two channels from BasicMotions.

In [50]:
from aeon.transformations.collection import ElbowClassPairwise

ecp = ElbowClassPairwise()
t2 = ecp.fit_transform(motions, motions_labels)
t2.shape

series-to-vector Collection transformers return array-like objects of shape `(n_cases, n_features)`, so
they can be used with sklearn classifiers or regressors directly or in a pipeline. The following are equivalent.

In [51]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

arrows_train, arrows_test, y_train, y_test = train_test_split(
    arrows, arrows_labels, test_size=0.33
)

c22 = Catch22()
c22_train = c22.fit_transform(arrows_train, y_train)

lr = LogisticRegression()
lr.fit(c22_train, y_train)

c22_test = c22.transform(arrows_test, y_test)
preds = lr.predict(c22_test)
accuracy_score(y_test, preds)

In [52]:
from sklearn.pipeline import Pipeline

pipe = Pipeline(steps=[("catch22", c22), ("logistic", lr)])
pipe.fit(arrows_train, y_train)
preds = pipe.predict(arrows_test)
accuracy_score(y_test, preds)

Series-to-series collection transformers can be used in an sklearn pipeline with an
`aeon` classifier or regressor

In [53]:
from sklearn.metrics import mean_squared_error

from aeon.regression.distance_based import KNeighborsTimeSeriesRegressor

knn = KNeighborsTimeSeriesRegressor(distance="euclidean")
pipe = Pipeline(steps=[("ECP", ecp), ("knn", knn)])
covid_train, covid_test, y_train, y_test = train_test_split(
    covid, covid_response, test_size=0.75
)
pipe.fit(covid_train, y_train)

In [54]:
preds = pipe.predict(covid_test)
mean_squared_error(y_test, preds)

### Wrapping as a general transformer

Collection transformers can be wrapped to have to same functionality as a `BaseTransformer` using the `CollectionToSeriesWrapper`.

In [55]:
from aeon.transformations.collection import CollectionToSeriesWrapper

c22 = Catch22()
wrapper = CollectionToSeriesWrapper(c22)  # wrap transformer to accept single series

wrapper.fit_transform(airline)