# BaseSeriesEstimator

The ``BaseSeriesEstimator``class is a base class for estimators that take a single
series as input rather than a collection of time series (see
``BaseCollectionEstimator``). This notebook describes the major design issues to bare
 in mind if using any class that inherits from ``BaseSeriesEstimator``. To use any
 base estimator all you need to understand is the meaning of ``axis`` and the capability
  tags.

``BaseSeriesEstimator`` handles the preprocessing required for a single before being
used in a method such as ``fit``. These are used in base classes by
applying the protected method ``preprocess_series``. The key steps to note are:
1. Input data type should be a ``np.ndarray``, a ``pd.Series`` or a ``pd.DataFrame``.
2. The input data will be transformed into the type required by the estimator as
determined by the tag ``X_inner_type``.
3. If the estimator can only work with univariate time series
(``capability:multivariate`` set to False) then the input data will be converted to a
 1D numpy array or a pandas Series.
4. If the estimator has the ability to handle multivariate time series as determined
 by the tag ``capability:multivariate``, then the input data will stored in either a 2
  D numpy array or a pandas DataFrames.
5. If the data is multivariate, then the ``axis`` varaible of the estimator controls
how it is interpreted. If ``axis==0`` then each column is a time series, and
each row is a time point: i.e. the shape of the data is ``(n_timepoints,n_channels)``
. If ``axis == 1`` indicates the time series are in rows, i.e. the shape of the data
is ``(n_channels, n_timepoints)``.

We demonstrate this with calls to private methods. This is purely to aide understanding
 and should not be used in practice.

In [28]:
# Univariate examples
import numpy as np
import pandas as pd
import pytest

from aeon.base import BaseSeriesEstimator

bs = BaseSeriesEstimator()
# By default, "capability:multivariate" is False, axis is 0 and
# X_inner_type is np.ndarray
d1 = np.random.random(size=(100))
# With this config, the output should always be an np.ndarray
# shape (100,)
d2 = bs._preprocess_series(d1, axis=0)
print(
    "1. Input shape = ",
    d1.shape,
    " output type = ",
    type(d2),
    " output shape = ",
    d2.shape,
)
# 2D numpy shape (m,1) or (1,m) get converted to 1D numpy array
# if multivariate is False
d1 = np.random.random(size=(1, 100))
d2 = bs._preprocess_series(d1, axis=0)
print(
    "2. Input shape = ",
    d1.shape,
    " output type = ",
    type(d2),
    " output shape = ",
    d2.shape,
)
d1 = pd.Series(np.random.random(size=(100)))
d2 = bs._preprocess_series(d1, axis=0)
print(
    "3. Input shape = ",
    d1.shape,
    " output type = ",
    type(d2),
    " output shape = ",
    d2.shape,
)
# Axis is irrelevant for univariate data
d2 = bs._preprocess_series(d1, axis=1)
print(
    "4. Input shape = ",
    d1.shape,
    " output type = ",
    type(d2),
    " output shape = ",
    d2.shape,
)
d1 = pd.DataFrame(np.random.random(size=(100, 1)))
d2 = bs._preprocess_series(d1, axis=0)
print(
    "5. Input shape = ",
    d1.shape,
    " output type = ",
    type(d2),
    " output shape = ",
    d2.shape,
)

# Passing a multivariate array will raise an error
with pytest.raises(ValueError, match=r"Multivariate data not supported"):
    bs._check_X(np.random.random(size=(4, 100)))

1. Input shape =  (100,)  output type =  <class 'numpy.ndarray'>  output shape =  (100,)
2. Input shape =  (1, 100)  output type =  <class 'numpy.ndarray'>  output shape =  (100,)
3. Input shape =  (100,)  output type =  <class 'numpy.ndarray'>  output shape =  (100,)
4. Input shape =  (100,)  output type =  <class 'numpy.ndarray'>  output shape =  (100,)
5. Input shape =  (100, 1)  output type =  <class 'numpy.ndarray'>  output shape =  (100,)


In [25]:
# Multivariate examples
# Set tags
bs.set_tags(**{"capability:multivariate": True})
d1 = np.random.random(size=(4, 100))
# Axis 0 means each row is a time series
d2 = bs._preprocess_series(d1, axis=0)
print(
    "1. Input shape = ",
    d1.shape,
    " output type = ",
    type(d2),
    " output shape = ",
    d2.shape,
)
# Axis 1 means each column is a time series
d2 = bs._preprocess_series(d1, axis=1)
print(
    "2. Input shape = ",
    d1.shape,
    " output type = ",
    type(d2),
    " output shape = ",
    d2.shape,
)
d1 = pd.DataFrame(d1)
d2 = bs._preprocess_series(d1, axis=1)
print(
    "2. Input type =",
    type(d1),
    "Input shape = ",
    d1.shape,
    " output type = ",
    type(d2),
    "output shape = ",
    d2.shape,
)

1. Input shape =  (4, 100)  output type =  <class 'numpy.ndarray'>  output shape =  (4, 100)
2. Input shape =  (4, 100)  output type =  <class 'numpy.ndarray'>  output shape =  (100, 4)
2. Input type = <class 'pandas.core.frame.DataFrame'> Input shape =  (4, 100)  output type =  <class 'numpy.ndarray'> output shape =  (100, 4)


If implementing a new estimator that extends ``BaseSeriesEstimator`` then just set
the ``axis`` to the shape you want to work with by passing it to the
``BaseSeriesEstimator`` constructor. If your estimator can handle
multivariate series, set the tag and set the ``capability:multivariate`` tag to True.
 The data will always then be passed to your estimator in ``(n_channels,
 n_timepoints)`` if axis is 1, or ``(n_timepoints, n_channels)``
 if axis is 0, either
  in numpy arrays or pandas DataFrame, dependning on ``X_inner_type`` tag. If
 a univariate series is passed  it will be passed in ``(1, n_timepoints)`` if axis is
  0, or ``(n_timepoints, 1)`` if the estimator axis is 0.