# Overview of the base class structure

`aeon` uses a core inheritance hierarchy of classes across the toolkit, with specialised sub classes in each module. The basic class hierarchy is shown in the following diagram.

<img src="img/aeon_uml_simple.drawio.png" alt="Basic class hierarchy">


## Scikit-learn `BaseEstimator` and aeon `BaseAeonEstimator`

To make sense of this, we break it down from the top.
Everything inherits from sklearn `BaseEstimator`, which mainly handles the mechanisms for getting and setting parameters using the `set_params` and `get_params` methods. These methods are used when the estimators interact with other classes such as [GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV), and is also used in aeon's `ComposableEstimatorMixin`, which we'll talk about later.

Then we have aeon's `BaseAeonEstimator` class. This class handles the following for all aeon's estimator:
- management of tags, setting, getting, interaction with sklearn's tags, etc.
- cloning and resetting of the estimator
- creation of test instances using test parameters specified by each estimators. For example, this is used to define fast-running estimator (e.g. a forest classifier with only 2 trees) for the CI/CD pipelines.

#### A word on aeon's estimator tag system
Tags in aeon are used for various purposes, to display estimators capabilities in the documentations, to use specific tests based on each estimator's capabilities. You can check [all existing tags in aeon](https://github.com/aeon-toolkit/aeon/blob/main/aeon/utils/tags/_tags.py) and the [developer documentation on the testing framework](https://www.aeon-toolkit.org/en/stable/developer_guide/testing.html#) to know more about how we use tags.

## `BaseCollectionEstimator` and `BaseSeriesEstimator`

We distinguish between two types of inputs for aeon estimators, series and collections:
- Series represent single time series as a 2D format `(n_channels, n_timepoints)`, some estimators can also use 1D format as `(n_timepoints)` when they don't support multivariate series. Series estimators also have an `axis` parameter, which allow the input shape to be transposed such as the 2D format becomes `(n_timepoints, n_channels)` instead.
- Collections represent an ensemble of time series as a 3D format `(n_samples, n_channels, n_timepoints)`. Again, this can sometime be represented as a 2D format such as `(n_samples, n_timepoints)` for univariate estimators. Preferably, this should be avoided to clear any confusion on the meaning of axes and the possible confusion with with 2D single series.

For example, if we go back to the base class schema `BaseClassifier` inherit from `BaseCollectionEstimator`. This means that during `fit` and `predict`, all estimators inheriting from `BaseClassifier` will take time series collection as inputs. 


## Collection base estimators

The `BaseCollectionEstimator` defines methods to check the shape of the input, extract metadata (e.g. whether the collection is multivariate) and check compatibility of the input against tags of the estimators. For example, when you do the following : 

In [22]:
from aeon.classification.dictionary_based import TemporalDictionaryEnsemble
from aeon.testing.data_generation import make_example_3d_numpy_list

# TDE does not support unequal length collections
X_unequal, y_unequal = make_example_3d_numpy_list()
try:
    TemporalDictionaryEnsemble().fit(X_unequal, y_unequal)
except ValueError as e:
    print(e)

Data seen by instance of TemporalDictionaryEnsemble has unequal length series, but TemporalDictionaryEnsemble cannot handle these characteristics. 


What happens here is that `TemporalDictionaryEnsemble` inherit from `BaseClassifier`, which itself inherit from `BaseCollectionEstimator`. During `fit` and `predict`, `BaseClassifier` calls `_preprocess_collection`, a function defined in `BaseCollectionEstimator`. This function extracts the input metadata (whether it is multivariate, of unequal lengths etc.) and compare it against `TemporalDictionaryEnsemble` tags. These states that the estimator does not support unequal lengths collections, and hence an exception is raised. 

## Collection base estimators

The `BaseCollectionEstimator` defines methods to check the shape of the input, extract metadata (e.g. whether the collection is multivariate) and check compatibility of the input against tags of the estimators. For example, when you do the following : 

### `BaseClassifier` (aeon.classification)

This is the base class for all classifiers. It uses the standard `fit`, `predict` and `predict_proba` structure from `sklearn`. `fit` and `predict` call the abstract methods `_fit` and `_predict` which are implemented in the subclass to define the classification algorithm. All of the common format checking and conversion is done using the following final methods defined in `BaseCollectionEstimator`.

### `BaseRegressor` (aeon.regression)

BaseRegressor has the same structure as `BaseClassifier`, although it has no `predict_proba` method. The tests on y are also different.

### `BaseClusterer` (aeon.clustering)

`BaseClusterer` also has `fit` and `predict`, but does not take input y. It does include `predict_proba`.

### `BaseCollectionTransformer` (aeon.transformations.collection)

The `BaseCollectionTransformer` was introduced to differentiate transformers that work on a single series to those that work on collections. Part of the motivation was to work around a lot of legacy code in `BaseTransformer` that performs a huge amount of conversion checks that is unnecessary for collections. Rather than `fit` and`predict` it implements `fit`, `transform` and `fit_transform`.

### `BaseCollectionAnomalyDetector`


## Series base estimators
### `BaseForecaster`
### `BaseSegmenter`
### `BaseSeriesTransformer`
### `BaseSeriesAnomalyDetector`
