# Storing and Loading data in `aeon`

Getting data into the correct data structure is fundamental. This notebook
describes the datastructures used in `aeon` and links to more complex use cases.
models two types of abstract data. Single series and collections of series.

A single time series can be univariate (each observation is a single value) or
multivariate (each observation is a vector). We say that the length of the vector
(its dimension) is the number of channels, which in code we denote `n_channels`.
The length of the series is called  the number of timepoints, or `n_timepoints` in
code. We generally store a single series
in a 2D numpy array with shape ``(n_channels, n_timepoints)``. Series estimators
should work with a univariate series stored as a 1D numpy array, but will internally convert to 2D.

A collection consists of a group of time series, ideally assumed to be independent of
 each other. Each is also called a case or instance, and a collection contains a
 number of cases, denoted ``n_cases``. A  collection of equal length time series is
 internally stored in a 3D numpy array of shape ``(n_cases, n_channels, n_timepoints)
 ``. Collection estimators will work with a univariate collection of shape ``
 (n_cases, n_timepoints)`` which internally is converted to ``(n_cases, 1,
 n_timepoints)``.  Like ``scikit-learn``, we refer to a
 collection of cases as ``X``. Supervised learners (e.g. classifiers and regressors)
 require a target variable for training.

## Why this shape?

We get asked this a lot. Packages like tensorflow assume  ``(n_cases, n_timepoints,
n_channels)`` rather than ``(n_cases, n_channels, n_timepoints)``. tl;dr: its a
decision we made early on because many estimators iterate over channels, and we are
not changing it now. Its simple to reshape.

If your collection of series are unequal length, we store them in a list of 2D numpy
arrays. See the [unequal length collections](provided_unequal.ipynb). aeon does not
currently support single series with unequal length channels.

`aeon` ships with a range of datasets used in examples and testing. The [provided
datasets notebook](provided_data.ipynb) describes all these datasets.

`aeon` provides functions to load data directly from text files in several formats.
The [data loading notebook](data_loading.ipynb) describes the formats of our
supported files and how to load them into aeon data structures.

You can load data directly from the [Time Series Machine Learning
archive](https://timeseriesclassification.com/) and the
[Monash time series forecasting](https://forecastingdata.org/)
sites. More details in the [load from web notebook](load_data_from_web.ipynb).
