# Data conversions in aeon

We recommend you follow the data storage described in the [data storage notebook](examples/datasets/data_storage.ipynb)
which can be summarised as follows: Use `pd.Series` or `pd.DataFrame` for forecasting
 and for classification, clustering and regression, use 3D numpy of shape `(n_cases,
 n_channels, n_timepoints)` if your collection of time series are equal length, or a
  list of 2D numpy of length `[n_cases]` if not equal length. All are [data loaders]
  (examples/datasets/data_loading.ipynb)  use this format.

However, `aeon` provides a range of converters in the `datatypes` package. These are
grouped into converters for single series and converters for collections of series

# Series Converters

Single time series can be stored in the following data structures

pd.Series: a univariate time series
pd.DataFrame: a univariate or multivariate time series
np.ndarray: 2D numpy.ndarray of shape `(n_timepoints, n_channels)`.
xr.DataArray: a univariate or multivariate time series
dask_series: Dask DataFrame: a univariate or multivariate time series

NOTE the 2D numpy array representation is not consistent with that used in
collections. This is an unfortunate difference that is a result of legacy design and
norms in different research fields. We recommend not using numpy arrays with
forecasting.

Conversion to and from these data structures is fairly straightforward. `aeon` contains
converters that are part of the legacy code base. There is a wrapper to hide all this
 code, but we also show under the hood. This code is not likely to be maintained.

In [8]:
import numpy as np

from aeon.datatypes import convert

numpyarray = np.random.random(size=(100, 1))
series = convert(numpyarray, from_type="np.ndarray", to_type="xr.DataArray")
type(series)

xarray.core.dataarray.DataArray

All the actual converter functions for series are in the following file `aeon.datatypes._series._convert`. We stress,
this is legacy code. `aeon` thinks it better the user is responsible for getting the
data into the best format for the estimators.

In [9]:
from aeon.datatypes._series._convert import (
    convert_mvs_to_dask_as_series,
    convert_Mvs_to_xrdatarray_as_Series,
    convert_np_to_MvS_as_Series,
)

pd_dataframe = convert_np_to_MvS_as_Series(numpyarray)
type(pd_dataframe)

pandas.core.frame.DataFrame

In [10]:
dask_dataframe = convert_mvs_to_dask_as_series(pd_dataframe)
type(dask_dataframe)

dask.dataframe.core.DataFrame

In [11]:
xrarray = convert_Mvs_to_xrdatarray_as_Series(pd_dataframe)
type(xrarray)

xarray.core.dataarray.DataArray

# Collections Converters

Previously, collections of time series were called panels (a term from econometrics,
not machine learning), and there are still references to panel. Collections can be
stored as follows

numpy3D: 3D np.array of format (n_instances, n_channels, n_timepoints)
np-list:


MTYPE_REGISTER_PANEL = [
    (
        "nested_univ",
        "Panel",
        "pd.DataFrame with one column per channel, pd.Series in cells",
    ),
    (
        "numpy3D",
        "Panel",
        "3D np.array of format (n_instances, n_channels, n_timepoints)",
    ),
    (
        "numpyflat",
        "Panel",
        "2D np.array of format (n_instances, n_columns*n_timepoints)",
    ),
    ("pd-multiindex", "Panel", "pd.DataFrame with multi-index (instances, timepoints)"),
    ("pd-wide", "Panel", "pd.DataFrame in wide format, cols = (instance*timepoints)"),
    (
        "pd-long",
        "Panel",
        "pd.DataFrame in long format, cols = (index, time_index, column)",
    ),
    ("df-list", "Panel", "list of pd.DataFrame"),
    (
        "dask_panel",
        "Panel",
        "dask frame with one instance and one time index, as per dask_to_pd convention",
    ),
    (
        "np-list",
        "Panel",
        "list of n_cases, each case a 2D np.array of shape (n_channels, series_length)",
    ),
]
