# Time Series

## Check Seasonality automatically with `darts`

Seasonality describes a pattern that repeats regularly over time.

Identifying and understanding the seasonality in time series can boost the performance of your model.

But you don't have to find the seasonality effect and period by yourself.

Instead, you can use `check_seasonality()` from `darts` in Python.

It will check if the time series is seasonal and returns also the period, which is inferred from the Auto-correlation Function.

In the example below, it will return a seasonal period of 12 (Air Passenger Dataset has a monthly frequency).

In [None]:
!pip install darts

In [None]:
from darts.utils.statistics import check_seasonality
from darts.datasets import AirPassengersDataset

ts = AirPassangersDataset().load()

is_seasonal, period = check_seasonality(ts)

## Cross-validation for Time Series Data with `TimeSeriesSplit`

How to do Cross-Validation with Time Series?

Using standard K-Fold Cross-Validation will not work.

In this case, you would simply partition the data into k folds, and then train and evaluate the model k times, each time using a different fold as the test set and the rest of the data as the training set.

But, this can lead to issues because the model will be trained on data that is both before and after the test data.

This can result in overfitting or biased estimates of model performance

Instead, use `TimeSeriesSplit` from scikit-learn.

`TimeSeriesSplit` ensures that the model is only trained on the past values and tested on future data.

This gives you a more accurate and less biased assessment of the model's performance.

In [None]:
from sklearn.model_selection import TimeSeriesSplit, cross_validate
from sklearn.ensemble import GradientBoostingRegressor

X, y = ...
model = GradientBoostingRegressor()

ts_cv = TimeSeriesSplit(n_splits=3)

scores = cross_validate(model, X, y, cv=ts_cv, scoring='neg_mean_squared_error')

## More Cross-Validation with `tscv`

How to do Cross-Validation with Time Series?

Using standard K-Fold Cross-Validation will not work.

In this case, you would simply partition the data into k folds, and then train and evaluate the model k times, each time using a different fold as the test set and the rest of the data as the training set.

But, this can lead to issues because the model will be trained on data that is both before and after the test data.

This can result in overfitting or biased estimates of model performance.

Instead, use `tscv` package for Python.

`tscv` offers methods for correct splitting of your data with 3 classes implemented:

- `GapLeavePOut`
- `GapKFold`
- `GapRollForward`

This gives you a more accurate and less biased assessment of the model’s performance.

In [None]:
!pip install tscv

In [None]:
from tscv import GapRollForward
cv = GapRollForward(min_train_size=3, gap_size=1, max_test_size=2)
for train, test in cv.split(range(10)):
    print("train:", train, "test:", test)