### Tutorial on how to use the train/test method listed in the s2s `traintest` module

For cross-validation, we split data resampled in the s2s `time` module into groups.

We start by importing the required libraries and generating an example `AdventCalendar` along with example data.

In [1]:
import s2spy.time
import s2spy.traintest
import pandas as pd
import numpy as np

In [2]:
calendar = s2spy.time.AdventCalendar(anchor_date=(10, 15), freq="180d")

time_index = pd.date_range("20151020", "20211001", freq="60d")
test_data = np.random.random(len(time_index))
df = pd.DataFrame(test_data, index=time_index, columns =["data1"])
ds = df.to_xarray().rename({"index": "time"})

We first need to resample the data using the calendar:

In [3]:
df = calendar.resample(df)
df.keys()

Index(['anchor_year', 'i_interval', 'interval', 'data1', 'target'], dtype='object')

#### Example of the `KFold` method.

All splitter classes from sklearn are supported, a list is available here:

https://scikit-learn.org/stable/modules/classes.html#splitter-classes

In [4]:
from sklearn.model_selection import KFold
splitter = KFold(n_splits=3)
df = s2spy.traintest.split_groups(splitter, df)

Get data from all training groups of fold 0:

In [5]:
training_data_split_0 = df.where(df.split_0 == "train")
training_data_split_0.dropna()

Unnamed: 0,anchor_year,i_interval,interval,data1,target,split_0,split_1,split_2
4,2018,0.0,"(2018-04-18, 2018-10-15]",0.240913,True,train,test,train
5,2018,1.0,"(2017-10-20, 2018-04-18]",0.669087,False,train,test,train
6,2019,0.0,"(2019-04-18, 2019-10-15]",0.414739,True,train,test,train
7,2019,1.0,"(2018-10-20, 2019-04-18]",0.287408,False,train,test,train
8,2020,0.0,"(2020-04-18, 2020-10-15]",0.670571,True,train,train,test
9,2020,1.0,"(2019-10-21, 2020-04-18]",0.556028,False,train,train,test


### `xarray` example

In [6]:
ds = calendar.resample(ds)
ds

Here we choose the `ShuffleSplit` method:

In [7]:
from sklearn.model_selection import ShuffleSplit

splitter = ShuffleSplit(n_splits=3)
s2spy.traintest.split_groups(splitter, ds)