# A Toy Example for the Time Series Split
<div class="alert alert-block alert-info">
<b>Content:</b> In this notebook,
    we demonstrate the time series split using a very small demo example
    
* where we look at a time series
* try to predict the next element in the series
* use a window-length of 2
</div>

In [2]:
import numpy as np
from sklearn.model_selection import TimeSeriesSplit

In [10]:
tscv = TimeSeriesSplit(n_splits=2)
X = np.arange(100, 110)
y = np.arange(200, 210)
for train_idx, test_idx in tscv.split(X):
    print("TRAIN:", train_idx, "TEST:", test_idx)
    X_train, X_test = X[train_idx], X[test_idx]
    y_train, y_test = y[train_idx], y[test_idx]
    print("X_train:", X_train, "X_test:", X_test)
    print("y_train:", y_train, "y_test:", y_test)
    print()

TRAIN: [0 1 2 3] TEST: [4 5 6]
X_train: [100 101 102 103] X_test: [104 105 106]
y_train: [200 201 202 203] y_test: [204 205 206]

TRAIN: [0 1 2 3 4 5 6] TEST: [7 8 9]
X_train: [100 101 102 103 104 105 106] X_test: [107 108 109]
y_train: [200 201 202 203 204 205 206] y_test: [207 208 209]



In [11]:
for train_idx, test_idx in tscv.split(X):
    pass

X[train_idx], y[train_idx]

(array([100, 101, 102, 103, 104, 105, 106]),
 array([200, 201, 202, 203, 204, 205, 206]))

In [12]:
X[test_idx], y[test_idx]

(array([107, 108, 109]), array([207, 208, 209]))

In [3]:
def print_walk_forward_folds(ts_split, X, y):
    for i, (train_index, test_index) in enumerate(tscv.split(X)):
        print(f"\nFold {i}:")
        print("Train: X=", np.array2string(X[train_index]).replace('\n', ','), "\n       Y=", y[train_index])
        print("Test: X=", np.array2string(X[test_index]).replace('\n', ','), "\n      Y=", y[test_index])

## Default Split
Let the time series be
$1, 2, 3,\ldots$

Let's create time windows of size $2$ in $X$ and the corresponding targets in $y$.

In [4]:
X1=np.arange(1,11)
X2=np.arange(2,12)
Y=np.arange(3,13)
X=np.transpose(np.array([X1,X2]))
X,Y

(array([[ 1,  2],
        [ 2,  3],
        [ 3,  4],
        [ 4,  5],
        [ 5,  6],
        [ 6,  7],
        [ 7,  8],
        [ 8,  9],
        [ 9, 10],
        [10, 11]]),
 array([ 3,  4,  5,  6,  7,  8,  9, 10, 11, 12]))

In [5]:
tscv = TimeSeriesSplit()
# default is 5 splits
print_walk_forward_folds(tscv, X, Y)


Fold 0:
Train: X= [[1 2], [2 3], [3 4], [4 5], [5 6]] 
       Y= [3 4 5 6 7]
Test: X= [[6 7]] 
      Y= [8]

Fold 1:
Train: X= [[1 2], [2 3], [3 4], [4 5], [5 6], [6 7]] 
       Y= [3 4 5 6 7 8]
Test: X= [[7 8]] 
      Y= [9]

Fold 2:
Train: X= [[1 2], [2 3], [3 4], [4 5], [5 6], [6 7], [7 8]] 
       Y= [3 4 5 6 7 8 9]
Test: X= [[8 9]] 
      Y= [10]

Fold 3:
Train: X= [[1 2], [2 3], [3 4], [4 5], [5 6], [6 7], [7 8], [8 9]] 
       Y= [ 3  4  5  6  7  8  9 10]
Test: X= [[ 9 10]] 
      Y= [11]

Fold 4:
Train: X= [[ 1  2], [ 2  3], [ 3  4], [ 4  5], [ 5  6], [ 6  7], [ 7  8], [ 8  9], [ 9 10]] 
       Y= [ 3  4  5  6  7  8  9 10 11]
Test: X= [[10 11]] 
      Y= [12]


## Parametrizing the split

In [20]:
# Fix test_size to 2 with 12 samples
tscv = TimeSeriesSplit(n_splits=2, test_size=3)
print_walk_forward_folds(tscv, X, Y)


Fold 0:
Train: X= [0 1 2 3 4 5] 
       Y= [3 4 5 6 7 8]
Test: X= [6 7 8] 
      Y= [ 9 10 11]

Fold 1:
Train: X= [0 1 2 3 4 5 6 7 8] 
       Y= [ 3  4  5  6  7  8  9 10 11]


IndexError: index 10 is out of bounds for axis 0 with size 10

In [37]:
X=np.arange(24*7*100)

In [38]:
tscv = TimeSeriesSplit(n_splits=10, test_size=24)
print_walk_forward_folds(tscv, X, X+1)


Fold 0:
Train: X= [    0     1     2 ... 16557 16558 16559] 
       Y= [    1     2     3 ... 16558 16559 16560]
Test: X= [16560 16561 16562 16563 16564 16565 16566 16567 16568 16569 16570 16571, 16572 16573 16574 16575 16576 16577 16578 16579 16580 16581 16582 16583] 
      Y= [16561 16562 16563 16564 16565 16566 16567 16568 16569 16570 16571 16572
 16573 16574 16575 16576 16577 16578 16579 16580 16581 16582 16583 16584]

Fold 1:
Train: X= [    0     1     2 ... 16581 16582 16583] 
       Y= [    1     2     3 ... 16582 16583 16584]
Test: X= [16584 16585 16586 16587 16588 16589 16590 16591 16592 16593 16594 16595, 16596 16597 16598 16599 16600 16601 16602 16603 16604 16605 16606 16607] 
      Y= [16585 16586 16587 16588 16589 16590 16591 16592 16593 16594 16595 16596
 16597 16598 16599 16600 16601 16602 16603 16604 16605 16606 16607 16608]

Fold 2:
Train: X= [    0     1     2 ... 16605 16606 16607] 
       Y= [    1     2     3 ... 16606 16607 16608]
Test: X= [16608 16609 16610 1661