Sequential data: 
- ordered in time or space
- order of the data points contains dependencies between them
- Examples of sequential data:
1. Time series
2. Text
3. Audio waves

Train-test-split
- No random splitting for time series
- Look-ahead bias: model has info about the future

Solution: split by time

**Creating sequences**

- Sequence lenght = number of data points in one training example
- 24*4 = 96 -> consider last 24 hours
- Predict single next data points

In [59]:
# Creating sequences in python

import numpy as np
import torch
from torch.utils.data import TensorDataset

def create_sequences(df, seq_length):
    xs, ys = [], []
    for i in range(len(df) - seq_length):
        x = df.iloc[i:(i + seq_length), 1]  # column index 1 = 'value'
        y = df.iloc[i + seq_length, 1]
        xs.append(x.values)
        ys.append(y)
    return np.array(xs), np.array(ys)


In [49]:
df.drop_duplicates(inplace=True)
df.dropna(inplace=True)
df.value_counts().sum()

500

In [50]:
import pandas as pd
df = pd.read_csv("./../../synthetic_time_series.csv")
df.head()

Unnamed: 0,time,value
0,0,0.248357
1,1,-0.019153
2,2,0.423678
3,3,0.910953
4,4,0.081593


In [51]:
seq_length = 10
N = len(df)  # total number of time steps
SPLIT = round(0.7 * N)
SPLIT

350

In [56]:
train_data, test_data = df[:SPLIT], df[SPLIT:]
train_data.head(), test_data.head()

(   time     value
 0     0  0.248357
 1     1 -0.019153
 2     2  0.423678
 3     3  0.910953
 4     4  0.081593,
      time     value
 350   350 -0.820172
 351   351 -0.225761
 352   352 -0.520015
 353   353 -1.011847
 354   354 -0.922091)

In [57]:
# Creating training examples

X_train, y_train = create_sequences(train_data, seq_length)
print(X_train.shape, y_train.shape)

(340, 10) (340,)


In [60]:
# TensorDataset
dataset_train = TensorDataset(
    torch.from_numpy(X_train).float(),
    torch.from_numpy(y_train).float(),
)

dataset_train

<torch.utils.data.dataset.TensorDataset at 0x1d599bf2530>