## Supervised Learning in a Univariate Series

Given a sequence of numbers for a time series dataset, we can restructure the data to look like a supervised learning problem:

In [1]:
import numpy as np
import pandas as pd

In [2]:
data1 = pd.DataFrame({'time':[1, 2, 3, 4, 5], 'measure':[100, 110, 108, 115, 120]})
data1

Unnamed: 0,time,measure
0,1,100
1,2,110
2,3,108
3,4,115
4,5,120


We can use the value at the previous time-step as the input variable for the subsequent time-step:

In [3]:
X = np.append(np.nan, data1.measure)
y = np.append(data1.measure, np.nan)

In [4]:
restructured = pd.DataFrame({'X':X, 'y':y})
restructured

Unnamed: 0,X,y
0,,100.0
1,100.0,110.0
2,110.0,108.0
3,108.0,115.0
4,115.0,120.0
5,120.0,


Dropping the first and last row will give us all of the input and labels needed to perform additional machine learning with this dataset:

In [5]:
restructured = restructured.dropna()
restructured

Unnamed: 0,X,y
1,100.0,110.0
2,110.0,108.0
3,108.0,115.0
4,115.0,120.0


From this transformation, we have the minimum requirement to perform a regression on these values to predict and forecast future values. This is a univariate time series problem, as a single variable is being observed. Multivariate time series analysis is much more complicated and many classical methods do not perform well.

Consider the following dataset:

In [6]:
data2 = pd.DataFrame({'time':[1, 2, 3, 4, 5],
        'measure1':[0.2, 0.5, 0.7, 0.4, 1.0],
        'measure2':[88, 89, 87, 88, 90]})
data2

Unnamed: 0,time,measure1,measure2
0,1,0.2,88
1,2,0.5,89
2,3,0.7,87
3,4,0.4,88
4,5,1.0,90


Another transformation can turn this into multivariate time series data:

In [7]:
X1 = np.append(np.nan, data2.measure1)
X2 = np.append(np.nan, data2.measure2)
X3 = np.append(data2.measure1, np.nan)
y = np.append(data2.measure2, np.nan)
multi_restructured = pd.DataFrame({'X1':X1, 'X2':X2, 'X3':X3, 'y':y})
multi_restructured

Unnamed: 0,X1,X2,X3,y
0,,,0.2,88.0
1,0.2,88.0,0.5,89.0
2,0.5,89.0,0.7,87.0
3,0.7,87.0,0.4,88.0
4,0.4,88.0,1.0,90.0
5,1.0,90.0,,


Simply removing the first and last rows will give us a fully labeled structure capable:

In [8]:
multi_restructured = multi_restructured.dropna()
multi_restructured

Unnamed: 0,X1,X2,X3,y
1,0.2,88.0,0.5,89.0
2,0.5,89.0,0.7,87.0
3,0.7,87.0,0.4,88.0
4,0.4,88.0,1.0,90.0


What if we wanted to predict both `measure1` and `measure2` for the next time step?

In [9]:
X1 = np.append(np.nan, data2.measure1)
X2 = np.append(np.nan, data2.measure2)
y1 = np.append(data2.measure1, np.nan)
y2 = np.append(data2.measure2, np.nan)
multistep_restructured = pd.DataFrame({'X1':X1, 'X2':X2, 'y1':y1, 'y2':y2})
multistep_restructured

Unnamed: 0,X1,X2,y1,y2
0,,,0.2,88.0
1,0.2,88.0,0.5,89.0
2,0.5,89.0,0.7,87.0
3,0.7,87.0,0.4,88.0
4,0.4,88.0,1.0,90.0
5,1.0,90.0,,


### Sliding Window With Multiple Steps

- **One-step Forecast**: This is where the next time step (t+1) is predicted.
- **Multi-step Forecast**: This is where two or more future time steps are to be predicted.

Consider the same univariate time series from the first example:

In [11]:
data1

Unnamed: 0,time,measure
0,1,100
1,2,110
2,3,108
3,4,115
4,5,120


We can frame this as a two-step forecasting dataset with a window width of one:

In [12]:
X1 = np.append(np.nan, data1.measure)
y1 = np.append(data1.measure, np.nan)
y2 = np.append(np.append(data1.measure[1:], np.nan), np.nan)
two_step = pd.DataFrame({'X1':X1, 'y1':y1,'y2':y2})
two_step

Unnamed: 0,X1,y1,y2
0,,100.0,110.0
1,100.0,110.0,108.0
2,110.0,108.0,115.0
3,108.0,115.0,120.0
4,115.0,120.0,
5,120.0,,


The first row and the last two rows ca nnot be used, but with enough observations, we can still produce a multistep prediction.

In [13]:
two_step.dropna()

Unnamed: 0,X1,y1,y2
1,100.0,110.0,108.0
2,110.0,108.0,115.0
3,108.0,115.0,120.0
