## 6.2 Time Series to Supervised 

A time series must be transformed into samples with input and output components. 
For a univariate time series problem where we are interested in one-step predictions, the observations at prior time steps, so-called lag observations, are used as input and the output is the observation at the current time step. 

X, &emsp;&emsp;&emsp;    y <br>
[1, 2, 3], [4] <br>
[2, 3, 4], [5] <br>
[3, 4, 5], [6] <br>

The __split sequence()__ function below implements this behavior and will split a given univariate sequence into multiple samples where each sample has a specified number of time steps and the output is a single time step.

In [3]:
from numpy import array 
# split a univariate sequence into samples
def split_sequence(sequence, n_steps):
    X, y = [], []
    for i in range(len(sequence)):
        # find the end of this pattern 
        end_idx = i + n_steps 
        # check if we are beyond the sequence
        if end_idx > len(sequence) - 1:
            break 
            
        # gather input and output parts of the pattern 
        seq_x, seq_y = sequence[i:end_idx], sequence[end_idx]
        X.append(seq_x)
        y.append(seq_y)
        
    return array(X), array(y)

# define univariate time series 
series = array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
print(series.shape)
# transform to a supervised learning problem 
X, y = split_sequence(series, 3)
print(X.shape, y.shape)
for i in range(len(X)):
    print(X[i], y[i])

(10,)
(7, 3) (7,)
[1 2 3] 4
[2 3 4] 5
[3 4 5] 6
[4 5 6] 7
[5 6 7] 8
[6 7 8] 9
[7 8 9] 10


- Feature: A column in a dataset, such as a lag observation for a time series dataset.
- Sample: A row in a dataset, such as an input and output sequence for a time series dataset.

## 6.3 3D Data Preparation Basics

The input layer for CNN and LSTM models is specified by the input shape argument on the first hidden layer of the network. The input to every CNN and LSTM layer must be three-dimensional. The three dimensions of this input are:
- Samples. One sequence is one sample. A batch is comprised of one or more samples.
- Time Steps. One time step is one point of observation in the sample. One sample is comprised of multiple time steps.
- Features. One feature is one observation at a time step. One time step is comprised of one or more features.

This expected three-dimensional structure of input data is often summarized using the array shape notation of: [samples, timesteps, features].

In [4]:
# define univariate time series
series = array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
print(series.shape)
# transform to a supervised learning problem
X, y = split_sequence(series, 3)
print(X.shape, y.shape)
# transform input from [samples, features] to [samples, timesteps, features]
X = X.reshape((X.shape[0], X.shape[1], 1))
print(X.shape)

(10,)
(7, 3) (7,)
(7, 3, 1)


## 6.4 Data Preparation Example

There are few problems here:
- Data Shape. LSTMs expect 3D input, and it can be challenging to get your head around this the first time.
- Sequence Length. LSTMs don’t like sequences of more than 200-400 time steps, so the data will need to be split into subsamples.

The LSTM needs data with the format of [samples, timesteps, features]. We have 25 samples, 200 time steps per sample, and 1 feature. we can use the reshape() function to add one additional dimension for our single feature and use the existing columns as time steps instead.

In [7]:
# example of creating a 3d array of subsequences 
from numpy import array 

# define the dataset 
data = []
n = 5000 

for i in range(n):
    data.append([i+1, (i+1)*10])
data = array(data)

# drop time 
data = data[:,1]
# split into samples (e.g. 5000/200 = 25)
samples = []
length = 200 
# step over the 5,000 in jumps of 200 
for i in range(0, n, length):
    # grab from i to i + 200 
    sample = data[i:i+length]
    samples.append(sample)
    
# convert list of arrays into 2d array
data = array(samples)
# reshape into [samples, timesteps, features]
data = data.reshape((len(samples), length, 1))
print(data.shape)

(25, 200, 1)


## 6.6 Further Reading
This section provides more resources on the topic if you are looking to go deeper.
- numpy.reshape API.(https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html)
- Keras Recurrent Layers API in Keras. (https://keras.io/layers/recurrent/)
- Keras Convolutional Layers API in Keras. https://keras.io/layers/convolutional/

## 6.7 Summary
In this tutorial, you discovered exactly how to transform a time series data set into a three- dimensional structure ready for fitting a CNN or LSTM model.
Specifically, you learned:
- How to transform a time series dataset into a two-dimensional supervised learning format.
- How to transform a two-dimensional time series dataset into a three-dimensional structure suitable for CNNs and LSTMs.
- How to step through a worked example of splitting a very long time series into subsequences ready for training a CNN or LSTM model.