# 05 Data Preparation for Deep Learning
It is important to first transform any data that you have into a suitable format before any time series analysis can be done. This lab will guide you through from the basics of transforming raw time series data into structure suitable for supervised learning task, and ways to transform time series data into 3-dimensional structure to be feed into convolutional neural networks (CNN) and long short-term memory (LSTM). At the end of this lab, you will be able to:

1. transform a time series dataset into a two-dimensional supervised learning format, and
2. transform a two-dimensional time series dataset into a three-dimensional structure.

But first let us import some necessary libraries for this lab.

In [3]:
# importing required libraries or modules for this lab
from numpy import array

## Transforming Time Series Data for Supervised Learning Task

Below provides an example of a function written with the purpose to transform a univariate time series into a structure suitable for supervised learning.

Suppose we have a univariate time series. It has a 1-dimensional structure, and thus is unable to perform supervised learning. Why? Because there are no clear distinctions of features and labels.

In [4]:
# define univariate time series
series = array([1,2,3,4,5,6,7,8,9,10])
print(series.shape)

(10,)


In [1]:
# split a univariate sequence into samples
def split_sequence(sequence, n_steps):
    X, y = list(), list()
    for i in range(len(sequence)):
    # find the end of this pattern
        end_ix = i + n_steps
        # check if we are beyond the sequence
        if end_ix > len(sequence)-1:
            break
    # gather input and output parts of the pattern
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
        X.append(seq_x)
        y.append(seq_y)
    return array(X), array(y)

In [5]:
# calling the function to transform time series into features and labels
x, y = split_sequence(series, 3)
print(f"Features are now in the shape of {x.shape} while labels are now in the shape of {y.shape}")

# printing out each sample
for i in range(len(x)):
    print(x[i], y[i])

Features are now in the shape of (7, 3) while labels are now in the shape of (7,)
[1 2 3] 4
[2 3 4] 5
[3 4 5] 6
[4 5 6] 7
[5 6 7] 8
[6 7 8] 9
[7 8 9] 10


"Running the example first prints the shape of the time series, in this case 10 time steps
of observations. Next, the series is split into input and output components for a supervised
learning problem. We can see that for the chosen representation that we have 7 samples for the
input and output and 3 input features. The shape of the output is 7 samples represented as (7,)
indicating that the array is a single column. It could also be represented as a two-dimensional
array with 7 rows and 1 column [7, 1]. Finally , the input and output aspects of each sample
are printed, showing the expected breakdown of the problem."

## Preparing 3-Dimensional Data
"Preparing time series data for CNNs and LSTMs requires one additional step beyond transforming
the data into a supervised learning problem."
"The input layer for CNN and LSTM models is specified by the input shape argument on
the first hidden layer of the network. This too can make things confusing for beginners as
intuitively we may expect the first layer defined in the model be the input layer, not the first
hidden layer. For example, below is an example of a network with one hidden LSTM layer and
one Dense output layer."

Source: Deep Learning for Time Series Forecasting, Jason Brownlee