### Time Series Forcasting with Long Short-Term Memory


In this notebook I will investigate:
- univariate time series forcasting
- multivariate time series forcasting
- multi-step time series forcasting

time series forcasting.

This notebook is based on the tutorial by Jason Brownlee and here is the link to the tutorial
https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/

#### Data Preparation

The data needs some preparation for time series modeling.

Let's start with a simple univariate sequence

In [68]:
row_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]

In [69]:
import numpy as np


#split the data
def split_sequence(seq, steps):
    """the function take the seq and split it to 
    sequences
    input:seq, input seq as a list and steps, number of time steps
    output: input and output of the timeseries"""
    X, y = [], []
    for i in range(len(seq)):
        #find the end of this pattern
        end_ix = i + steps
        #check if we are beyond the seq
        if end_ix >len(seq)-1:
            break
        #gather input and putput parts
        seq_x, seq_y = seq[i:end_ix], seq[end_ix]
        X.append(seq_x)
        y.append(seq_y)
    return np.array(X), np.array(y)

In [70]:
#split the data
steps = 3
X, y = split_sequence(row_seq, steps=steps)

for i in range(len(X)):
    print(X[i], y[i])

[10 20 30] 40
[20 30 40] 50
[30 40 50] 60
[40 50 60] 70
[50 60 70] 80
[60 70 80] 90


The model expect the input sd [samples, timesteps, features], our current format is [samples, timesteps]. So let's reshape our input:

In [71]:
#reshape the input to fit [samples, timesteps, features] format
n_features = 1
X = X.reshape((X.shape[0], X.shape[1], n_features))
print(X)

[[[10]
  [20]
  [30]]

 [[20]
  [30]
  [40]]

 [[30]
  [40]
  [50]]

 [[40]
  [50]
  [60]]

 [[50]
  [60]
  [70]]

 [[60]
  [70]
  [80]]]


First we will use Vanilla LSTM which is a single hidden layer of LSTM.

The model is fit using Adam and optimized for mean squered error

In [72]:
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
#define model
model = Sequential()
model.add(LSTM(50, activation='relu', input_shape=(steps, n_features) ))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
#fit model
model.fit(X, y, epochs=200, verbose=0)


<tensorflow.python.keras.callbacks.History at 0x1e6e04bec88>

In [73]:
#predict
x_input = np.array([70, 80, 90])
x_input = x_input.reshape((1, steps, n_features))
y_pred = model.predict(x_input, verbose=0)
print('y_pred',y_pred)

y_pred [[102.079956]]


### Stacked LSTM

We can add multiple hidden LSTM layers, an LSTM layer requires a 3D input (like before), using return_sequences we will have 3D output for the next LSTM layer

In [74]:
#define model
model_stacked = Sequential()
model_stacked.add(LSTM(50, activation='relu', return_sequences=True, input_shape=(steps, n_features)))
model_stacked.add(LSTM(50, activation='relu'))
model_stacked.add(Dense(1))
model_stacked.compile(optimizer='adam', loss='mse')
#fit model
model_stacked.fit(X, y, epochs=200, verbose=0)

<tensorflow.python.keras.callbacks.History at 0x1e6e0853fd0>

In [75]:
#predict
model_stacked.predict(x_input, verbose=0)
y_pred = model_stacked.predict(x_input, verbose=0)
print(y_pred)

[[104.289505]]


#### Bidirectional LSTM

Bidirectional LSTM allows the model to learn the input sequence both forward and backwards

In [76]:
from keras.layers import Bidirectional
#define model
model_bi = Sequential()
model_bi.add(Bidirectional(LSTM(50, activation='relu'),  input_shape=(steps, n_features)))
model_bi.add(Dense(1))
model_bi.compile(optimizer='adam', loss='mse')
#fit model
model_bi.fit(X, y, epochs=200, verbose=0)

<tensorflow.python.keras.callbacks.History at 0x1e6d9505978>

In [77]:
model_bi.predict(x_input, verbose=0)
y_pred = model_bi.predict(x_input, verbose=0)
print(y_pred)

[[101.10093]]


#### CNN LSTM

CNN can be very effective at automatically extracting and learning features from one-dimensional sequence data such as univariate time series data.

A CNN model can be used in a hybrid model with an LSTM backend where the CNN is used to interpret subsequences of input that together are provided as a sequence to an LSTM model to interpret. This hybrid model is called a CNN-LSTM.

The first step is to split the input sequences into subsequences that can be processed by the CNN model. For example, we can first split our univariate time series data into input/output samples with four steps as input and one as output. Each sample can then be split into two sub-samples, each with two time steps. The CNN can interpret each subsequence of two time steps and provide a time series of interpretations of the subsequences to the LSTM model to process as input.

We can parameterize this and define the number of subsequences as n_seq and the number of time steps per subsequence as n_steps. The input data can then be reshaped to have the required structure:

[samples, subsequences, timesteps, features]

In [78]:
# choose a number of time steps
n_steps = 4
# split into samples
X, y = split_sequence(row_seq, n_steps)
# reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]
n_features = 1
n_seq = 2
n_steps = 2
X = X.reshape((X.shape[0], n_seq, n_steps, n_features))

In [79]:
print(X[:3])

[[[[10]
   [20]]

  [[30]
   [40]]]


 [[[20]
   [30]]

  [[40]
   [50]]]


 [[[30]
   [40]]

  [[50]
   [60]]]]


We want to reuse the same CNN model when reading in each sub-sequence of data separately.

This can be achieved by wrapping the entire CNN model in a TimeDistributed wrapper that will apply the entire model once per input, in this case, once per input subsequence.

The CNN model first has a convolutional layer for reading across the subsequence that requires a number of filters and a kernel size to be specified. The number of filters is the number of reads or interpretations of the input sequence. The kernel size is the number of time steps included of each ‘read’ operation of the input sequence.

The convolution layer is followed by a max pooling layer that distills the filter maps down to 1/2 of their size that includes the most salient features. These structures are then flattened down to a single one-dimensional vector to be used as a single input time step to the LSTM layer.

In [80]:
# univariate cnn lstm example
from numpy import array
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import TimeDistributed
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
 
# split a univariate sequence into samples
def split_sequence(sequence, n_steps):
    X, y = list(), list()
    for i in range(len(sequence)):
        # find the end of this pattern
        end_ix = i + n_steps
        # check if we are beyond the sequence
        if end_ix > len(sequence)-1:
            break
        # gather input and output parts of the pattern
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
        X.append(seq_x)
        y.append(seq_y)
    return array(X), array(y)
 
# define input sequence
raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]
# choose a number of time steps
n_steps = 4
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]
n_features = 1
n_seq = 2
n_steps = 2
X = X.reshape((X.shape[0], n_seq, n_steps, n_features))


In [81]:
# define model
model = Sequential()
model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation='relu'), input_shape=(None, n_steps, n_features)))
model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(50, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
# fit model
model.fit(X, y, epochs=500, verbose=0)

<tensorflow.python.keras.callbacks.History at 0x1e6e3dd8128>

In [82]:

# demonstrate prediction
x_input = array([60, 70, 80, 90])
x_input = x_input.reshape((1, n_seq, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)
print(yhat)

[[100.55077]]


#### ConvLSTM

A type of LSTM related to the CNN-LSTM is the ConvLSTM, where the convolutional reading of input is built directly into each LSTM unit.

The ConvLSTM was developed for reading two-dimensional spatial-temporal data, but can be adapted for use with univariate time series forecasting.

The layer expects input as a sequence of two-dimensional images, therefore the shape of input data must be:

[samples, timesteps, rows, columns, features]

For our purposes, we can split each sample into subsequences where timesteps will become the number of subsequences, or n_seq, and columns will be the number of time steps for each subsequence, or n_steps. The number of rows is fixed at 1 as we are working with one-dimensional data.

We can now reshape the prepared samples into the required structure.

In [83]:
# choose a number of time steps
n_steps = 4
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# reshape from [samples, timesteps] into [samples, timesteps, rows, columns, features]
n_features = 1
n_seq = 2
n_steps = 2
X = X.reshape((X.shape[0], n_seq, 1, n_steps, n_features))

# choose a number of time steps
n_steps = 4
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# reshape from [samples, timesteps] into [samples, timesteps, rows, columns, features]
n_features = 1
n_seq = 2
n_steps = 2
X = X.reshape((X.shape[0], n_seq, 1, n_steps, n_features))

In [84]:
#print(X)

We can define the ConvLSTM as a single layer in terms of the number of filters and a two-dimensional kernel size in terms of (rows, columns). As we are working with a one-dimensional series, the number of rows is always fixed to 1 in the kernel.

The output of the model must then be flattened before it can be interpreted and a prediction made.

In [85]:
from numpy import array
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import ConvLSTM2D
# define model
model = Sequential()
model.add(ConvLSTM2D(filters=64, kernel_size=(1,2), activation='relu', input_shape=(n_seq, 1, n_steps, n_features)))
model.add(Flatten())
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
# fit model
model.fit(X, y, epochs=500, verbose=0)
# demonstrate prediction
x_input = array([60, 70, 80, 90])
x_input = x_input.reshape((1, n_seq, 1, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)
print(yhat)

[[102.83079]]


In [86]:
from numpy import array
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import ConvLSTM2D

# split a univariate sequence into samples
def split_sequence(sequence, n_steps):
    X, y = list(), list()
    for i in range(len(sequence)):
        # find the end of this pattern
        end_ix = i + n_steps
        # check if we are beyond the sequence
        if end_ix > len(sequence)-1:
            break
        # gather input and output parts of the pattern
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
        X.append(seq_x)
        y.append(seq_y)
    return array(X), array(y)

# define input sequence
raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]
# choose a number of time steps
n_steps = 4
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# reshape from [samples, timesteps] into [samples, timesteps, rows, columns, features]
n_features = 1
n_seq = 2
n_steps = 2
X = X.reshape((X.shape[0], n_seq, 1, n_steps, n_features))
# define model
model = Sequential()
model.add(ConvLSTM2D(filters=64, kernel_size=(1,2), activation='relu', input_shape=(n_seq, 1, n_steps, n_features)))
model.add(Flatten())
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
# fit model
model.fit(X, y, epochs=500, verbose=0)
# demonstrate prediction
x_input = array([60, 70, 80, 90])
x_input = x_input.reshape((1, n_seq, 1, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)
print(yhat)

[[104.07025]]
