# Develop LSTM Models for Time Series Forecasting
A time series is a sequence of observations taken sequentially in time. Predictions are made for new data when the actual data may not be known until some future date.
Understanding a dataset, called Time Series Analysis (depends on the purpose of prediction). 

In descriptive modeling, or time series analysis, a time series is modeled to determine its components in terms of seasonal patterns, trends, relation to external factors, and the like. … In contrast, time series forecasting uses the information in a time series (perhaps with additional information) to forecast future values of that series

### Time Series Analysis
Time series analysis involves developing models that best capture or describe an observed time series in order to understand the underlying causes. This field of study seeks the “why” behind a time series dataset.

This often involves making assumptions about the form of the data and decomposing the time series into constitution components.

### Time Series Forecasting
Making predictions about the future is called extrapolation in the classical statistical handling of time series data.

More modern fields focus on the topic and refer to it as time series forecasting.

Forecasting involves taking models fit on historical data and using them to predict future observations.

Descriptive models can borrow for the future (i.e. to smooth or remove noise), they only seek to best describe the data.

An important distinction in forecasting is that the future is completely unavailable and must only be estimated from what has already happened.

- The purpose of time series analysis is generally twofold: to understand or model the stochastic mechanisms that gives rise to an observed series and to predict or forecast the future values of a series based on the history of that series

### Components of Time Series
Time series analysis provides a body of techniques to better understand a dataset.

Perhaps the most useful of these is the decomposition of a time series into 4 constituent parts:

1. Level. The baseline value for the series if it were a straight line.
2. Trend. The optional and often linear increasing or decreasing behavior of the series over time.
3. Seasonality. The optional repeating patterns or cycles of behavior over time.
4. Noise. The optional variability in the observations that cannot be explained by the model.

All time series have a level, most have noise, and the trend and seasonality are optional.

- The main features of many time series are trends and seasonal variations … another important feature of most time series is that observations close together in time tend to be correlated (serially dependent)

### Concerns of Forecasting
When forecasting, it is important to understand your goal.

Use the Socratic method and ask lots of questions to help zoom in on the specifics of your predictive modeling problem. For example:

1. How much data do you have available and are you able to gather it all together? More data is often more helpful, offering greater opportunity for exploratory data analysis, model testing and tuning, and model fidelity.
2. What is the time horizon of predictions that is required? Short, medium or long term? Shorter time horizons are often easier to predict with higher confidence.
3. Can forecasts be updated frequently over time or must they be made once and remain static? Updating forecasts as new information becomes available often results in more accurate predictions.
4. At what temporal frequency are forecasts required? Often forecasts can be made at a lower or higher frequencies, allowing you to harness down-sampling, and up-sampling of data, which in turn can offer benefits while modeling.

Time series data often requires cleaning, scaling, and even transformation.

For example:

- Frequency. Perhaps data is provided at a frequency that is too high to model or is unevenly spaced through time requiring resampling for use in some models.
- Outliers. Perhaps there are corrupt or extreme outlier values that need to be identified and handled.
- Missing. Perhaps there are gaps or missing data that need to be interpolated or imputed.

Often time series problems are real-time, continually providing new opportunities for prediction. This adds an honesty to time series forecasting that quickly flushes out bad assumptions, errors in modeling and all the other ways that we may be able to fool ourselves.

### Examples of Time Series Forecasting
There is almost an endless supply of time series forecasting problems.

Below are 10 examples from a range of industries to make the notions of time series analysis and forecasting more concrete.

Forecasting the corn yield in tons by state each year.

Forecasting whether an EEG trace in seconds indicates a patient is having a seizure or not.

Forecasting the closing price of a stock each day.

Forecasting the birth rate at all hospitals in a city each year.

Forecasting product sales in units sold each day for a store.

Forecasting the number of passengers through a train station each day.

Forecasting unemployment for a state each quarter.

Forecasting utilization demand on a server each hour.

Forecasting the size of the rabbit population in a state each breeding season.

Forecasting the average price of gasoline in a city each day.

I expect that you will be able to relate one or more of these examples to your own time series forecasting problems that you would like to address.

# LSTM models for Time Series Forecasting
### How to develop a suite of LSTM models for a range of standard time series forecasting problems.
- How to develop LSTM models for univariate time series forecasting.
- How to develop LSTM models for multivariate time series forecasting.
- How to develop LSTM models for multi-step time series forecasting.

There are four parts of this tutorial:

1. Univariate LSTM Models

    1.1. Data Preparation

    1.2. Vanilla LSTM
    
    1.3. Stacked LSTM
    
    1.4. Bidirectional LSTM
    
    1.5. CNN LSTM
    
    1.6. ConvLSTM

2. Multivariate LSTM Models
    
    2.1. Multiple Input Series.
    
    2.2.Multiple Parallel Series.

3. Multi-Step LSTM Models

    3.1. Data Preparation
    
    3.2. Vector Output Model
    
    3.3. Encoder-Decoder Model

4. Multivariate Multi-Step LSTM Models

    4.1. Multiple Input Multi-Step Output.
    
    4.2. Multiple Parallel Input and Multi-Step Output.
    
# 1. Univariate LSTM models, the demonstration of the LSTM model for univariate time series forecasting
The LSTM model will learn a function that maps a sequence of past observations as input to an output observation. As such, the sequence of observations must be transformed into multiple examples from which the LSTM can learn.

### 1.1 Data preparation
A given univariate sequence:

    [10, 20, 30, 40, 50, 60, 70, 80, 90]
    
We can divide the sequence into multiple input/output patterns called samples, where three time steps are used as input and one time step is used as output for the one-step prediction that is being learned.

    X,				y
    10, 20, 30		40
    20, 30, 40		50
    30, 40, 50		60
    ...

In [1]:
# univariate data preparation
from numpy import array

# split a univariate sequence into samples
def split_sequence(sequence, n_steps):
	X, y = list(), list()
	for i in range(len(sequence)):
		# find the end of this pattern
		end_ix = i + n_steps
		# check if we are beyond the sequence
		if end_ix > len(sequence)-1:
			break
		# gather input and output parts of the pattern
		seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
		X.append(seq_x)
		y.append(seq_y)
	return array(X), array(y)

# define input sequence
raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]
# choose a number of time steps
n_steps = 3
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# summarize the data
for i in range(len(X)):
	print(X[i], y[i])

[10 20 30] 40
[20 30 40] 50
[30 40 50] 60
[40 50 60] 70
[50 60 70] 80
[60 70 80] 90


### 1.2 Vanilla LSTM
A Vanilla LSTM is an LSTM model that has a single hidden layer of LSTM units, and an output layer used to make a prediction.

We can define a Vanilla LSTM for univariate time series forecasting as follows.

Key in the definition is the shape of the input; that is what the model expects as input for each sample in terms of the number of time steps and the number of features.

We are working with a univariate series, so the number of features is one, for one variable.

The number of time steps as input is the number we chose when preparing our dataset as an argument to the split_sequence() function.

The shape of the input for each sample is specified in the input_shape argument on the definition of first hidden layer.

We almost always have multiple samples, therefore, the model will expect the input component of training data to have the dimensions or shape:

    [samples, timesteps, features]

In [2]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
import matplotlib.pyplot as plt


In [3]:
# reshape from [samples, timesteps] into [samples, timesteps, features]
n_features = 1
X = X.reshape((X.shape[0], X.shape[1], n_features))
for val in X:
    print(val)

[[10]
 [20]
 [30]]
[[20]
 [30]
 [40]]
[[30]
 [40]
 [50]]
[[40]
 [50]
 [60]]
[[50]
 [60]
 [70]]
[[60]
 [70]
 [80]]


In [4]:
## define vanilla model
# create Sequential model
model = Sequential()
# define 50 LSTM units in hidden layer, output layer predicts a single numerical value
model.add(LSTM(50, activation='relu', input_shape=(n_steps, n_features)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
# fit model
model.fit(X, y, epochs=200, verbose=0)

# demonstrate prediction
x_input = array([70, 80, 90]) # test set
x_input = x_input.reshape((1, n_steps, n_features))
print(x_input)

yhat = model.predict(x_input, verbose=0)
print(yhat)

## result: model will predict a next value in the sequence

2022-02-21 16:09:49.653752: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2022-02-21 16:09:49.653775: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: G15
2022-02-21 16:09:49.653779: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: G15
2022-02-21 16:09:49.653830: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 510.47.3
2022-02-21 16:09:49.653842: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 510.47.3
2022-02-21 16:09:49.653845: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 510.47.3
2022-02-21 16:09:49.654057: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: 

[[[70]
  [80]
  [90]]]
[[102.874565]]


- Expected value is 100

### 1.2. Stacked LSTM
Multiple hidden LSTM layers can be stacked one on top of another in what is referred to as a Stacked LSTM model.

An LSTM layer requires a three-dimensional input and LSTMs by default will produce a two-dimensional output as an interpretation from the end of the sequence.

We can address this by having the LSTM output a value for each time step in the input data by setting the return_sequences=True argument on the layer. This allows us to have 3D output from hidden LSTM layer as input to the next.

In [5]:
del model # delete model parameter out of RAM
# redefine model
model = Sequential()
model.add(LSTM(50, activation='relu', return_sequences=True, input_shape=(n_steps, n_features)))
model.add(LSTM(50, activation='relu'))
model.add(Dense(1))
# compile
model.compile(optimizer='adam', loss='mse')
# fit 
model.fit(X, y, epochs=200, verbose=0)
# demonstrate prediction
# re-using x_input
yhat = model.predict(x_input, verbose=0)
print(yhat)

[[103.7662]]


### 1.3. Bidirectional LSTM
On some sequence prediction problems, it can be beneficial to allow the LSTM model to learn the input sequence both forward and backwards and concatenate both interpretations.

This is called a Bidirectional LSTM.

We can implement a Bidirectional LSTM for univariate time series forecasting by wrapping the first hidden layer in a wrapper layer called Bidirectional.

An example of defining a Bidirectional LSTM to read input both forward and backward is as follows.

In [6]:
from tensorflow.keras.layers import Bidirectional


del model

model = Sequential()
model.add(Bidirectional(LSTM(50, activation='relu'), input_shape=(n_steps, n_features)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
model.fit(X, y, epochs=200, verbose=0)
# re-using x_input
yhat = model.predict(x_input, verbose=0)
print(yhat)

[[101.63932]]


### 1.4. CNN LSTM
A convolutional neural network, or CNN for short, is a type of neural network developed for working with two-dimensional image data.

The CNN can be very effective at automatically extracting and learning features from one-dimensional sequence data such as univariate time series data.

A CNN model can be used in a hybrid model with an LSTM backend where the CNN is used to interpret subsequences of input that together are provided as a sequence to an LSTM model to interpret. This hybrid model is called a CNN-LSTM.

The first step is to split the input sequences into subsequences that can be processed by the CNN model. For example, we can first split our univariate time series data into input/output samples with four steps as input and one as output. Each sample can then be split into two sub-samples, each with two time steps. The CNN can interpret each subsequence of two time steps and provide a time series of interpretations of the subsequences to the LSTM model to process as input.

We can parameterize this and define the number of subsequences as n_seq and the number of time steps per subsequence as n_steps. The input data can then be reshaped to have the required structure:

    [samples, subsequences, timesteps, features]

In [7]:
# re-define input/output by choose a number of time steps
n_steps = 4
X, y = split_sequence(raw_seq, n_steps)
# reshape from [samples, timestep] into [samples, subsequences, timestep, features]
n_features = 1
n_seq = 2
n_steps = 2
X = X.reshape((X.shape[0], n_seq, n_steps, n_features))
display(X.shape, X[0])

(5, 2, 2, 1)

array([[[10],
        [20]],

       [[30],
        [40]]])

- Reuse the same CNN model when reading in each sub-sequence of data separately, by wrapping the entire CNN model in the TimeDistributed wrapper that will apply the entire model once per input.
- The CNN first has convolutional layer for reading across the subsequence that requires a number of filters and a kernel size to be specified. The number of filters is the number of reads or interpretations of the input sequence. The kernel size is the number of time steps included of each ‘read’ operation of the input sequence.

- The convolution layer is followed by a max pooling layer that distills the filter maps down to 1/2 of their size that includes the most salient features. These structures are then flattened down to a single one-dimensional vector to be used as a single input time step to the LSTM layer.
- Next, we can define the LSTM part of the model that interprets the CNN model’s read of the input sequence and makes a prediction.

#### CNN-LSTM model for univariate time series forecasting

In [8]:
from tensorflow.keras.layers import Flatten, TimeDistributed, Conv1D, MaxPooling1D

del model
# define the mixed CNN-LSTM model
model = Sequential()
# specific the CNN model has Conv1D layer
model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation='relu'), input_shape=(None, n_steps, n_features)))
# max pooling layer distills the filter maps down to 1/2 of size
model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
# then, flatten the model to a single one dimensional vector to be used as a single input time step to the LSTM
model.add(TimeDistributed(Flatten()))
# next, model defines the LSTM that interprets the CNN model's read of the input sequence and make prediction
model.add(LSTM(50, activation='relu'))
model.add(Dense(1)) # output
# compile model
model.compile(optimizer='adam', loss='mse')
# fit training data to model
model.fit(X, y, epochs=500, verbose=0)

# prediction on test set
testX = array([60, 70, 80, 90])
testX = testX.reshape((1, n_seq, n_steps, n_features))
yhat= model.predict(testX, verbose=0)
print(yhat)
## expected value 100

[[101.72918]]


### ConvLSTM
A type of LSTM related to the CNN-LSTM is the ConvLSTM, where the convolutional reading of input is built directly into each LSTM unit.

The ConvLSTM was developed for reading two-dimensional spatial-temporal data, but can be adapted for use with univariate time series forecasting.

The layer expects input as a sequence of two-dimensional images, therefore the shape of input data must be:

    [samples, timesteps, rows, columns, features]

In [9]:
# choose a number of time steps
n_steps = 4
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# reshape from [samples, timesteps] into [samples, timesteps, rows, columns, features]
n_features = 1
n_seq = 2
n_steps = 2
X = X.reshape((X.shape[0], n_seq, 1, n_steps, n_features))
X.shape

(5, 2, 1, 2, 1)

We can define the ConvLSTM as a single layer in terms of the number of filters and a two-dimensional kernel size in terms of (rows, columns). As we are working with a one-dimensional series, the number of rows is always fixed to 1 in the kernel.

The output of the model must then be flattened before it can be interpreted and a prediction made.

In [10]:
from tensorflow.keras.layers import ConvLSTM2D

del model
# define model
model = Sequential()
model.add(ConvLSTM2D(filters=64, kernel_size=(1,2), activation='relu', input_shape=(n_seq, 1, n_steps, n_features)))
model.add(Flatten())
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
# fit model
model.fit(X, y, epochs=500, verbose=0)
# demonstrate prediction
testX = array([60, 70, 80, 90])
testX = testX.reshape((1, n_seq, 1, n_steps, n_features))
yhat = model.predict(testX, verbose=0)
print(yhat)

[[103.65026]]


# 2. Multivariate LSTM Models
Multivariate time series data means data where there is more than one observation for each time step.

There are two main models for the multivariate time series data:
1. Multiple Input Series
2. Multiple Parallel Series

## 2.1 Multiple Input Series
A problem may have two or more parallel input time series and an output time series that is dependent on the input time series.

The input time series are parallel because each series has an observation at the same time steps.

- We can reshape these three arrays of data as a single dataset where each row is a time step, and each column is a separate time series. This is a standard way of storing parallel time series in a CSV file.

In [11]:
import numpy as np

# define and reshape input sequences and output sequence.
def define_seq(seq1, seq2):
    seq1 = np.array(seq1)
    seq2 = np.array(seq2)
    output = np.array([seq1[i] + seq2[i] for i in range(len(seq1))])
    # reshape to [rows, columns]
    seq1 = seq1.reshape((len(seq1), 1)) # multi row, 1 column
    seq2 = seq2.reshape((len(seq2), 1))
    output = output.reshape((len(output), 1))
    # compress 3 seqs into the horizontal structure seq and return
    return np.hstack((seq1, seq2, output))

# define input sequence
in_seq1 = [10, 20, 30, 40, 50, 60, 70, 80, 90]
in_seq2 = [15, 25, 35, 45, 55, 65, 75, 85, 95]

dataset = define_seq(in_seq1, in_seq2)
dataset

array([[ 10,  15,  25],
       [ 20,  25,  45],
       [ 30,  35,  65],
       [ 40,  45,  85],
       [ 50,  55, 105],
       [ 60,  65, 125],
       [ 70,  75, 145],
       [ 80,  85, 165],
       [ 90,  95, 185]])

As with the univariate time series, we must structure these data into samples with input and output elements.

An LSTM model needs sufficient context to learn a mapping from an input sequence to an output value. LSTMs can support parallel input time series as separate variables or features. Therefore, we need to split the data into samples maintaining the order of observations across the two input sequences.

if we chose three input time steps, so:
    
    10, 15
    20, 25
    30, 35
    
then the output series is 65

We can see that, in transforming the time series into input/output samples to train the model, that we will have to discard some values from the output time series where we do not have values in the input time series at prior time steps. In turn, the choice of the size of the number of input time steps will have an important effect on how much of the training data is used.

In [12]:
# split a multivariate sequence into samples
def split_sequences(sequences, n_steps):
    X, y = list(), list()
    for i in range(len(sequences)):
        end_ix = i + n_steps
        if end_ix > len(sequences):
            break
        seqX = sequences[i:end_ix, :-1]
        seqy = sequences[end_ix - 1, -1]
        X.append(seqX)
        y.append(seqy)
    return np.array(X), np.array(y)

n_steps = 3
X, y = split_sequences(dataset, n_steps)
print(X.shape, y.shape)
print(X[0], y[0])

(7, 3, 2) (7,)
[[10 15]
 [20 25]
 [30 35]] 65


In [13]:
## We will use a Vanilla LSTM where the number of time steps and parallel series (features) are specified 
## for the input layer via the input_shape argument. We can use any LSTM models such as Vanilla, 
## Stacked, Bidirection, CNN, ConvLSTM.

# define model
def vanilla_model(X, y, n_steps):
    n_features = X.shape[2]
    # define model
    model = Sequential()
    model.add(LSTM(50, activation='relu', input_shape=(n_steps, n_features)))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mse')
    model.fit(X, y, epochs=500, verbose=0)
    return model

In [14]:
# predict test set
n_features = X.shape[2]
input_x = np.array([[80, 85], [90, 95], [100, 105]])
input_x = input_x.reshape((1, n_steps, n_features))

model = vanilla_model(X, y, n_steps)
yhat = model.predict(input_x, verbose=0)
print(yhat)

## result is quietly well, which expected value is 205

[[205.62956]]


## 2.2 Multiple Parallel Series
An alternate time series problem is the case where there are multiple parallel time series and a value must be predicted for each.

We may want to predict the value for each of the three time series for the next time step.

This might be referred to as multivariate forecasting.

    10, 15, 25
    20, 25, 45
    30, 35, 65
    
Output:
    
    40, 45, 85

In [15]:
# re-define the split_sequences
def split_parallel_sequences(sequences, n_steps):
    X, y = list(), list()
    for i in range(len(sequences)):
        end_ix = i + n_steps
        if end_ix > len(sequences) - 1:
            break
        # gather input and output parts of pattern
        seq_x = sequences[i:end_ix, :]
        seq_y = sequences[end_ix, :]
        X.append(seq_x)
        y.append(seq_y)
    return np.array(X), np.array(y)

# process dataset
n_steps = 3
X, y = split_parallel_sequences(dataset, n_steps)
n_features = X.shape[2]
print(X.shape, y.shape)

(6, 3, 3) (6, 3)


In [16]:
## We will use a Stacked LSTM where the number of time steps and parallel series (features) are specified for 
## the input layer via the input_shape argument. The number of parallel series is also used in the specification 
## of the number of values to predict by the model in the output layer. (can apply any models)

# using the Stacked LSTM model
def stacked_model(X, y, n_steps, n_features):
    model = Sequential()
    model.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_steps, n_features)))
    model.add(LSTM(100, activation='relu'))
    model.add(Dense(n_features))
    model.compile(optimizer='adam', loss='mse')
    model.fit(X, y, epochs=500, verbose=0)
    return model

In [17]:
# predict on testset
input_x = np.array([[70,75,145], [80,85,165], [90,95,185]])
input_x = input_x.reshape((1, n_steps, n_features))
model = stacked_model(X, y, n_steps, n_features)
yhat = model.predict(input_x, verbose=0)
print(yhat)

# expected value 100, 105, 205

[[100.60504 106.18641 206.44127]]


# 3. Multi-Step LSTM models
A time series forecasting problem that requires a prediction of multiple time steps into the future can be referred to as multi-step time series forecasting.

Specifically, these are problems where the forecast horizon or interval is more than one time step.

There are two main types of LSTM models that can be used for multi-step forecasting

1. Vector Output Model
2. Encoder-Decoder Model

In [18]:
## prepare data for input seq [10, 20, 30] and output seq [40, 50]
def split_multi_steps(sequences, n_in, n_out):
    X, y = list(), list()
    for i in range(len(sequences)):
        end_ix = i + n_in
        out_end_ix = end_ix + n_out
        if out_end_ix > len(sequences):
            break
        seq_x, seq_y = sequences[i:end_ix], sequences[end_ix:out_end_ix]
        X.append(seq_x)
        y.append(seq_y)
    return np.array(X), np.array(y)

# split into samples
n_steps_in, n_steps_out = 3, 2
X, y = split_multi_steps(raw_seq, n_steps_in, n_steps_out) # use raw_seq 
print(X[0], y[0])

[10 20 30] [40 50]


## 3.1. Vector Output Model
Like other types of neural network models, the LSTM can output a vector directly that can be interpreted as a multi-step forecast.

This approach was seen in the previous section were one time step of each output time series was forecasted as a vector.

As with the LSTMs for univariate data in a prior section, the prepared samples must first be reshaped. The LSTM expects data to have a three-dimensional structure of [samples, timesteps, features], and in this case, we only have one feature so the reshape is straightforward.

With the number of input and output steps specified in the n_steps_in and n_steps_out variables, we can define a multi-step time-series forecasting model.

Any of the presented LSTM model types could be used, such as Vanilla, Stacked, Bidirectional, CNN-LSTM, or ConvLSTM. Below defines a Stacked LSTM for multi-step forecasting.

In [19]:
# reshape from [samples, timesteps] into [samples, timesteps, features]
n_features = 1
X = X.reshape((X.shape[0], X.shape[1], n_features))

# define model, Stacked model type.
def vector_output_model(X, y, n_steps_in, n_steps_out, n_features):
    model = Sequential()
    model.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_steps_in, n_features)))
    model.add(LSTM(100, activation='relu'))
    model.add(Dense(n_steps_out))
    model.compile(optimizer='adam', loss='mse')
    model.fit(X, y, epochs=500, verbose=0)
    return model

In [20]:
input_x = np.array([70, 80, 90])
input_x = input_x.reshape((1, n_steps_in, n_features))
# result test set
model = vector_output_model(X, y, n_steps_in, n_steps_out, n_features)
yhat = model.predict(input_x, verbose=0)
print(yhat)

[[103.028175 114.33643 ]]


## 3.2 Encoder-Decoder Model
A model specifically developed for forecasting variable length output sequences is called the Encoder-Decoder LSTM.

The model was designed for prediction problems where there are both input and output sequences, so-called sequence-to-sequence, or seq2seq problems, such as translating text from one language to another.

This model can be used for multi-step time series forecasting.

As its name suggests, the model is comprised of two sub-models: the encoder and the decoder.

The encoder is a model responsible for reading and interpreting the input sequence. The output of the encoder is a fixed length vector that represents the model’s interpretation of the sequence. The encoder is traditionally a Vanilla LSTM model, although other encoder models can be used such as Stacked, Bidirectional, and CNN models.

The decoder uses the output of the encoder as an input.

In [21]:
from tensorflow.keras.layers import RepeatVector, TimeDistributed

print(X.shape, y.shape)

(5, 3, 1) (5, 2)


In [22]:
# univariate multi-step encoder-decoder lstm example
# required the y reshape
y = y.reshape((y.shape[0], y.shape[1], n_features))
print(y.shape, y[0])

(5, 2, 1) [[40]
 [50]]


In [23]:
# define encoder-decoder model
def encoder_decoder_model(X, y, n_steps_in, n_steps_out, n_features):
    model = Sequential()
    # encoder, reading and interpreting the input seq, the output of encoder is a fixed length vector
    model.add(LSTM(100, activation='relu', input_shape=(n_steps_in, n_features)))
    # repeating fixed length output of encoder with required time step in the output seq
    model.add(RepeatVector(n_steps_out))
    # LSTM decoder model, output value on each value in output time step
    model.add(LSTM(100, activation='relu', return_sequences=True))
    # use the same output layer for prediction in the output seq 
    model.add(TimeDistributed(Dense(1)))
    model.compile(optimizer='adam', loss='mse')
    model.fit(X, y, epochs=500, verbose=0)
    return model

In [24]:
# demonstrate prediction
x_input = np.array([70, 80, 90])
x_input = x_input.reshape((1, n_steps_in, n_features))
model = encoder_decoder_model(X, y, n_steps_in, n_steps_out, n_features)
yhat = model.predict(x_input, verbose=0)
print(yhat)

[[[102.2284 ]
  [112.68889]]]


# 4. Multivariate Multi-step LSTM models
It is possible to mix and match the different types of LSTM models presented so far for the different problems. This too applies to time series forecasting problems that involve multivariate and multi-step forecasting, but it may be a little more challenging.

1. Multiple Input Multi-Step Output.
2. Multiple Parallel Input and Multi-Step Output.

## 4.1. Multiple Input Multi-step Output

In [25]:
## define the split_seq multivariate time series for multiple Input
def split_multiple_input_seq(sequences, n_steps_in, n_steps_out):
    X, y = list(), list()
    for i in range(len(sequences)):
        end_ix = i + n_steps_in
        out_ix = end_ix + n_steps_out - 1
        if out_ix > len(sequences):
            break
        seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1:out_ix, -1]
        X.append(seq_x)
        y.append(seq_y)
    return np.array(X), np.array(y)

## setup input samples
n_steps_in, n_steps_out = 3, 2
X, y = split_multiple_input_seq(dataset, n_steps_in, n_steps_out)
print(X.shape, y.shape)

(6, 3, 2) (6, 2)


- We have 6 samples
- The input portion of the samples is 3D, with 3 time steps and 2 variables for 2 input time series
- The output portion has 2D, which is 2 time steps for 6 samples 

In [26]:
## define model
n_features = X.shape[2]
# reuse the encoder-decoder model, demons a vector output with Stacked LSTM
def multivariate_model(X, y, n_steps_in, n_steps_out, n_features):
    model = Sequential()
    model.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_steps_in, n_features)))
    model.add(LSTM(100, activation='relu'))
    model.add(Dense(n_steps_out))
    model.compile(optimizer='adam', loss='mse')
    model.fit(X, y, epochs=500, verbose=0)
    return model

In [27]:
x_input = np.array([[70, 75], [80, 85], [90, 95]])
x_input = x_input.reshape((1, n_steps_in, n_features))
model = multivariate_model(X, y, n_steps_in, n_steps_out, n_features)
yhat = model.predict(x_input, verbose=0)
print(yhat)

## expected value 185, 205

[[185.3084  206.19203]]


## 4.2. Multiple Parallel Input and Multi-step Output
A problem with parallel time series may require the prediction of multiple time steps of each time series.

We may use the last three time steps from each of the three time series as input to the model and predict the next time steps of each of the three time series as output.

Input

    10, 15, 25
    20, 25, 45
    30, 35, 65

Output

    40, 45, 85
    50, 55, 105
    
We can see that both the input (X) and output (Y) elements of the dataset are three dimensional for the number of samples, time steps, and variables or parallel time series respectively.

We can use either the Vector Output or Encoder-Decoder LSTM to model this problem. 

In [28]:
## multivariate multi-step encoder-decoder lstm example
# split data
def split_parallel_input_seq(sequences, n_steps_in, n_steps_out):
    X, y = list(), list()
    for i in range(len(sequences)):
        end_ix = i + n_steps_in
        out_ix = end_ix + n_steps_out
        if out_ix > len(sequences):
            break
        seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix:out_ix, :]
        X.append(seq_x)
        y.append(seq_y)
    return np.array(X), np.array(y)

n_steps_in, n_steps_out = 3, 2
X, y = split_parallel_input_seq(dataset, n_steps_in, n_steps_out)
print(X.shape, y.shape)

(5, 3, 3) (5, 2, 3)


In [29]:
# encoder-decoder model
def multiple_parallel_model(X, y, n_steps_in, n_steps_out, n_features):
    model = Sequential()
    model.add(LSTM(200, activation='relu', input_shape=(n_steps_in, n_features)))
    model.add(RepeatVector(n_steps_out))
    model.add(LSTM(200, activation='relu', return_sequences=True))
    model.add(TimeDistributed(Dense(n_features)))
    model.compile(optimizer='adam', loss='mse')
    model.fit(X, y, epochs=500, verbose=0)
    return model

In [30]:
n_features = X.shape[2]
x_input = np.array([[60, 65, 125], [70, 75, 145], [80, 85, 165]])
x_input = x_input.reshape((1, n_steps_in, n_features))
model = multiple_parallel_model(X, y, n_steps_in, n_steps_out, n_features)
yhat = model.predict(x_input, verbose=0)
print(yhat)

[[[ 90.13544   95.713745 185.54388 ]
  [100.09547  106.224075 206.01292 ]]]
