In [6]:
import warnings
warnings.filterwarnings('ignore')

# Time series forecasting

This summer, I've done an internship in a bank in the City, in London. Time series forcasting was kind of a big deal there and that's why I decided to give it a shot in this Notebook. I should also recognise that during this internship, I've met a former PHD student in machine learning from Imperial College that introduced me to a couple of things concerning time serie machine learning

Firstly, we will considere univariate time series

## Data preparation


In [2]:
# We need to split the time series into multiple output  for the LSTM model
from numpy import array

def split_sequence(seq, n_steps):
    X,y = [],[]
    # a row of X will contain n elements and the correponding row in y will contain the n+1 element
    for i in range(len(seq)):
        # find the n+1 element of the row
        end_ix= i + n_steps
        # check if we are not going too far in the sequence
        if end_ix > len(seq)-1:
            break
        # We then build X and y
        seq_x , seq_y = seq[i:end_ix], seq[end_ix]
        X.append(seq_x)
        y.append(seq_y)
    return array(X), array(y)

In [3]:
# Test the function:
# define input sequence
raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190]
# choose a number of time steps
n_steps = 3
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# summarize the data
for i in range(len(X)):
	print(X[i], y[i])

[10 20 30] 40
[20 30 40] 50
[30 40 50] 60
[40 50 60] 70
[50 60 70] 80
[60 70 80] 90
[70 80 90] 100
[ 80  90 100] 110
[ 90 100 110] 120
[100 110 120] 130
[110 120 130] 140
[120 130 140] 150
[130 140 150] 160
[140 150 160] 170
[150 160 170] 180
[160 170 180] 190


## Vanilla LSTM
Let's devellop an LSTM model that has a single hidden layer of LSTM units, and an ouput layer used to make prediction

Because we are using univariate time series, we have only one feature in the time serie. Thus, the output of split_sequence (which will be the input of the LSTM model) has the shape:
<br/>
    ```[samples, timesteps]```

In [4]:
# reshape from [samples, timesteps] into [samples, timesteps, features]
n_features = 1
X = X.reshape((X.shape[0], X.shape[1], n_features))

We will fit our model with the Adam version of the gradient descent <br/>
The loss function used is the mean squared error

In [7]:
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense

model = Sequential()
model.add(LSTM(50, activation='relu', input_shape=(n_steps, n_features)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

In [9]:
# fit model
model.fit(X, y, epochs=200, verbose=0)

<keras.callbacks.History at 0x7f5721355610>

Now that the model is fit, we can use it to make predictions

In [10]:
# demonstrate prediction
x_input = array([170, 180, 190]) # we are expecting to get 100
x_input = x_input.reshape((1, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)
print(yhat)

[[200.39282]]


## Stacked LSTM
When multiple hidden LSTM layers can be stacked one on top of another in what is referred to as a Stacked LSTM model

In [11]:
# define model
model = Sequential()
model.add(LSTM(50, activation='relu', return_sequences=True, input_shape=(n_steps, n_features)))
# return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence.
model.add(LSTM(50, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

In [12]:
model.fit(X, y, epochs=200, verbose=0)

<keras.callbacks.History at 0x7f57182b86d0>

In [13]:
# prediction of the Stacked LSTM

x_input = array([170, 180, 190]) # we are expecting to get 100
x_input = x_input.reshape((1, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)
print(yhat)

[[200.53333]]


## Bidirectional LSTM
Here the LSTM model will learn the input sequence both forward and backward

To do this for univariate time series, we can wrap the first hidden layer in a wrapper layer called Bidirectional

In [14]:
# define model
from keras.layers import Bidirectional
 
model = Sequential()
model.add(Bidirectional(LSTM(50, activation='relu'), input_shape=(n_steps, n_features)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

We can check the efficiency of this model

In [15]:
# fit model
model.fit(X, y, epochs=200, verbose=0)

<keras.callbacks.History at 0x7f57082f13d0>

In [16]:
# demonstrate prediction
x_input = array([170, 180, 190])
x_input = x_input.reshape((1, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)
print(yhat)

[[200.28166]]


## CNN-LSTM
The CNN might be effective to extract feature from a one-dimensional sequence such as univariate time series

Here we will try to use an hybrid system. The idea is the following. The CNN is used to interpret the subsequences (sub divisions of the rows of X) thant are provided together as an input to an LSTM model to interpret. This hybrid model is called CNN-LSTM

Firstly, we have to split the input sequences into sub-sequences in order to be processed by the CNN model. <br/>
If we take a row of X we already have (if we use 4 steps):<br/>
```[30,40,50,60]```


We split it to get 2 subsequences:<br/>
```[30,40]``` and ```[50,60]```<br/>


The CNN will then interpret 2 subsequences of 2 time steps and provide a time series of interpretations of the subsequencies to the LSTM model.

In [17]:
# number of time steps
n_steps = 4
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]
n_features = 1
n_seq = 2
n_steps = 2
X = X.reshape((X.shape[0], n_seq, n_steps, n_features))

In order to use the same CNN model in each subsequence (taken separately) we can wrap the entire CNN model in a TimeDistributed wrapper that apply the model one per input.

CNN model :


First there is a convolutional layer to read across the subsequence.<br/>
number of filter: number of reads/interpretations of the input sequence.<br/>
kernel size: number of time steps included of each 'read' operation of the input sequence.


Then there is a max pooling mayer, it takes the most salient features of the filters.


Finally, we flatten down the structures to a 1D vector to be used by the LSTM layer


In [18]:
from keras.layers import Flatten
from keras.layers import TimeDistributed
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D

model = Sequential()
model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation='relu'), input_shape=(None, n_steps, n_features)))
model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
model.add(TimeDistributed(Flatten()))
#then the LSTM model
model.add(LSTM(50, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')




In [19]:
# fit model
model.fit(X, y, epochs=500, verbose=0)

<keras.callbacks.History at 0x7f56ef5befd0>

In [20]:
# test the model
x_input = array([160, 170, 180, 190])
x_input = x_input.reshape((1, n_seq, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)
print(yhat)

[[201.68326]]


## ConvLSTM


This code is inpired by  [this Medium introduction to ConvLSTM](https://medium.com/neuronio/an-introduction-to-convlstm-55c9025563a7)


ConvLSTM is related to the CNN-LSTM. In ConvLSTM, a convolutional reading of input is built directly in each LSTM unit. It was originally developped to deal with 2D spatial-temporal data. Thus we have to adapt it.

The layer takes as an input a sequence of 2D images. The shape of data must be:


```[samples, timesteps, rows, columns, features]```

for what we want to do, we can split each row of X into subsequences where timesteps will become the number of subsequences and columns will become the number of time steps for each subsequence. Since we are dealiong with 1D data, the number of row will be 1.

In [21]:
# choose a number of time steps
n_steps = 4
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# reshape from [samples, timesteps] into [samples, timesteps, rows, columns, features]
n_features = 1
n_seq = 2
n_steps = 2
X = X.reshape((X.shape[0], n_seq, 1, n_steps, n_features))

Then we define the ConvLSTM layer as a single layer in terms of the number of filters and a 2 dimensional kernel size which is (number_rows, number_columns)

In [22]:
from keras.layers import ConvLSTM2D

model = Sequential()
model.add(ConvLSTM2D(filters = 64, kernel_size = (1,2), activation = 'relu', input_shape=(n_seq, 1, n_steps, n_features)))
model.add(Flatten())
model.add(Dense(1))
model.compile(optimizer = 'adam', loss='mse')

In [23]:
# fit model
model.fit(X, y, epochs = 500, verbose = 0)

<keras.callbacks.History at 0x7f56ee8b9d90>

In [24]:
# demonstrate prediction
x_input = array([160, 170, 180, 190])
x_input = x_input.reshape((1, n_seq, 1, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)
print(yhat)

[[200.82155]]


# Multivariate LSM models
Here, there is more than 1 observation for each timestep

## Multiple input series


Here, we want to use multiple imput time series to predict one output time serie thqt is dependant on the input ones.

In [38]:
from numpy import array
from numpy import hstack
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense

#define input sequences
in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])
in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])
out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])

In [39]:
# reshape data
from numpy import hstack
# convert to [rows, columns] structure
in_seq1 = in_seq1.reshape((len(in_seq1), 1))
in_seq2 = in_seq2.reshape((len(in_seq2), 1))
out_seq = out_seq.reshape((len(out_seq), 1))
# horizontally stack columns
dataset = hstack((in_seq1, in_seq2, out_seq))
print(dataset)

[[ 10  15  25]
 [ 20  25  45]
 [ 30  35  65]
 [ 40  45  85]
 [ 50  55 105]
 [ 60  65 125]
 [ 70  75 145]
 [ 80  85 165]
 [ 90  95 185]]


We have to structure the data into samples with inputs and output elements, taking care of how LSTM deals with parallel input time series.

In [40]:
# split a multivariate sequence into samples
# kind of similar to split_sequence but we will have to discard the firsts output time serie value

def split_sequences(sequences, n_steps):
    X, y = [], []
    for i in range(len(sequences)):
        end_ix = i + n_steps
        # check if we are beyond the dataset
        if end_ix > len(sequences):
            break
        # build the input and output
        seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix -1, -1]
        X.append(seq_x)
        y.append(seq_y)
    return array(X), array(y)

In [41]:
# test it on our dataset

n_steps = 3

X, y = split_sequences(dataset, n_steps)
print(X.shape, y.shape)

for i in range(len(X)):
    print(X[i], y[i])

n_features = X.shape[2]

(7, 3, 2) (7,)
[[10 15]
 [20 25]
 [30 35]] 65
[[20 25]
 [30 35]
 [40 45]] 85
[[30 35]
 [40 45]
 [50 55]] 105
[[40 45]
 [50 55]
 [60 65]] 125
[[50 55]
 [60 65]
 [70 75]] 145
[[60 65]
 [70 75]
 [80 85]] 165
[[70 75]
 [80 85]
 [90 95]] 185


In [42]:
# define model
model = Sequential()
model.add(LSTM(50, activation = 'relu', input_shape = (n_steps, n_features)))
model.add(Dense(1))
model.compile(optimizer = 'adam', loss = 'mse')

In [44]:
# demonstrate prediction
x_input = array([[80, 85], [90, 95], [100, 105]])
# we expect 205
x_input = x_input.reshape((1, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)
print(yhat)

[[-7.677895]]


In [37]:
# multivariate lstm example
from numpy import array
from numpy import hstack
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
 
# split a multivariate sequence into samples
def split_sequences(sequences, n_steps):
	X, y = list(), list()
	for i in range(len(sequences)):
		# find the end of this pattern
		end_ix = i + n_steps
		# check if we are beyond the dataset
		if end_ix > len(sequences):
			break
		# gather input and output parts of the pattern
		seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1, -1]
		X.append(seq_x)
		y.append(seq_y)
	return array(X), array(y)
 
# define input sequence
in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])
in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])
out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])
# convert to [rows, columns] structure
in_seq1 = in_seq1.reshape((len(in_seq1), 1))
in_seq2 = in_seq2.reshape((len(in_seq2), 1))
out_seq = out_seq.reshape((len(out_seq), 1))
# horizontally stack columns
dataset = hstack((in_seq1, in_seq2, out_seq))
# choose a number of time steps
n_steps = 3
# convert into input/output
X, y = split_sequences(dataset, n_steps)
# the dataset knows the number of features, e.g. 2
n_features = X.shape[2]
# define model
model = Sequential()
model.add(LSTM(50, activation='relu', input_shape=(n_steps, n_features)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
# fit model
model.fit(X, y, epochs=200, verbose=0)
# demonstrate prediction
x_input = array([[80, 85], [90, 95], [100, 105]])
x_input = x_input.reshape((1, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)
print(yhat)

[[206.05481]]
