# LSTM Forecasting

## Prepare de data

First, we read the dataset with pandas and explore the nature of the data.

In [5]:
import pandas as pd
dataset = pd.read_csv('full_dataset.csv')
dataset.tail()

Unnamed: 0,date,price_eurusd,open_eurusd,high_eurusd,low_eurusd,change_eurusd,bid,ask,price_sp500,open_sp500,high_sp500,low_sp500,change_sp500
9937,6/27/2022,1.06,1.06,1.06,1.05,0.26,695.55,689.88,3900.11,3920.76,3927.72,3889.66,-0.3
9938,6/28/2022,1.05,1.06,1.06,1.05,-0.61,693.59,685.75,3821.55,3913.0,3945.86,3820.14,-2.01
9939,6/29/2022,1.04,1.05,1.05,1.04,-0.75,692.96,686.23,3818.83,3825.09,3836.5,3799.02,-0.07
9940,6/30/2022,1.05,1.04,1.05,1.04,0.41,692.25,684.76,3785.38,3785.99,3818.99,3738.67,-0.88
9941,7/1/2022,1.04,1.05,1.05,1.04,-0.52,693.69,685.26,3825.33,3781.0,3829.82,3752.1,1.06


We have to define which columns will be used to predict. In this particular case, we use `price_eurusd` and `price_sp500` as inputs and `bid` as the target. We use `.values` to get the numpy array.

In [6]:
import numpy as np

in_seq1 = dataset["price_eurusd"].values
in_seq2 =  dataset["price_sp500"].values
out_seq = dataset["bid"].values

print(in_seq1.shape, in_seq2.shape, out_seq.shape)
print(in_seq1)

(9942,) (9942,) (9942,)
[0.98 0.99 0.99 ... 1.04 1.05 1.04]


We have to transform the arrays in a single array. For that, first we reshape the arrays to have matrixes with one column.

In [7]:
in_seq1 = in_seq1.reshape((len(in_seq1), 1))
in_seq2 = in_seq2.reshape((len(in_seq2), 1))
out_seq = out_seq.reshape((len(out_seq), 1))
print(in_seq1.shape, in_seq2.shape, out_seq.shape)
print(in_seq1)

(9942, 1) (9942, 1) (9942, 1)
[[0.98]
 [0.99]
 [0.99]
 ...
 [1.04]
 [1.05]
 [1.04]]


Then, we "merge" the matrixes to have a single matrix. Each row is a time step and each column is a separate time series.

In [8]:
dataset = np.hstack((in_seq1, in_seq2, out_seq))
print(dataset.shape)
print(dataset)

(9942, 3)
[[9.80000e-01 1.41300e+02 4.54500e+01]
 [9.90000e-01 1.42000e+02 4.54500e+01]
 [9.90000e-01 1.45300e+02 4.54500e+01]
 ...
 [1.04000e+00 3.81883e+03 6.92960e+02]
 [1.05000e+00 3.78538e+03 6.92250e+02]
 [1.04000e+00 3.82533e+03 6.93690e+02]]


In [9]:
# Split a multivariate sequence into samples
def split_sequences(sequences, n_steps):
    X, y = list(), list()
    for i in range(len(sequences)):
        # find the end of this pattern
        end_ix = i + n_steps
        # check if we are beyond the dataset
        if end_ix > len(sequences):
            break
        # gather input and output parts of the pattern
        seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1, -1]
        X.append(seq_x)
        y.append(seq_y)
    return np.array(X), np.array(y)

In [10]:
# choose a number of time steps
n_steps = 6
# convert into input/output
X, y = split_sequences(dataset, n_steps)
print(X.shape)
# the dataset knows the number of features, e.g. 2
n_features = X.shape[2]

(9937, 6, 2)


## LSTM Model

In [11]:
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense

def build_lstm_model(steps, features):
    # define model
    model = Sequential()
    model.add(LSTM(50, activation='relu', input_shape=(n_steps, n_features)))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mse')
    return model

Then, we just need to train the model with 200 epochs. It will take some time to finish.

In [12]:
# fit model
model = build_lstm_model(n_steps, n_features)
model.fit(X, y, epochs=200, verbose=0)

<keras.callbacks.History at 0x281f7da8970>

## Prediction

In [13]:
x_input = np.array([[1.06, 3911.74], [1.06, 3900.11], [1.05, 3821.55], [1.04, 3818.83], [1.05, 3785.38], [1.04, 3825.33]])
x_input = x_input.reshape((1, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)
print(yhat)

[[569.56226]]
