LSTM is an RNN that can capture the pattern in sequential data. The benefit is that it can learn and remember for long sequences. In keras this is referred to as setting the stateful argument as true in the lstm layer 
Lstm includes three important gates: input gate, forget gate and the output gate. The interactive operation among these three gates makes LSTM have the sufficient ability to solve the problem of long-term dependencies
which general RNNs cannot learn. "The learning speed of the previous hidden layers is slower than the deeper
hidden layers. This phenomenon may even lead to a decrease of accuracy rate as hidden layers
increase [25]. However, the smart design of the memory cell in LSTM can effectively solve the problem
of gradient vanishing in backpropagation and can learn the input sequence with longer time steps.
Hence, LSTM is commonly used for solving applications related to time serial issues. "

- LSTMs are a type of recurrent network, and as such are designed to take sequence data as input, unlike other models where lag observations must be presented as input features.
- LSTMs directly support multiple parallel input sequences for multivariate inputs, unlike other models where multivariate inputs are presented in a flat structure.
- Like other neural networks, LSTMs are able to map input data directly to an output vector that may represent multiple output time steps.

- A popular approach has been to combine CNNs with LSTMs, where the CNN is as an encoder to learn features from sub-sequences of input data which are provided as time steps to an LSTM. This architecture is called a CNN-LSTM.
- A power variation on the CNN LSTM architecture is the ConvLSTM that uses the convolutional reading of input subsequences directly within an LSTM’s units. This approach has proven very effective for time series classification and can be adapted for use in multi-step time series forecasting.

In [69]:
import numpy as np
import pandas as pd
import pickle 
import sklearn 

In [70]:
with open('../data/train_data.pickle', 'rb') as f:
    train_data = pickle.load(f)

In [71]:
with open('../data/test_data.pickle', 'rb') as f:
    test_data = pickle.load(f)

In [72]:
#def evaluate_forecasts(actual, predicted):
train_data.head()

Unnamed: 0_level_0,pollution,dew,temp,press,wnd_dir,wnd_spd,snow,rain
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2010-01-02 00:00:00,0.129779,0.352941,0.245902,0.527273,0.333333,0.00229,0.0,0.0
2010-01-02 01:00:00,0.148893,0.367647,0.245902,0.527273,0.333333,0.003811,0.0,0.0
2010-01-02 02:00:00,0.15996,0.426471,0.229508,0.545455,0.333333,0.005332,0.0,0.0
2010-01-02 03:00:00,0.182093,0.485294,0.229508,0.563636,0.333333,0.008391,0.037037,0.0
2010-01-02 04:00:00,0.138833,0.485294,0.229508,0.563636,0.333333,0.009912,0.074074,0.0


In [73]:
test_data.head()

Unnamed: 0_level_0,pollution,dew,temp,press,wnd_dir,wnd_spd,snow,rain
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2014-12-18 00:00:00,0.181087,0.397059,0.213115,0.709091,0.0,0.00229,0.0,0.0
2014-12-18 01:00:00,0.171026,0.397059,0.196721,0.709091,0.666667,0.000752,0.0,0.0
2014-12-18 02:00:00,0.160966,0.397059,0.196721,0.709091,0.666667,0.003811,0.0,0.0
2014-12-18 03:00:00,0.146881,0.382353,0.163934,0.727273,0.666667,0.00687,0.0,0.0
2014-12-18 04:00:00,0.125755,0.382353,0.180328,0.709091,0.666667,0.012219,0.0,0.0


## Moving Window CV

The LSTM takes sequences of inputs. The pollution values can either be included (as lagged values) in the input or left out. 

In [74]:
def generate_sequence(df,N, window_size):
    '''PyTorch models expect the target labels to have two 
        dimensions with shape (batch_size,output_size). 
        - batch_size is the number of samples in each sequence
        - output_size is the number of target values per sample'''
        
    #We generate sequences of size 10
    X_sequences = [df.iloc[i:i+window_size].drop(columns=['pollution']).values for i in range(N - window_size)]
    #And for each sequence evaluate agains the pollution value following each sequence
    Y_values = [df.iloc[i+window_size]['pollution'] for i in range(N - window_size)]

    return np.array(X_sequences).astype(np.float32), np.array(Y_values).astype(np.float32).reshape(-1,1)



Limit of sequence size is related to vanishing gradient problem. This can limit how well an LSTM can learn dependencies far back in the sequence, especially if the model isn’t deep enough to capture long-term patterns.

In [75]:
window_size = 24
N= len(train_data)
X_train, y_train = generate_sequence(train_data,N, window_size)
print(X_train.shape, y_train.shape)

M=len(test_data)
X_test, y_test = generate_sequence(test_data,M,window_size)
print(X_test.shape,y_test.shape)


(43440, 24, 7) (43440, 1)
(312, 24, 7) (312, 1)


In [76]:
X_train.shape[1]

24

In [77]:
X_train.shape[2]

7

## Standard LSTM


In [92]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, Input
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import MeanSquaredError
from tensorflow.keras.metrics import RootMeanSquaredError
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint



In [97]:
n_steps = X_train.shape[1]
n_features = X_train.shape[2]

MV_LSTM = Sequential()
MV_LSTM.add(Input(shape =(n_steps, n_features)))
MV_LSTM.add(LSTM(50))
#MV_LSTM.add(Droput(0.2)) prevents overfitting by randomly dropping out 20% of neurons 

MV_LSTM.add(Dense(1))

#Compile the model
metrics="RootMeanSquaredError()"
optimzer="adam"
loss = 'mse'

MV_LSTM.compile(optimizer=Adam(learning_rate = 0.001), loss=loss, metrics = [RootMeanSquaredError()])


MV_LSTM.summary()

In [98]:
# fit model
history = MV_LSTM.fit(X_train, y_train, epochs=150)

Epoch 1/150
[1m1358/1358[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 8ms/step - loss: 0.0064 - root_mean_squared_error: 0.0802
Epoch 2/150
[1m1358/1358[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 8ms/step - loss: 0.0050 - root_mean_squared_error: 0.0708
Epoch 3/150
[1m1358/1358[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 8ms/step - loss: 0.0047 - root_mean_squared_error: 0.0688
Epoch 4/150
[1m1358/1358[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 8ms/step - loss: 0.0048 - root_mean_squared_error: 0.0696
Epoch 5/150
[1m1358/1358[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 8ms/step - loss: 0.0047 - root_mean_squared_error: 0.0684
Epoch 6/150
[1m1358/1358[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 8ms/step - loss: 0.0045 - root_mean_squared_error: 0.0674
Epoch 7/150
[1m1358/1358[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 8ms/step - loss: 0.0044 - root_mean_squared_error: 0.0663
Epoch 8/150
[1m1358/1358[

history.predict(X_test): This generates predictions for X_test, resulting in a 2D array with shape (n, 1), where n is the number of test samples.
.flatten(): Converts this (n, 1) array to a 1D array with shape (n,), making it easier to work with 

In [103]:
test_predictions = MV_LSTM.predict(X_test).flatten()
actual_predictions = y_test.flatten()
test_predictions.shape, actual_predictions.shape

test_results = pd.DataFrame(data={
        'Model Predictions': test_predictions,
        'Actual':actual_predictions})
test_results.head()


[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step


Unnamed: 0,Model Predictions,Actual
0,0.426779,0.275654
1,0.411589,0.254527
2,0.378857,0.249497
3,0.359858,0.220322
4,0.341136,0.200201


## Making it better

In [114]:
from keras.layers import BatchNormalization


In [115]:
n_steps = X_train.shape[1]
n_features = X_train.shape[2]
n_outputs = y_train.shape[1]
MV_LSTM = Sequential()
MV_LSTM.add(Input(shape =(n_steps, n_features)))
MV_LSTM.add(LSTM(32,return_sequences=True))
MV_LSTM.add(Dropout(0.2)) #Prevent overfitting
MV_LSTM.add(LSTM(16, return_sequences=False))
#MV_LSTM.add(BatchNormalization()) #Normalize outputs
#MV_LSTM.add(Dense(5, activation='relu')) # Small intermediate Dense layer
MV_LSTM.add(Dense(n_outputs)) #Dense output layer with 1 unit (regression problem)

#Compile the model
metrics="RootMeanSquaredError()"
optimzer="adam"
loss = 'mse'

MV_LSTM.compile(optimizer=Adam(learning_rate = 0.001), loss=loss, metrics = [RootMeanSquaredError()])


MV_LSTM.summary()


In [117]:
test2 = MV_LSTM.fit(X_train, y_train, epochs=150, validation_split=0.1, batch_size=32,shuffle=False)

Epoch 1/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 13ms/step - loss: 0.0088 - root_mean_squared_error: 0.0938 - val_loss: 0.0114 - val_root_mean_squared_error: 0.1069
Epoch 2/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 12ms/step - loss: 0.0071 - root_mean_squared_error: 0.0840 - val_loss: 0.0118 - val_root_mean_squared_error: 0.1087
Epoch 3/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 12ms/step - loss: 0.0067 - root_mean_squared_error: 0.0819 - val_loss: 0.0107 - val_root_mean_squared_error: 0.1036
Epoch 4/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 12ms/step - loss: 0.0062 - root_mean_squared_error: 0.0785 - val_loss: 0.0098 - val_root_mean_squared_error: 0.0990
Epoch 5/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 12ms/step - loss: 0.0059 - root_mean_squared_error: 0.0769 - val_loss: 0.0081 - val_root_mean_squared_error: 0.0900
Epoch 6/150
[1

In [118]:
test_predictions = MV_LSTM.predict(X_test).flatten()
actual_predictions = y_test.flatten()
test_predictions.shape, actual_predictions.shape

test_results = pd.DataFrame(data={
        'Model Predictions': test_predictions,
        'Actual':actual_predictions})
test_results.head()

[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 54ms/step


Unnamed: 0,Model Predictions,Actual
0,0.325278,0.275654
1,0.305614,0.254527
2,0.267541,0.249497
3,0.28113,0.220322
4,0.375007,0.200201


In [124]:
n_steps = X_train.shape[1]
n_features = X_train.shape[2]
n_outputs = y_train.shape[1]
MV_LSTM = Sequential()
MV_LSTM.add(Input(shape =(n_steps, n_features)))
MV_LSTM.add(LSTM(32,return_sequences=True))
MV_LSTM.add(Dropout(0.2)) #Prevent overfitting
MV_LSTM.add(LSTM(16, return_sequences=False))
#MV_LSTM.add(BatchNormalization()) #Normalize outputs
MV_LSTM.add(Dense(5, activation='relu')) # Small intermediate Dense layer
MV_LSTM.add(Dense(n_outputs)) #Dense output layer with 1 unit (regression problem)

#Compile the model
metrics="RootMeanSquaredError()"
optimzer="adam"
loss = 'mse'

MV_LSTM.compile(optimizer=Adam(learning_rate = 0.001), loss=loss, metrics = [RootMeanSquaredError()])


MV_LSTM.summary()


In [125]:
test2 = MV_LSTM.fit(X_train, y_train, epochs=150, validation_split=0.1, batch_size=32,shuffle=False)

Epoch 1/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 14ms/step - loss: 0.0093 - root_mean_squared_error: 0.0960 - val_loss: 0.0098 - val_root_mean_squared_error: 0.0991
Epoch 2/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 14ms/step - loss: 0.0072 - root_mean_squared_error: 0.0849 - val_loss: 0.0097 - val_root_mean_squared_error: 0.0983
Epoch 3/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 16ms/step - loss: 0.0072 - root_mean_squared_error: 0.0844 - val_loss: 0.0097 - val_root_mean_squared_error: 0.0984
Epoch 4/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 15ms/step - loss: 0.0066 - root_mean_squared_error: 0.0812 - val_loss: 0.0090 - val_root_mean_squared_error: 0.0949
Epoch 5/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 14ms/step - loss: 0.0065 - root_mean_squared_error: 0.0804 - val_loss: 0.0079 - val_root_mean_squared_error: 0.0891
Epoch 6/150
[1

In [123]:
test_predictions = MV_LSTM.predict(X_test).flatten()
actual_predictions = y_test.flatten()
test_predictions.shape, actual_predictions.shape

test_results = pd.DataFrame(data={
        'Model Predictions': test_predictions,
        'Actual':actual_predictions})
test_results.head()

[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 64ms/step


Unnamed: 0,Model Predictions,Actual
0,0.189849,0.275654
1,0.165256,0.254527
2,0.154786,0.249497
3,0.175161,0.220322
4,0.183971,0.200201
