LSTM is an RNN that can capture the pattern in sequential data. The benefit is that it can learn and remember for long sequences. In keras this is referred to as setting the stateful argument as true in the lstm layer 
Lstm includes three important gates: input gate, forget gate and the output gate. The interactive operation among these three gates makes LSTM have the sufficient ability to solve the problem of long-term dependencies
which general RNNs cannot learn. "The learning speed of the previous hidden layers is slower than the deeper
hidden layers. This phenomenon may even lead to a decrease of accuracy rate as hidden layers
increase [25]. However, the smart design of the memory cell in LSTM can effectively solve the problem
of gradient vanishing in backpropagation and can learn the input sequence with longer time steps.
Hence, LSTM is commonly used for solving applications related to time serial issues. "

- LSTMs are a type of recurrent network, and as such are designed to take sequence data as input, unlike other models where lag observations must be presented as input features.
- LSTMs directly support multiple parallel input sequences for multivariate inputs, unlike other models where multivariate inputs are presented in a flat structure.
- Like other neural networks, LSTMs are able to map input data directly to an output vector that may represent multiple output time steps.

- A popular approach has been to combine CNNs with LSTMs, where the CNN is as an encoder to learn features from sub-sequences of input data which are provided as time steps to an LSTM. This architecture is called a CNN-LSTM.
- A power variation on the CNN LSTM architecture is the ConvLSTM that uses the convolutional reading of input subsequences directly within an LSTM’s units. This approach has proven very effective for time series classification and can be adapted for use in multi-step time series forecasting.

In [3]:
import numpy as np
import pandas as pd
import pickle 
import sklearn 

In [4]:
with open('../data/train_data.pickle', 'rb') as f:
    train_data = pickle.load(f)

In [5]:
with open('../data/test_data.pickle', 'rb') as f:
    test_data = pickle.load(f)

In [6]:
#def evaluate_forecasts(actual, predicted):
train_data.head()

Unnamed: 0_level_0,pollution,dew,temp,press,wnd_dir,wnd_spd,snow,rain
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2010-01-02 00:00:00,0.129779,0.352941,0.245902,0.527273,0.333333,0.00229,0.0,0.0
2010-01-02 01:00:00,0.148893,0.367647,0.245902,0.527273,0.333333,0.003811,0.0,0.0
2010-01-02 02:00:00,0.15996,0.426471,0.229508,0.545455,0.333333,0.005332,0.0,0.0
2010-01-02 03:00:00,0.182093,0.485294,0.229508,0.563636,0.333333,0.008391,0.037037,0.0
2010-01-02 04:00:00,0.138833,0.485294,0.229508,0.563636,0.333333,0.009912,0.074074,0.0


In [7]:
test_data.head(-5)

Unnamed: 0_level_0,pollution,dew,temp,press,wnd_dir,wnd_spd,snow,rain
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2014-12-18 00:00:00,0.181087,0.397059,0.213115,0.709091,0.000000,0.002290,0.0,0.0
2014-12-18 01:00:00,0.171026,0.397059,0.196721,0.709091,0.666667,0.000752,0.0,0.0
2014-12-18 02:00:00,0.160966,0.397059,0.196721,0.709091,0.666667,0.003811,0.0,0.0
2014-12-18 03:00:00,0.146881,0.382353,0.163934,0.727273,0.666667,0.006870,0.0,0.0
2014-12-18 04:00:00,0.125755,0.382353,0.180328,0.709091,0.666667,0.012219,0.0,0.0
...,...,...,...,...,...,...,...,...
2014-12-31 14:00:00,0.009054,0.191176,0.327869,0.745455,0.666667,0.334547,0.0,0.0
2014-12-31 15:00:00,0.011066,0.205882,0.327869,0.745455,0.666667,0.349825,0.0,0.0
2014-12-31 16:00:00,0.008048,0.250000,0.311475,0.745455,0.666667,0.365103,0.0,0.0
2014-12-31 17:00:00,0.009054,0.264706,0.295082,0.763636,0.666667,0.377322,0.0,0.0


## Moving Window NOT CV

The LSTM takes sequences of inputs. The pollution values can either be included (as lagged values) in the input or left out. 

In [16]:
def generate_sequence(df,N, window_size):
    '''Model expects the target labels to have two 
        dimensions with shape (batch_size,output_size). 
        - batch_size is the number of samples in each sequence
        - output_size is the number of target values per sample'''
        
    #We generate sequences of size 24
    X_sequences = [df.iloc[i:i+window_size].values for i in range(N - window_size)]
    #And for each sequence evaluate agains the pollution value following each sequence
    Y_values = [df.iloc[i+window_size]['pollution'] for i in range(N - window_size)]


    return np.array(X_sequences).astype(np.float32), np.array(Y_values).astype(np.float32).reshape(-1,1)



Limit of sequence size is related to vanishing gradient problem. This can limit how well an LSTM can learn dependencies far back in the sequence, especially if the model isn’t deep enough to capture long-term patterns.

In [29]:
window_size = 12
N= len(train_data)
X_train, y_train = generate_sequence(train_data,N, window_size)
print(X_train.shape, y_train.shape)

M=len(test_data)
X_test, y_test = generate_sequence(test_data,M,window_size)
print(X_test.shape,y_test.shape)




(43452, 12, 8) (43452, 1)
(324, 12, 8) (324, 1)


Note: we reshape input to be 3D [samples, timesteps, features]

In [42]:
X_train.shape[1]

24

In [43]:
X_train.shape[2]

8

## Standard LSTM


In [10]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, Input
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import MeanSquaredError
from tensorflow.keras.metrics import RootMeanSquaredError
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint





In [11]:
n_steps = X_train.shape[1]
n_features = X_train.shape[2]

MV_LSTM = Sequential()
MV_LSTM.add(Input(shape =(n_steps, n_features)))
MV_LSTM.add(LSTM(50))
#MV_LSTM.add(Droput(0.2)) prevents overfitting by randomly dropping out 20% of neurons 

MV_LSTM.add(Dense(1))

#Compile the model
metrics="RootMeanSquaredError()"
optimzer="adam"
loss = 'mse'

MV_LSTM.compile(optimizer=Adam(learning_rate = 0.001), loss=loss, metrics = [RootMeanSquaredError()])


MV_LSTM.summary()

2024-11-13 20:07:47.077571: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M2
2024-11-13 20:07:47.077644: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 8.00 GB
2024-11-13 20:07:47.077667: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 2.67 GB
2024-11-13 20:07:47.078024: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-11-13 20:07:47.078048: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


In [98]:
# fit model
history = MV_LSTM.fit(X_train, y_train, epochs=150, verbose=1)

Epoch 1/150
[1m1358/1358[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 8ms/step - loss: 0.0064 - root_mean_squared_error: 0.0802
Epoch 2/150
[1m1358/1358[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 8ms/step - loss: 0.0050 - root_mean_squared_error: 0.0708
Epoch 3/150
[1m1358/1358[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 8ms/step - loss: 0.0047 - root_mean_squared_error: 0.0688
Epoch 4/150
[1m1358/1358[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 8ms/step - loss: 0.0048 - root_mean_squared_error: 0.0696
Epoch 5/150
[1m1358/1358[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 8ms/step - loss: 0.0047 - root_mean_squared_error: 0.0684
Epoch 6/150
[1m1358/1358[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 8ms/step - loss: 0.0045 - root_mean_squared_error: 0.0674
Epoch 7/150
[1m1358/1358[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 8ms/step - loss: 0.0044 - root_mean_squared_error: 0.0663
Epoch 8/150
[1m1358/1358[

history.predict(X_test): This generates predictions for X_test, resulting in a 2D array with shape (n, 1), where n is the number of test samples.
.flatten(): Converts this (n, 1) array to a 1D array with shape (n,), making it easier to work with 

In [30]:
from numpy import concatenate
from sklearn.metrics import mean_squared_error
from math import sqrt

In [103]:
test_predictions = MV_LSTM.predict(X_test).flatten()
X_test = X_test.reshape((X_test.shape[0],X_test.shape[2]))

# Invert scaling for the forecast
inv_test_predictions = concatenate((test_predictions, X_test_reshaped[:, 1:]), axis=1)
inv_test_predictions = scaler.inverse_transform(inv_test_predictions)
inv_test_predictions = inv_test_predictions[:, 0]  # Extract the pollution column

actual_predictions = y_test.flatten()
test_predictions.shape, actual_predictions.shape

test_results = pd.DataFrame(data={
        'Model Predictions': test_predictions,
        'Actual':actual_predictions})
test_results.head()


[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step


Unnamed: 0,Model Predictions,Actual
0,0.426779,0.275654
1,0.411589,0.254527
2,0.378857,0.249497
3,0.359858,0.220322
4,0.341136,0.200201


## Making it better

In [18]:
from keras.layers import BatchNormalization


In [115]:
n_steps = X_train.shape[1]
n_features = X_train.shape[2]
n_outputs = y_train.shape[1]
MV_LSTM = Sequential()
MV_LSTM.add(Input(shape =(n_steps, n_features)))
MV_LSTM.add(LSTM(32,return_sequences=True))
MV_LSTM.add(Dropout(0.2)) #Prevent overfitting
MV_LSTM.add(LSTM(16, return_sequences=False))
#MV_LSTM.add(BatchNormalization()) #Normalize outputs
#MV_LSTM.add(Dense(5, activation='relu')) # Small intermediate Dense layer
MV_LSTM.add(Dense(n_outputs)) #Dense output layer with 1 unit (regression problem)

#Compile the model
metrics="RootMeanSquaredError()"
optimzer="adam"
loss = 'mse'

MV_LSTM.compile(optimizer=Adam(learning_rate = 0.001), loss=loss, metrics = [RootMeanSquaredError()])


MV_LSTM.summary()


In [117]:
test2 = MV_LSTM.fit(X_train, y_train, epochs=150, validation_split=0.1, batch_size=32,shuffle=False)

Epoch 1/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 13ms/step - loss: 0.0088 - root_mean_squared_error: 0.0938 - val_loss: 0.0114 - val_root_mean_squared_error: 0.1069
Epoch 2/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 12ms/step - loss: 0.0071 - root_mean_squared_error: 0.0840 - val_loss: 0.0118 - val_root_mean_squared_error: 0.1087
Epoch 3/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 12ms/step - loss: 0.0067 - root_mean_squared_error: 0.0819 - val_loss: 0.0107 - val_root_mean_squared_error: 0.1036
Epoch 4/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 12ms/step - loss: 0.0062 - root_mean_squared_error: 0.0785 - val_loss: 0.0098 - val_root_mean_squared_error: 0.0990
Epoch 5/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 12ms/step - loss: 0.0059 - root_mean_squared_error: 0.0769 - val_loss: 0.0081 - val_root_mean_squared_error: 0.0900
Epoch 6/150
[1

In [118]:
test_predictions = MV_LSTM.predict(X_test).flatten()
actual_predictions = y_test.flatten()
test_predictions.shape, actual_predictions.shape

test_results = pd.DataFrame(data={
        'Model Predictions': test_predictions,
        'Actual':actual_predictions})
test_results.head()

[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 54ms/step


Unnamed: 0,Model Predictions,Actual
0,0.325278,0.275654
1,0.305614,0.254527
2,0.267541,0.249497
3,0.28113,0.220322
4,0.375007,0.200201


In [15]:
n_steps = X_train.shape[1]
n_features = X_train.shape[2]
n_outputs = y_train.shape[1]
MV_LSTM = Sequential()
MV_LSTM.add(Input(shape =(n_steps, n_features)))
MV_LSTM.add(LSTM(32,return_sequences=True))
MV_LSTM.add(Dropout(0.2)) #Prevent overfitting
MV_LSTM.add(LSTM(16, return_sequences=False))
#MV_LSTM.add(BatchNormalization()) #Normalize outputs
MV_LSTM.add(Dense(5, activation='relu')) # Small intermediate Dense layer
MV_LSTM.add(Dense(n_outputs)) #Dense output layer with 1 unit (regression problem)

#Compile the model
metrics="RootMeanSquaredError()"
optimzer="adam"
loss = 'mse'

MV_LSTM.compile(optimizer=Adam(learning_rate = 0.001), loss=loss, metrics = [RootMeanSquaredError()])


MV_LSTM.summary()


2024-11-13 12:59:43.652977: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M2
2024-11-13 12:59:43.653032: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 8.00 GB
2024-11-13 12:59:43.653045: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 2.67 GB
2024-11-13 12:59:43.653452: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-11-13 12:59:43.653477: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


In [16]:
test2 = MV_LSTM.fit(X_train, y_train, epochs=150, validation_split=0.1, batch_size=32,shuffle=False)

Epoch 1/150


2024-11-13 12:59:45.422867: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type GPU is enabled.


[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 14ms/step - loss: 0.0082 - root_mean_squared_error: 0.0904 - val_loss: 0.0098 - val_root_mean_squared_error: 0.0992
Epoch 2/150
[1m1174/1222[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 13ms/step - loss: 0.0072 - root_mean_squared_error: 0.0849

KeyboardInterrupt: 

In [123]:
test_predictions = MV_LSTM.predict(X_test).flatten()
actual_predictions = y_test.flatten()
test_predictions.shape, actual_predictions.shape

test_results = pd.DataFrame(data={
        'Model Predictions': test_predictions,
        'Actual':actual_predictions})
test_results.head()

[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 64ms/step


Unnamed: 0,Model Predictions,Actual
0,0.189849,0.275654
1,0.165256,0.254527
2,0.154786,0.249497
3,0.175161,0.220322
4,0.183971,0.200201


In [19]:
n_steps = X_train.shape[1]
n_features = X_train.shape[2]
n_outputs = y_train.shape[1]
MV_LSTM = Sequential()
MV_LSTM.add(Input(shape =(n_steps, n_features)))
MV_LSTM.add(LSTM(32,return_sequences=True))
MV_LSTM.add(Dropout(0.2)) #Prevent overfitting
MV_LSTM.add(LSTM(16, return_sequences=False))
MV_LSTM.add(BatchNormalization()) #Normalize outputs
MV_LSTM.add(Dense(5, activation='relu')) # Small intermediate Dense layer
MV_LSTM.add(Dense(n_outputs)) #Dense output layer with 1 unit (regression problem)

#Compile the model
metrics="RootMeanSquaredError()"
optimzer="adam"
loss = 'mse'

MV_LSTM.compile(optimizer=Adam(learning_rate = 0.001), loss=loss, metrics = [RootMeanSquaredError()])


MV_LSTM.summary()

In [20]:
test3 = MV_LSTM.fit(X_train, y_train, epochs=150, validation_split=0.1, batch_size=32,shuffle=False)

Epoch 1/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 16ms/step - loss: 0.0259 - root_mean_squared_error: 0.1511 - val_loss: 0.0107 - val_root_mean_squared_error: 0.1034
Epoch 2/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 15ms/step - loss: 0.0080 - root_mean_squared_error: 0.0895 - val_loss: 0.0140 - val_root_mean_squared_error: 0.1184
Epoch 3/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 15ms/step - loss: 0.0078 - root_mean_squared_error: 0.0880 - val_loss: 0.0147 - val_root_mean_squared_error: 0.1211
Epoch 4/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 16ms/step - loss: 0.0076 - root_mean_squared_error: 0.0872 - val_loss: 0.0308 - val_root_mean_squared_error: 0.1756
Epoch 5/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 15ms/step - loss: 0.0074 - root_mean_squared_error: 0.0858 - val_loss: 0.0509 - val_root_mean_squared_error: 0.2256
Epoch 6/150
[1

In [21]:
test_predictions = MV_LSTM.predict(X_test).flatten()
actual_predictions = y_test.flatten()
test_predictions.shape, actual_predictions.shape

test_results = pd.DataFrame(data={
        'Model Predictions': test_predictions,
        'Actual':actual_predictions})
test_results.head()

[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 43ms/step


Unnamed: 0,Model Predictions,Actual
0,0.184844,0.275654
1,0.16395,0.254527
2,0.170126,0.249497
3,0.157723,0.220322
4,0.119257,0.200201


In [81]:
n_steps = X_train.shape[1]
n_features = X_train.shape[2]
n_outputs = y_train.shape[1]
MV_LSTM = Sequential()
MV_LSTM.add(Input(shape =(n_steps, n_features)))
MV_LSTM.add(LSTM(32,return_sequences=True))
MV_LSTM.add(Dropout(0.2)) #Prevent overfitting
MV_LSTM.add(LSTM(16, return_sequences=False))
MV_LSTM.add(BatchNormalization()) #Normalize outputs
#MV_LSTM.add(Dense(5, activation='relu')) # Small intermediate Dense layer
MV_LSTM.add(Dense(n_outputs)) #Dense output layer with 1 unit (regression problem)

#Compile the model
metrics="RootMeanSquaredError()"
optimzer="adam"
loss = 'mse'

MV_LSTM.compile(optimizer=Adam(learning_rate = 0.001), loss=loss, metrics = [RootMeanSquaredError()])


MV_LSTM.summary()

In [82]:
test4 = MV_LSTM.fit(X_train, y_train, epochs=150, validation_split=0.1, batch_size=32,shuffle=False)

Epoch 1/150
[1m1223/1223[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 14ms/step - loss: 0.0259 - root_mean_squared_error: 0.1481 - val_loss: 0.0068 - val_root_mean_squared_error: 0.0828
Epoch 2/150
[1m1223/1223[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 14ms/step - loss: 0.0073 - root_mean_squared_error: 0.0854 - val_loss: 0.0044 - val_root_mean_squared_error: 0.0662
Epoch 3/150
[1m1223/1223[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 13ms/step - loss: 0.0071 - root_mean_squared_error: 0.0841 - val_loss: 0.0031 - val_root_mean_squared_error: 0.0561
Epoch 4/150
[1m1223/1223[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 13ms/step - loss: 0.0068 - root_mean_squared_error: 0.0823 - val_loss: 0.0028 - val_root_mean_squared_error: 0.0530
Epoch 5/150
[1m1223/1223[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 14ms/step - loss: 0.0066 - root_mean_squared_error: 0.0814 - val_loss: 0.0012 - val_root_mean_squared_error: 0.0341
Epoch 6/150
[1

In [83]:
test_predictions = MV_LSTM.predict(X_test).flatten()
actual_predictions = y_test.flatten()
test_predictions.shape, actual_predictions.shape

test_results = pd.DataFrame(data={
        'Model Predictions': test_predictions,
        'Actual':actual_predictions})
test_results.head()

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 49ms/step


Unnamed: 0,Model Predictions,Actual
0,0.104851,0.132797
1,0.109981,0.133803
2,0.112735,0.142857
3,0.124358,0.163984
4,0.139212,0.167002


- Try without final dense(activation='relu') layer 
- try early stopping 
- try different batch size but might cause overfitting 
- first make some nice graphs 
- then try different techniques entirely 


In [25]:
from tensorflow.keras.callbacks import EarlyStopping


In [75]:
n_steps = X_train.shape[1]
n_features = X_train.shape[2]
n_outputs = y_train.shape[1]
MV_LSTM = Sequential()
MV_LSTM.add(Input(shape =(n_steps, n_features)))
MV_LSTM.add(LSTM(32,return_sequences=True))
MV_LSTM.add(Dropout(0.2)) #Prevent overfitting
MV_LSTM.add(LSTM(16, return_sequences=False))
MV_LSTM.add(BatchNormalization()) #Normalize outputs
#MV_LSTM.add(Dense(5, activation='relu')) # Small intermediate Dense layer
MV_LSTM.add(Dense(n_outputs)) #Dense output layer with 1 unit (regression problem)

#Compile the model
metrics="RootMeanSquaredError()"
optimzer="adam"
loss = 'mse'

MV_LSTM.compile(optimizer=Adam(learning_rate = 0.001), loss=loss, metrics = [RootMeanSquaredError()])

#Try early stopping to avoid overfitting 
# Define callbacks for avoiding overfitting
early_stopping = EarlyStopping(monitor='val_loss', patience=15, restore_best_weights=True)

MV_LSTM.summary()

In [76]:
test5 = MV_LSTM.fit(X_train, y_train, epochs=150, validation_split=0.1, batch_size=32, callbacks=[early_stopping], shuffle=False)

Epoch 1/150
[1m1223/1223[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 14ms/step - loss: 0.0236 - root_mean_squared_error: 0.1441 - val_loss: 0.0046 - val_root_mean_squared_error: 0.0675
Epoch 2/150
[1m1223/1223[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 14ms/step - loss: 0.0076 - root_mean_squared_error: 0.0870 - val_loss: 0.0027 - val_root_mean_squared_error: 0.0522
Epoch 3/150
[1m1223/1223[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 13ms/step - loss: 0.0070 - root_mean_squared_error: 0.0837 - val_loss: 0.0017 - val_root_mean_squared_error: 0.0418
Epoch 4/150
[1m1223/1223[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 13ms/step - loss: 0.0068 - root_mean_squared_error: 0.0821 - val_loss: 0.0012 - val_root_mean_squared_error: 0.0339
Epoch 5/150
[1m1223/1223[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 13ms/step - loss: 0.0066 - root_mean_squared_error: 0.0814 - val_loss: 0.0021 - val_root_mean_squared_error: 0.0460
Epoch 6/150
[1

In [77]:
test_predictions = MV_LSTM.predict(X_test).flatten()
actual_predictions = y_test.flatten()



test_results = pd.DataFrame(data={
        'Model Predictions': test_predictions,
        'Actual':actual_predictions})
test_results

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 45ms/step


Unnamed: 0,Model Predictions,Actual
0,0.089810,0.132797
1,0.091292,0.133803
2,0.094786,0.142857
3,0.104015,0.163984
4,0.119621,0.167002
...,...,...
319,-0.004038,0.008048
320,-0.006890,0.010060
321,-0.004901,0.010060
322,-0.004957,0.008048


Stopping time seems to have worse effect but computationally more efficient

- Try see if specifying , activation='linear' in last dense layer changes anything 

print(actual_predictions)

Try without batch normalization one last time

In [12]:
n_steps = X_train.shape[1]
n_features = X_train.shape[2]
n_outputs = y_train.shape[1]
MV_LSTM = Sequential()
MV_LSTM.add(Input(shape =(n_steps, n_features)))
MV_LSTM.add(LSTM(32,return_sequences=True))
MV_LSTM.add(Dropout(0.2)) #Prevent overfitting
MV_LSTM.add(LSTM(16, return_sequences=False))
#MV_LSTM.add(BatchNormalization()) #Normalize outputs
#MV_LSTM.add(Dense(5, activation='relu')) # Small intermediate Dense layer
MV_LSTM.add(Dense(n_outputs)) #Dense output layer with 1 unit (regression problem)

#Compile the model
metrics="RootMeanSquaredError()"
optimzer="adam"
loss = 'mse'

MV_LSTM.compile(optimizer=Adam(learning_rate = 0.001), loss=loss, metrics = [RootMeanSquaredError()])


MV_LSTM.summary()

In [13]:
test4 = MV_LSTM.fit(X_train, y_train, epochs=150, validation_split=0.1, batch_size=32,shuffle=False)

Epoch 1/150


2024-11-13 20:08:04.633368: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type GPU is enabled.


[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 13ms/step - loss: 0.0051 - root_mean_squared_error: 0.0705 - val_loss: 0.0012 - val_root_mean_squared_error: 0.0354
Epoch 2/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 12ms/step - loss: 0.0016 - root_mean_squared_error: 0.0404 - val_loss: 6.6907e-04 - val_root_mean_squared_error: 0.0259
Epoch 3/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 11ms/step - loss: 0.0012 - root_mean_squared_error: 0.0346 - val_loss: 5.1519e-04 - val_root_mean_squared_error: 0.0227
Epoch 4/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 11ms/step - loss: 0.0011 - root_mean_squared_error: 0.0330 - val_loss: 4.7824e-04 - val_root_mean_squared_error: 0.0219
Epoch 5/150
[1m1222/1222[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 12ms/step - loss: 0.0010 - root_mean_squared_error: 0.0320 - val_loss: 4.7706e-04 - val_root_mean_squared_error: 0.0218
Epoch 6/150

In [14]:
test_predictions = MV_LSTM.predict(X_test).flatten()
actual_predictions = y_test.flatten()
test_predictions.shape, actual_predictions.shape

test_results = pd.DataFrame(data={
        'Model Predictions': test_predictions,
        'Actual':actual_predictions})
test_results.head()

[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 23ms/step


Unnamed: 0,Model Predictions,Actual
0,0.255373,0.275654
1,0.272095,0.254527
2,0.2391,0.249497
3,0.235627,0.220322
4,0.197035,0.200201


In [18]:
n_steps = X_train.shape[1]
n_features = X_train.shape[2]
n_outputs = y_train.shape[1]
MV_LSTM = Sequential()
MV_LSTM.add(Input(shape =(n_steps, n_features)))
MV_LSTM.add(LSTM(32,return_sequences=True))
MV_LSTM.add(Dropout(0.2)) #Prevent overfitting
MV_LSTM.add(LSTM(16, return_sequences=False))
MV_LSTM.add(Dropout(0.2))
MV_LSTM.add(Dense(8, activation='tanh'))

#MV_LSTM.add(BatchNormalization()) #Normalize outputs
#MV_LSTM.add(Dense(5, activation='relu')) # Small intermediate Dense layer
MV_LSTM.add(Dense(n_outputs)) #Dense output layer with 1 unit (regression problem)

#Compile the model
metrics="RootMeanSquaredError()"
optimzer="adam"
loss = 'mse'

MV_LSTM.compile(optimizer=Adam(learning_rate = 0.001), loss=loss, metrics = [RootMeanSquaredError()])


MV_LSTM.summary()

In [19]:
test4 = MV_LSTM.fit(X_train, y_train, epochs=150, validation_split=0.1, batch_size=32,shuffle=False)

Epoch 1/150
[1m1223/1223[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 14ms/step - loss: 0.0065 - root_mean_squared_error: 0.0785 - val_loss: 0.0016 - val_root_mean_squared_error: 0.0398
Epoch 2/150
[1m1223/1223[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 13ms/step - loss: 0.0022 - root_mean_squared_error: 0.0469 - val_loss: 9.8265e-04 - val_root_mean_squared_error: 0.0313
Epoch 3/150
[1m1223/1223[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 14ms/step - loss: 0.0016 - root_mean_squared_error: 0.0398 - val_loss: 6.6711e-04 - val_root_mean_squared_error: 0.0258
Epoch 4/150
[1m1223/1223[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 13ms/step - loss: 0.0013 - root_mean_squared_error: 0.0362 - val_loss: 5.9540e-04 - val_root_mean_squared_error: 0.0244
Epoch 5/150
[1m1223/1223[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 13ms/step - loss: 0.0012 - root_mean_squared_error: 0.0346 - val_loss: 5.6808e-04 - val_root_mean_squared_error: 0.0238

In [27]:
test_predictions = MV_LSTM.predict(X_test).flatten()
actual_predictions = y_test.flatten()
test_predictions.shape, actual_predictions.shape

test_results = pd.DataFrame(data={
        'Model Predictions': test_predictions,
        'Actual':actual_predictions})
test_results.head()

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step


Unnamed: 0,Model Predictions,Actual
0,0.125473,0.132797
1,0.131391,0.133803
2,0.131834,0.142857
3,0.149834,0.163984
4,0.170915,0.167002


In [28]:
test_rmse = np.sqrt(np.mean((test_predictions - y_test.flatten()) ** 2))

print(f"Walk-forward validation RMSE: {test_rmse}")

Walk-forward validation RMSE: 0.02848050370812416


This seems to be best model, takes least time and best accuracy 

## Walk Forward CV

In [21]:
n_steps = X_train.shape[1]
n_features = X_train.shape[2]
n_outputs = y_train.shape[1]
MV_LSTM = Sequential()
MV_LSTM.add(Input(shape =(n_steps, n_features)))
MV_LSTM.add(LSTM(32,return_sequences=True))
MV_LSTM.add(Dropout(0.2)) #Prevent overfitting
MV_LSTM.add(LSTM(16, return_sequences=False))
#MV_LSTM.add(BatchNormalization()) #Normalize outputs
#MV_LSTM.add(Dense(5, activation='relu')) # Small intermediate Dense layer
MV_LSTM.add(Dense(n_outputs)) #Dense output layer with 1 unit (regression problem)

#Compile the model
metrics="RootMeanSquaredError()"
optimzer="adam"
loss = 'mse'

MV_LSTM.compile(optimizer=Adam(learning_rate = 0.001), loss=loss, metrics = [RootMeanSquaredError()])


MV_LSTM.summary()

In [22]:
test4 = MV_LSTM.fit(X_train, y_train, epochs=150, validation_split=0.1, batch_size=32,shuffle=False)

Epoch 1/150
[1m1223/1223[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 12ms/step - loss: 0.0044 - root_mean_squared_error: 0.0654 - val_loss: 0.0012 - val_root_mean_squared_error: 0.0347
Epoch 2/150
[1m1223/1223[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 13ms/step - loss: 0.0016 - root_mean_squared_error: 0.0398 - val_loss: 6.1493e-04 - val_root_mean_squared_error: 0.0248
Epoch 3/150
[1m1223/1223[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 12ms/step - loss: 0.0012 - root_mean_squared_error: 0.0340 - val_loss: 4.8659e-04 - val_root_mean_squared_error: 0.0221
Epoch 4/150
[1m1223/1223[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 14ms/step - loss: 0.0011 - root_mean_squared_error: 0.0325 - val_loss: 4.8017e-04 - val_root_mean_squared_error: 0.0219
Epoch 5/150
[1m1223/1223[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 12ms/step - loss: 0.0010 - root_mean_squared_error: 0.0317 - val_loss: 4.6805e-04 - val_root_mean_squared_error: 0.0216

In [24]:

# Walk-forward validation
history = [x for x in X_train]  # Initialize history with the training sequences
predictions = []

for i in range(len(X_test)):
    # Prepare the current input for prediction
    current_input = np.array(history[-window_size])  # Get the last 'window_size' points in history
    current_input = current_input.reshape((1, current_input.shape[0], current_input.shape[1]))
    # Make a prediction
    yhat = MV_LSTM.predict(current_input, verbose=0)
    predictions.append(yhat[0])

    # Append the actual test input to history for the next step (walk-forward)
    history.append(X_test[i])

# Evaluate predictions
predictions = np.array(predictions).flatten()
actual_predictions = y_test.flatten()
predictions.shape, actual_predictions.shape

test_results = pd.DataFrame(data={
        'Model Predictions': predictions,
        'Actual':actual_predictions})


test_rmse = np.sqrt(np.mean((predictions - y_test.flatten()) ** 2))

print(f"Walk-forward validation RMSE: {test_rmse}")

Walk-forward validation RMSE: 0.1112561747431755


In [26]:
test_results.head(10)

Unnamed: 0,Model Predictions,Actual
0,0.121074,0.132797
1,0.133369,0.133803
2,0.155517,0.142857
3,0.099975,0.163984
4,0.056622,0.167002
5,0.055747,0.190141
6,0.076357,0.203219
7,0.096494,0.229376
8,0.103343,0.241449
9,0.195563,0.236419


Results seem to get progresively worse

## Multivariate Multi-Step LSTM