LSTM is an RNN that can capture the pattern in sequential data. The benefit is that it can learn and remember for long sequences. In keras this is referred to as setting the stateful argument as true in the lstm layer 
Lstm includes three important gates: input gate, forget gate and the output gate. The interactive operation among these three gates makes LSTM have the sufficient ability to solve the problem of long-term dependencies
which general RNNs cannot learn. "The learning speed of the previous hidden layers is slower than the deeper
hidden layers. This phenomenon may even lead to a decrease of accuracy rate as hidden layers
increase [25]. However, the smart design of the memory cell in LSTM can effectively solve the problem
of gradient vanishing in backpropagation and can learn the input sequence with longer time steps.
Hence, LSTM is commonly used for solving applications related to time serial issues. "

- LSTMs are a type of recurrent network, and as such are designed to take sequence data as input, unlike other models where lag observations must be presented as input features.
- LSTMs directly support multiple parallel input sequences for multivariate inputs, unlike other models where multivariate inputs are presented in a flat structure.
- Like other neural networks, LSTMs are able to map input data directly to an output vector that may represent multiple output time steps.

- A popular approach has been to combine CNNs with LSTMs, where the CNN is as an encoder to learn features from sub-sequences of input data which are provided as time steps to an LSTM. This architecture is called a CNN-LSTM.
- A power variation on the CNN LSTM architecture is the ConvLSTM that uses the convolutional reading of input subsequences directly within an LSTM’s units. This approach has proven very effective for time series classification and can be adapted for use in multi-step time series forecasting.

In [3]:
import numpy as np
import pandas as pd
import pickle 
import sklearn 

In [31]:
with open('../data/train_data.pickle', 'rb') as f:
    train_data = pickle.load(f)

In [32]:
with open('../data/test_data.pickle', 'rb') as f:
    test_data = pickle.load(f)

In [33]:
#def evaluate_forecasts(actual, predicted):
train_data.head()

Unnamed: 0_level_0,pollution,dew,temp,press,wnd_dir,wnd_spd,snow,rain
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2010-01-02 00:00:00,0.129779,0.352941,0.245902,0.527273,0.333333,0.00229,0.0,0.0
2010-01-02 01:00:00,0.148893,0.367647,0.245902,0.527273,0.333333,0.003811,0.0,0.0
2010-01-02 02:00:00,0.15996,0.426471,0.229508,0.545455,0.333333,0.005332,0.0,0.0
2010-01-02 03:00:00,0.182093,0.485294,0.229508,0.563636,0.333333,0.008391,0.037037,0.0
2010-01-02 04:00:00,0.138833,0.485294,0.229508,0.563636,0.333333,0.009912,0.074074,0.0


In [34]:
test_data.head()

Unnamed: 0_level_0,pollution,dew,temp,press,wnd_dir,wnd_spd,snow,rain
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2014-12-18 00:00:00,0.181087,0.397059,0.213115,0.709091,0.0,0.00229,0.0,0.0
2014-12-18 01:00:00,0.171026,0.397059,0.196721,0.709091,0.666667,0.000752,0.0,0.0
2014-12-18 02:00:00,0.160966,0.397059,0.196721,0.709091,0.666667,0.003811,0.0,0.0
2014-12-18 03:00:00,0.146881,0.382353,0.163934,0.727273,0.666667,0.00687,0.0,0.0
2014-12-18 04:00:00,0.125755,0.382353,0.180328,0.709091,0.666667,0.012219,0.0,0.0


## Moving Window CV

The LSTM takes sequences of inputs.

In [35]:
def generate_sequence(df,N, window_size):
 
    #We generate sequences of size 10
    X_sequences = [df.iloc[i:i+window_size].values for i in range(N - window_size)]
    #And for each sequence evaluate agains the pollution value following each sequence
    Y_values = [df.iloc[i+window_size]['pollution'] for i in range(N - window_size)]

    return np.array(X_sequences), np.array(Y_values)



Limit of sequence size is related to vanishing gradient problem. This can limit how well an LSTM can learn dependencies far back in the sequence, especially if the model isn’t deep enough to capture long-term patterns.

In [36]:
window_size = 24
N= len(train_data)
X_train, y_train = generate_sequence(train_data,N, window_size)
print(X_train.shape, y_train.shape)

M=len(test_data)
X_test, y_test = generate_sequence(test_data,M,window_size)
print(X_test.shape,y_test.shape)


(43440, 24, 8) (43440,)
(312, 24, 8) (312,)
