# Recurrent Neural Networks
### Introduction
- RNNs are type of Deep Learning models with built-in feedback mechanism. 
- The output of a particular layer can be **re-fed** as the input in order to predict the output. 

![image](https://user-images.githubusercontent.com/43855029/132912049-167cf37e-66a0-4b54-8024-183ab7785398.png)

[source](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)

- A look at detailed when we unroll the RNN loop:

![image](https://user-images.githubusercontent.com/43855029/132911838-0ce7eb99-fd60-44c7-b554-d176fdb45f8b.png)


### Types of RNN

![image](https://user-images.githubusercontent.com/43855029/132903689-398ef108-660d-47ba-ae46-b783f203e307.png)

### Applications
- It is specifically designed for Sequential problem **Weather forecast, Stock forecast, Image captioning, Natural Language Processing, Speech/Voice Recognition**

### Some Disadvantages of RNN: 
- Computationally Expensive and large memory requested
- RNN is sensitive to changes in parameters and having **Exploding Gradient** or **Vanishing Gradient**
- In order to resolve the gradient problem of RNN, a method Long-Short Term Memory (LSTM) is proposed.

In this limited workshop, we only cover LSTM for timeseries forecast problem.

# Long-Short Term Memory model - LSTM
### Introduction
- LSTMs are a special kind of RNN — capable of learning long-term dependencies by remembering information for long periods is the default behavior.
- They were introduced by Hochreiter & Schmidhuber (1997) and were refined and popularized by many people
- LSTMs are explicitly designed to avoid the long-term dependency problem.

### Comparison between traditional RNN and LSTM

![image](https://user-images.githubusercontent.com/43855029/132913273-1b7d4765-a8f2-4f2d-b3b9-6910d5d15807.png)

### Step by step walkthrought LSTM:
[Link](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)

# Hands-on exercise on application of LSTM in temperature forecast
Here, we will access Keras LSTM to forecast temperature at site name Jena (Germany), given information of temperature and other climate variables.
The tutorial following the [keras website](https://keras.io/examples/timeseries/timeseries_weather_forecasting/), but rewritten in a simpler way for easy understanding.

### Climate Data
- Download data from [this link](https://drive.google.com/file/d/1vFs6uHrg24nmpOYuTKrtyW177Ik_gGUy/view?usp=sharing) and upload to your same folder where you have this notebook
- Single station named Jena station in Germany.
- Data consists of 14 climate variables in every 10 minutes
- Temporal timeframe 8 year: 01/01/2009 - 12/31/2016
- Data description:


![image](https://user-images.githubusercontent.com/43855029/132914704-b2c7ee79-0c99-482a-abfd-cc4575dcfe1b.png)

- Input variables: all 14 climate variables including Temperature
- Output or target variable: Temperature at later date

### Objective
- Using data from previous 5 days, forecast temperature in the next 12 hours

#### Loading library:

In [1]:
import numpy as np
import pandas as pd
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Bidirectional

import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from numpy import array    

#### Loading Jena climate station data:


In [2]:
df = pd.read_csv("jena_climate_2009_2016.csv")


#### Check for any missing value


In [3]:
print(df.isnull().sum())
print(df.isna().sum())
print(df.min())

Date Time          0
p (mbar)           0
T (degC)           0
Tpot (K)           0
Tdew (degC)        0
rh (%)             0
VPmax (mbar)       0
VPact (mbar)       0
VPdef (mbar)       0
sh (g/kg)          0
H2OC (mmol/mol)    0
rho (g/m**3)       0
wv (m/s)           0
max. wv (m/s)      0
wd (deg)           0
dtype: int64
Date Time          0
p (mbar)           0
T (degC)           0
Tpot (K)           0
Tdew (degC)        0
rh (%)             0
VPmax (mbar)       0
VPact (mbar)       0
VPdef (mbar)       0
sh (g/kg)          0
H2OC (mmol/mol)    0
rho (g/m**3)       0
wv (m/s)           0
max. wv (m/s)      0
wd (deg)           0
dtype: int64
Date Time          01.01.2009 00:10:00
p (mbar)                         913.6
T (degC)                        -23.01
Tpot (K)                         250.6
Tdew (degC)                     -25.01
rh (%)                           12.95
VPmax (mbar)                      0.95
VPact (mbar)                      0.79
VPdef (mbar)                    

There are missing values for wv and max. wv (denoted by -9999). Therefore we need to convert -9999 to nan


In [4]:
# Select only Temperature. atmospheric pressure (p) and relative humidity which has no missing values as input predictors
selected_col = [0,1,2,5] 
df = df.iloc[:,selected_col].set_index(["Date Time"])


#### Data partitioning

- Data was collected at interval 10 minutes or 6 times an hour. Thus, resample the input data to hourly with the **sampling_rate** argument: **step=6**
- Using historical data of 5 days in the past: 5 x 24 x 6 = **720 data points**
- To forecast temperature in the next 12 hours: 12 x 6 = **72 data points**
- Data partition to **70% training** and **30% testing** in order of time
- For Neural Network, following parameters are pre-selected:
   - **Learning rate** = 0.001
   - **Batch size** = 256 (Batch size is the number of samples that usually pass through the neural network at one time)
   - **Epoch** = 10 (Epoch is the number of times that the entire dataset pass through the neural network)

In [5]:
split_fraction = 0.7
train_split = int(split_fraction * int(df.shape[0]))

step = 6 
past = 720
future = 72

learning_rate = 0.0001
batch_size = 256
epochs = 10

As input data has different range, so there would be the need for **standardization**


In [6]:
from sklearn.preprocessing import MinMaxScaler
scale = MinMaxScaler(feature_range=(0,1))
scaled_features = pd.DataFrame(scale.fit_transform(df))
scaled_features.columns = df.columns
scaled_features.index = df.index

# The split of training and testing data must follow the sequential
train_data = scaled_features[0:train_split]
test_data =  scaled_features[train_split:]

train_data.head()

Unnamed: 0_level_0,p (mbar),T (degC),rh (%)
Date Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
01.01.2009 00:10:00,0.814939,0.248632,0.923033
01.01.2009 00:20:00,0.81543,0.242163,0.924182
01.01.2009 00:30:00,0.815037,0.240504,0.929925
01.01.2009 00:40:00,0.81484,0.243822,0.933372
01.01.2009 00:50:00,0.81484,0.244485,0.932223


In [7]:
scaled_features.head()

Unnamed: 0_level_0,p (mbar),T (degC),rh (%)
Date Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
01.01.2009 00:10:00,0.814939,0.248632,0.923033
01.01.2009 00:20:00,0.81543,0.242163,0.924182
01.01.2009 00:30:00,0.815037,0.240504,0.929925
01.01.2009 00:40:00,0.81484,0.243822,0.933372
01.01.2009 00:50:00,0.81484,0.244485,0.932223


#### Selecting input/output for training/testing dataset:

![image](https://user-images.githubusercontent.com/43855029/133829483-8190525b-d278-497c-af5c-73f5a372855d.png)

##### Training

The above graph could be interpreted that:
- The entire dataset is splitted to training and testing set based on the value of train_split
- The X_train contains all input values [p (mbar), T (degC), rh (%)] and the y_train is [T (degC)]. Same for X_test, y_test.
- past = 720 means 720 time steps of 10 minutes interval, which is equivalent to 5 days in the past. Here we use inputs from the past 5 days to predict output for the next 12 hours
- future = 72 means the next 12 hours values are ignored. The forecast is for the next 12 hours value only.
- 1 batch:  Input shape: (256, 120, 3); Target shape: (256,)
- 1 batch contains 256 random samples (defined manually) of input X and Target y. Each random sample contain 120 values of X [p (mbar), T (degC), rh (%)] (120=720/6), converted from 10 minutes interval to 1 hour interval and 1 value of y [T (degC)].

In [8]:
start_ytrain = past + future
end_ytrain = train_split + start_ytrain

x_train = train_data
y_train = scaled_features[start_ytrain:end_ytrain]["T (degC)"]

sequence_length = int(past/step)

##### Testing


In [9]:
start_ytest = end_ytrain
end_ytest = len(test_data) - past - future

x_test = test_data.iloc[:end_ytest,:]
y_test = scaled_features.iloc[start_ytest:]["T (degC)"]


In [10]:
y_test.shape

(125374,)

For training data set, the updated keras (with tensorflow version 2.3 and above) has built-in function to prepare for time series modeling using given batch size and the length for historical data.



#### Using Keras to split training/testing data to different batch:
Here, we utilize the preprocessing time series feature of keras to split training/testing data into different batch:

##### Training


In [12]:
dataset_train = keras.preprocessing.timeseries_dataset_from_array(
    x_train,
    y_train,
    sequence_length=sequence_length,
    sampling_rate = step,
    batch_size=batch_size,
)

In [28]:
for batch in dataset_train.take(1):
    inputs, targets = batch
    print("Input shape:", inputs.numpy().shape)
    print("Target shape:", targets.numpy().shape)


Input shape: (256, 120, 3)
Target shape: (256,)


##### Testing


In [14]:
dataset_test = keras.preprocessing.timeseries_dataset_from_array(
    x_test,
    y_test,
    sequence_length=sequence_length,
    sampling_rate=step,
    batch_size=batch_size
)

In [27]:
for batch in dataset_test.take(2):
    inputs_test, targets_test = batch
    print("Input shape:", inputs_test.numpy().shape)
    print("Target shape:", targets_test.numpy().shape)


Input shape: (256, 120, 3)
Target shape: (256,)
Input shape: (256, 120, 3)
Target shape: (256,)


#### Build Deep learning model with LSTM framework:

In [None]:
inputs = keras.layers.Input(shape=(inputs.shape[1], inputs.shape[2]))
lstm_out = keras.layers.LSTM(32, activation="relu")(inputs)
outputs = keras.layers.Dense(1)(lstm_out)

model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=keras.optimizers.Adam(learning_rate=learning_rate), loss="mse")
model.summary()    

In [None]:
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
model.summary()    

### Create callback Object for tensorboard visualization

In [None]:
import tensorflow
tb_callback = tensorflow.keras.callbacks.TensorBoard(log_dir="logs_LSTM/", histogram_freq=1)

#### Train the LSTM model and vaidate with testing data set:


In [None]:
history = model.fit(
    dataset_train,
    epochs=epochs,
    validation_data=dataset_test,
    callbacks=[tb_callback]
)

#### Visualize the Training & Testing loss with 10 different epoches?


In [None]:
def visualize_loss(history, title):
    loss = history.history["loss"]
    val_loss = history.history["val_loss"]
    epochs = range(len(loss))
    plt.figure()
    plt.plot(epochs, loss, "b", label="Training loss")
    plt.plot(epochs, val_loss, "r", label="Validation loss")
    plt.title(title)
    plt.xlabel("Epochs")
    plt.ylabel("Loss")
    plt.legend()
    plt.show()

visualize_loss(history, "Training and Validation Loss")

#### Save & Load the LSTM trained model

Save LSTM model:

In [None]:
model.save('LSTM_Jena.keras')


Load LSTM model


In [None]:
model = keras.models.load_model('LSTM_Jena.keras')



#### Prediction
Modifying the given [code](https://keras.io/examples/timeseries/timeseries_weather_forecasting/) to make predictions for 5 sets of values from validation set:

First, we need to create a rescale function back to original scale for T (degC)


In [None]:
#Create transformation function to rescale back to original
scaleT = MinMaxScaler(feature_range=(0,1))
scaleT.fit_transform(pd.DataFrame(dfnew[:]["T (degC)"]))

Apply plotting:


In [None]:
def show_plot(plot_data, delta, title):
    labels = ["History", "True Future", "Model Prediction"]
    marker = [".-", "rx", "go"]
    time_steps = list(range(-(plot_data[0].shape[0]), 0))
    if delta:
        future = delta
    else:
        future = 0

    plt.title(title)
    for i, val in enumerate(plot_data):
        if i:
            plt.plot(future, plot_data[i], marker[i], markersize=10, label=labels[i])
        else:
            plt.plot(time_steps, plot_data[i].flatten(), marker[i], label=labels[i])
    plt.legend()
    plt.xlim([time_steps[0], (future + 5) * 2])
    plt.xlabel("Time-Step")
    plt.ylabel("T (degC)")
    plt.show()
    return


for x, y in dataset_test.take(5):
    show_plot(
        #[x[0][:, 1].numpy(), y[0].numpy(), model.predict(x)[0]],
        [scaleT.inverse_transform(pd.DataFrame(x[0][:, 1])),
         scaleT.inverse_transform(pd.DataFrame(pd.Series(y[0].numpy()))),
         scaleT.inverse_transform(pd.DataFrame(model.predict(x)[0]))],         
        12,
        "Single Step Prediction",
    )

In [None]:
## Using all input data


In [None]:
for x, y in dataset_test.take(10):
    print(x.shape)
    print(y.shape)
   
    

In [None]:
x_train.shape