# RNNs

In this notebook you will learn how to build Recurrent Neural Networks (RNNs) for time series forecasting and sequence classification.

## Imports

In [None]:
%matplotlib inline

In [None]:
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import sklearn
import sys
import tensorflow as tf
from tensorflow import keras
import time

In [None]:
print("python", sys.version)
for module in mpl, np, pd, sklearn, tf, keras:
    print(module.__name__, module.__version__)

In [None]:
assert sys.version_info >= (3, 5) # Python ≥3.5 required
assert tf.__version__ >= "2.0"    # TensorFlow ≥2.0 required

# Exercise 1 – Time series forecasting

## 1.1) Load the data

Let's start with a simple univariate time series: the daily temperatures in Melbourne from 1981 to 1990 ([source](https://datamarket.com/data/set/2324/daily-minimum-temperatures-in-melbourne-australia-1981-1990)).

In [None]:
temps = pd.read_csv("datasets/daily-minimum-temperatures-in-me.csv",
                    parse_dates=[0], index_col=0)

In [None]:
temps.info()

In [None]:
temps.head()

In [None]:
temps.plot(figsize=(10,5))
plt.show()

## 1.2) Prepare the data

A few dates are missing, for example December 31st, 1984:

In [None]:
temps.loc["1984-12-29":"1985-01-02"]

Let's ensure there's one row per day, filling missing values with the previous valid value:

In [None]:
temps = temps.asfreq("1D", method="ffill")
temps.loc["1984-12-29":"1985-01-02"]

Alternatively, we could have interpolated using `temps.interpolate()`.

## 1.3) Add the shifted columns

Next, let's create a function to add lag columns:

In [None]:
def add_lags(series, times):
    cols = []
    column_index = []
    for time in times:
        cols.append(series.shift(-time))
        lag_fmt = "t+{time}" if time > 0 else "t{time}" if time < 0 else "t"
        column_index += [(lag_fmt.format(time=time), col_name)
                        for col_name in series.columns]
    df = pd.concat(cols, axis=1)
    df.columns = pd.MultiIndex.from_tuples(column_index)
    return df

We will try to predict the temperature in 5 days (t+5) using the temperatures from the last 30 days (t-29 to t):

In [None]:
X = add_lags(temps, times=range(-30+1,1)).iloc[30:-5]
y = add_lags(temps, times=[5]).iloc[30:-5]

In [None]:
X.head()

In [None]:
y.head()

Note: you may want to use `keras.preprocessing.sequence.TimeseriesGenerator` or `tf.data.Dataset.window()` instead.

## 1.4) Split the dataset

Split this dataset into three periods: training (1981-1986), validation (1987-1988) and testing (1989-1990).

In [None]:
#X_train, y_train = ...
#X_valid, y_valid = ...
#X_test, y_test = ...

## 1.5) Reshape the inputs for the RNN

Keras and TensorFlow expect a 3D NumPy array for any sequence. Its shape should be (number of instances, number of time steps, number of features per time step). Since this is a univariate time series, the last dimension is 1. Reshape the input features to get 3D arrays:

In [None]:
#X_train_3D = ...
#X_valid_3D = ...
#X_test_3D = ...

## 1.6) Build some baseline models

Build some baseline models (at least one) and evaluate them on the validation set, using the Mean Absolute Error (MAE). For example:

* a naive model, that just predicts the last known value.
* an EMA model that predicts an exponential moving average of the last 48 hours (you can try to find the best span).
* a linear model.

Optional: plot the predictions.

## 1.7) Build a simple RNN

Using Keras, build a simple 2-layer RNN with 100 neurons per layer, plus a dense layer with a single neuron. Train the model for 200 epochs with a batch size of 200, using Stochastic Gradient Descent with an learning rate of 0.005. Make sure to print the validation loss during training.

Hints:

* Create a `Sequential` model.
* Add two `SimpleRNN` layers, with 100 units each. The first should return sequences but not the second. Indeed, in a Seq2Vec model, the last RNN layer should not return sequences. The first layer should specify the input shape (i.e., the shape of a single input sequence).
* Use the MSE as the loss.
* Call the model's `compile()` method, passing it an `SGD` instance with `lr=0.005`.
* Call the model's `fit()` method, with the inputs and targets, number of epochs, batch size and validation data.

In [None]:
#model1 = ...

## 1.8) Plot the history

Recall that you can simply use `pd.DataFrame(history.history).plot()`.

## 1.9) Evaluate the model

Evaluate your RNN on the validation set, using the MAE. Try training your model again using the Huber loss and see if you get better performance.

In [None]:
def huber_loss(y_true, y_pred, max_grad=1.):
    err = tf.abs(y_true - y_pred, name='abs')
    mg = tf.constant(max_grad, name='max_grad')
    lin = mg * (err - 0.5 * mg)
    quad = 0.5 * err * err
    return tf.where(err < mg, quad, lin)

## 1.10) Plot the predictions

Make predictions on the validation set and plot them. Compare them to the targets and the baseline predictions.

# Exercise 2 – Forecasting the shifted sequence (Seq2Seq)

Now let's predict temperatures for 30 days (from t-24 to t+5) instead of just one.

## 2.1) Define the 3D targets for training, validation and testing

In [None]:
#Y_train_3D = ...
#Y_valid_3D = ...
#Y_test_3D = ...

## 2.2) Define an `mae_last_step()` function

For the final evaluation, we only want to look at the final time step (t+5). Create an `mae_last_step()` function that computes the MAE based on the final time step.

## 2.3) Build a Seq2Seq model

Build a Seq2Seq model and compile it, using the Huber Loss, and using the last step MAE as the metric. Use SGD with a learning rate of 0.01. Hint: the layers are the same as earlier, except that the last RNN layer has `return_sequences=False`.

## 2.4) Train the model

Fit the model as earlier (but with the 3D targets). Again, evaluate the model and plot the predictions.

# Exercise 3 – LSTM and GRU

## 3.1) Build, train and evaluate a Seq2Seq LSTM

Train the same model as earlier but using `LSTM` or `GRU` instead of `SimpleRNN`. You can also try reducing the learning rate when the validation loss reaches a plateau, using the `ReduceLROnPlateau` callback.

## 3.2) Add $\ell_2$ regularization

Add $\ell_2$ regularization to your RNN, using the layers' `kernel_regularizer` and `recurrent_regularizer` arguments, and the `l2()` function in `keras.regularizers`. Tip: use the `partial()` function in the `functools` package to avoid repeating the same arguments again and again.

# Exercise 4 – Preprocessing with 1D-ConvNets

At the beginning of your sequential model, add a `Conv1D` layer with 32 kernels of size 5, a `MaxPool1D` layer with pool size 5 and strides 2. Train and evaluate the model.

# Exercice 5 – Sequence classification

Let's load the IMDB movie reviews, for binary sentiment analysis (positive review or negative review):

We only want the 10,000 most common words:

In [None]:
num_words = 10000
(X_train, y_train), (X_test, y_test) = keras.datasets.imdb.load_data(num_words=num_words)

Let's also get the word index (word to word id):

In [None]:
word_index = keras.datasets.imdb.get_word_index()

And let's create a reverse index (word id to word). Three special word ids are added:

In [None]:
reverse_index = {word_id + 3: word for word, word_id in word_index.items()}
reverse_index[0] = "<pad>" # padding
reverse_index[1] = "<sos>" # start of sequence
reverse_index[2] = "<oov>" # out-of-vocabulary
reverse_index[3] = "<unk>" # unknown

Let's write a little function to decode reviews:

In [None]:
def decode_review(word_ids):
    return " ".join([reverse_index.get(word_id, "<err>") for word_id in word_ids])

Let's look at a review:

In [None]:
decode_review(X_train[0])

It seems very positive, let's look at the target (0=negative review, 1=positive review):

In [None]:
y_train[0]

And another review:

In [None]:
decode_review(X_train[1])

Very negative! Let's check the target:

In [None]:
y_train[1]

## 5.1) Train a baseline model

Train and evaluate a baseline model using ScikitLearn. You will need to create a pipeline with a `CountVectorizer`, a `TfidfTransformer` and an `SGDClassifier`. The `CountVectorizer` transformer expects text as input, so let's create a text version of the training set and test set:

In [None]:
X_train_text = [decode_review(words_ids) for words_ids in X_train]
X_test_text = [decode_review(words_ids) for words_ids in X_test]

In [None]:
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import SGDClassifier

## 5.2) Create a sequence classifier

Create a sequence classifier using Keras:
* Use `keras.preprocessing.sequence.pad_sequences()` to preprocess `X_train`: this will create a 2D array of 25,000 rows (one per review) and `maxlen=500` columns. Reviews longer than 500 words will be cropped, while reviews shorter 
than 500 words will be padded with zeros.
* The first layer in your model should be an `Embedding` layer, with `input_dim=num_words` and `output_dim=10`. The model will gradually learn to represent each of the 10,000 words as a 10-dimensional vector. So the next layer will receive 3D batchs of shape (batch size, 500, 10).
* Add one or more LSTM layers with 32 neurons each.
* The output layer should be a Dense layer with a sigmoid activation function, since this is a binary classification problem.
* When compiling the model, you should use the `binary_crossentropy` loss.
* Fit the model for 10 epochs, using a batch size of 128 and `validation_split=0.2`.

# Exercise 6 – Bidirectional RNN

Update the previous sequence classification model to use a bidirectional LSTM. For this, you just need to wrap the LSTM layer in a `Bidirectional` layer. If the model overfits, try adding a dropout layer.