# Preparations

In [None]:
%config InlineBackend.figure_format ='retina'
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [None]:
import os
import sys
import math
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, SimpleRNN, LSTM, GRU
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
import seaborn as sns

There is also a utility module `rnnutils.py` that you can use if you want to save time coding. Whenever it is used, it will be commented out to leave you the choice whether you want to use the functions or write up your own solution. In any case, make sure the file is located in the current directory to load it.

In [None]:
import rnnutils

# Lab session: predicting airline passengers

## Aims

In this lab the idea is to try out different RNN models on the Box & Jenkins monthly airline passengers dataset. You will download data and prepare it for later analyses. To help you along the way, some of the steps have been prepared in advance, but in most cases, your task is to complete missing code. Don't hesitate to change parameter settings and experiment with the model architectures.

# Session 1: Vanilla RNN

## Download data

Start by downloading the data and loading it into a pandas dataframe:

In [None]:
!wget https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv --no-check-certificate

In [None]:
df = pd.read_csv('airline-passengers.csv')
df = df.rename(columns={'Month': 'time','Passengers': 'passengers'})
df['time'] = pd.to_datetime(df['time'], format='%Y-%m')
df['year'] = pd.DatetimeIndex(df['time']).year
df['month'] = pd.DatetimeIndex(df['time']).month
df.head()

Plot the data for overview:

In [None]:
plt.plot(df.time, df.passengers)

## Create training and test data


Partition the data into training and test data sets.

In [None]:
train_fraction = 2/3
# Reshape data for MinMaxScaler
data = np.array(df['passengers'].values.astype('float32')).reshape(-1, 1)
split = int(len(data) * train_fraction)
# Rescale the data
scaler = MinMaxScaler(feature_range=(0, 1))
data = scaler.fit_transform(data).flatten()
train = data[range(split)]
test = data[split:]
# The above code is available in rnnutils.make_train_test and called as follows:
# train, test, scaler = rnnutils.make_train_test(data)

## Transform data to input - output pairs


Now that we have train and test data sets we need to convert the data to input - output (X/Y) pairs. The general idea is to take time slices (e.g. 12 data points) as input vectors and use the subsequent value as the known output.

In [None]:
time_steps = 12
# trainX, trainY, trainX_indices, trainY_indices = rnnutils.make_xy(train, time_steps)
# testX, testY, testX_indices, testY_indices = rnnutils.make_xy(test, time_steps) 

## Define the model


Complete the model below to include a SimpleRNN layer and a Dense output layer.

In [None]:
# model = Sequential()
# Add layers here
#
#
# model.compile(loss='mean_squared_error', optimizer='adam')
# model.summary()

Once you are happy with the configuration, fit the model and evaluate. 

In [None]:
# history = model.fit(trainX, trainY, ...)
# Ytrainpred = model.predict(trainX)
# Ytestpred = model.predict(testX)

You can use the utility plotting functions in `rnnutils` to plot training history and predictions

In [None]:
# data = {'train': (Ytrainpred, train, trainY_indices),
#       'test': (Ytestpred, test, testY_indices)}
# rnnutils.plot_pred(...)
# rnnutils.plot_history(...)







# Session 2: LSTM (and optionally GRU) 

Building on session 1, analyse the data set using LSTM layers. Here is a tentative model setup to get you started.

In [None]:
# model = Sequential()
# model.add(LSTM(..., input_shape=(..., ...)))
# model.add(LSTM(..., return_sequences=True))
# model.compile(loss='mean_squared_error', optimizer='adam')