# Introduction to Artificial Neural Networks with Keras
This notebook take a look at how Recurrent Neural Networks (RNN)s are structured and how the can
be built using Keras and TensorFlow.

## Index

[RNNs](#RNNs)

[Forecasting a Time Series](#Forecasting-a-Time-Series)

[Baseline Metrics](#Baseline-Metrics)

[Implementing a Simple RNN](#Implementing-a-Simple-RNN)

[Deep RNNs](#Deep-RNNs)

[Forecasting Several Time Steps Ahead](#Forecasting-Several-Time-Steps-Ahead)

[Long Short-Term Memory](Long-Short-Term-Memory)



## RNNs
Recurrent Neural Networks (RNNs), are a class of nets that can be used to predict the future.
They can analyze stock prices, and tell us when to buy or sell. In autonomous vehicle systems
they can anticipate car trajectories and help avoid accidents.

More generally, they can work on sequences of arbitrary lengths, rather than on fixed-size inputs
. They take sentences, documents or audio samples as input, making them extremely useful for
natural language processing applications such as automatic translation or speech-to-text.

### Forecasting a Time Series
When data is in a sequence of one or more values per time step, the data is said to be a *time
series*.

Examples of time series include, a study into the number of active users per hour on a website,
the daily temperature or a company's financial health, measured quarterly using multiple metrics.

In the first two examples there is a single value per time step, so these are *univariate* time
series, while the financial example there are multiple values per time step (eg, the company's
revenue, debt, etc...), so it is a *multivariate* time series.

A typical task is to predict future values, which is called *forecasting*. Another task is to
fill in the blanks: to predict (or rather "postdict") missing values from the past. This is
called *imputation*


In [1]:
# Import modules
import tensorflow as tf
from tensorflow import  keras
import numpy as np

This function creates as many time series as requested (```batch_size```), each of length
```n_steps```, and there is just one value per time step in the series (univariate).

The function returns a NumPy array of shape [batch_size, time steps, 1], where each series is the
 sum of two sine waves of fixed amplitude but random frequencies and phases, plus noise

In [2]:
# Generate time series
def generate_time_series(batch_size, n_steps):
    freq1, freq2, offset1, offset2 = np.random.rand(4, batch_size, 1)
    time = np.linspace(0, 1, n_steps)
    series = 0.5 * np.sin((time - offset1) * (freq2 * 10 + 10))
    series += 0.2 * np.sin((time - offset2) * (freq2 * 20 + 20))
    series += 0.1 * (np.random.rand(batch_size, n_steps) - 0.5)

    return series[..., np.newaxis].astype(np.float32)

In [3]:
# Create time series and train/test split
n_steps= 50
series = generate_time_series(10000, n_steps+ 1)
X_train, y_train = series[:7000, :n_steps], series[:7000, - 1]
X_valid, y_valid = series[7000:9000, :n_steps], series[7000:9000, -1]
X_test, y_test = series[9000:, n_steps], series[9000, -1]

### Baseline Metrics
Before any ML project it is often a good idea to get a baseline metrics. The simplest approach is to
 predict the last value in the series. This is called *naive forecasting* and is sometimes
 surprisingly difficult to out perform.

 Another approach is to use a fully connected network. Since it expects a flat list of features
 for each input, we need to add a ```Flatten``` layer. A simple Linear Regression model can be
 used. Each prediction will be a linear combination of the values in the time series

In [4]:
# MSE
y_pred = X_valid[:, -1]
np.mean(keras.losses.mean_squared_error(y_valid, y_pred))

0.020646073

In [5]:
# Fit model
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[50, 1]),
    keras.layers.Dense(1)
])

model.compile(loss='mean_squared_error', optimizer='Adam')

model.fit(X_train, y_train, epochs=20, validation_data=(X_valid, y_valid))

Train on 7000 samples, validate on 2000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x1e2621549c8>

In [6]:
# Evaluate model
model.evaluate(X_valid, y_valid)



0.0037212533466517927

### Implementing a Simple RNN
A simple RNN can be built using the Sequential API. It contains a single layer and a single
neuron. There is no need to specify the length of the input sequences, since a recurrent neural
network can process any number of time steps.

The ```SimpleRNN``` layer uses the hyperbolic tangent activation function.

In [7]:
# Fit model
model = keras.models.Sequential([
    keras.layers.SimpleRNN(1, input_shape=[None, 1])
])

model.compile(loss="mean_squared_error", optimizer="Adam")

model.fit(X_train, y_train, epochs=20)

Train on 7000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x1e255e26388>

In [8]:
# Evaluate model
model.evaluate(X_valid, y_valid)



0.01166666191443801

This simple RNN achieves a worse score than the linear model. *Note, the linear model has 51
parameters (one parameter per input plus a bias term). The RNN uses just three parameters (one
parameter per input, one per hidden state dimension and the bias term)

Deeper RNNs are needed to improve the perfomance

### Deep RNNs
Implementing a deep RNN with ```tf.keras``` is simple: stack the recurrent layers.


Make sure to set ```return_sequences=True``` for all recurrent layers (except the last one, if
you only care about the last output). If not, a 2D array (containing only the output of the last
time step) instead of a 3D array (containing the outputs of all the time steps)

The model below reaches a MSE of on the validation set. This beats t

In [9]:
# Fit model
model = keras.models.Sequential([
    keras.layers.SimpleRNN(20, return_sequences=True, input_shape=[None, 1]),
    keras.layers.SimpleRNN(20, return_sequences=True),
    keras.layers.SimpleRNN(1)
])

model.compile(loss="mean_squared_error", optimizer="Adam")

model.fit(X_train, y_train, epochs=20)

Train on 7000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x1e2fd9d60c8>

In [10]:
# Evaluate model
model.evaluate(X_valid, y_valid)



0.001915568302385509

The last layer is not idea. It must have a single unit because we want to forcast a univariate
time series, and this means we must have a single ouput value per time step. Having a single unit
 means that the hidden state is just a single layer. ```SimpleRNN``` layer uses a tanh
 activation function, thus the predicted values must lie between -1 and 1.

 For these reasons it might be better to replace the output layer with a ```Dense``` layer. It
 would run slightly faster, the accuracy would be roughly the same, and it would allow us to
 choose any output activation function.

In [13]:
# Fit model
model = keras.models.Sequential([
    keras.layers.SimpleRNN(20, return_sequences=True, input_shape=[None, 1]),
    keras.layers.SimpleRNN(20),
    keras.layers.Dense(1)
])

model.compile(loss="mean_squared_error", optimizer="Adam")

model.fit(X_train, y_train, epochs=20)

Train on 7000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x1e307eb7648>

In [14]:
# Evaluate model
model.evaluate(X_valid, y_valid)



0.0019033951871097088

### Forecasting Several Time Steps Ahead
An RNN can be trained to predict all 10 next values at once. A sequence-to-vector will still be
used and it will output 10 values instead of 1. However, the targets need to be changed to values
 containing the next 10 values first.

In [16]:
# Create time series
series = generate_time_series(10000, n_steps + 10)
X_train, y_train = series[:7000, :n_steps], series[:7000, - 10:, 0]
X_valid, y_valid = series[7000:9000, :n_steps], series[7000:9000, - 10:, 0]
X_test, y_test = series[9000:, n_steps], series[9000, - 10:, 0]

In [17]:
X_new, y_new = series[:, :n_steps], series[:, n_steps:]

In [23]:
# Fit model
model = keras.models.Sequential([
    keras.layers.SimpleRNN(20, return_sequences=True, input_shape=[None, 1]),
    keras.layers.SimpleRNN(20),
    keras.layers.Dense(10)
])

model.compile(loss="mean_squared_error", optimizer="Adam")

model.fit(X_train, y_train)

Train on 7000 samples


<tensorflow.python.keras.callbacks.History at 0x1e30b24c908>

In [25]:
# Predict
y_pred = model.predict(X_new)
y_pred

array([[-2.6105353e-01, -1.9034575e-01, -4.4157591e-01, ...,
        -1.6589804e-01,  1.2603959e-01,  1.8965770e-02],
       [-4.6449639e-03,  2.6571440e-02,  1.4873151e-01, ...,
         3.4333071e-01,  3.2440677e-01,  3.3738253e-01],
       [ 8.2862608e-02,  2.4327156e-01,  3.0574772e-01, ...,
         3.0127183e-01,  3.3435893e-01,  2.6876193e-01],
       ...,
       [-3.4906992e-01, -1.3679799e-01, -3.7060240e-01, ...,
        -3.0094889e-01,  5.2491896e-02, -2.0654613e-01],
       [ 1.6430587e-01,  1.3133971e-01,  2.6064666e-04, ...,
        -4.1307354e-01, -4.8359442e-01, -3.7437671e-01],
       [ 7.0051378e-01,  4.5148227e-01,  3.6372069e-01, ...,
        -3.5080183e-01, -3.3569303e-01, -4.8144358e-01]], dtype=float32)

This model works well. It can be improved however, instead of training the model to forecast the
next 10 values only at the very last time step. It can be trained to forecast the next 10 values
at each time step.

We can turn this sequence-to-vector RNN into a sequence-to-sequence RNN. The advantage here is
that the loss will contain a term for the output of the RNN at each and every time step, not just
 the output of the last time step. Many more error gradients will flow through the model, not
 only flowing through time but also from the output of each time step. This will stabilize and
 speed up training.

In [27]:
y = np.empty((10000, n_steps, 10))
for step_ahead in range(1, 10 + 1):
    y[:, :, step_ahead - 1] = series[:, step_ahead:step_ahead + n_steps, 0]
    y_train = y[:7000]
    y_valid = y[7000:9000]
    y_test = y[9000:]

```TimeDistributed``` layer can be used to wrap a ```Dense``` layer and apply it to every time
step of its input sequence.

In [29]:
# Create model
model = keras.models.Sequential([
    keras.layers.SimpleRNN(20, return_sequences=True, input_shape=[None, 1]),
    keras.layers.SimpleRNN(20, return_sequences=True),
    keras.layers.TimeDistributed(keras.layers.Dense(10))
])

In [30]:
# Create evaluation metric
def last_time_step_mse(y_true, y_pred):
    return keras.metrics.mean_squared_error(y_true[:, -1], y_pred[:, -1])

In [33]:
# Fit model
optimizer = keras.optimizers.Adam(lr=0.01)
model.compile(loss="mse", optimizer=optimizer, metrics=[last_time_step_mse])

model.fit(X_train, y_train)

Train on 7000 samples


<tensorflow.python.keras.callbacks.History at 0x1e30e3f8988>

In [34]:
# Evaluate model
model.evaluate(X_valid, y_valid)



[0.0332684488594532, 0.019032883]

### Long Short-Term Memory
Due to the transformations that the data goes through when traversing and RNN, some information
is lost at each time step. After a while, the RNNs state contains no trace of the first inputs.
*Long Short-Term Memory* cells fix this by detecting long term dependencies in the data.

The ```LSTM``` layer uses an optimized implementation when running on a GPU so it is powerul in
its use.


In [37]:
# LSTM
model = keras.models.Sequential([
    keras.layers.LSTM(20, return_sequences=True, input_shape=[None, 1]),
    keras.layers.LSTM(20, return_sequences=True),
    keras.layers.TimeDistributed(keras.layers.Dense(10))
])

model.compile(loss="mean_squared_error", optimizer="Adam")

model.fit(X_train, y_train, epochs=20)

Train on 7000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x1e314f96f88>

In [38]:
# Evaluate model
model.evaluate(X_valid, y_valid)



0.017572412215173243