# Recurrent neural networks

(This example is based on a lecture from EE-559 – Deep learning by Francois Fleuret,
an [excellent resource](https://documents.epfl.ch/users/f/fl/fleuret/www/dlc/).)

Why do we need something more complicated than the simple RNN from the slides?

This notebook will demosntrate that a simple RNN struggles to learn as quickly as a LSTM.

A recurrent model maintains a recurrent state that is updated at each time
step. Given a sequence $x$ and an initial recurrent state $h_0$ the model
computes a sequence of recurrent states:
$$
h_t = \Phi(x_t, h_{t-1}), \mathsf{with\ } t = 1, 2, 3, ...
$$

We will try and solve the problem of deciding if a sequence is a mirror of itself or not
using recurrent neural networks.

| sequence | label |
|----------|-------|
| (1,2, 1,2) | 1 |
| (3,4, 5,6) | 0 |
| (7, 7)     | 1 |
| (6,4,2, 6,4,2) | 1 |

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt

from collections import Counter

import numpy as np
from sklearn.model_selection import train_test_split

In [2]:
from keras.utils import to_categorical


# define a function that can tell if a sequence x is mirrored
# or not
def is_mirrored(x):
    return np.allclose(x[:x.shape[0]//2], x[-x.shape[0]//2:])


def generate_data(n_samples=100, max_length=10):
    """Generate sequences that are mirrored or not.
    
    It should return approximately `n_samples` samples
    with a roughly equal split between the two classes
    
    `max_length` sets the maximum length a HALF sequence can
    have. For each sequence a length is picked at random
    between 1 and `max_length`. This means total length of
    the sequence is 2*max_length.
    """
    pass

Using TensorFlow backend.


## Generate data

Generate a dataset with a lot of entries. It is a good idea to get everything
running with a small dataset, and then increase it. This way you don't spend
too much time waiting for errors that only occur after training.

Split your data into a training and testing set.

In [3]:
X, y, counter = generate_data(30000)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=2)

In [4]:
# check we get very roughly 30000 samples
# I got about 25000 samples when asking for 30k
len(counter)

25326

In [5]:
# what shape should the data have? Depends on max_length
# and how many different symbols there are
X_train.shape

(20260, 20, 10)

In [6]:
# X_train should be one-hot encoded
X_train[:2]

array([[[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.]],

In [7]:
# check that the dataset is roughly balanced
Counter(y_train[:,0])

Counter({1.0: 11358, 0.0: 8902})

In [8]:
from keras.layers import SimpleRNN, Input
from keras.layers import LSTM
from keras.layers import Activation, Dense
from keras.models import Model

In [9]:
# make sure you understand what all this does
# feel free to experiment with some settings

def make_model(lstm=False):
    """Construct a simple recurrent network.
    
    Uses either a `SimpleRNN` or a `LSTM` depending
    on the value of `lstm`.
    """
    x = Input(shape=X_train.shape[1:])
    if lstm:
        # ... your code for a LSTM layer here ...
        # use the relu activation
    else:
        # ... your code for a SimpleRNN here ...
        # use the relu activation
    h = Dense(2)(h)
    out = Activation('softmax')(h)
    model = Model(inputs=x, outputs=out)
    model.compile('adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [10]:
rnn = make_model()
lstm = make_model(lstm=True)

In [None]:
# Check that you can correctly predict the rough value of the
# accuracy of each of the untrained networks before
# getting started. Was your prediction correct?
# Score your untrained model to check your guess.

In [11]:
rnn.predict(X_test[:5]), y_test[:5]

(array([[0.47061643, 0.5293836 ],
        [0.27798405, 0.722016  ],
        [0.37072337, 0.62927663],
        [0.5167797 , 0.4832203 ],
        [0.4999961 , 0.50000393]], dtype=float32), array([[1., 0.],
        [1., 0.],
        [1., 0.],
        [1., 0.],
        [1., 0.]], dtype=float32))

In [None]:
# Train both networks for 30 epochs. Check if you should
# train them for more or less iterations.

In [13]:
rnn_history = rnn.fit(X_train, y_train, epochs=30,
                      validation_split=0.2, verbose=0)

In [14]:
lstm_history = lstm.fit(X_train, y_train, epochs=30,
                        validation_split=0.2, verbose=0)

In [None]:
# Create a helper function that can take a Python
# iterable as input and applies each of the model's `predict()`
# method to it, printing a human friendly version of the result
# Do the models work?

In [None]:
# Modify the model construction function so that it
# can use a GRU layer as well. How does the GRU
# compare to the LSTM and Simple RNN?

In [None]:
# Experiment with longer sequences.
# When do things stop working?
# What if you increase or decrease the number of allowed symbols
# that can appear in a sequence? Right now it is just integers, how about chars?
# How does the accuracy on the test set behave
# as a function of sequence length?

In [18]:
rnn.predict(X_test[:5])

array([[8.9403522e-01, 1.0596478e-01],
       [9.6727699e-01, 3.2722980e-02],
       [9.1677642e-01, 8.3223537e-02],
       [9.9965203e-01, 3.4801522e-04],
       [9.9946386e-01, 5.3611689e-04]], dtype=float32)

In [19]:
y_test[:5]

array([[1., 0.],
       [1., 0.],
       [1., 0.],
       [1., 0.],
       [1., 0.]], dtype=float32)

In [20]:
rnn.evaluate(X_test, y_test)



[0.23871791721025665, 0.8969601263324122]

## Bonus

Can you create a dataset of correctly spelt words and words with typos in them?

Can your RNNs learn to classify words as typos? What about words they've never seen?