# Intro to Recurrent Neural Networks
### Starter Code
* **PyData Bristol - 5th Meetup:** https://www.meetup.com/PyData-Bristol/events/255667468/
* **Event URL:** https://www.eventbrite.co.uk/e/intro-to-recurrent-neural-networks-tickets-52401888459
* **Date:** Tue 13th November 2018
* **Instructor:** John Sandall
* **Contact:** john@coefficient.ai / @john_sandall

---

In [1]:
%%capture
!pip install seaborn numpy pandas matplotlib pathlib sklearn

In [2]:
# Imports
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pathlib import Path
import seaborn as sns
from sklearn import datasets, ensemble, linear_model, model_selection, neighbors, metrics, preprocessing, neural_network
import warnings

%matplotlib inline
warnings.filterwarnings('ignore')
np.random.seed(0)

## Lab: Build A Recurrent Neural Network

Let's build a basic RNN using just numpy. We won't train it for now, we'll instead just get a feeling for how it's working. We'll use input data that has 20 samples, each with two-features, and two time points (t=0 and t=1).

In [3]:
n_features = 2
n_samples = 20

In [4]:
# Create our input data. Here's X at t=0
X0 = np.random.randint(low=-10, high=10, size=(n_samples, n_features))
X0

array([[  2,   5],
       [-10,  -7],
       [ -7,  -3],
       [ -1,   9],
       [  8,  -6],
       [ -4,   2],
       [ -9,  -4],
       [ -3,   4],
       [  7,  -5],
       [  3,  -2],
       [ -1,   9],
       [  6,   9],
       [ -5,   5],
       [  5, -10],
       [  8,  -7],
       [  7,   9],
       [  9,   9],
       [  4,  -3],
       [-10,  -9],
       [ -1, -10]])

In [5]:
# Similarly here's X at t=1
X1 = np.random.randint(low=-10, high=10, size=(n_samples, n_features))

Let's also create the weight matrices `Wx` (connecting X to neurons) and `Wy` (connecting output y at t-1 to neurons at time t).

In [6]:
n_neurons = 3

# Connects 2-features to 3-neurons
Wx = np.random.randint(low=-5, high=5, size=(n_features, n_neurons))
Wx

array([[-1, -2,  2],
       [ 0,  0, -5]])

In [7]:
# Connects 3-neuron output at time t-1 to 3-neurons at time t (the recurrent weights)
Wy = np.random.randint(low=-5, high=5, size=(n_neurons, n_neurons))
Wy

array([[-4,  0,  4],
       [-2, -5,  0],
       [-5, -4, -3]])

In [8]:
# We'll also need the bias
b = np.ones(n_neurons)
b

array([1., 1., 1.])

> #### Exercise: Calculate Y0!
> 
> **Tips**:
> - Remember `Y0 = activation(X0*Wx + b)` and `Y1 = activation(X0*Wx + Y0*Wy + b)`
> - You'll need `np.matmul()` to do multiply two matrixes.
> - You'll need `np.heaviside(some_vector, 0)` for your activation function.

In [9]:
def activation(m):
    return m*np.heaviside(m, 0)

In [10]:
Y0 = activation(np.matmul(X0,Wx) + b)
Y0

array([[-0., -0., -0.],
       [11., 21., 16.],
       [ 8., 15.,  2.],
       [ 2.,  3., -0.],
       [-0., -0., 47.],
       [ 5.,  9., -0.],
       [10., 19.,  3.],
       [ 4.,  7., -0.],
       [-0., -0., 40.],
       [-0., -0., 17.],
       [ 2.,  3., -0.],
       [-0., -0., -0.],
       [ 6., 11., -0.],
       [-0., -0., 61.],
       [-0., -0., 52.],
       [-0., -0., -0.],
       [-0., -0., -0.],
       [-0., -0., 24.],
       [11., 21., 26.],
       [ 2.,  3., 49.]])

In [11]:
Y1 = activation(np.matmul(X0,Wx) + np.matmul(Y0,Wy) + b)
Y1

array([[-0., -0., -0.],
       [-0., -0., 12.],
       [-0., -0., 28.],
       [-0., -0., -0.],
       [-0., -0., -0.],
       [-0., -0.,  3.],
       [-0., -0., 34.],
       [-0., -0., -0.],
       [-0., -0., -0.],
       [-0., -0., -0.],
       [-0., -0., -0.],
       [-0., -0., -0.],
       [-0., -0., -0.],
       [-0., -0., -0.],
       [-0., -0., -0.],
       [-0., -0., -0.],
       [-0., -0., -0.],
       [-0., -0., -0.],
       [-0., -0., -0.],
       [-0., -0., -0.]])

In [12]:
np.argmax(Y1,axis=1)

array([0, 2, 2, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

## Lab: Build A Recurrent Neural Network using Keras

Let's work through a simple example now using Keras.

In [13]:
%%capture
!pip install keras tensorflow

In [14]:
# Imports
from keras.layers import SimpleRNN, Dense, TimeDistributed
from keras.models import Sequential

Using TensorFlow backend.


In [15]:
# Check if Keras is using GPU version of TensorFlow
from tensorflow.python.client import device_lib

print(device_lib.list_local_devices())

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 9936168441113400842
]


Let's now look at 5 time steps, with:
- input X has 20 samples and two features
- output y is of length 3 (we have three neurons).

In [16]:
# Input format shape for Keras is (sample size, number of time steps, features)
n_steps = 5

X = np.random.randint(low=-10, high=10, size=(n_samples, n_steps, n_features))
X.shape

(20, 5, 2)

In [17]:
y = np.random.randint(low=-10, high=10, size=(n_samples, n_steps, n_neurons))
y.shape

(20, 5, 3)

> #### Exercise: Define a simple `Sequential` RNN model using Keras
> - The model should contain one layer (`SimpleRNN` with 3 units, and `return_sequences=True`
> - Assign it to a variable called `model`
> - Use the Keras documentation if you get stuck!

In [25]:
# Define your model here...
model = Sequential()
model.add(SimpleRNN(3, return_sequences=True))

> #### Exercise: Compile & fit the model
> - Use MSE loss and `rmsprop` optimizer.
> - Fit it to X and y, using 10 epochs and batch size of 32.

In [29]:
model.compile(optimizer='rmsprop', loss='MSE')
model.fit(X, y, epochs=10, batch_size=32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x11ef20090>

Let's try it out! We'll generate some new data `X_new` in the same shape as X.

In [30]:
# We'll have one sample, so we want it to have shape (1, 5, 2)
X.shape

(20, 5, 2)

In [31]:
# This has shape (1, 5, 2)
X_new = np.array([
    [[1, 0],  # t = 0 (two features)
     [0, 1],  # t = 1
     [0, 1],  # t = 2
     [0, 1],  # t = 3
     [0, 1],  # t = 4
    ]
])
X_new.shape

(1, 5, 2)

In [32]:
# Our RNN is able to predict some outcomes y of length 3, for each time step.
model.predict(X_new)

array([[[ 0.65920615,  0.23279989,  0.794165  ],
        [-0.06249426,  0.76769173,  0.5828201 ],
        [-0.7510873 ,  0.6860359 ,  0.5734208 ],
        [-0.8786577 ,  0.3345276 ,  0.38988632],
        [-0.8524856 ,  0.24765082,  0.03467968]]], dtype=float32)

> #### Exercise: Predict single value outputs for y (instead of vectors of length 3)
> - Within your `Sequential` model, add a fully connected `Dense()` network with `input_dim=1` and `output_dim=1`
> - Compile as before
> - Fit to the new y provided
> - Predict for `X_new` again, confirming that your outputs are a single time series of 5 numbers.

In [33]:
# We want a newly shaped y to predict, containing 20 samples over 5 time steps, but otherwise scalar output.
y = np.random.randint(low=-10, high=10, size=(n_samples, n_steps, 1))
y.shape

(20, 5, 1)

In [53]:
# Define your model here...
model = Sequential()
model.add(SimpleRNN(3, return_sequences=True))
model.add(Dense(input_dim=1, output_dim=1))
model.compile(optimizer='rmsprop', loss='MSE')
model.fit(X, y, epochs=10, batch_size=32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x120b7b910>

In [54]:
model.predict(X_new)

array([[[-0.5718623 ],
        [ 0.9151196 ],
        [-0.45066446],
        [-0.12483628],
        [-0.3880046 ]]], dtype=float32)

> #### Exercise: Train a more fully fledged RNN on real data.
> - We'll construct an X input with `1` at t=0 and `0` otherwise.
> - Our `y` output just has a simple pattern.
> - The RNN should be able to learn the relationship between the X pattern, and the corresponding y pattern.
> - Re-use your code from before, i.e. a Sequential model containing a SimpleRNN (this time with 50 units), plus a Dense layer with 1 unit and `sigmoid` activation.
> - Compile as before, and fit to `x_train` and `y_train` using 10 epochs.

In [55]:
# These are our sequences. The RNN should learn to predict the
# 0.8 and 0.6 correctly because it can remember the 1 in the inputs.
x_seed = [1, 0, 0, 0, 0, 0]
y_seed = [1, 0.8, 0.6, 0, 0, 0]

In [62]:
# Let's create 1000 identical samples.
n_samples = 1000

x_train = np.array([[x_seed] * n_samples]).reshape(n_samples, len(x_seed), 1)
print(x_train.shape)
y_train = np.array([[y_seed] * n_samples]).reshape(n_samples, len(y_seed), 1)
print(y_train.shape)

(1000, 6, 1)
(1000, 6, 1)


In [63]:
# Define your model here...
model = Sequential()
model.add(SimpleRNN(50, return_sequences=True))
model.add(Dense(input_dim=1, output_dim=1, activation='sigmoid'))

In [64]:
# Compile...
model.compile(optimizer='rmsprop', loss='MSE')

In [65]:
# Fit...
model.fit(x_train, y_train, epochs=10, batch_size=32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x12241d550>

In [66]:
# Let's predict for this x_new
x_new = np.array([[[1],[0],[0],[0],[0],[0]]])
x_new

array([[[1],
        [0],
        [0],
        [0],
        [0],
        [0]]])

In [67]:
model.predict(x_new)

array([[[0.97347236],
        [0.80541945],
        [0.63428426],
        [0.00627161],
        [0.00542594],
        [0.00383898]]], dtype=float32)

## Lab: LSTMs and GRUs

In [68]:
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import Embedding
from keras.layers import LSTM, GRU

> #### Exercise: Try using the LSTM and GRU units from Keras on the previous example. Does it appear to perform any better?

In [71]:
# Define your model here...
model = Sequential()
model.add(LSTM(50, return_sequences=True))
model.add(Dense(input_dim=1, output_dim=1, activation='sigmoid'))
model.compile(optimizer='rmsprop', loss='MSE')
model.fit(x_train, y_train, epochs=10, batch_size=32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x122959650>

In [72]:
# Define your model here...
model = Sequential()
model.add(GRU(50, return_sequences=True))
model.add(Dense(input_dim=1, output_dim=1, activation='sigmoid'))
model.compile(optimizer='rmsprop', loss='MSE')
model.fit(x_train, y_train, epochs=10, batch_size=32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x124607150>

> #### Exercise: Try adding some additional components from the example provided [on the Keras docs here](https://keras.io/getting-started/sequential-model-guide/), such as Dropout. How does this improve things?

In [75]:
# Define your model here...
model = Sequential()
model.add(LSTM(50, return_sequences=True))
model.add(Dropout(0.5))
model.add(Dense(input_dim=1, output_dim=1, activation='sigmoid'))
model.compile(optimizer='rmsprop', loss='MSE')
model.fit(x_train, y_train, epochs=10, batch_size=32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x12518dad0>

> #### Suggested "homework" exercise: Work through the Keras "text generation example" code: https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py
> 
> Try applying this to your own text dataset!

In [77]:
from keras.utils.data_utils import get_file
path = get_file(
    'nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')

Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt


In [84]:
import io
with io.open(path, encoding='utf-8') as f:
    text = f.read().lower()
print('corpus length:', len(text))

('corpus length:', 600893)


In [86]:
chars = sorted(list(set(text)))
print('total chars:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

('total chars:', 57)


In [88]:
# cut the text in semi-redundant sequences of maxlen characters
maxlen = 40
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('nb sequences:', len(sentences))

print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

('nb sequences:', 200285)
Vectorization...


In [89]:
from keras.optimizers import RMSprop

# build the model: a single LSTM
print('Build model...')
model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
model.add(Dense(len(chars), activation='softmax'))

optimizer = RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

Build model...


In [90]:
def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


def on_epoch_end(epoch, _):
    # Function invoked at end of each epoch. Prints generated text.
    print()
    print('----- Generating text after Epoch: %d' % epoch)

    start_index = random.randint(0, len(text) - maxlen - 1)
    for diversity in [0.2, 0.5, 1.0, 1.2]:
        print('----- diversity:', diversity)

        generated = ''
        sentence = text[start_index: start_index + maxlen]
        generated += sentence
        print('----- Generating with seed: "' + sentence + '"')
        sys.stdout.write(generated)

        for i in range(400):
            x_pred = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_indices[char]] = 1.

            preds = model.predict(x_pred, verbose=0)[0]
            next_index = sample(preds, diversity)
            next_char = indices_char[next_index]

            generated += next_char
            sentence = sentence[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

In [None]:
from keras.callbacks import LambdaCallback
print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

model.fit(x, y,
          batch_size=128,
          epochs=60,
          callbacks=[print_callback])

Epoch 1/60