# Recurrent Neural Networks in Python

**Jessica Cervi**

## Activity Overview 

A recurrent neural network (RNN) structure enables learning of sequence-to-sequence mappings. For example, in speech recognition one might have an input sequence of sounds or phonemes as input and the AI model learns to output a sequence of words or sentences. 

In this activity, we'll first look at how to assemble a simplified version of an RNN using the familiar MNIST digits dataset. Finally, we will train our networks and measure its accuracy to see if it makes a correct prediction.


This activity is designed to help you apply the machine learning algorithms you have learned using the packages in `Python`. `Python` concepts, instructions, and starter code are embedded within this Jupyter Notebook to help guide you as you progress through the activity. Remember to run the code of each code cell prior to submitting the assignment. Upon completing the activity, we encourage you to compare your work against the solution file to perform a self-assessment.

## Index:

#### Week 3:  Recurrent Neural Networks

- [Part 1](#part1) - Recurrent Neural Networks
- [Part 2](#part2) - Setting up the Problem
- [Part 3](#part3) - Creating the Model
- [Part 4](#part4) - Compiling the Model 
- [Part 5](#part5) - Training the Model 
- [Part 6](#part6) - RNN Language Model

[Back to top](#Index:) 

<a id='part1'></a>

## Recurrent Neural Networks <a></a>

Similarly to what we did for DNNs, it’s helpful to understand at least some of the basics before getting to the implementation. At a high level, a recurrent neural network (RNN) processes sequences - whether daily stock prices, sentences, or sensor measurements - one element at a time while retaining a memory (called a state) of what has come previously in the sequence.


Recurrent means that the output at the current time step becomes the input to the next time step. At each element of the sequence, the model considers not just the current input, but what it remembers about the preceding elements.


<img src="images/recurrent.png" alt="Drawing" style="width: 400px;"/>


In very simple words, with an RNN, your input data is passed into a cell, which we, along with outputting the activiation function's output, take that output and include it as an input back into this cell.
This can work, but this means we have a new set of problems: How should we weigh incoming new data? How should we handle the recurring data? In other words, if we're not careful, that initial signal could dominate the entire model.
Approaches both in training and in the RNN architecture itself have developed to combat this. 

### Long Short-Term Memory Structure

The  long short-term memory structure (LSTM) is an RNN with a neural network structure. It adds learned “gating” functions to tell the network what portions of the past state or inputs or outputs are most relevant to achieve the learning goal. This enables the LSTM to learn the dynamics at multiple time scales – e.g., perhaps there are monthly periodic effects, in addition to more recent daily effects, that are important in predicting an output.

The idea here is that we can have some sort of functions to determine what to forget from the previous cells, what to add from the new input data, what to output to new cells, and what to actually pass on to the next layer.

<img src="images/lstm.png" alt="Drawing" style="width: 750px;"/>

[Back to top](#Index:) 

<a id='part1'></a>

## Setting up the Problem <a></a>

Now let's work on applying an RNN to something simple, then we'll use an RNN on a more realistic use-case. We're going to use an RNN to predict our [MNIST](https://keras.io/api/datasets/mnist/) dataset, since that's a simple dataset, already in sequences, and we can understand what the model wants from us relatively easily.

Run the code cell below for the imports:

In [None]:
import tensorflow as tf
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM

Run the code cell below to import our data.

Fill the ellipsis with `x_train` and `y_test`.

In [None]:
# mnist is a dataset of 28x28 images of handwritten 
# digits and their labels
mnist = tf.keras.datasets.mnist  

(..., y_train),(x_test,...) = mnist.load_data() 


Now let’s take a look at one of the images in our dataset to see what we're working with. We will plot the first image in our dataset using `matplotlib`.

In [None]:
import matplotlib.pyplot as plt
#plot the first image in the dataset
plt.imshow(x_train[0])

### Preparing the Data

Same as before, we can see the shape again of the dataset and individual samples.

Run the code cell below to normalize the pixel values to the range from 0 to 1:

In [None]:
x_train = x_train/255.0
x_test = x_test/255.0

print(x_train.shape)
print(x_train[0].shape)

[Back to top](#Index:) 

<a id='part3'></a>

## Creating the Model <a></a>

Now we are ready to build our model.

The model type that we will be using is Sequential. The `Keras` class [`sequential`](https://keras.io/api/models/sequential/) is the easiest way to build a model in Keras. It allows you to build a model layer by layer.

Run the code cell below to load the imports for this section.

In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM

This should all be straight-forward, similar to what we've seen previously. I this case, rather than  Conv layers, we're also using [`LSTM`](https://keras.io/api/layers/recurrent_layers/lstm/) as a layer type. 

The other new thing is `return_sequences` in the model structure below.
This flag is used for when you're continuing on to another recurrent layer. If you are, then you want to return sequences. If you're **not going** to another recurrent-type of layer, then you don't set this to true. In our model below, our first LSTM layer is sequence to sequence mapping; the later layer than do additional feature extraction and classification, like in our CNN model.

In [None]:
model = Sequential()

model.add(LSTM(128, input_shape=(x_train.shape[1:]), activation='relu', return_sequences=True))
#model.add(LSTM(128, activation='relu', return_sequences=True))
model.add(Dropout(0.2))

model.add(LSTM(128, activation='relu'))
model.add(Dropout(0.1))

model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))

model.add(Dense(10, activation='softmax'))

[Back to top](#Index:) 

<a id='part4'></a>

## Compiling the Model <a></a>

Next, we need to compile our model. Compiling the model takes three parameters: optimizer, loss, and metrics.

As before, the optimizer controls the learning rate. We will be using `adam` as our optmizer, as it is generally a good optimizer to use for many cases. The adam optimizer adjusts the learning rate throughout training. The learning rate determines how fast the optimal weights for the model are calculated. A smaller learning rate may lead to more accurate weights (up to a certain point), but as we saw, the time it takes to compute the weights will be longer.

We will use `categorical_crossentropy` for our loss function. This is the most common choice for classification. A lower score indicates that the model is performing better.

To make things even easier to interpret, we will use the `accuracy` metric to see the accuracy score on the validation set when we train the model.

In the code cell, fill in the ellipsis to set the argument `loss` equal to `'sparse_categorical_crossentropy'`.

In [None]:
opt = tf.keras.optimizers.Adam(lr=0.001, decay=1e-6)

model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer=opt,
    metrics=['accuracy'],
)

[Back to top](#Index:) 

<a id='part5'></a>

## Training the Model <a></a>


Now, we will train our model. We will train the data in a similar way as we did for autoencoders.

For our validation data, we will use the test set provided to us in our dataset, which we have split into `X_test` and `y_test`.
The number of epochs is the number of times the model will cycle through the data. The more epochs we run, the more the model will improve, up to a certain point. After that point, the model will stop improving during each epoch. For efficiency,  set the number of epochs to `3` in our model.

Run the code cell below. **Note: this cell may take a few minutes to run.**

In [None]:
model.fit(x_train, y_train, epochs=..., validation_data=(x_test, y_test))

**Question**

What is the accuracy of our model after three epochs? Round your answer to two decimal digits.

**CLICK ON THIS CELL TO TYPE YOUR ANSWER**

### Making Predictions

If you want to see the actual predictions that our model has made for the test data, we can use the [`predict` ](https://www.tensorflow.org/api_docs/python/tf/keras/Model) function. 

The predict function will give an array with 10 numbers. Again, as we saw in the DNN activity, these numbers are the probabilities that the input image represents each digit (0–9). The array index with the highest number represents the model prediction. 

In [None]:
#predict images in the test set, and show first four
y_pred = model.predict(x_test)
y_pred_class = np.argmax(y_pred, axis=1)
print(y_pred[:4])

**Question**

What is the predicted output of the second image shown above?.

**CLICK ON THIS CELL TO TYPE YOUR ANSWER**


Finally, let’s compare this with the actual results.

In [None]:
print("predicted results:", y_pred_class[:4])
print("actual results:   ", y_test[:4])

So we see that treating the mnist images as a "sequence" with an RNN is also able to extract relationships, and do a reasonably good job for subsequent classification.

[Back to top](#Index:) 

<a id='part6'></a>

## RNN Language Model<a></a>

RNNs are more typically used to map sequences to sequences (e.g., translate a sentence in one language to another language), or to map a sequence to a prediction for the next item in the sequence. In the example below, we'll use the same RNN structure as above, but this time train it on sequences of words from a corpus of texts.

This example is optional -- but you might find it interesting as a final example for this week's content!

Reference: the code below is inspired by the example at https://towardsdatascience.com/recurrent-neural-networks-by-example-in-python-ffd204f99470

In [None]:
from tensorflow.keras.preprocessing.text import Tokenizer, text_to_word_sequence, one_hot

In [None]:
texts = []
books = ['alice_in_wonderland.txt', 'pride.txt', 'tale.txt', 'hamlet.txt', 'macbeth.txt']
for book in ['alice_in_wonderland.txt']:
    with open('data/'+book, 'r') as f:
        texts.append(text_to_word_sequence(f.read()))

# Tokenizer encodes word strings into integer identifiers for each unique word
tokenizer = Tokenizer(num_words=1000) #limit to this number of most frequent words
tokenizer.fit_on_texts(texts)
corpus = tokenizer.texts_to_sequences(texts)

We'll build data from `SEQLEN` consecutive words, followed by the next word as the "label" that we want to learn to predict.

In [None]:
SEQLEN = 10
STEP = 1

input_words = []
label_words = []
for words in corpus:
    for i in range(0, len(words) - SEQLEN, STEP):
        input_words.append(words[i:i + SEQLEN])
        label_words.append(words[i + SEQLEN])

# Looking at our integer encoded data...
for i in range(5):
    print(input_words[i], label_words[i])
num_words = max(label_words) + 1
print("number of unique words:", num_words)

In [None]:
# One-hot encoding word inputs and labels
X = np.zeros((len(input_words), SEQLEN, num_words), dtype=np.bool)
y = np.zeros((len(input_words), num_words), dtype=np.bool)
for i, start_words in enumerate(input_words):
    for j, w in enumerate(start_words):
        X[i, j, w] = 1
    y[i, label_words[i]] = 1
print("X:", X.shape, "and y:", y.shape)

In [None]:
# Our deep RNN model structure:
model = Sequential()

model.add(LSTM(128, input_shape=(SEQLEN, num_words), activation='relu', return_sequences=True))
model.add(Dropout(0.2))

model.add(LSTM(128, activation='relu'))
model.add(Dropout(0.1))

model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))

model.add(Dense(num_words, activation='softmax'))

In [None]:
opt = tf.keras.optimizers.Adam(lr=0.001, decay=1e-6)

model.compile(
    loss='categorical_crossentropy',
    optimizer=opt,
    metrics=['accuracy'],
)

Run the code cell below. **Note: this cell may take several minutes to run.**

In [None]:
# For now, we training with a small number of epochs, as an example. However, RNNs
# are known for needing a LOT of training data and training time. So if you have the time, 
#you can increase this (e.g., to 100 or more epochs) to get much better results!
model.fit(X, y, epochs=50)

In [None]:
model.save('corpus_rnn_model') 
#model = keras.models.load_model('corpus_rnn_model')

In [None]:
#len(input_words)
X.shape

In [None]:
# Given an starting list of start_words (which must be of
# length SEQLEN and encoded with integer tokens), 
# generate and print an additional num_pred words after that.
def generate_words(start_words, num_pred=10):
    print(tokenizer.sequences_to_texts([start_words])[0],end=' | ')
    for p in range(num_pred):
        Xtest = np.zeros((1, SEQLEN, num_words))
        for i, w in enumerate(start_words):
            Xtest[0, i, w] = 1
        ypred = model.predict(Xtest, verbose=0)[0].argmax()
        print(tokenizer.sequences_to_texts([[ypred]])[0],end=' ')
        start_words = start_words[1:] + [ypred]
    print("\n----")

In [None]:
# Look at some examples, with start words taken from our text    
for example in range(10):
    start_idx = np.random.randint(len(input_words))
    start_words = input_words[start_idx]
    generate_words(start_words)

In [None]:
# And now some starter sentences of our own. Note that
# these must only consist of words that were in our corpus,
# so that the sentence is of the right length.
start_text = "alice ran from the queen with a mushroom soup pot"
start_text = "funny red hare over there said the faster weeping turtle"
start_text = "funny red hare over there said a faster weeping turtle"
start_text = "funny red hare over here said a faster weeping turtle"

start_words = tokenizer.texts_to_sequences([start_text])[0]
if len(start_words) != SEQLEN:
    print(tokenizer.sequences_to_texts([start_words])[0], "=>", len(start_words), "but need", SEQLEN)
else:
    generate_words(start_words)

In [None]:
# if you want to see the vocabulary of allowed words...
all_words = tokenizer.sequences_to_texts([list(range(num_words))])
#print(all_words)