# Tutorial VI: Recurrent Neural Networks

<p>
Bern Winter School on Machine Learning, 28.01-01.02 2019<br>
Mykhailo Vladymyrov
</p>

This work is licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.

In this session we will see what RNN is. We will use it to predict/generate text sequence, but same approach can be applied to any sequential data.


(Largely adopted from `https://github.com/roatienza/Deep-Learning-Experiments`)

So far we looked at the data available altogether. In many cases the data is sequential (weather, speach, sensor signals etc).
RNNs are specifically designed for such tasks.

![rnn.png](https://scits-training.unibe.ch/data/figures/rnn.png)

## download libraries

In [1]:
! wget http://scits-training.unibe.ch/data/tut_files/t6.tgz
! tar -xvzf t6.tgz

--2019-02-01 09:20:53--  http://scits-training.unibe.ch/data/tut_files/t6.tgz
Resolving scits-training.unibe.ch (scits-training.unibe.ch)... 130.92.251.56
Connecting to scits-training.unibe.ch (scits-training.unibe.ch)|130.92.251.56|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6896 (6.7K) [application/octet-stream]
Saving to: 't6.tgz'


2019-02-01 09:20:53 (305 MB/s) - 't6.tgz' saved [6896/6896]

RNN/
RNN/belling_the_cat.txt
RNN/rnn.txt
utils/
utils/gr_disp.py
utils/inception.py
utils/__init__.py
tar: A lone zero block at 42


## 1. Load necessary libraries

In [2]:
import sys

import numpy as np
import matplotlib.pyplot as plt
import IPython.display as ipyd
import tensorflow as tf
import collections
import time

# We'll tell matplotlib to inline any drawn figures like so:
%matplotlib inline
plt.style.use('ggplot')
from utils import gr_disp

from IPython.core.display import HTML
HTML("""<style> .rendered_html code { 
    padding: 2px 5px;
    color: #0000aa;
    background-color: #cccccc;
} </style>""")

## 2. Load the text data

In [3]:
def read_data(fname):
    with open(fname) as f:
        content = f.readlines()
    content = [x.strip() for x in content]
    content = [word for i in range(len(content)) for word in content[i].split()]
    content = np.array(content)
    return content

In [4]:
training_file = 'RNN/rnn.txt'

In [5]:
training_data = read_data(training_file)

In [18]:
print(training_data[:300])

['recurrent' 'neural' 'networks' ',' 'or' 'rnns' '(' 'rumelhart' 'et' 'al'
 '.' ',' '1986a' ')' ',' 'are' 'a' 'family' 'of' 'neural' 'networks' 'for'
 'processing' 'sequential' 'data' '.' 'much' 'as' 'a' 'convolutional'
 'network' 'is' 'a' 'neural' 'network' 'that' 'is' 'specialized' 'for'
 'processing' 'a' 'grid' 'of' 'values' 'such' 'as' 'an' 'image' ',' 'a'
 'recurrent' 'neural' 'network' 'is' 'a' 'neural' 'network' 'that' 'is'
 'specialized' 'for' 'processing' 'a' 'sequence' 'of' 'values' 'x' '(' '1'
 ')' ',' '.' '.' '.' ',' 'x' '(' 'tau' ')' '.' 'just' 'as' 'convolutional'
 'networks' 'can' 'readily' 'scale' 'to' 'images' 'with' 'large' 'width'
 'and' 'height' ',' 'and' 'some' 'convolutional' 'networks' 'can'
 'process' 'images' 'of' 'variable' 'size' ',' 'recurrent' 'networks'
 'can' 'scale' 'to' 'much' 'longer' 'sequences' 'than' 'would' 'be'
 'practical' 'for' 'networks' 'without' 'sequence' '-' 'based'
 'specialization' '.' 'most' 'recurrent' 'networks' 'can' 'also' 'process'


## 3. Build dataset
We will assign an id to each word, and make dictionaries word->id and id->word.
The most frequently repeating words have lowest id

In [7]:
def build_dataset(words):
    count = collections.Counter(words).most_common()
    dictionary = dict()
    for word, _ in count:
        dictionary[word] = len(dictionary)
    reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys()))
    return dictionary, reverse_dictionary

In [8]:
dictionary, reverse_dictionary = build_dataset(training_data)
vocab_size = len(dictionary)

In [9]:
print(dictionary)

{'feature': 242, 'examples': 144, 'related': 243, 'models': 371, 'exposition': 482, 'grid': 245, 'practice': 246, 'to': 7, 'than': 58, 'major': 248, 'position': 145, 'form': 249, 'represented': 146, 'can': 26, 'learns': 251, '2012': 252, 'general': 147, 'has': 82, 'nepal': 103, 'recover': 253, 'specified': 148, 'rewrite': 254, 'what': 317, 'mapping': 256, 'point': 257, 'any': 149, 'take': 376, 'applied': 105, 'manifests': 258, 'diagram': 199, 'learned': 260, 'classical': 261, 'passage': 262, 'out': 263, 'figure': 71, 'represent': 107, 'repeated': 150, 'operation': 151, 'image': 264, 'regardless': 250, 'involving': 194, 'computations': 265, '1989': 266, 'possible': 79, 'advantage': 267, 'hidden': 152, 'recurrent': 18, 'refers': 173, 'unfolding': 68, 'simplify': 270, 'requires': 272, 'fully': 273, 'readily': 274, 'state': 27, 'in': 8, 'selectively': 275, 'connected': 276, 'like': 277, 'typically': 108, 'share': 109, 'idea': 80, 'all': 69, 'specialization': 278, 'same': 55, 'extract': 280

Then the whole text will look as a sequence of word ids:

In [10]:
print([dictionary[w] for w in training_data])

[18, 25, 22, 2, 96, 91, 4, 365, 121, 125, 1, 2, 458, 5, 2, 302, 6, 428, 3, 25, 22, 17, 138, 337, 115, 1, 106, 16, 6, 84, 20, 10, 6, 25, 20, 15, 10, 175, 17, 138, 6, 245, 3, 113, 45, 16, 51, 264, 2, 6, 18, 25, 20, 10, 6, 25, 20, 15, 10, 175, 17, 138, 6, 14, 3, 113, 13, 4, 12, 5, 2, 1, 1, 1, 2, 13, 4, 67, 5, 1, 475, 16, 84, 22, 26, 274, 192, 7, 135, 31, 330, 314, 24, 289, 2, 24, 216, 84, 22, 26, 134, 135, 3, 53, 112, 2, 18, 22, 26, 192, 7, 106, 391, 142, 58, 75, 29, 334, 17, 22, 238, 14, 23, 282, 278, 1, 154, 18, 22, 26, 180, 134, 142, 3, 53, 48, 1, 7, 153, 78, 348, 22, 7, 18, 22, 2, 19, 131, 7, 376, 267, 3, 66, 3, 0, 293, 346, 347, 8, 206, 128, 24, 126, 371, 3, 294, 83, 52, 42, 41, 35, 170, 3, 6, 44, 1, 127, 52, 244, 36, 79, 7, 469, 24, 474, 0, 44, 7, 144, 3, 35, 320, 4, 35, 72, 2, 419, 5, 24, 195, 41, 308, 1, 88, 19, 411, 81, 42, 17, 28, 129, 3, 0, 11, 118, 2, 19, 471, 61, 195, 7, 14, 72, 61, 432, 259, 86, 2, 453, 109, 126, 349, 41, 35, 14, 72, 24, 41, 35, 228, 8, 11, 1, 45, 52, 10, 41

## 4. Build model

In [19]:
# Parameters
learning_rate = 0.001
training_iters = 250000
display_step = 1000
n_input = 7

# number of units in RNN cell
n_hidden = [1024, 512]

def RNN(x, n_vocab, n_hid):
    x = tf.unstack(x, n_input, 1)

    basic_cells = [tf.nn.rnn_cell.LSTMCell(n) for n in n_hid]
    rnn_cell = tf.nn.rnn_cell.MultiRNNCell(basic_cells)
    
    # generate prediction
    outputs, states = tf.nn.static_rnn(rnn_cell, x, dtype=tf.float32)

    # there are n_input outputs but
    # we only want the last output
    last_output = outputs[-1]
    
    w = tf.Variable(tf.random_normal([n_hid[-1], n_vocab]))
    b = tf.Variable(tf.random_normal([n_vocab]))
    y = tf.matmul(last_output, w) + b
    return y

                    
g = tf.Graph()
with g.as_default():
    # tf Graph input
    x = tf.placeholder("float", [None, n_input, 1])
    y = tf.placeholder("float", [None, vocab_size])
    
    pred = RNN(x, vocab_size, n_hidden)

    # Loss and optimizer
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=pred, labels=y))
    optimizer = tf.train.RMSPropOptimizer(learning_rate=learning_rate).minimize(cost)

    # Model evaluation
    correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

In [12]:
gr_disp.show(g.as_graph_def())

## 5. Run!

In [20]:
with tf.Session(graph=g) as session:
    session.run(tf.global_variables_initializer())
    step = 0
    offset = np.random.randint(0,n_input+1)
    end_offset = n_input + 1
    acc_total = 0
    loss_total = 0

    start_time = time.time()
    while step < training_iters:
        # Generate a minibatch. Add some randomness on selection process.
        if offset > (len(training_data)-end_offset):
            offset = np.random.randint(0, n_input+1)

        symbols_in_keys = [ [dictionary[ str(training_data[i])]] for i in range(offset, offset+n_input) ]
        symbols_in_keys = np.reshape(np.array(symbols_in_keys), [-1, n_input, 1])

        symbols_out_onehot = np.zeros([vocab_size], dtype=float)
        symbols_out_onehot[dictionary[str(training_data[offset+n_input])]] = 1.0
        symbols_out_onehot = np.reshape(symbols_out_onehot,[1,-1])

        _, acc, loss, onehot_pred = session.run([optimizer, accuracy, cost, pred], \
                                                feed_dict={x: symbols_in_keys, y: symbols_out_onehot})
        loss_total += loss
        acc_total += acc
        if (step+1) % display_step == 0:
            print("Iter= " + str(step+1) + ", Average Loss= " + \
                  "{:.6f}".format(loss_total/display_step) + ", Average Accuracy= " + \
                  "{:.2f}%".format(100*acc_total/display_step))
            acc_total = 0
            loss_total = 0
            symbols_in = [training_data[i] for i in range(offset, offset + n_input)]
            symbols_out = training_data[offset + n_input]
            symbols_out_pred = reverse_dictionary[int(tf.argmax(onehot_pred, 1).eval())]
            print("%s - [%s] vs [%s]" % (symbols_in,symbols_out,symbols_out_pred))
        step += 1
        offset += (n_input+1)
    print("Optimization Finished!")
    print("Elapsed time: ", time.time() - start_time)

    
    for itr in range(100):
        prompt = "%s words: " % n_input
        sentence = input(prompt)
        sentence = sentence.strip()
        words = sentence.split(' ')
        if len(words) != n_input:
            continue
        try:
            symbols_in_keys = [dictionary[str(words[i])] for i in range(len(words))]
            for i in range(128):
                keys = np.reshape(np.array(symbols_in_keys), [-1, n_input, 1])
                onehot_pred = session.run(pred, feed_dict={x: keys})
                onehot_pred_index = int(tf.argmax(onehot_pred, 1).eval())
                sentence = "%s %s" % (sentence,reverse_dictionary[onehot_pred_index])
                symbols_in_keys = symbols_in_keys[1:]
                symbols_in_keys.append(onehot_pred_index)
            print(sentence)
        except:
            print("Word not in dictionary")

Iter= 1000, Average Loss= 7.881673, Average Accuracy= 2.30%
['state', 'at', 'time', 't', 'to', 'the', 'state'] - [at] vs [the]
Iter= 2000, Average Loss= 5.757039, Average Accuracy= 4.20%
['a', 'recurrent', 'neural', 'network', '.', 'many', 'recurrent'] - [neural] vs [)]
Iter= 3000, Average Loss= 5.622638, Average Accuracy= 5.20%
['recurrent', 'neural', 'networks', '.', 'we', 'then', 'describe'] - [many] vs [of]
Iter= 4000, Average Loss= 5.437735, Average Accuracy= 4.80%
['sixth', 'word', 'or', 'in', 'the', 'second', 'word'] - [of] vs [the]
Iter= 5000, Average Loss= 5.433819, Average Accuracy= 4.90%
['state', 'to', 'another', 'state', ',', 'rather', 'than'] - [specified] vs [t]
Iter= 6000, Average Loss= 5.479347, Average Accuracy= 4.80%
['chapter', '14', ')', '.', 'equation', '10', '.'] - [5] vs [the]
Iter= 7000, Average Loss= 5.444962, Average Accuracy= 4.80%
['acyclic', 'computational', 'graph', '.', 'the', 'unfolded', 'computational'] - [graph] vs [)]
Iter= 8000, Average Loss= 5.4953

Iter= 61000, Average Loss= 4.955246, Average Accuracy= 25.10%
['length', '.', 'a', 'traditional', 'fully', 'connected', 'feedforward'] - [network] vs [in]
Iter= 62000, Average Loss= 2.539262, Average Accuracy= 50.30%
['f', 'with', 'the', 'same', 'parameters', 'at', 'every'] - [time] vs [time]
Iter= 63000, Average Loss= 3.089235, Average Accuracy= 45.40%
['model', ',', 'such', 'as', 'a', 'biological', 'neural'] - [network] vs [structure]
Iter= 64000, Average Loss= 2.071311, Average Accuracy= 66.60%
['(', 't', ')', '=', 'f', '(', 's'] - [(] vs [(]
Iter= 65000, Average Loss= 3.532812, Average Accuracy= 45.40%
['have', 'connections', 'that', 'go', 'backward', 'in', 'time'] - [,] vs [physical]
Iter= 66000, Average Loss= 1.653713, Average Accuracy= 62.50%
['sequence', '.', 'for', 'example', ',', 'consider', 'the'] - [two] vs [two]
Iter= 67000, Average Loss= 2.558411, Average Accuracy= 55.50%
['1', ')', ')', 'as', 'input', 'and', 'produces'] - [the] vs [the]
Iter= 68000, Average Loss= 2.72346

KeyboardInterrupt: 

## 6. Excercice 
* Run with 5-7 input words instead of 3.
* increase number of training iterations, since convergance will take much longer (training as well!).

## 7. Further reading

[Illustrated Guide to Recurrent Neural Networks](https://towardsdatascience.com/illustrated-guide-to-recurrent-neural-networks-79e5eb8049c9)

[Illustrated Guide to LSTM’s and GRU’s: A step by step explanation](https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21)