## Recurrent Neural Network

MLPs are used for a lot of problems, e.g. pattern recognition, classification, function approximation, etc. But they are not intrinsically intelligent as they lack the emulaton of a human memory's associative characteristics. For such purpose, we need a __recurrent neural network__ (RNN).

![](../figures/rnn-unrolled.png)

Image from [Understanding LSTM Networks](http://colah.github.io/posts/2015-08-Understanding-LSTMs/) by Chris Olah.

An RNN has a feedback loop from its outputs to inputs. This feedback loop enables the neural network to retain information (refer to image above). For instance, if it has the following input:

    Michael C. was born in Paris, France. He is married, and has three children. He received a M.S. in neurosciences from the University Pierre & Marie Curie and the Ecole Normale Superieure in 1987, and ... His mother tongue is ???
    
If the neural network was tasked to predict what is the next value, i.e. the answer to the 'mother tongue' sequence, an RNN is the architecture that must be chosen. This storage of information is the duty of the feedback loop in an RNN. However, like the given example, there are a lot of unrelated information in between the two for the model to determine what is the answer. The context that gives the answer to the question is that Michael C. was born in Paris, France. But again, a lot of information before the mother tongue description comes into light.

This problem is called the _long term dependency problem_. The RNN fails to remember that much of information as it progresses through time (input). The solution to this problem is the use of [Long Short Term Memory (LSTM)](http://colah.github.io/posts/2015-08-Understanding-LSTMs/) (refer to the image below). In a nutshell, LSTM has the functionality to determine which input is necessary to the next time step, i.e. which information it must keep and which must not be kept.

![](../figures/rnn-lstm.png)

Image from [Understanding LSTM Networks](http://colah.github.io/posts/2015-08-Understanding-LSTMs/) by Chris Olah.

An improvement on the LSTM model is the Gated Recurrent Unit (GRU), which is computationally more efficient than the LSTM. For this session, we are going to implement GRU for MNIST classification.

In [1]:
# Load the TensorFlow library
import tensorflow as tf

# Load the NumPy library
import numpy as np

# Load the input reader
from tensorflow.examples.tutorials.mnist import input_data

# Load the MNIST data
mnist = input_data.read_data_sets('/home/darth/MNIST_data', one_hot=True)

Extracting /home/darth/MNIST_data/train-images-idx3-ubyte.gz
Extracting /home/darth/MNIST_data/train-labels-idx1-ubyte.gz
Extracting /home/darth/MNIST_data/t10k-images-idx3-ubyte.gz
Extracting /home/darth/MNIST_data/t10k-labels-idx1-ubyte.gz


In [2]:
# define the hyper-parameters
BATCH_SIZE = 128
CELL_SIZE = 32
CHUNK_SIZE = 28
HM_EPOCHS = 10
LEARNING_RATE = 0.01
NUM_CHUNKS = 28
NUM_CLASSES = 10

# input placeholders
x_input = tf.placeholder(dtype=tf.float32, shape=[None, NUM_CHUNKS, CHUNK_SIZE])
y_input = tf.placeholder(dtype=tf.float32, shape=[None, NUM_CLASSES])
initial_state = tf.placeholder(dtype=tf.float32, shape=[None, CELL_SIZE])

# define the model
cell = tf.contrib.rnn.GRUCell(CELL_SIZE)
output, states = tf.nn.dynamic_rnn(cell, x_input, initial_state=initial_state, dtype=tf.float32)

# define the weights and biases
xav_init = tf.contrib.layers.xavier_initializer
weight = tf.get_variable('weights', shape=[CELL_SIZE, NUM_CLASSES], initializer=xav_init())
bias = tf.get_variable('biases', initializer=tf.constant(0.1, shape=[NUM_CLASSES]))

# the output of the RNN
final_state = tf.transpose(output, [1, 0, 2])
last = tf.gather(final_state, int(final_state.get_shape()[0]) - 1)

# predicted value
output = tf.matmul(last, weight) + bias

Now, let us define the cost function and train step

In [3]:
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=output, labels=y_input))
train_step = tf.train.AdamOptimizer(learning_rate=LEARNING_RATE).minimize(cross_entropy)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Define the measurement of the model's accuracy.

In [4]:
correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y_input, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float16))

Initialize the `initial_state` of the network.

In [5]:
# this state will be updated as the network learns
current_state = np.zeros([BATCH_SIZE, CELL_SIZE])

# variables initializer
init_op = tf.global_variables_initializer()

Start the training of RNN.

In [6]:
with tf.Session() as sess:
    sess.run(init_op)
    
    for epoch in range(HM_EPOCHS):
        epoch_loss = 0
        for _ in range(mnist.train.num_examples // BATCH_SIZE):
            # load the input data by batch
            batch_x, batch_y = mnist.train.next_batch(BATCH_SIZE)
            
            # resize the image data
            batch_x = batch_x.reshape((BATCH_SIZE, NUM_CHUNKS, CHUNK_SIZE))
            
            # create input dictionary
            feed_dict = {x_input: batch_x, y_input: batch_y, initial_state: current_state}
            
            _, epoch_loss, train_accuracy, next_state = sess.run([train_step, cross_entropy, accuracy, states], feed_dict=feed_dict)
            
            
        # display the state of the model
        print('Epoch : {} completed out of {}, loss : {} accuracy {}'.format(epoch + 1, HM_EPOCHS, epoch_loss, train_accuracy))
        
        # update the RNN state
        current_state = next_state
    
    # load the test data
    x_test = mnist.test.images.reshape((-1, NUM_CHUNKS, CHUNK_SIZE))
    y_test = mnist.test.labels
    
    # the state must not be the trained stated
    # the size of test data is 10000
    test_accuracy = sess.run(accuracy, feed_dict={x_input: x_test, y_input: y_test, initial_state: np.zeros([10000, CELL_SIZE])})
    
    # Display test accuracy
    print('Test Accuracy : {}'.format(test_accuracy))

Epoch : 1 completed out of 10, loss : 0.12615236639976501 accuracy 0.9609375
Epoch : 2 completed out of 10, loss : 0.07255779206752777 accuracy 0.9765625
Epoch : 3 completed out of 10, loss : 0.10057511180639267 accuracy 0.9609375
Epoch : 4 completed out of 10, loss : 0.02002027817070484 accuracy 1.0
Epoch : 5 completed out of 10, loss : 0.1277269870042801 accuracy 0.96875
Epoch : 6 completed out of 10, loss : 0.08655179291963577 accuracy 0.96875
Epoch : 7 completed out of 10, loss : 0.14416192471981049 accuracy 0.9609375
Epoch : 8 completed out of 10, loss : 0.04098448529839516 accuracy 0.9921875
Epoch : 9 completed out of 10, loss : 0.0689292699098587 accuracy 0.984375
Epoch : 10 completed out of 10, loss : 0.035995617508888245 accuracy 0.9921875
Test Accuracy : 0.98046875


With just few iterations, the GRU (RNN) model was able to reach a 98.04% test accuracy on MNIST classification. Further training will improve this accuracy.