# Recurrent Neural Network Example

Build a recurrent neural network (LSTM) with TensorFlow.

- Author: Aymeric Damien
- Project: https://github.com/aymericdamien/TensorFlow-Examples/

## RNN Overview

<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png" alt="nn" style="width: 600px;"/>

References:
- [Long Short Term Memory](http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf), Sepp Hochreiter & Jurgen Schmidhuber, Neural Computation 9(8): 1735-1780, 1997.

## MNIST Dataset Overview

This example is using MNIST handwritten digits. The dataset contains 60,000 examples for training and 10,000 examples for testing. The digits have been size-normalized and centered in a fixed-size image (28x28 pixels) with values from 0 to 9. For simplicity, each image has been flatten and converted to a 1-D numpy array of 784 features (28*28).

![MNIST Dataset](http://neuralnetworksanddeeplearning.com/images/mnist_100_digits.png)

To classify images using a recurrent neural network, we consider every image row as a sequence of pixels. Because MNIST image shape is 28*28px, we will then handle 28 sequences of 28 timesteps for every sample.

More info: http://yann.lecun.com/exdb/mnist/

In [1]:
from __future__ import print_function

import tensorflow as tf
from tensorflow.contrib import rnn

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

# system call interface for the operating system
import os

Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


In [2]:
base_model_path = "Models"
log_dir = os.path.join(base_model_path, "logdir")
model_name = "recurrent_network"

In [3]:
# Training Parameters
learning_rate = 0.001
training_steps = 10000
batch_size = 128
display_step = 200

# Network Parameters
num_input = 28 # MNIST data input (img shape: 28*28)
timesteps = 28 # timesteps
num_hidden = 128 # hidden layer num of features
num_classes = 10 # MNIST total classes (0-9 digits)

In [55]:
RNN_GRAPH = tf.Graph()
with RNN_GRAPH.as_default():
    
    # tf Graph input
    with tf.variable_scope("Input"): 
        X = tf.placeholder("float", [None, timesteps, num_input], name="input_data")
        Y = tf.placeholder("float", [None, num_classes], name="input_labels") 
    
    def RNN(x):
        with tf.variable_scope("RNN"):
            with tf.variable_scope("Output_Weights"):
                # Define weights and biases
                weights = tf.Variable(tf.random_normal([num_hidden, num_classes]), name="Output_Weight_Matrix")
                
                # weights summary:
                weights_summary = tf.summary.histogram("RNN_output_weights", weights)
                
                biases = tf.Variable(tf.random_normal([num_classes]), name="Output_biases")
                
            # Prepare data shape to match `rnn` function requirements
            # Current data input shape: (batch_size, timesteps, n_input)
            # Required shape: 'timesteps' tensors list of shape (batch_size, n_input)
            
            # Unstack to get a list of 'timesteps' tensors of shape (batch_size, n_input)
            x = tf.unstack(x, timesteps, 1)

            # Get lstm cell output
            outputs, states = rnn.static_rnn(rnn.BasicLSTMCell(num_hidden, forget_bias=1.0), x, dtype=tf.float32)
            
            # Linear activation, using rnn inner loop last output
            return tf.matmul(outputs[-1], weights) + biases
    
    logits = RNN(X)
    prediction = tf.nn.softmax(logits, name="prediction")

    # Define loss and optimizer
    with tf.variable_scope("Loss"):
        loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))
        
        loss_summary = tf.summary.scalar("Loss", loss_op)
    
    train_op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss_op, name="train_op")

    # Evaluate model (with test logits, for dropout to be disabled)
    with tf.variable_scope("Accuracy_calculation"):
        correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1), name="correct_predictions")
        accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32), name="accuracy")
        
        acc_summary = tf.summary.scalar("Accuracy", accuracy)
    
    all_summaries = tf.summary.merge_all()

In [57]:
# Start training
with tf.Session(graph=RNN_GRAPH) as sess:
    # define the paths required
    log_path = os.path.join(log_dir, model_name)
    model_path = os.path.join(base_model_path, model_name)
    
    # create the tensorboard writer and perform summary generation
    tensorboard_writer = tf.summary.FileWriter(log_path, graph=sess.graph, filename_suffix=".bot")
    
    # create the saver object for saving and restoring the model
    saver = tf.train.Saver(max_to_keep=2,)
    
    if(os.path.isfile(os.path.join(model_path, "checkpoint"))):
        # restore the model
        saver.restore(sess, tf.train.latest_checkpoint(model_path))
        
    else: 
        # run the global variable initializer
        sess.run(tf.global_variables_initializer())

    for step in range(1, training_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Reshape data to get 28 seq of 28 elements
        batch_x = batch_x.reshape((batch_size, timesteps, num_input))
        # Run optimization op (backprop)
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
        if step % display_step == 0 or step == 1:
            # Calculate batch loss, accuracy and the summaries
            loss, acc, summ = sess.run([loss_op, accuracy, all_summaries], feed_dict={X: batch_x,
                                                                 Y: batch_y})
            # dump the collected summaries:
            tensorboard_writer.add_summary(summ, global_step=step)
            
            # save the model so far:
            saver.save(sess, os.path.join(model_path, "model"), global_step=step)
            
            print("Step " + str(step) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc))

    print("Optimization Finished!")

    # Calculate accuracy for 128 mnist test images
    test_len = 128
    test_data = mnist.test.images[:test_len].reshape((-1, timesteps, num_input))
    test_label = mnist.test.labels[:test_len]
    print("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={X: test_data, Y: test_label}))

Step 1, Minibatch Loss= 2.6426, Training Accuracy= 0.086
Step 200, Minibatch Loss= 0.1613, Training Accuracy= 0.945
Step 400, Minibatch Loss= 0.1476, Training Accuracy= 0.938
Step 600, Minibatch Loss= 0.0911, Training Accuracy= 0.969
Step 800, Minibatch Loss= 0.1308, Training Accuracy= 0.953
Step 1000, Minibatch Loss= 0.0803, Training Accuracy= 0.969
Step 1200, Minibatch Loss= 0.1046, Training Accuracy= 0.977
Step 1400, Minibatch Loss= 0.1270, Training Accuracy= 0.961
Step 1600, Minibatch Loss= 0.0197, Training Accuracy= 1.000
Step 1800, Minibatch Loss= 0.0389, Training Accuracy= 0.969
Step 2000, Minibatch Loss= 0.0297, Training Accuracy= 0.984
Step 2200, Minibatch Loss= 0.0452, Training Accuracy= 0.977
Step 2400, Minibatch Loss= 0.0874, Training Accuracy= 0.984
Step 2600, Minibatch Loss= 0.0823, Training Accuracy= 0.984
Step 2800, Minibatch Loss= 0.0314, Training Accuracy= 0.984
Step 3000, Minibatch Loss= 0.0077, Training Accuracy= 1.000
Step 3200, Minibatch Loss= 0.0238, Training Acc