# Simple RNN Models for Sequential MNIST with Tensorflow

Adapted by Sungwon Kim from the work of [Ceshine Lee](https://medium.com/the-artificial-impostor/notes-understanding-tensorflow-part-2-f7e5ece849f5), [Aymeric Damien](https://github.com/aymericdamien/TensorFlow-Examples/), and [Sungjoon](https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/rnn_mnist_simple.ipynb)


## RNN Overview

<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png" alt="nn" style="width: 600px;"/>

References:
- [Long Short Term Memory](http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf), Sepp Hochreiter & Jurgen Schmidhuber, Neural Computation 9(8): 1735-1780, 1997.

## MNIST Dataset Overview

This example is using MNIST handwritten digits. The dataset contains 60,000 examples for training and 10,000 examples for testing. The digits have been size-normalized and centered in a fixed-size image (28x28 pixels) with values from 0 to 1. For simplicity, each image has been flattened and converted to a 1-D numpy array of 784 features (28*28).

![MNIST Dataset](http://neuralnetworksanddeeplearning.com/images/mnist_100_digits.png)

To classify images using a recurrent neural network, we consider every image row as a sequence of pixels. Because MNIST image shape is 28*28px, we will then handle 28 sequences of 28 timesteps for every sample.

More info: http://yann.lecun.com/exdb/mnist/

## Imports & Load MNIST

In [4]:
from pathlib import Path
import random 
import time

import tensorflow as tf
from tensorflow.contrib import rnn, cudnn_rnn
from tensorflow.keras import layers


import numpy as np

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("data/MNIST/", one_hot=True)

Instructions for updating:
Please use urllib or similar directly.
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting data/MNIST/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting data/MNIST/train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting data/MNIST/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/MNIST/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


In [5]:
# ONLY for students using Google Colab
!pip install matplotlib

You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [6]:
import matplotlib.pyplot as plt

## Row-by-row Sequential MNIST

How RNN model for row-by-row sequential MNIST works from [Sungjoon's notebook](https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/rnn_mnist_simple.ipynb):
![](https://raw.githubusercontent.com/sjchoi86/Tensorflow-101/582cc1d946f0ecbce078e493b8ccb1d7b40684df/notebooks/images/etc/rnn_mnist_look.jpg)

## Configuration of Neural Network

In [7]:
# Training Parameters
learning_rate = 0.001
training_steps = 3000
batch_size = 128
display_step = 200

# Network Parameters
num_input = 28 # MNIST data input (img shape: 28*28)
timesteps = 28 # timesteps (the number of rows)
num_hidden = 64 # hidden layer num of features
num_classes = 10 # MNIST total classes (0-9 digits)
num_layers = 1 # the number of LSTM's layers

## Tensorflow Graph

### Placeholders

In [8]:
# tf Graph input
X = tf.placeholder("float", [None, timesteps, num_input])
Y = tf.placeholder("float", [None, num_classes])

### RNN

In [11]:
# 1. Static RNN version
# Unstack to get a list of 'timesteps' tensors of shape (batch_size, n_input)
input_x = tf.unstack(X, timesteps, 1)

print(len(input_x), input_x[0].shape)

assert num_layers > 0
# Define a lstm cell with tensorflow
if num_layers == 1:
    lstm_cell = rnn.BasicLSTMCell(num_hidden, forget_bias=1.0)
else:
    cells = [rnn.BasicLSTMCell(num_hidden, forget_bias=1.0) for _ in range(num_layers)]
    lstm_cell = rnn.MultiRNNCell(cells)

# Get lstm cell output
outputs, states = rnn.static_rnn(lstm_cell, input_x, dtype=tf.float32)

print(len(outputs), outputs[0].shape)

28 (?, 28)


ValueError: Variable rnn/basic_lstm_cell/kernel already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

  File "<ipython-input-9-066016310200>", line 16, in <module>
    outputs, states = rnn.static_rnn(lstm_cell, input_x, dtype=tf.float32)
  File "/home/snulife/Documents/KJH/ML/ML_3.6/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3296, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/home/snulife/Documents/KJH/ML/ML_3.6/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3214, in run_ast_nodes
    if (yield from self.run_code(code, result)):


In [19]:
# 공부용
#
print(len(outputs), outputs[0].shape)
print(len(states), states[0].shape, states[1].shape)
print(outputs[-1], num_classes)

28 (?, 64)
2 (?, 64) (?, 64)
Tensor("rnn/basic_lstm_cell/Mul_83:0", shape=(?, 64), dtype=float32) 10


### Fully-connected layer

In [20]:
# Linear activation, using rnn inner loop last output
logits = tf.layers.dense(outputs[-1], num_classes)

Instructions for updating:
Use keras.layers.dense instead.


### Define Loss, Optimizer, and Evaluation Metrics

In [0]:
# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(
    logits=logits, labels=Y))
optimizer = tf.train.RMSPropOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)

# Evaluate model (with test logits, for dropout to be disabled)
prediction = tf.nn.softmax(logits)
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

## Train RNN


In [22]:
# 공부용
batch_x, batch_y = mnist.train.next_batch(batch_size)
print(batch_x.shape)
batch_x = batch_x.reshape(batch_size, 28, num_input)
print(batch_x.shape)

(128, 784)
(128, 28, 28)


In [0]:
# Start training
with tf.Session() as sess:

    # Run the initializer
    sess.run(tf.global_variables_initializer())

    for step in range(1, training_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Reshape data to get 28 seq of 28 elements
        # batch_x = batch_x.reshape((batch_size, timesteps, num_input))
        batch_x = batch_x.reshape((batch_size, 28, num_input))
        batch_x = batch_x[:, :timesteps, :]
        if step == 1:
            plt.imshow(batch_x[0]);
            plt.show()
        # Run optimization op (backprop)
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
        if step % display_step == 0 or step == 1:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x,
                                                                 Y: batch_y})
            print("Step " + str(step) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc))

    print("Optimization Finished!")

    # Calculate accuracy for 128 mnist test images
    test_len = 128
    test_data = mnist.test.images[:test_len].reshape((-1, 28, num_input))
    test_data = test_data[:, :timesteps, :]
    
    test_label = mnist.test.labels[:test_len]
    print("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={X: test_data, Y: test_label}))

## LSTM using dynamic RNN

## Configuration of Neural Networks

In [0]:
# Training Parameters
learning_rate = 0.001 #0.02
training_steps = 5000
batch_size = 128
display_step = 200

# Network Parameters
num_input = 28 # MNIST data input (img shape: 28*28)
timesteps = 28 # max timesteps
num_hidden = 64 # hidden layer num of features
num_classes = 10 # MNIST total classes (0-9 digits)
num_layers = 2 # the number of LSTM's layers

## Tensorflow Graph

In [0]:
tf.reset_default_graph()

### Placeholders

In [0]:
# tf Graph input
X = tf.placeholder("float", [None, timesteps, num_input])
Y = tf.placeholder("float", [None, num_classes])

### RNN



In [0]:
"""
# previous static rnn version
# Unstack to get a list of 'timesteps' tensors of shape (batch_size, n_input)
input_x = tf.unstack(X, timesteps, 1)

assert num_layers > 0
# Define a lstm cell with tensorflow
if num_layers == 1:
    lstm_cell = rnn.BasicLSTMCell(num_hidden, forget_bias=1.0)
else:
    cells = [rnn.BasicLSTMCell(num_hidden, forget_bias=1.0) for _ in range(num_layers)]
    lstm_cell = rnn.MultiRNNCell(cells)

# Get lstm cell output
outputs, states = rnn.static_rnn(lstm_cell, input_x, dtype=tf.float32)
"""

assert num_layers > 0
if num_layers == 1:
    # Define a lstm cell with tensorflow
    lstm_cell = rnn.LSTMBlockCell(num_hidden, forget_bias=1.0)
else:
    cells = [rnn.LSTMBlockCell(num_hidden, forget_bias=1.0) for _ in range (num_layers)]
    lstm_cell = rnn.MultiRNNCell(cells)
    
    
# Get lstm cell output
# time_major=True --> inputs_shape : [timesteps, batch_size, num_hidden]
# time_major=False --> inputs_shape : [batch_size, timesteps, num_hidden]
outputs, states = tf.nn.dynamic_rnn(cell=lstm_cell, inputs=X, time_major=False, dtype=tf.float32)

### Fully-connected layer

In [0]:
# Linear activation, using rnn inner loop last output
# outputs : [batch_size, timesteps, num_hidden]
# logits : [batch_size, num_classes]
logits = tf.layers.dense(tf.layers.batch_normalization(outputs[:, -1, :]), 
                         num_classes,
                        kernel_initializer=tf.orthogonal_initializer())

### Define Loss, Optimizer, and Evaluation Metrics

In [0]:
# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(
    logits=logits, labels=Y))
optimizer = tf.train.RMSPropOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)

# Evaluate model (with test logits, for dropout to be disabled)
prediction = tf.nn.softmax(logits)
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

## Train RNN


In [0]:
# Start training
with tf.Session() as sess:

    # Run the initializer
    sess.run(tf.global_variables_initializer())
    start_time = time.time()
    for step in range(1, training_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Reshape data to get 28 seq of 28 elements
        batch_x = batch_x.reshape((batch_size, timesteps, num_input))

        if step == 1:
            plt.imshow(batch_x[0]);
            plt.show()
        # Run optimization op (backprop)
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
        if step % display_step == 0:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x,
                                                               Y: batch_y})
            print("Step " + str(step) + ", Minibatch Loss= " + \
                "{:.4f}".format(loss) + ", Training Accuracy= " + \
                "{:.3f}".format(acc) + ", {} steps Elapsed time = {:.2f}".format(display_step, time.time() - start_time))
            start_time = time.time()

    print("Optimization Finished!")

    # Calculate accuracy for 128 mnist test images
    test_len = 128
    test_data = mnist.test.images[:test_len].reshape((-1, timesteps, num_input))

    test_label = mnist.test.labels[:test_len]
    print("Testing Accuracy:", \
          sess.run(accuracy, feed_dict={X: test_data, Y: test_label}))

## Pixel-by-pixel Sequential MNIST

View every example as a 784 x 1 sequence.

## Configuration of Neural Networks

In [0]:
# Training Parameters
learning_rate = 0.002
training_steps = 5000
batch_size = 32
display_step = 200

# Network Parameters
num_input = 1 # MNIST data input (img shape: 28*28)
timesteps = 784 # timesteps
num_hidden = 128 # hidden layer num of features
num_classes = 10 # MNIST total classes (0-9 digits)
num_layers = 1 # the number of LSTM's layers
use_cudnn = True

## Tensorflow Graph

In [0]:
tf.reset_default_graph()

### Placeholder

In [0]:
# tf Graph input
X = tf.placeholder("float", [None, timesteps, num_input])
Y = tf.placeholder("float", [None, num_classes])

### RNN

In [0]:
if use_cudnn:
    gru = cudnn_rnn.CudnnGRU(num_layers, num_hidden, kernel_initializer=tf.orthogonal_initializer())
    outputs, _ = gru(tf.transpose(X, (1, 0, 2)))

else:
    assert num_layers > 0
    if num_layers == 1:
        # Define a lstm cell with tensorflow
        lstm_cell = rnn.LSTMBlockCell(num_hidden, forget_bias=1.0)
    else:
        cells = [rnn.LSTMBlockCell(num_hidden, forget_bias=1.0) for _ in range (num_layers)]
        lstm_cell = rnn.MultiRNNCell(cells)

    # Get lstm cell output
    # time_major=True --> inputs_shape : [timesteps, batch_size, num_hidden]
    # time_major=False --> inputs_shape : [batch_size, timesteps, num_hidden]
    outputs, states = tf.nn.dynamic_rnn(cell=lstm_cell, inputs=X, time_major=False, dtype=tf.float32)

### Fully-connected layer

In [0]:
# Linear activation, using rnn inner loop last output
# outputs : [batch_size, timesteps, num_hidden] or [timesteps, batch_size, num_hidden]
# logits : [batch_size, num_classes]
if use_cudnn:
    logits = tf.layers.dense(tf.layers.batch_normalization(outputs[-1, :, :]), 
                             num_classes, 
                             kernel_initializer=tf.orthogonal_initializer())
else:
    logits = tf.layers.dense(tf.layers.batch_normalization(outputs[:, -1, :]), 
                             num_classes,
                             kernel_initializer=tf.orthogonal_initializer())

### Define Loss, Optimizer, and Evaluation Metrics

In [0]:
with tf.variable_scope('optimizer'):
    # Define loss and optimizer
    loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(
        logits=logits, labels=Y))
    # Gradient clipping to avoid exploding gradients
    tvars = tf.trainable_variables()
    grads, _ = tf.clip_by_global_norm(tf.gradients(loss_op, tvars), 1.)
    optimizer = tf.train.AdamOptimizer(learning_rate)
    train_op = optimizer.apply_gradients(zip(grads, tvars))

with tf.variable_scope('evaluate'):
    # Evaluate model (with test logits, for dropout to be disabled)
    prediction = tf.nn.softmax(logits)
    correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

## Train RNN


In [0]:
# Start training
with tf.Session() as sess:

    # Run the initializer
    sess.run(tf.global_variables_initializer())
    start_time = time.time()
    for step in range(1, training_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        batch_x = batch_x.reshape((batch_size, timesteps, num_input))

        if step == 1:
            plt.imshow(batch_x[0].reshape(28, 28));
            plt.show()
        # Run optimization op (backprop)
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
        if step % display_step == 0:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x,
                                                                 Y: batch_y})
            print("Step " + str(step) + ", Minibatch Loss= " + \
                "{:.4f}".format(loss) + ", Training Accuracy= " + \
                "{:.3f}".format(acc) + ", {} steps Elapsed time = {:.2f}".format(display_step, time.time() - start_time))
            start_time = time.time()

    print("Optimization Finished!")

    
    # Calculate accuracy for 128 mnist test images
    test_len = 128
    test_data = mnist.test.images[:test_len].reshape((-1, timesteps, num_input))


    test_label = mnist.test.labels[:test_len]
    print("Testing Accuracy:", \
          sess.run(accuracy, feed_dict={X: test_data, Y: test_label}))