# RNN for MNIST with TensorFlow and Keras
---

# Overview Notes

In problems involving ordered sequences of data, such as **time series Forecasting** and **natural language processing**, the context is very valuable to predict the output. The context for such problems can be determined by ingesting the whole sequence, not just one last data point. Thus, the previous output becomes part of the current input, and when repeated, the last output turns out to be the results of all the previous inputs along with the last input.

**Recurrent Neural Network (RNN)** architecture is a solution for handling machine learning problems that involve sequences. RNN is a specialized neural network architecture for handling sequential data. The sequential data could be the sequence of observations over a period of time, as in time series data, or sequence of characters, words, and sentences, as in textual data.

One of the assumptions for the *standard* neural network is that the input data is arranged in a way that *one input has no dependency on another*. However, for time series data and textual data, this assumption **does not hold true**, since the values appearing later in the sequence are often influenced by the values that appeared before.

In order to achieve that, RNN extends the standard neural networks in the following ways:
* RNN adds the ability to use the output of one layer as an input to the same or previous layer, by adding *loops or cycles* in the computation graph.
* RNN adds **the memory unit** to store previous inputs and outputs that can be used in the current computation.

In this chapter, we cover the following topics to learn about RNN:
* Simple Recurrent Neural Networks

* RNN variants
    * Long Short-Term Memory networks (LSTM)
    * Gated Recurrent Unit networks (GRU)


* TensorFlow for RNN
* Keras for RNN
* RNN in Keras for MNIST data

#### **\[📚\] For detailed discussion and computational graphs of RNN, LSTM, GRU, see Book, Chapter 6.**

### Workflow:
The basic workflow for creating RNN models in low-level TensorFlow library is almost the same as MLP:
1. First create the input and output placeholders of shape `(None, #TimeSteps, #Features)` or `(BatchSize, #TimeSteps, #Features)`
2. From the input placeholder, create a list of length `#TimeSteps`, containing Tensors of Shape `(None, #Features)` or `(Batch Size, #Features)`
3. Create a cell of the desired RNN type from the `tf.rnn.rnn_cell` module
4. Use the cell and the input tensor list created previously to create a static or dynamic RNN
5. Create the output weights and bias variables, and define the loss and optimizer functions
6. For the required number of epochs, train the model using the loss and optimizer functions

This **basic workflow** would be demonstrated with the example code in the **next chapter**.

#### **\[📚\] For details on building-block classes available in TF1 for various RNNa see Book, Chapter 6.**

In general, there are **3 areas of interest**:
* TensorFlow **RNN Cell** Classes: 
    * `tf.nn.rnn_cell`
    * `tf.contrib.rnn` (in the zombie contrib module)
* TensorFlow **RNN Model Construction** Classes:
    * The **static** RNN classes add unrolled cells for time steps **at the compile time**, 
    * while **dynamic** RNN classes add unrolled cells for time steps **at the run time**.
    * These are found:
        * `tf.nn.static_rnn`
        * `tf.nn.static_state_saving_rnn`
        * `tf.nn.static_bidirectional_rnn`
        * `tf.nn.dynamic_rnn`
        * `tf.nn.bidirectional_dynamic_rnn`
        * `tf.nn.raw_rnn`
        * `tf.contrib.rnn.stack_bidirectional_dynamic_rnn`
* TensorFlow RNN Cell **Wrapper** Classes:
    * These **wrap other *cell* classes**.
    * Found at:
        * `tf.contrib.rnn.LSTMBlockWrapper`
        * `tf.contrib.rnn.DropoutWrapper`
        * `tf.contrib.rnn.EmbeddingWrapper`
        * `tf.contrib.rnn.InputProjectionWrapper`
        * `tf.contrib.rnn.OutputProjectionWrapper`
        * `tf.contrib.rnn.DeviceWrapper`
        * `tf.contrib.rnn.ResidualWrapper`
        
### Keras for RNN

* Keras offers both *functional* and *sequential* API for creating the recurrent networks. 
* To build the RNN model, you have to add layers from the `kera.layers.recurrent` module. 
* Keras provides the following kinds of recurrent layers in the `keras.layers.recurrent` module:
    * SimpleRNN
    * LSTM
    * GRU


**Stateful Models (Keras)**

Keras recurrent layers also support RNN models that **save state *between the batches***. 

You can create a stateful RNN, LSTM, or GRU model by passing `stateful` parameters as `True`. For stateful models, the batch size specified for the inputs *has to be a fixed value*. In stateful models, the hidden state learnt from training a batch is *reused for the next batch*. 

If you want to reset the memory at some point during training, it can be done with extra code by calling the `model.reset_states()` or `layer.reset_states()` functions.

---
# Implementation

Although RNN is mostly used for sequence data, it can also be used for image data. 

We know that images have minimum two dimensions - height and width. Now think of one of the dimensions as time steps, and other as features. 

For MNIST, the image size is 28 x 28 pixels, thus **we can think of an MNIST image as having 28 time steps with 28 features in each timestep**.

In [2]:
import os

import numpy as np
np.random.seed(123)
print("NumPy:{}".format(np.__version__))

import tensorflow as tf
tf.set_random_seed(123)
print("TensorFlow:{}".format(tf.__version__))

import tensorflow.keras as keras
print("Keras:{}".format(keras.__version__))

NumPy:1.18.5
TensorFlow:1.15.5
Keras:2.2.4-tf


In [3]:
DATASETSLIB_HOME = '../datasetslib'

import sys
if not DATASETSLIB_HOME in sys.path:
    sys.path.append(DATASETSLIB_HOME)

%reload_ext autoreload
%autoreload 2

import datasetslib

datasetslib.datasets_root = os.path.join('../datasets')

## Get the MNIST data

In [4]:
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets(
    os.path.join(datasetslib.datasets_root, 'mnist'), 
    one_hot=True
)

X_train = mnist.train.images
X_test = mnist.test.images
Y_train = mnist.train.labels
Y_test = mnist.test.labels

n_classes = 10

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting ../datasets/mnist/train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting ../datasets/mnist/train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Extracting ../datasets/mnist/t10k-images-idx3-ubyte.gz
Extracting ../datasets/mnist/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


In [7]:
print("type(X_train):", type(X_train))
print("X_train.shape:", X_train.shape)
print("type(Y_train):", type(Y_train))
print("Y_train.shape:", Y_train.shape)

type(X_train): <class 'numpy.ndarray'>
X_train.shape: (55000, 784)
type(Y_train): <class 'numpy.ndarray'>
Y_train.shape: (55000, 10)


## Preprocess for RNN

In [8]:
X_train = X_train.reshape(-1, 28, 28)
X_test = X_test.reshape(-1, 28, 28)
print("X_train.shape:", X_train.shape)

X_train.shape: (55000, 28, 28)


## RNN With `Keras` for MNIST Data

In [11]:
import tensorflow.keras as keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.layers import SimpleRNN    # Note the importing of SimpleRNN. ⚠️ `keras.layers.recurrent` doesn't exist anymore, just `keras.layers` now.
from tensorflow.keras.optimizers import RMSprop  # Note the use of RMSprop optimizer

In [12]:
tf.reset_default_graph()
keras.backend.clear_session()

In [14]:
# Create and fit the SimpleRNN model.
model = Sequential()
model.add(SimpleRNN(units=16, activation='relu', input_shape=(28,28)))
# ^ Note:
# From https://keras.io/api/layers/recurrent_layers/simple_rnn/#simplernn-layer
# inputs: A 3D tensor, with shape [batch, timesteps, feature].
model.add(Dense(n_classes))
# ^ presumably this is applied to the RNN output from the final step. 
model.add(Activation('softmax'))

Instructions for updating:
If using Keras pass *_constraint arguments to layers.


In [15]:
model.compile(
    loss='categorical_crossentropy',
    optimizer=RMSprop(lr=0.01),
    metrics=['accuracy']
)

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
simple_rnn (SimpleRNN)       (None, 16)                720       
_________________________________________________________________
dense (Dense)                (None, 10)                170       
_________________________________________________________________
activation (Activation)      (None, 10)                0         
Total params: 890
Trainable params: 890
Non-trainable params: 0
_________________________________________________________________


In [16]:
model.fit(
    X_train, 
    Y_train,
    batch_size=100,
    epochs=20
)

score = model.evaluate(X_test, Y_test)
print('\nTest loss:', score[0])
print('Test accuracy:', score[1])

Train on 55000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20

Test loss: 0.5015630922555924
Test accuracy: 0.8495
