<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#MNIST-classification-with-a-Long-Short-Term-Memory-(LSTM)-network" data-toc-modified-id="MNIST-classification-with-a-Long-Short-Term-Memory-(LSTM)-network-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>MNIST classification with a Long Short Term Memory (LSTM) network</a></span></li><li><span><a href="#Purpose" data-toc-modified-id="Purpose-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Purpose</a></span></li><li><span><a href="#Load-libraries-and-data" data-toc-modified-id="Load-libraries-and-data-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Load libraries and data</a></span></li><li><span><a href="#Init-vars" data-toc-modified-id="Init-vars-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Init vars</a></span></li><li><span><a href="#Build-the-computational-graph" data-toc-modified-id="Build-the-computational-graph-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Build the computational graph</a></span><ul class="toc-item"><li><span><a href="#Static-vs-Dynamic-TensorFlow-RNNs" data-toc-modified-id="Static-vs-Dynamic-TensorFlow-RNNs-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Static vs Dynamic TensorFlow RNNs</a></span></li><li><span><a href="#LSTM-v1" data-toc-modified-id="LSTM-v1-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>LSTM v1</a></span></li><li><span><a href="#LSTM-v2" data-toc-modified-id="LSTM-v2-5.3"><span class="toc-item-num">5.3&nbsp;&nbsp;</span>LSTM v2</a></span></li><li><span><a href="#LSTM-v3" data-toc-modified-id="LSTM-v3-5.4"><span class="toc-item-num">5.4&nbsp;&nbsp;</span>LSTM v3</a></span></li></ul></li><li><span><a href="#Summary" data-toc-modified-id="Summary-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Summary</a></span></li></ul></div>

<h1>MNIST classification with a Long Short Term Memory (LSTM) network</h1>

<img style="float: left; margin-right: 15px; width: 30%; height: 30%;" src="images/mnist-image.png" />

# Purpose

The purpose of this write-up is create a predictive MNIST data set classification model utilizing a Long Short Term Memory (LSTM) network written in TensorFlow.    

Goals include:
* Build and train three different LSTM predictive classification models with differing architectures
* Collect metrics and graph each model's performance in TensorBoard 
* Make predictions with the training model on the test data set and examine accuracy

Dataset source:  [The MNIST Database](http://yann.lecun.com/exdb/mnist/)

# Load libraries and data

In [4]:
import os
import numpy as np
import tensorflow as tf
from tensorflow.contrib import rnn

from tensorflow.examples.tutorials.mnist import input_data

from functools import partial

In [5]:
def resetGraph(seed= 10):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)
    accTrain = accValidation = accTest = None

In [6]:
def cleanLogs():
    os.system('rm -rf ./logs/mnistLSTM/')

# Init vars

The LSTM wants inputs of shape `[samples, timeSteps, features]`, and we have several thousand MNIST images of size 28 x 28 pixels.  

One way to think of this is a complete image is comprised of 28 rows of 28 pixels each.  If we were to step through the rows one by one and stack them up then the image would be more and more complete as time went by.  So our units of "time" will be the rows stacking together to create a complete image, and the number of features will be the number of pixels in the image row at that step in time (i.e. 28).  This gives us:

* samples     = number of observations (i.e. number of images in the mini batch)
* timeSteps   = number of rows we need to step through/stack up to make a complete image
* features    = the number of features in each row we are stepping through (i.e. also 28)

Additionally, we only care about the final output of the LSTM network which should give us the prediction of which numeral the image represents.  Other LSTM networks do care about the outputs of each LSTM cell (translating each word in a sentence for example), but that doesn't apply in our case.

Having said this we can continue with initializing the various variables we'll need:

In [9]:
# Setup vars for the MINST data set
timeSteps = 28
features = 28

lstmUnits = 128
lr = 0.001
epochs = 10
samples = 50

classes = 10

# Allow results to be reproduced
seed = 10

# Notice we are pulling in the labels as one hot encodings!
mninst = input_data.read_data_sets("./datasets/mnist", one_hot = True)

# For use when we create the LSTM network below
testShape = mninst.test.images.shape

# Note the one hot encoding on the label:
print("\n", "Example label: ", mninst.test.labels[0])

Extracting ./datasets/mnist\train-images-idx3-ubyte.gz
Extracting ./datasets/mnist\train-labels-idx1-ubyte.gz
Extracting ./datasets/mnist\t10k-images-idx3-ubyte.gz
Extracting ./datasets/mnist\t10k-labels-idx1-ubyte.gz

 Example label:  [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]


# Build the computational graph

## Static vs Dynamic TensorFlow RNNs

> Dynamic RNN's allow for variable sequence lengths. You might have an input shape (batch_size, max_sequence_length), but this will allow you to run the RNN for the correct number of time steps on those sequences that are shorter than max_sequence_length.
 
> In contrast, there are static RNNs, which expect to run the entire fixed RNN length. There are cases where you might prefer to do this, such as if you are padding your inputs to max_sequence_length anyway.

>In short, dynamic_rnn is usually what you want for variable length sequential data. It has a sequence_length parameter, and it is your friend.  [Source](https://stackoverflow.com/questions/43100981/what-is-a-dynamic-rnn-in-tensorflow)


## LSTM v1

* Utilize f.contrib.rnn.BasicLSTMCell
* Utilize tf.contrib.rnn.static_rnn
* Manual weight and bias definitions with tf.random_normal for initialization
* Track training and validiation loss and accuracy in TensorBoard

In [84]:
# Reset the TF CG
resetGraph()

# Clean away any old log files
cleanLogs()

# Set the seed
tf.set_random_seed(seed)

# Set the TB logdir - We want two log dirs since we are going to be plotting two values on the same plot
logDirTrain = './logs/mnistLSTM/runOne/train'
logDirValidation = './logs/mnistLSTM/runOne/validation'


# Create place holders
x = tf.placeholder(tf.float32, shape = [None, timeSteps, features], name = 'x')
# Give 2nd dimension arg to shape since we are using one hot encodings
y = tf.placeholder(tf.int64, shape = [None, classes], name = 'y')

# Create weights and bias tensors
with tf.name_scope("weightBias"):
    w = tf.Variable(tf.random_normal([lstmUnits, classes]))
    b = tf.Variable(tf.random_normal([classes]))


# Add the LSTM cells
with tf.name_scope("LSTM"):
    
    # Later in the code we'll make a call to tf.contrib.rnn.static_rnn
    # tf.contrib.rnn.static_rnn expects a length T list of inputs, each a Tensor of shape [batch_size, input_size]
    # So we need to convert our inputs of shape [batchSize, timeSteps, numberOfInputs] to [batch_size, input_size]
    #
    # https://www.tensorflow.org/versions/r1.1/api_docs/python/tf/contrib/rnn/static_rnn
    
    # https://www.tensorflow.org/api_docs/python/tf/unstack
    inputs = tf.unstack(x, num = timeSteps, axis = 1)
    
    # Create the basic LSTM cell
    # It does not allow cell clipping, a projection layer, and does not use peep-hole connections: it is the basic baseline.
    # https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/BasicLSTMCell
    cell = tf.contrib.rnn.BasicLSTMCell(lstmUnits)
    
    # Add the cell to the RNN
    # https://www.tensorflow.org/versions/r1.1/api_docs/python/tf/contrib/rnn/static_rnn
    output, state = tf.contrib.rnn.static_rnn(cell, inputs, dtype = tf.float32)
    
    # We only care about the final output which should be the model's prediction
    yH = tf.matmul(output[-1], w) + b
    
# Add loss function
with tf.name_scope("loss"):
    # We don't use "tf.nn.sparse_softmax_cross_entropy_with_logits" here since we have one hot encodings
    entropy = tf.nn.softmax_cross_entropy_with_logits(logits = yH, labels = y)
    loss = tf.reduce_mean(entropy, name = "loss")
    # Capture loss
    tf.summary.scalar("loss", loss)
    
with tf.name_scope("optimizer"):
    opt = tf.train.AdamOptimizer(learning_rate = lr).minimize(loss)
    
# Eval the model's accuracy
with tf.name_scope("eval"):
    # We don't use "tf.nn.in_top_k(yH, y, 1)" here since are aren't using "tf.nn.sparse_softmax_cross_entropy_with_logits"
    correct = tf.equal(tf.argmax(yH, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
    # Capture accuracy
    tf.summary.scalar("accuracy", accuracy)

init = tf.global_variables_initializer()

In [85]:
# Execute the TF CG
counter = 0

with tf.Session() as sess:
    init.run()
    
    # Create the TB writer and init
    trainWriter = tf.summary.FileWriter(logDirTrain, sess.graph)
    validationWriter = tf.summary.FileWriter(logDirValidation)
    merge = tf.summary.merge_all()
    
    for e in range(epochs + 1):
        for i in range(mninst.train.num_examples // samples):      
            counter += 1     
            
            # Grab the next minibatch
            xBatch, yBatch = mninst.train.next_batch(samples)
            
            # Reshape x to [samples, timeSteps, features] for the LSTM:
            #   The image is given to us as a single vector of dimensionality 784
            #   So to use them we need to gather the number of rows together to be the timeSteps
            xBatch = xBatch.reshape(samples, timeSteps, features)
            
            # Train the model
            summary, _ = sess.run([merge, opt], feed_dict = {x: xBatch, y: yBatch})
            
            # Capture summary data every N steps
            if counter % 10 == 0:
                # Manually add to the train accuracy summary value
                summary, accTrain = sess.run([merge, accuracy], feed_dict = {x: xBatch, y: yBatch})
                trainWriter.add_summary(summary, counter) 
                
                # Manually add to the test accuracy summary value
                
                # If test accuracy calcs are causing speed issues you can reduce the number tested via the following:
                #summary, accValidation = sess.run([merge, accuracy], feed_dict = {
                #    x: mninst.validation.images[:450].reshape(-1, timeSteps, features), 
                #    y: mninst.validation.labels[:450]})
                summary, accValidation = sess.run([merge, accuracy], feed_dict = {
                    x: mninst.validation.images.reshape(-1, timeSteps, features), 
                    y: mninst.validation.labels})
                validationWriter.add_summary(summary, counter)
                
        if e % 10 == 0:
            print(e, "Train Acc: ", accTrain, "Validation Acc: ", accValidation)
        
    print(" ")
    # Compute test set accuracy rating
    summary, accTest = sess.run([merge, accuracy], feed_dict = {
                    x: mninst.test.images.reshape(-1, timeSteps, features), 
                    y: mninst.test.labels})
    print("FINAL :: ", "Train Acc: ", accTrain, "Validation Acc: ", accValidation, "Test Acc: ", accTest)

0 Train Acc:  0.96 Validation Acc:  0.9686
10 Train Acc:  1.0 Validation Acc:  0.9898
 
FINAL ::  Train Acc:  1.0 Validation Acc:  0.9898 Test Acc:  0.9882


<img style="float: left; margin-right: 15px;" src="images/mnist-run-one.png" />

Although a little slower than some other models we've looked at the LSTM has exellent accuracy on this problem.

## LSTM v2

* Enclose the CG architecture in a graph object; pass to the training session
* Utilize f.contrib.rnn.BasicLSTMCell
* Utilize tf.contrib.rnn.dynamic_rnn, so we don't need to unstack the 'x' tensor
* Remove manual weight and bias definitions and relace with a dense layer
* Utilize He initialization
* Track training and validiation loss and accuracy in TensorBoard

In [88]:
# Reset the TF CG
resetGraph()

# Clean away any old log files
cleanLogs()

# Set the seed
tf.set_random_seed(seed)

# Set the TB logdir - We want two log dirs since we are going to be plotting two values on the same plot
logDirTrain = './logs/mnistLSTM/runTwo/train'
logDirValidation = './logs/mnistLSTM/runTwo/validation'

# Create the graph object and populate it
graph = tf.Graph()

with graph.as_default():
    # Create place holders
    x = tf.placeholder(tf.float32, shape = [None, timeSteps, features], name = 'x')
    # Give 2nd dimension arg to shape since we are using one hot encodings
    y = tf.placeholder(tf.int64, shape = [None, classes], name = 'y')

    # Add the LSTM cells with He initialization (we'll let TF worry about the "w" and "b" values)
    # Notice to do this we switch from "tf.name_scope" to "tf.variable_scope" and add the "initializer" param
    with tf.variable_scope("LSTM", initializer = tf.variance_scaling_initializer()):

        # Create the basic LSTM cell
        # It does not allow cell clipping, a projection layer, and does not use peep-hole connections: it is the basic baseline.
        # https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/BasicLSTMCell
        cell = tf.contrib.rnn.BasicLSTMCell(lstmUnits)

        # Add the cell to the RNN
        # https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn
        # Notice we don't have to unstack x as in the previous model
        output, state = tf.nn.dynamic_rnn(cell, x, dtype = tf.float32)

        # We only care about the final output which should be the model's prediction
        yH = tf.layers.dense(state[-1], classes)

    # Add loss function
    with tf.name_scope("loss"):
        # We don't use "tf.nn.sparse_softmax_cross_entropy_with_logits" here since we have one hot encodings
        entropy = tf.nn.softmax_cross_entropy_with_logits(logits = yH, labels = y)
        loss = tf.reduce_mean(entropy, name = "loss")
        # Capture loss
        tf.summary.scalar("loss", loss)

    with tf.name_scope("optimizer"):
        opt = tf.train.AdamOptimizer(learning_rate = lr).minimize(loss)

    # Eval the model's accuracy
    with tf.name_scope("eval"):
        # We don't use "tf.nn.in_top_k(yH, y, 1)" here since are aren't using "tf.nn.sparse_softmax_cross_entropy_with_logits"
        correct = tf.equal(tf.argmax(yH, 1), tf.argmax(y, 1))
        accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
        # Capture accuracy
        tf.summary.scalar("accuracy", accuracy)

    init = tf.global_variables_initializer()

In [89]:
# Execute the TF CG
counter = 0

with tf.Session(graph = graph) as sess:
    init.run()
    
    # Create the TB writer and init
    trainWriter = tf.summary.FileWriter(logDirTrain, sess.graph)
    validationWriter = tf.summary.FileWriter(logDirValidation)
    merge = tf.summary.merge_all()
    
    for e in range(epochs + 1):
        for i in range(mninst.train.num_examples // samples):      
            counter += 1     
            
            # Grab the next minibatch
            xBatch, yBatch = mninst.train.next_batch(samples)
            
            # Reshape x to [samples, timeSteps, features] for the LSTM:
            #   The image is given to us as a single vector of dimensionality 784
            #   So to use them we need to gather the number of rows together to be the timeSteps
            xBatch = xBatch.reshape(samples, timeSteps, features)
            
            # Train the model
            summary, _ = sess.run([merge, opt], feed_dict = {x: xBatch, y: yBatch})
            
            # Capture summary data every N steps
            if counter % 10 == 0:
                # Manually add to the train accuracy summary value
                summary, accTrain = sess.run([merge, accuracy], feed_dict = {x: xBatch, y: yBatch})
                trainWriter.add_summary(summary, counter) 
                
                # Manually add to the test accuracy summary value
                
                # If test accuracy calcs are causing speed issues you can reduce the number tested via the following:
                #summary, accValidation = sess.run([merge, accuracy], feed_dict = {
                #    x: mninst.validation.images[:450].reshape(-1, timeSteps, features), 
                #    y: mninst.validation.labels[:450]})
                summary, accValidation = sess.run([merge, accuracy], feed_dict = {
                    x: mninst.validation.images.reshape(-1, timeSteps, features), 
                    y: mninst.validation.labels})
                validationWriter.add_summary(summary, counter)
                
        if e % 10 == 0:
            print(e, "Train Acc: ", accTrain, "Validation Acc: ", accValidation)
        
    print(" ")
    # Compute test set accuracy rating
    summary, accTest = sess.run([merge, accuracy], feed_dict = {
                    x: mninst.test.images.reshape(-1, timeSteps, features), 
                    y: mninst.test.labels})
    print("FINAL :: ", "Train Acc: ", accTrain, "Validation Acc: ", accValidation, "Test Acc: ", accTest)

0 Train Acc:  0.9 Validation Acc:  0.9606
10 Train Acc:  1.0 Validation Acc:  0.9884
 
FINAL ::  Train Acc:  1.0 Validation Acc:  0.9884 Test Acc:  0.9877


<img style="float: left; margin-right: 15px;" src="images/mnist-run-two.png" />

## LSTM v3

* Enclose the CG architecture in a graph object; pass to the training session
* Utilize tf.contrib.rnn.LSTMBlockCell
* Utilize tf.contrib.rnn.dynamic_rnn, so we don't need to unstack the 'x' tensor
* Apply [Batch normalization](https://www.tensorflow.org/api_docs/python/tf/layers/batch_normalization)
* Dense layer with He initialization for weights and biases
* Using [tf.nn.softmax_cross_entropy_with_logits_v2](https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits_v2) in the 'loss' calculations
* Perform gradient clipping via [tf.clip_by_global_norm](https://www.tensorflow.org/versions/r1.2/api_docs/python/tf/clip_by_global_norm) during optimization
* Track training and validiation loss and accuracy in TensorBoard

LSTM types and benchmarks:  https://returnn.readthedocs.io/en/latest/tf_lstm_benchmark.html

In [10]:
# Reset the TF CG
resetGraph()

# Clean away any old log files
cleanLogs()

# Set the seed
tf.set_random_seed(seed)

# Set the TB logdir - We want two log dirs since we are going to be plotting two values on the same plot
logDirTrain = './logs/mnistLSTM/runThree/train'
logDirValidation = './logs/mnistLSTM/runThree/validation'

# Create the graph object and populate it
graph = tf.Graph()

with graph.as_default():
    # We need a way to track if we are training or not for the gradient clipping
    isTraining = tf.placeholder_with_default(False, shape = (), name = 'isTraining')
    
    # Create place holders
    x = tf.placeholder(tf.float32, shape = [None, timeSteps, features], name = 'x')
    # Give 2nd dimension arg to shape since we are using one hot encodings
    y = tf.placeholder(tf.int64, shape = [None, classes], name = 'y')

    # Add the LSTM cells with He initialization (we'll let TF worry about the "w" and "b" values)
    # Notice to do this we switch from "tf.name_scope" to "tf.variable_scope" and add the "initializer" param
    with tf.variable_scope("LSTM", initializer = tf.variance_scaling_initializer()):

        # Create LSTMBlockCell which should be faster
        # https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/LSTMBlockCell
        # LSTM types and benchmarks:  https://returnn.readthedocs.io/en/latest/tf_lstm_benchmark.html
        cell = tf.contrib.rnn.LSTMBlockCell(lstmUnits)

        # Add the LSTMBlockCell to the RNN
        # https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn
        # Notice we don't have to unstack x as in the previous model
        output, state = tf.nn.dynamic_rnn(cell, x, dtype = tf.float32)
        
        # Return the last output for each sample and apply batch normalization
        # Ex: 
        #   x = np.arange(24)
        #   x = x.reshape((2,3,4))
        #   x
        #   >>>
        #   array([[[ 0,  1,  2,  3],
        #           [ 4,  5,  6,  7],
        #           [ 8,  9, 10, 11]],
        #   
        #          [[12, 13, 14, 15],
        #           [16, 17, 18, 19],
        #           [20, 21, 22, 23]]])
        #
        #   x[:,-1,:]
        #   >>>
        #   array([[ 8,  9, 10, 11],
        #          [20, 21, 22, 23]])
        #
        # Don't forget to enable/disable training!
        bnormOutput = tf.layers.batch_normalization(output[:, -1, :], training = isTraining)
        
        # Apply the dense layer to output prediction probabilities
        yH = tf.layers.dense(bnormOutput, classes)

    # Add loss function
    with tf.name_scope("loss"):
        # We don't use "tf.nn.sparse_softmax_cross_entropy_with_logits" here since we have one hot encodings
        entropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits = yH, labels = y)
        loss = tf.reduce_mean(entropy, name = "loss")
        # Capture loss
        tf.summary.scalar("loss", loss)

    with tf.name_scope("optimizer"):
        # Since we want to apply gradient clipping we need to compute the gradients,
        # process them, and then update the model's parameters by hand
        # https://stackoverflow.com/questions/36498127/how-to-apply-gradient-clipping-in-tensorflow
        # https://www.tensorflow.org/api_docs/python/tf/clip_by_global_norm
        _opt = tf.train.AdamOptimizer(learning_rate = lr)
        gvs = _opt.compute_gradients(loss)
        cappedGvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]
        opt = _opt.apply_gradients(cappedGvs)       

    # Eval the model's accuracy
    with tf.name_scope("eval"):
        # We don't use "tf.nn.in_top_k(yH, y, 1)" here since are aren't using "tf.nn.sparse_softmax_cross_entropy_with_logits"
        correct = tf.equal(tf.argmax(yH, 1), tf.argmax(y, 1))
        accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
        # Capture accuracy
        tf.summary.scalar("accuracy", accuracy)

    init = tf.global_variables_initializer()

In [11]:
# Execute the TF CG
counter = 0

with tf.Session(graph = graph) as sess:
    init.run()
    
    # Create the TB writer and init
    trainWriter = tf.summary.FileWriter(logDirTrain, sess.graph)
    validationWriter = tf.summary.FileWriter(logDirValidation)
    merge = tf.summary.merge_all()
    
    for e in range(epochs + 1):
        for i in range(mninst.train.num_examples // samples):      
            counter += 1     
            
            # Grab the next minibatch
            xBatch, yBatch = mninst.train.next_batch(samples)
            
            # Reshape x to [samples, timeSteps, features] for the LSTM:
            #   The image is given to us as a single vector of dimensionality 784
            #   So to use them we need to gather the number of rows together to be the timeSteps
            xBatch = xBatch.reshape(samples, timeSteps, features)
            
            # Train the model
            summary, _ = sess.run([merge, opt], feed_dict = {x: xBatch, y: yBatch})
            
            # Capture summary data every N steps
            if counter % 10 == 0:
                # Manually add to the train accuracy summary value
                summary, accTrain = sess.run([merge, accuracy], feed_dict = {x: xBatch, y: yBatch})
                trainWriter.add_summary(summary, counter) 
                
                # Manually add to the test accuracy summary value
                
                # If test accuracy calcs are causing speed issues you can reduce the number tested via the following:
                #summary, accValidation = sess.run([merge, accuracy], feed_dict = {
                #    x: mninst.validation.images[:450].reshape(-1, timeSteps, features), 
                #    y: mninst.validation.labels[:450]})
                summary, accValidation = sess.run([merge, accuracy], feed_dict = {
                    x: mninst.validation.images.reshape(-1, timeSteps, features), 
                    y: mninst.validation.labels})
                validationWriter.add_summary(summary, counter)
                
        if e % 10 == 0:
            print(e, "Train Acc: ", accTrain, "Validation Acc: ", accValidation)
        
    print(" ")
    # Compute test set accuracy rating
    summary, accTest = sess.run([merge, accuracy], feed_dict = {
                    x: mninst.test.images.reshape(-1, timeSteps, features), 
                    y: mninst.test.labels})
    print("FINAL :: ", "Train Acc: ", accTrain, "Validation Acc: ", accValidation, "Test Acc: ", accTest)

0 Train Acc:  0.98 Validation Acc:  0.9666
10 Train Acc:  1.0 Validation Acc:  0.9888
 
FINAL ::  Train Acc:  1.0 Validation Acc:  0.9888 Test Acc:  0.9875


<img style="float: left; margin-right: 15px;" src="images/mnist-run-three.png" />

# Summary

We developed, trained, and gathered metrics for three implementations of a MNIST LSTM classifier.  The three models had the following accuracy ratings on the train, validation, and test data sets:

|Model  |Train Accuracy |Validation Accuracy|Test Accuracy|
|-------|---------------|-------------------|-------------|
|LSTM v1|1.0            |0.9898             |0.9882       |
|LSTM v2|1.0            |0.9884             |0.9877       |
|LSTM v3|1.0            |0.9888             |0.9875       |

Each of the models had around the same performance, and the TensorBoard graphs were also almost identical.  Likely the choice of one of these models over other would come down to personal preference.  However, given a different data set this could certainly change.