In this project, I will be fitting a neural network to the MNIST data set using Tensor Flow. The various network structures, activation functions, optimization methods, and/or hyperparameter settings will be tested. Classification performance accuracy and processing time will be assessed using the Benchmark Experiment design. Only the train and test dataset and 2x2 design (experiments) will be used for the experiment.

The model having greatest Train and Test set Accuracy is 5-Layer NN having 20 nodes and ELU activation function.
The model having lease Runtime is 2-Layer NN having 10 nodes and Tanh activation function.

The models having average Train and Test set Accuracy and Runtime are:

1. 5-Layer NN having 20 nodes and Tanh activation function
2. 5-Layer NN having 10 nodes and ELU activation function
3. 2-Layer NN having 20 nodes and Tanh activation function
4. 2-Layer NN having 10 nodes and ELU activation function

If performance is a priority then I will recommend using 5-Layer NN having 20 nodes and ELU activation function as it has better Train and Test Accuracy. However, the 2-Layer NN having 20 nodes and Tanh activation function has a good Accuracy and Runtime tradeoff. So if timing needs to be considered as well then this Model can be used.

## Importing Train and Test files for MNIST dataset

In [1]:
# ensure common functions across Python 2 and 3
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

# import base packages into the namespace for this program
import warnings
import numpy as np
import os
import sys
import time

import pandas as pd

#from astropy.table import Table
#from tabulate import tabulate

# Use to build neural network
import tensorflow as tf

# Stores MNIST dataset
from tensorflow.examples.tutorials.mnist import input_data

# To remove warnings from code
tf.logging.set_verbosity(tf.logging.ERROR)

from six.moves.urllib.request import urlretrieve

SOURCE_URL = 'https://storage.googleapis.com/cvdf-datasets/mnist/'
WORK_DIRECTORY = "./mnist-data"

import gzip, binascii, struct

In [2]:
def maybe_download(filename):
    """A helper to download the data files if not present."""
    if not os.path.exists(WORK_DIRECTORY):
        os.mkdir(WORK_DIRECTORY)
    filepath = os.path.join(WORK_DIRECTORY, filename)
    if not os.path.exists(filepath):
        filepath, _ = urlretrieve(SOURCE_URL + filename, filepath)
        statinfo = os.stat(filepath)
        print('Successfully downloaded', filename, statinfo.st_size, 'bytes.')
    else:
        print('Already downloaded', filename)
    return filepath

train_data_filename = maybe_download('train-images-idx3-ubyte.gz')
train_labels_filename = maybe_download('train-labels-idx1-ubyte.gz')
test_data_filename = maybe_download('t10k-images-idx3-ubyte.gz')
test_labels_filename = maybe_download('t10k-labels-idx1-ubyte.gz')

Already downloaded train-images-idx3-ubyte.gz
Already downloaded train-labels-idx1-ubyte.gz
Already downloaded t10k-images-idx3-ubyte.gz
Already downloaded t10k-labels-idx1-ubyte.gz


## Unzipping the Train and Test files to extract the Train and Test dataset

In [3]:
IMAGE_SIZE = 28
PIXEL_DEPTH = 255

def extract_data(filename, num_images):
    """Extract the images into a 4D tensor [image index, y, x, channels].
  
    For MNIST data, the number of channels is always 1.

    Values are rescaled from [0, 255] down to [-0.5, 0.5].
    """
    print('Extracting', filename)
    with gzip.open(filename) as bytestream:
        # Skip the magic number and dimensions; we know these values.
        bytestream.read(16)

        buf = bytestream.read(IMAGE_SIZE * IMAGE_SIZE * num_images)
        data = np.frombuffer(buf, dtype=np.uint8).astype(np.float32)
        data = (data - (PIXEL_DEPTH / 2.0)) / PIXEL_DEPTH
        data = data.reshape(num_images, IMAGE_SIZE, IMAGE_SIZE, 1)
        return data

train_data = extract_data(train_data_filename, 60000)
test_data = extract_data(test_data_filename, 10000)

Extracting ./mnist-data\train-images-idx3-ubyte.gz
Extracting ./mnist-data\t10k-images-idx3-ubyte.gz


In [4]:
NUM_LABELS = 10

def extract_labels(filename, num_images):
    """Extract the labels into a 1-hot matrix [image index, label index]."""
    print('Extracting', filename)
    with gzip.open(filename) as bytestream:
        # Skip the magic number and count; we know these values.
        bytestream.read(8)
        buf = bytestream.read(1 * num_images)
        labels = np.frombuffer(buf, dtype=np.uint8)
    # Convert to dense 1-hot representation.
    return (np.arange(NUM_LABELS) == labels[:, None]).astype(np.float32)

train_labels = extract_labels(train_labels_filename, 60000)
test_labels = extract_labels(test_labels_filename, 10000)

Extracting ./mnist-data\train-labels-idx1-ubyte.gz
Extracting ./mnist-data\t10k-labels-idx1-ubyte.gz


In [5]:
# user-defined function to convert binary digits to digits 0-9
def label_transform(y_in):
    for i in range(len(y_in)):
        if (y_in[i] == 1): return i

y_train = []    
for j in range(train_labels.shape[0]):
    y_train.append(label_transform(train_labels[j,]))  
y_train = np.asarray(y_train)    

y_test = []    
for j in range(test_labels.shape[0]):
    y_test.append(label_transform(test_labels[j,]))  
y_test = np.asarray(y_test)    
    
# 28x28 matrix of entries converted to vector of 784 entries    
X_train = train_data.reshape(60000, 784)
X_test = test_data.reshape(10000, 784)

# check data intended for Scikit Learn input
print('\nX_train object:', type(X_train), X_train.shape)    
print('\ny_train object:', type(y_train),  y_train.shape)  
print('\nX_test object:', type(X_test),  X_test.shape)  
print('\ny_test object:', type(y_test),  y_test.shape)


X_train object: <class 'numpy.ndarray'> (60000, 784)

y_train object: <class 'numpy.ndarray'> (60000,)

X_test object: <class 'numpy.ndarray'> (10000, 784)

y_test object: <class 'numpy.ndarray'> (10000,)


In [6]:
N_INPUTS = 28*28  # MNIST dataset features
N_OUTPUTS = 10    # Categories (number of digits)
  
tf.set_random_seed(111)

# optimizer to minimize the cost function
learning_rate = 0.01

In [7]:
# reset graph to make output stable across runs
def reset_graph(seed=111):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

## Model

#### Two layer Neural Network

In [8]:
def two_layer_NN(n_hidden1, n_hidden2, activate):
    
    # reset graph
    reset_graph()
    
    tf.set_random_seed(111)
    # setup placeholder nodes to represent the training data and targets
    X = tf.placeholder(tf.float32, shape=(None, N_INPUTS), name="X")
    y = tf.placeholder(tf.int64, shape=(None), name="y") 
    
    # Create Neural Network
    with tf.name_scope("dnn"):
        hidden1 = tf.layers.dense(X, n_hidden1, name="hidden1",
                                activation=activate)
        hidden2 = tf.layers.dense(hidden1, n_hidden2, name="hidden2",
                                activation=activate)
        logits = tf.layers.dense(hidden2, N_OUTPUTS, name="outputs") 
    
    # Cost function used to train Neural Network
    with tf.name_scope("loss"):
        xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, 
                                                                logits=logits)
        loss = tf.reduce_mean(xentropy, name="loss")
    
    with tf.name_scope("train"):
        optimizer = tf.train.GradientDescentOptimizer(learning_rate)
        training_op = optimizer.minimize(loss)
        
    # Measure classification performance
    with tf.name_scope("eval"):
        correct = tf.nn.in_top_k(logits, y, 1)
        accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))  
    
    # Node to initialize all the variables
    init = tf.global_variables_initializer()
    
    # Create saver object to save traned model parameters to disk
    saver = tf.train.Saver(save_relative_paths=True) 

    # Execute Model 
    
    # Train model 
    n_epochs = 20
    batch_size = 50
    
    # Start clock to time training time for NN
    start_time = time.clock()
    
    # The next_batch function is doing random sampling 
    with tf.Session() as sess:
        init.run()
        for epoch in range(n_epochs):
            rnd_idx = np.random.permutation(len(X_train))
            n_batches = X_train.shape[0] // batch_size
            for iteration in np.array_split(rnd_idx, n_batches):
                X_batch, y_batch = X_train[iteration], y_train[iteration]
                sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
            acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
    
        save_path = saver.save(sess, './model_2_final.ckpt')
    
    # restoring graph parameters and run against holdout data      
    with tf.Session() as sess:
        saver.restore(sess, save_path)
        accuracy = accuracy.eval(feed_dict={X: X_test, 
                                        y: y_test})
    
    # Start clock to time training time for NN
    stop_time = time.clock()
    
    #Total Time
    runtime = stop_time - start_time 
    return n_hidden1, n_hidden2, str(activate), acc_train, runtime, accuracy

#### Five Layer Neural Network

In [9]:
def five_layer_NN(n_hidden1, n_hidden2, n_hidden3, n_hidden4,
                    n_hidden5, activate):
    
    # reset graph
    reset_graph()
    
    tf.set_random_seed(111)
    
    # setup placeholder nodes to represent the training data and targets
    X = tf.placeholder(tf.float32, shape=(None, N_INPUTS), name="X")
    y = tf.placeholder(tf.int64, shape=(None), name="y") 
    
    # Create Neural Network
    with tf.name_scope("dnn"):
        hidden1 = tf.layers.dense(X, n_hidden1, name="hidden1",
                                activation=activate)
        hidden2 = tf.layers.dense(hidden1, n_hidden2, name="hidden2",
                                activation=activate)
        hidden3 = tf.layers.dense(hidden2, n_hidden3, name="hidden3",
                                activation=activate)
        hidden4 = tf.layers.dense(hidden3, n_hidden4, name="hidden4",
                                activation=activate)
        hidden5 = tf.layers.dense(hidden4, n_hidden5, name="hidden5",
                                activation=activate)
        logits = tf.layers.dense(hidden5, N_OUTPUTS, name="outputs") 
    
    # Cost function used to train Neural Network
    with tf.name_scope("loss"):
        xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, 
                                                                logits=logits)
        loss = tf.reduce_mean(xentropy, name="loss")
    
    with tf.name_scope("train"):
        optimizer = tf.train.GradientDescentOptimizer(learning_rate)
        training_op = optimizer.minimize(loss)
        
    # Measure classification performance
    with tf.name_scope("eval"):
        correct = tf.nn.in_top_k(logits, y, 1)
        accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))  
    
    # Node to initialize all the variables
    init = tf.global_variables_initializer()
    
    # Create saver object to save traned model parameters to disk
    saver = tf.train.Saver(save_relative_paths=True) 

    # Execute Model 
    
    # Train model 
    n_epochs = 20
    batch_size = 50
    
    # Start clock to time training time for NN
    start_time = time.clock()
    
    # The next_batch function is essentially doing random sampling so the results
    # won't be consistent from run to run
    with tf.Session() as sess:
        init.run()
        for epoch in range(n_epochs):
            rnd_idx = np.random.permutation(len(X_train))
            n_batches = X_train.shape[0] // batch_size
            for iteration in np.array_split(rnd_idx, n_batches):
                X_batch, y_batch = X_train[iteration], y_train[iteration]
                sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
            acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
    
        save_path = saver.save(sess, './model_5_final.ckpt')
    
    # Now restore graph parameters and run against holdout data for final score      
    with tf.Session() as sess:
        saver.restore(sess, save_path)
        accuracy = accuracy.eval(feed_dict={X: X_test, 
                                        y: y_test})
    
    # Start clock to time training time for NN
    stop_time = time.clock()
    
    #Total Time
    runtime = stop_time - start_time  
    
    return n_hidden1, n_hidden2, n_hidden3, n_hidden4, n_hidden5,\
                str(activate), acc_train, runtime, accuracy

## Experiment

#### Experimenting with 2-layer NN

In [14]:
#different node configerations were passed to the function with 
# different activation functions
nn_layer_2 = [[10,10, tf.nn.relu], [20,20, tf.nn.relu],
              [10,10, tf.nn.tanh], [20,20, tf.nn.tanh], 
              [10,10, tf.nn.elu], [20,20, tf.nn.elu]]

exp_result_2 = []
for m in nn_layer_2 :
    exp_result_2.append(two_layer_NN(m[0],m[1], m[2]))
    
names=['hidden1','hidden2', 'activation',
        'Train Accuracy','Runtime','Test Accuracy']
result_df_2 = pd.DataFrame(exp_result_2)
result_df_2.columns=names
result_df_2['layers']= 2

#### Experimenting with 5-layer NN

In [15]:
#different node configerations were passed to the function with 
#different activation functions
nn_layer_5 = [[10,10,10,10,10,tf.nn.relu], [20,20,20,20,20,tf.nn.relu],
              [10,10,10,10,10,tf.nn.tanh], [20,20,20,20,20,tf.nn.tanh], 
              [10,10,10,10,10,tf.nn.elu], [20,20,20,20,20,tf.nn.elu]]

exp_result_5 = []
for m in nn_layer_5 :
    exp_result_5.append(five_layer_NN(m[0],m[1], m[2], m[3], m[4], m[5]))

names5=['hidden1','hidden2','hidden3','hidden4', 'hidden5', 'activation',
        'Train Accuracy','Runtime','Test Accuracy']
result_df_5 = pd.DataFrame(exp_result_5)
result_df_5.columns=names5
result_df_5['layers']= 5

#### 2-Layer NN Experiment Results

In [16]:
#ordered based on accuracy and runtime
result_df_2.sort_values(['Test Accuracy', 'Runtime'], ascending=[0, 1])

Unnamed: 0,hidden1,hidden2,activation,Train Accuracy,Runtime,Test Accuracy,layers
5,20,20,<function elu at 0x0000017F5C2A0048>,0.92,15.994504,0.9466,2
1,20,20,<function relu at 0x0000017F5C2AF598>,0.92,15.86086,0.9459,2
3,20,20,<function tanh at 0x0000017F5C1350D0>,0.98,15.870896,0.9442,2
4,10,10,<function elu at 0x0000017F5C2A0048>,0.94,14.503777,0.932,2
0,10,10,<function relu at 0x0000017F5C2AF598>,0.94,15.677235,0.9255,2
2,10,10,<function tanh at 0x0000017F5C1350D0>,0.92,14.536947,0.9237,2


#### 5-Layer NN Experiment Results

In [17]:
#ordered based on accuracy and runtime
result_df_5.sort_values(['Test Accuracy', 'Runtime'], ascending=[0, 1])

Unnamed: 0,hidden1,hidden2,hidden3,hidden4,hidden5,activation,Train Accuracy,Runtime,Test Accuracy,layers
5,20,20,20,20,20,<function elu at 0x0000017F5C2A0048>,0.98,19.796811,0.9543,5
1,20,20,20,20,20,<function relu at 0x0000017F5C2AF598>,0.96,18.012435,0.9534,5
3,20,20,20,20,20,<function tanh at 0x0000017F5C1350D0>,0.98,17.551077,0.9472,5
4,10,10,10,10,10,<function elu at 0x0000017F5C2A0048>,0.98,15.977314,0.9386,5
2,10,10,10,10,10,<function tanh at 0x0000017F5C1350D0>,0.96,15.844253,0.9302,5
0,10,10,10,10,10,<function relu at 0x0000017F5C2AF598>,0.92,16.029119,0.9202,5
