# Convolutional Network for MNIST Dataset

Before starting the model we first needed to define some parameters and load the MNIST dataset. N_CLASSES is the number of classes in our dataset (10 digits). The other variables that must be initilialized are all vairables that can be tuned further to better fit our model. THe given values are ones that have been tested for >95% testing accuracy with only N_EPOCHS = 1. 

In [1]:
""" Using convolutional net on MNIST dataset of handwritten digit
(http://yann.lecun.com/exdb/mnist/)
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

import time

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

import utils



# Step 1: Read in data
# using TF Learn's built in function to load MNIST data to the folder data/mnist
mnist = input_data.read_data_sets("/data/mnist", one_hot=True)
N_CLASSES = 10
# Step 2: Define paramaters for the model
LEARNING_RATE = 0.0001
BATCH_SIZE = 50
SKIP_STEP = 100
DROPOUT = 0.5
N_EPOCHS = 1



Extracting /data/mnist/train-images-idx3-ubyte.gz
Extracting /data/mnist/train-labels-idx1-ubyte.gz
Extracting /data/mnist/t10k-images-idx3-ubyte.gz
Extracting /data/mnist/t10k-labels-idx1-ubyte.gz


Finally, we also need to create some variables for our model and our dataset. Each image in MNISt is represented by a 28x28 pixel, whihc will be represented by a 1x784 tensor in this model. 

We also initialize a dropout probablity which we will apply as a hidden layer in our model.

In [2]:
# Step 3: create placeholders for features and labels
# each image in the MNIST data is of shape 28*28 = 784
# therefore, each image is represented with a 1x784 tensor
# We'll be doing dropout for hidden layer so we'll need a placeholder
# for the dropout probability too
# Use None for shape so we can change the batch_size once we've built the graph
with tf.name_scope('data'):
    X = tf.placeholder(tf.float32, [None, 784], name="X_placeholder")
    Y = tf.placeholder(tf.float32, [None, 10], name="Y_placeholder")

dropout = tf.placeholder(tf.float32, name='dropout')

# Step 4 + 5: create weights + do inference
# the model is conv -> relu -> pool -> conv -> relu -> pool -> fully connected -> softmax

global_step = tf.Variable(
    0, dtype=tf.int32, trainable=False, name='global_step')

utils.make_dir('checkpoints')
utils.make_dir('checkpoints/convnet_mnist')

## Build a Convolutional Network Model

As we see in our previous model, in order to improve upon Logistic Regression, in this example we will create a convolutional network that will involve two layers of convolution and pooling, a fully-connected RELU layer, a dropout layer, and then finally a layer for softmax regression.

![Fig 1: Convolutional Network](images/change.png "Convolutional Network")

### First Convolutional Layer

In [3]:
with tf.variable_scope('conv1') as scope:
    # first, reshape the image to [BATCH_SIZE, 28, 28, 1] to make it work with tf.nn.conv2d
    # use the dynamic dimension -1
    images = tf.reshape(X, shape=[-1, 28, 28, 1])

    kernel = tf.get_variable(
        'kernel', [5, 5, 1, 32], initializer=tf.truncated_normal_initializer())

    biases = tf.get_variable( 
        'biases', [32], initializer=tf.constant_initializer(0.1))

    # apply tf.nn.conv2d. strides [1, 1, 1, 1], padding is 'SAME'

    conv = tf.nn.conv2d(images, kernel, strides=[1, 1, 1, 1], padding='SAME')
    # conv1 = tf.layers.conv2d(x,32,5,activation=tf.nn.relu)

    # apply relu on the sum of convolution output and biases

    conv1 = tf.nn.relu(conv + biases, name=scope.name)
    # conv1 = tf.contrib.layers.conv2d(
      #  images, 32, 5, 1, activation_fn=tf.nn.relu, padding='SAME')
    # output is of dimension BATCH_SIZE x 28 x 28 x 32

To create the model, we start with the implementation of our first layer, which will consist of convolution and then max pooling. The convolution will compute the first 32 features for each 5x5 patch. The weight tensor has dimensions [5,5,1,32], where the first two dimensions are the patch size, the following element is the number of input channels, and the last is the number of output channels. We will also have a bias vctor that aligns with our output channel. 

In [4]:
with tf.variable_scope('pool1') as scope:
    # apply max pool with ksize [1, 2, 2, 1], and strides [1, 2, 2, 1], padding 'SAME'
    pool1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[
                           1, 2, 2, 1], padding='SAME')

# output is of dimension BATCH_SIZE x 14 x 14 x 32

Pooling of our layer must occur in order to reduce the image size for our next layer with the size of Batch_SIZE x 13 x 13 x 32. 

### Second Convolutional Layer

In [5]:
with tf.variable_scope('conv2') as scope:
    # similar to conv1, except kernel now is of the size 5 x 5 x 32 x 64
    kernel = tf.get_variable('kernels', [5, 5, 32, 64],
                             initializer=tf.truncated_normal_initializer())
    biases = tf.get_variable('biases', [64],
                             initializer=tf.constant_initializer(0.1))
    conv = tf.nn.conv2d(pool1, kernel, strides=[1, 1, 1, 1], padding='SAME')
    conv2 = tf.nn.relu(conv + biases, name=scope.name)

    # output is of dimension BATCH_SIZE x 14 x 14 x 64
    
with tf.variable_scope('pool2') as scope:
    # similar to pool1
    pool2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],
                           padding='SAME')

In a similar fashion, we will construct the second convotulional layer, with 64 features for each 5x5 patch that eventually gives us a outout of BATCH_SIZE x 7 x 7 64 after pooling. 

### Fully Connected Layer and Dropout Layer

In [6]:
with tf.variable_scope('fc') as scope:
    # use weight of dimension 7 * 7 * 64 x 1024
    input_features = 7 * 7 * 64

    # create weights and biases
    w = tf.get_variable(
        'weights', [input_features, 1024], initializer=tf.truncated_normal_initializer())
    b = tf.get_variable(
        'biases', [1024], initializer=tf.constant_initializer(0.1))

    # reshape pool2 to 2 dimensional
    pool2 = tf.reshape(pool2, [-1, input_features])

    # apply relu on matmul of pool2 and w + b
    fc = tf.nn.relu(tf.matmul(pool2, w) + b, name='relu')

    # apply dropout
    fc = tf.nn.dropout(fc, dropout, name='relu_dropout')

In our fully connected layer with all of our potential neurons sprocessing the entire image, we can reshape teh data into a batch of vectors and apply another ReLU. 

Dropout is used to reduce overfitting, according to the probability that a neuron's output is kept during dropout. This will also require us to only turn dropout on during training and off dring testing by modifying the DROPOUT_RATE 

### Specify Loss Function

In [None]:
with tf.variable_scope('softmax_linear') as scope:
    # this you should know. get logits without softmax
    # you need to create weights and biases

    w = tf.get_variable(
        'weights', [1024, N_CLASSES], initializer=tf.truncated_normal_initializer())
    b = tf.get_variable('biases', [N_CLASSES],
                        initializer=tf.constant_initializer(0.1))

    logits = tf.matmul(fc, w) + b

    # Step 6: define loss function
    # use softmax cross entropy with logits as the loss function
    # compute mean cross entropy, softmax is applied internally
with tf.name_scope('loss'):
    # you should know how to do this too

    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=Y, logits=logits), name='loss')



Finally, we specify the final layer which is a simply softmax regression that also inherently calculates our loss function that we want to minimize.

## Training and Evaluation

In [None]:
    # Step 7: define training op
    # using gradient descent with learning rate of LEARNING_RATE to minimize cost
    # don't forgot to pass in global_step

with tf.name_scope('summaries'):
    tf.summary.scalar('loss', loss)
    tf.summary.histogram('histogram loss', loss)
    summary_op = tf.summary.merge_all()

optimizer = tf.train.AdamOptimizer(
    LEARNING_RATE).minimize(loss, global_step=global_step)

For our optimizer, we will replace the traditional gradient descent optimizer with the more sophisticated ADAM optimizer. We will also add additional TensorBoard summaries to be able to visiualize the performance of our model.

tf.Session was used rather than tf.InteractiveSession since this better separates the process of creating the graph and the process of evaluating the graph. Using the with block, we can close the session immediately after the with block is exited.

In [None]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver = tf.train.Saver()
    # to visualize using TensorBoard
    writer = tf.summary.FileWriter('./my_graph/mnist', sess.graph)

    # You have to create folders to store checkpoints
    ckpt = tf.train.get_checkpoint_state(
        os.path.dirname('checkpoints/convnet_mnist/checkpoint'))
    # if that checkpoint exists, restore from checkpoint
    if ckpt and ckpt.model_checkpoint_path:
        saver.restore(sess, ckpt.model_checkpoint_path)

    initial_step = global_step.eval()

    start_time = time.time()
    n_batches = int(mnist.train.num_examples / BATCH_SIZE)

    total_loss = 0.0
    for index in range(initial_step, n_batches * N_EPOCHS):  # train the model n_epochs times
        X_batch, Y_batch = mnist.train.next_batch(BATCH_SIZE)
        _, loss_batch, summary = sess.run([optimizer, loss, summary_op],
                                          feed_dict={X: X_batch, Y: Y_batch, dropout: DROPOUT})
        writer.add_summary(summary, global_step=index)
        total_loss += loss_batch
        if (index + 1) % SKIP_STEP == 0:
            print('Average loss at step {}: {:5.1f}'.format(
                index + 1, total_loss / SKIP_STEP))
            total_loss = 0.0
            saver.save(sess, 'checkpoints/convnet_mnist/mnist-convnet', index)

    print("Optimization Finished!")  # should be around 0.35 after 25 epochs
    print("Total time: {0} seconds".format(time.time() - start_time))

The above code is how we trained our model using the described optimier. We also added logging to every 100th iteration in the training process to understand how our training accuracy and average loss is going. DROPOUT is the parameter used to control the dropout rate. Here are the plots that were calculated from fully trained the model.


![Fig 2: Training Accuracy](images/graph_acc.png)


![Fig 3: Training Accuracy](images/summaries_acc.png)


![Fig 4: Loss Function](images/graph_loss.png)


![Fig 4: Loss Function](images/summaries_histo.png)

In [None]:
    # test the model
    n_batches = int(mnist.test.num_examples / BATCH_SIZE)
    total_correct_preds = 0
    for i in range(n_batches):
        X_batch, Y_batch = mnist.test.next_batch(BATCH_SIZE)
        loss_batch, logits_batch = sess.run([loss, logits],
                                            feed_dict={X: X_batch, Y: Y_batch, dropout: 1.0})
        preds = tf.nn.softmax(logits_batch)
        correct_preds = tf.equal(tf.argmax(preds, 1), tf.argmax(Y_batch, 1))
        accuracy = tf.reduce_sum(tf.cast(correct_preds, tf.float32))
        total_correct_preds += sess.run(accuracy)

    print("Accuracy {0}".format(total_correct_preds / mnist.test.num_examples))


Running the test data on our model after completing the optimization we end up with roughly 97.28% test accuracy, which is respectable with 25 epochs.