<h1 style="text-align:center;"> MNIST Classifier using TensorFlow</h1>

<h3>The following changes are made to the previous classifier:</h3>
<ul>
    <li><h5>A convolutional layer is added</h5></li>
    <li><h5>A max pooling layer is added</h5></li>
    <li><h5>The batch size is reduced to 100</h5></li>
    <li><h5>The number of epochs is increased to 10</h5></li>
</ul>

------------------------------------------------------------------------------------------------------------------------------

<h6> Import the libraries TensorFlow and NumPy </h6>

In [1]:
import tensorflow as tf
import numpy as np

TensorFlow version used here is 1.13.2
<br>Use <b>pip install tensorflow=1.13.2</b> to install tensorflow 1.13.2 
<br>This will replace the current version of tensorflow, if installed

------------------------------------------------------------------------------------------------------------------------------

<h6> Construct the CNN classifier by building the graph using TensorFlow </h6>

In [2]:
tf.reset_default_graph() # Reset any existing graph

In [3]:
X = tf.placeholder(tf.float32, shape=[None, 28*28])

# Rehsape the input to the format('all_rows', 28, 28, 1)
X_reshaped = tf.reshape(X, shape=[-1, 28, 28, 1])

In [4]:
y = tf.placeholder(tf.int32, shape=[None])

In [5]:
# The below convolutional layer with 32 filters, a kernel size of 3, a stride of 2, SAME padding and with the 
# ReLU activation function processes the input and produces an output of the shape (-1, 28, 28, 32)

convolutional_layer_1 = tf.layers.conv2d(X_reshaped, filters=32, kernel_size=3,strides=1, 
                                         padding="SAME", activation=tf.nn.relu)

Instructions for updating:
Use keras.layers.conv2d instead.
Instructions for updating:
Colocations handled automatically by placer.


In [6]:
# The below max pooling layer with a kernel size of 2, a stride of 2, VALID padding processes the input 
# and produces an output of the shape (-1, 14, 14, 32)

pooling_layer_1 = tf.nn.max_pool(convolutional_layer_1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="VALID")

In [7]:
# The below convolutional layer with 64 filters, a kernel size of 3, a stride of 2, SAME padding and with the 
# ReLU activation function processes the input and produces an output of shape (-1, 7, 7, 64)

convolutional_layer_2 = tf.layers.conv2d(pooling_layer_1, filters=64, kernel_size=3,strides=2, 
                                         padding="SAME", activation=tf.nn.relu)

In [8]:
# The below max pooling layer with a kernel size of 2, a stride of 2, VALID padding processes the input 
# and produces an output of the shape (-1, 4, 4, 64)

pooling_layer_2 = tf.nn.max_pool(convolutional_layer_2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="VALID")

In [9]:
# The pooling layer output is flattened to the shape (-1, 576)

flattened_pooling_layer = tf.reshape(pooling_layer_2, shape=[-1, 64 * 3 * 3])

In [10]:
# The dense layer will produce an output of the shape (-1, 64) after passing through the relu activation function

fully_connected_layer = tf.layers.dense(flattened_pooling_layer, 64, activation=tf.nn.relu)

Instructions for updating:
Use keras.layers.dense instead.


In [11]:
# The fully connected output layer layer with linear activation function produces an output of the shape (-1,10)

logits = tf.layers.dense(fully_connected_layer, 10)

In [12]:
# The probability is calculated as a function of softmax

Y_probability = tf.nn.softmax(logits)

In [13]:
# The cross entropy is calculated to measure the loss

cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y)

In [14]:
# The loss function calculates the loss in the form of reduced mean

loss = tf.reduce_mean(cross_entropy)

In [15]:
# Adam optimizer is used to reduce the loss

optimizer = tf.train.AdamOptimizer()
training_op = optimizer.minimize(loss)

In [16]:
# The In Top K function returns if the predicted probability of a target class is in the first 'K' predictions
# In the case below, it checks if the target class is predicted with highest probability

correct = tf.nn.in_top_k(logits, y, 1)

In [17]:
# Below function is used to calculate the accuracy

accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

In [18]:
# Save the model parameters

saver = tf.train.Saver()

------------------------------------------------------------------------------------------------------------------------------

<h6> Load MNIST data from the Library</h6>
<br><ul><li>The data is already split into training and test sets</li>
<li>Re-shape the data to the format (28,28)</li>
<li>Fix the pixel values to range from float 0 to 1</li>
<li>Split the test set set further to create a validation set</li></ul>

In [19]:
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
X_train = X_train.astype(np.float32).reshape(-1, 28*28) / 255.0
X_test = X_test.astype(np.float32).reshape(-1, 28*28) / 255.0
y_train = y_train.astype(np.int32)
y_test = y_test.astype(np.int32)
X_valid, X_train = X_train[:5000], X_train[5000:]
y_valid, y_train = y_train[:5000], y_train[5000:]

------------------------------------------------------------------------------------------------------------------------------

<h6>Define the number epochs, batch size and a function to create batches of samples</h6>

In [20]:
n_epochs = 10 
batch_size = 100

In [21]:
# Function to create batches of training data to be fed to the classifier

def shuffle_batch(X, y, batch_size):
    rnd_idx = np.random.permutation(len(X))
    n_batches = len(X) // batch_size
    for batch_idx in np.array_split(rnd_idx, n_batches):
        X_batch, y_batch = X[batch_idx], y[batch_idx]
        yield X_batch, y_batch

------------------------------------------------------------------------------------------------------------------------------

<h6> Train the model and evaluate the training and validation accuracy</h6>

In [22]:
init = tf.global_variables_initializer() # Create an initializer object
sess = tf.Session() # Create a tensorflow session
sess.run(init) # Initialize the graph nodes

In [23]:
for epoch in range(n_epochs):
    for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
        sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
    acc_batch = sess.run(accuracy, feed_dict={X: X_batch, y: y_batch})
    acc_valid = sess.run(accuracy, feed_dict={X: X_valid, y: y_valid})
    print("Epoch:", epoch+1 , "Last batch accuracy:", acc_batch, "Validation accuracy:", acc_valid)
    save_path = saver.save(sess, "./my_mnist_model")

Epoch: 1 Last batch accuracy: 0.97 Validation accuracy: 0.9672
Epoch: 2 Last batch accuracy: 0.99 Validation accuracy: 0.9764
Epoch: 3 Last batch accuracy: 1.0 Validation accuracy: 0.9798
Epoch: 4 Last batch accuracy: 0.97 Validation accuracy: 0.9828
Epoch: 5 Last batch accuracy: 0.99 Validation accuracy: 0.9862
Epoch: 6 Last batch accuracy: 0.98 Validation accuracy: 0.9836
Epoch: 7 Last batch accuracy: 1.0 Validation accuracy: 0.988
Epoch: 8 Last batch accuracy: 1.0 Validation accuracy: 0.989
Epoch: 9 Last batch accuracy: 0.99 Validation accuracy: 0.987
Epoch: 10 Last batch accuracy: 1.0 Validation accuracy: 0.988


------------------------------------------------------------------------------------------------------------------------------

<h6>Use the trained model for prediction</h6>

In [24]:
selection=np.random.choice(10000,size=10)
X_pred = X_test[selection]
y_actual = y_test[selection]
y_probability_pred = sess.run(Y_probability, feed_dict = {X: X_pred})
y_pred=[]
for val in y_probability_pred:
    y_pred.append(np.argmax(val))

In [25]:
print("Actual Predicted")
for i in range(len(X_pred)):
    print("{} \t {}".format(y_actual[i],y_pred[i]))

Actual Predicted
5 	 5
3 	 3
5 	 5
2 	 2
3 	 3
1 	 1
2 	 2
7 	 7
0 	 0
2 	 2


In [26]:
acc_test = sess.run(accuracy, feed_dict={X: X_test, y: y_test})

In [27]:
print("The accuracy on the test set is {}%".format('{0:.4}'.format(acc_test*100)))

The accuracy on the test set is 98.9%


<b>A much better performance is achieved by stacking an additional layer of convolution and pooling<b>

------------------------------------------------------------------------------------------------------------------------------

In [28]:
sess.close() #Close the tensorflow session