# Building a Multilayer Convolutional Neural Network for the MNIST data


<img src="files/mnist_deep.png">


# 1) Loading MNIST DATASET into the project


The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits 
that is commonly used for training various image processing systems.[1][2] The database is also widely used for training and 
testing in the field of machine learning.It was created by "re-mixing" the samples from NIST's original dataset. The 
creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was 
taken from American high school students, it was not well-suited for machine learning experiments.[5] Furthermore, the black 
and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale 
levels.

The MNIST database contains 60,000 training images and 10,000 testing images. Half of the training set and half of the test 
set were taken from NIST's training dataset, while the other half of the training set and the other half of the test set were 
taken from NIST's testing dataset. There have been a number of scientific papers on attempts to achieve the lowest error rate; 
one paper, using a hierarchical system of convolutional neural networks, manages to get an error rate on the MNIST database of 
0.23 percent. The original creators of the database keep a list of some of the methods tested on it. In their original 
paper, they use a support vector machine to get an error rate of 0.8 percent. An extended dataset similar to MNIST called EMNIST 
has been published in 2017, which contains 240,000 training images, and 40,000 testing images of handwritten digits.

In [10]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz


# 2)Starting the tensorflow interactive session

Importing the tensorflow into the project as well as creating a new tensorflow session.

In [11]:
import tensorflow as tf
sess = tf.InteractiveSession()

# 3) Building a Softmax Regression Model

            In this section we will build a softmax regression model with a single linear layer. In the next section, we will
    extend this to the case of softmax regression with a multilayer convolutional network.

  ## Placeholders

       We start building the computation graph by creating nodes for the input images and target output classes.

In [12]:
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])

Here x and y_ aren't specific values. Rather, they are each a placeholder -- a value that we'll input when we ask TensorFlow to run a computation.

The input images x will consist of a 2d tensor of floating point numbers. Here we assign it a shape of [None, 784], where 784 is the dimensionality of a single flattened 28 by 28 pixel MNIST image, and None indicates that the first dimension, corresponding to the batch size, can be of any size. The target output classes y_ will also consist of a 2d tensor, where each row is a one-hot 10-dimensional vector indicating which digit class (zero through nine) the corresponding MNIST image belongs to.

The shape argument to placeholder is optional, but it allows TensorFlow to automatically catch bugs stemming from inconsistent tensor shapes.

## Variables

We now define the weights W and biases b for our model. We could imagine treating these like additional inputs, but 
TensorFlow has an even better way to handle them: Variable. A Variable is a value that lives in TensorFlow's computation graph.
It can be used and even modified by the computation. In machine learning applications, one generally has the model
parameters be Variables.

In [13]:
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

In [14]:
sess.run(tf.global_variables_initializer())

## Predicted Class and Loss Function

We can now implement our regression model. It only takes one line! We multiply the vectorized input images x by the weight matrix W, add the bias b

In [15]:
y = tf.matmul(x,W) + b

cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))

## Train the model

Now that we have defined our model and training loss function, it is straightforward to train using TensorFlow. Because TensorFlow knows the entire computation graph, it can use automatic differentiation to find the gradients of the loss with respect to each of the variables.

In [16]:
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

In [17]:
for _ in range(1000):
  batch = mnist.train.next_batch(100)
  train_step.run(feed_dict={x: batch[0], y_: batch[1]})

## Evaluate the Model

tf.argmax is an extremely useful function which gives you the index of the highest entry in a tensor along some axis. For example, tf.argmax(y,1) is the label our model thinks is most likely for each input, while tf.argmax(y_,1) is the true label. We can use tf.equal to check if our prediction matches the truth.

That gives us a list of booleans. To determine what fraction are correct, we cast to floating point numbers and then take the mean. For example, [True, False, True, True] would become [1,0,1,1] which would become 0.75.

Finally, we can evaluate our accuracy on the test data.

In [18]:
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

0.9189


So this shows that the **accuracy** of the model comes out to be around **92%** which is a fairly good score.

# 4)Build a Multilayer Convolutional Network

## Weight Initialization

To create this model, we're going to need to create a lot of weights and biases. One should generally initialize weights with a small amount of noise for symmetry breaking, and to prevent 0 gradients. Since we're using ReLU neurons, it is also good practice to initialize them with a slightly positive initial bias to avoid "dead neurons". Instead of doing this repeatedly while we build the model, let's create two handy functions to do it for us.

In [19]:
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

## Convolution and Pooling


In [20]:
def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

## First Convolutional Layer

Implementing first layer. It will consist of convolution, followed by max pooling. The convolution will compute 32 features for each 5x5 patch. Its weight tensor will have a shape of [5, 5, 1, 32]. The first two dimensions are the patch size, the next is the number of input channels, and the last is the number of output channels. We will also have a bias vector with a component for each output channel.

To apply the layer, first reshaping x to a 4d tensor, with the second and third dimensions corresponding to image width and height, and the final dimension corresponding to the number of color channels.

then convolving x_image with the weight tensor, add the bias, apply the ReLU function, and finally max pool. The max_pool_2x2 method will reduce the image size to 14x14.

In [21]:
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

x_image = tf.reshape(x, [-1, 28, 28, 1])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

## Second Convolutional Layer

In [22]:
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

## Densely Connected Layer
Now that the image size has been reduced to 7x7, we add a fully-connected layer with 1024 neurons to allow processing on the entire image. We reshape the tensor from the pooling layer into a batch of vectors, multiply by a weight matrix, add a bias, and apply a ReLU.