Our goal here is to train a Convolutional Neural Network (CNN) to read handwritten digits. 

The input images are 28 X 28, so the inout tensor size = [batch_size, 28,28]
The network architecture is as follows:

## Layer 1:
i) Convolutional layer [width = 5, height = 5, channels = 32]

ii) ReLU

iii) Max-pooling (using 2X2 filter)

Final feature-set  dimensions : [14X14X32]

## Layer 2:
i) Convolutional layer [14,14, 64]

ii) ReLU

iii) Max-pooling (using 2X2 filter)

Final feature-set  dimensions : [7X7X64]

## Layer 3:
i) Fully connected layer 

ii) ReLU

iii) Dropout

Final feature-set  dimensions : [1X1024]

## Final Output 
Fully connected layer: [1X10]





In [1]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

#Starting interactive session
sess = tf.InteractiveSession()

In [2]:
#Load MNIST data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [3]:
#Define model attributes
width = 28 # image width (in pixels) 
height = 28 # image height 
flat = width * height # total number of pixels in flattened image 
class_output = 10 # number of classes in final output

In [4]:
#Create PlaceHolders
x  = tf.placeholder(tf.float32, shape=[None, flat])
y_ = tf.placeholder(tf.float32, shape=[None
                                       , class_output])

In [5]:
#Reshape input into tensor of size [Batch_size, width, height, Num_channels]
x_image = tf.reshape(x, [-1,28,28,1])  


### Layer 1

In [6]:
#Initialize weights and biases
W_conv1 = tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev=0.1))# initialize weights randomly
b_conv1 = tf.Variable(tf.constant(0.1, shape=[32]))

In [7]:
# Apply convolution with same padding
conv1= tf.nn.conv2d(x_image, W_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1

In [8]:
# Apply ReLU activation
h_conv1 = tf.nn.relu(conv1)

In [9]:
# Apply 2X2 max pooling
h_pool1 = tf.nn.max_pool(h_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

In [10]:
h_pool1

<tf.Tensor 'MaxPool:0' shape=(?, 14, 14, 32) dtype=float32>

### Layer 2

In [11]:
#Initialize weights and biases
W_conv2 = tf.Variable(tf.truncated_normal([5, 5, 32, 64], stddev=0.1))# initialize weights randomly
b_conv2 = tf.Variable(tf.constant(0.1, shape=[64]))
#Convolution
conv2= tf.nn.conv2d(h_pool1, W_conv2, strides=[1, 1, 1, 1], padding='SAME') + b_conv2
#ReLU
h_conv2 = tf.nn.relu(conv2)
#Maxpool
h_pool2 = tf.nn.max_pool(h_conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

In [12]:
h_pool2

<tf.Tensor 'MaxPool_1:0' shape=(?, 7, 7, 64) dtype=float32>

### Layer 3
Now we are ready to apply the first fully connected layer. To do that, we first have to flatten the last output into a 1-dimensional tensor.

In [13]:
layer2_flat = tf.reshape(h_pool2, [-1, 7*7*64])

In [14]:
layer2_flat

<tf.Tensor 'Reshape_1:0' shape=(?, 3136) dtype=float32>

In [15]:
#Initialize weights and biases for fully connnected layer
W_fc1 = tf.Variable(tf.truncated_normal([7 * 7 * 64, 1024], stddev=0.1))
b_fc1 = tf.Variable(tf.constant(0.1, shape=[1024]))

In [16]:
fc1 = tf.matmul(layer2_flat,W_fc1) + b_fc1

In [17]:
h_fc1 = tf.nn.relu(fc1)

In [18]:
h_fc1

<tf.Tensor 'Relu_2:0' shape=(?, 1024) dtype=float32>

### Dropout
In order to control overfitting, we turn off some of the weights randomly.

In [19]:
keep_prob = tf.placeholder(tf.float32)
layer3_drop = tf.nn.dropout(h_fc1, keep_prob)

## Final classification layer

In [20]:
W_fc2 = tf.Variable(tf.truncated_normal([1024, 10], stddev=0.1)) 
b_fc2 = tf.Variable(tf.constant(0.1, shape=[10])) 

In [21]:
fc2 =tf.matmul(layer3_drop, W_fc2) + b_fc2

In [22]:
final_probs = tf.nn.softmax(fc2)

In [23]:
final_probs

<tf.Tensor 'Softmax:0' shape=(?, 10) dtype=float32>

### Loss function
We define the cross-entropy loss function.

In [24]:
loss = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(final_probs), reduction_indices=[1]))

## Gradient Descent Optimizer

In [34]:
train_step = tf.train.AdamOptimizer(1e-4).minimize(loss)

In [35]:
correct_ = tf.equal(tf.argmax(final_probs,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_, tf.float32))

In [36]:
sess.run(tf.global_variables_initializer())

In [37]:
for i in range(5001):
    batch = mnist.train.next_batch(50)
    if i%100 == 0:
        train_accuracy = accuracy.eval(feed_dict={x:batch[0], y_: batch[1], keep_prob: 1.0})
        print("step %d, training accuracy %g"%(i, float(train_accuracy)))
    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

step 0, training accuracy 0.1
step 100, training accuracy 0.84
step 200, training accuracy 0.92
step 300, training accuracy 0.96
step 400, training accuracy 0.88
step 500, training accuracy 1
step 600, training accuracy 0.92
step 700, training accuracy 0.84
step 800, training accuracy 0.92
step 900, training accuracy 0.98
step 1000, training accuracy 0.96
step 1100, training accuracy 0.98
step 1200, training accuracy 0.98
step 1300, training accuracy 1
step 1400, training accuracy 0.98
step 1500, training accuracy 0.98
step 1600, training accuracy 0.96
step 1700, training accuracy 0.92
step 1800, training accuracy 0.96
step 1900, training accuracy 0.98
step 2000, training accuracy 1
step 2100, training accuracy 0.98
step 2200, training accuracy 0.98
step 2300, training accuracy 0.98
step 2400, training accuracy 1
step 2500, training accuracy 0.96
step 2600, training accuracy 0.94
step 2700, training accuracy 0.98
step 2800, training accuracy 1
step 2900, training accuracy 0.96
step 300

In [43]:
print 'Test set performance'
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob:1.0}))


Test set performance
0.987


### Conclusion
We have trained a convolutional neural network to recognize handwritten digits. For a relatively simple network, and a relatively short training time, we achieve an excellent performance on the test set.