**Objective: ** Train a computer system to first understand what are handwritten images of digits an then test the system to predict new handwritten images correctly. We will feed our system with MNIST dataset and test it with random images to check the accuracy of its predictions.

**Importing Tensorflow & MNIST data**

In [0]:
#Importing tensorflow
import tensorflow as tf

In [2]:
#Importing MNIST Dataset
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use urllib or similar directly.
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py fr

Placeholders are different from variables. They are parameters which are created to hold the training data in this case it would be training images. 

In [0]:
#Creating placeholders
x = tf.placeholder(tf.float32, [None, 784])

**Weights and Biases:**

The initial values are set to zeros for Weight and Bias because it is going to be changed when the computation happens. Thus, it does not matter what are the initial values. ‘w’ is of (784 * 10) shape because 784 features and 10 outputs are present. ‘b’ is of the shape 10 because there are 10 outputs from 0 till 9 (digits).



In [0]:
#Creating weights and biases
w = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

**Prediction Model:**

x and w are matrices which are multiplied and added to a bias. The softmax function takes values and makes the total to be 1. This way it can be used as probability values making it uniform to judge which digit (0 -9) has higher chances of being predicted.

In [0]:
#Model
y = tf.nn.softmax(tf.matmul(x,w) + b)

In [0]:
#Training Our Model
y_ = tf.placeholder(tf.float32, [None,10])

**Cross-Entropy function:**

The cross-entropy function is a cost or loss function. It compares the true values to the predicted values. The goal is to minimize the loss.

In [0]:
#cross entropy function
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

**Gradient Descent Optimizer:**

When a model graph is formed using the cross-entropy function, we would want to find a point where the loss is at the minimum. This is done using the Gradient Descent optimizer. It moves towards the part of the graph where the value of the graph is lesser. The steps or the learning rate can be set manually. If a very small learning rate like 0.001 is set it would take forever for the system to reach to the point where loss is minimum but it would be more accurate. If the learning is rate is set high then the system might produce quick but false results a false results.



In [0]:
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

In [0]:
init = tf.global_variables_initializer()
# Variables are not initialized on their own. The above command is needed to initialize them.

**Training model:**

Here, we are training our model to learn the MNIST data in batches of 100.

In [0]:
#interactive session:
sess = tf.Session()
sess.run(init) # reset values to incorrect defaults.

In [0]:
for i in range(100):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x : batch_xs, y_: batch_ys})

**Evaluating and Testing the model:**

Here, the model is evaluated and tested with accuracy of 0.8916. tf.argmax gives the highest value in one axis. So, y_ are the correct values and y are the predicted values. tf.equal is used to find if they are equal or not.

In [0]:
#Evaluating the model:
prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

In [0]:
#Accuracy
accuracy = tf.reduce_mean(tf.cast(prediction, tf.float32))

In [14]:
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

0.8954


**Multi-Convolutional Layer:**

Here, we are using more layers, weights and biases to refine our model and increase accuracy.

Our first convolutional layer has 32 features for each 5*5 patch. Its weight tensor will be of a shape of [5,5,1,32]. First two dimensions are the patch size, the next is the input channel and the last number is the output channel.

To apply the convolutional layer, x is reshaped to 4D tensor. 28*28 is the image width and height and the final dimension is the number of color channels.

Then ReLU function is applied to bring the negative values to 0 and keep the positive values as it is. Maxpool reduces the image size to 14*14.

The second convolutional layer has 64 features for each 5*5 patch.

Now that the image size has been reduces to 7*7 we add a fully-connected layer with 1024 neurons.

In [0]:
#  Multilayer Convolutional Layer:

#functions to define weights and biases to be used in the computation ahead.
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

In [0]:
def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

In [0]:
get_ipython().run_line_magic('pinfo', 'tf.nn.conv2d')

In [0]:
#First Convolutional Layer
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

In [0]:
x_image = tf.reshape(x, [-1,28,28,1])

In [0]:
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

In [0]:
#Second Convolutional Layer
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

In [0]:
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

**Dropout and Readout Layer:**

To reduce overfitting we would drop some data randomly. Dropout layer thus helps in improving the accuracy of predictions. The next layer is used to read the data.

In [0]:
#Dropout Layer
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

In [0]:
#Readout Layer

W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

In [30]:
with tf.Session() as sess:


    cross_entropy = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
    correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    sess.run(tf.global_variables_initializer())

    for i in range(20000):
      batch = mnist.train.next_batch(50)
      if i%100 == 0:
        train_accuracy = accuracy.eval(feed_dict={
            x:batch[0], y_: batch[1], keep_prob: 1.0})
        print("step %d, training accuracy %g"%(i, train_accuracy))
      train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

    print("test accuracy %g"%accuracy.eval(feed_dict={
        x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

step 0, training accuracy 0.06
step 100, training accuracy 0.84
step 200, training accuracy 0.9
step 300, training accuracy 0.92
step 400, training accuracy 0.92
step 500, training accuracy 0.94
step 600, training accuracy 0.94
step 700, training accuracy 0.92
step 800, training accuracy 0.94
step 900, training accuracy 1
step 1000, training accuracy 0.96
step 1100, training accuracy 0.86
step 1200, training accuracy 1
step 1300, training accuracy 1
step 1400, training accuracy 0.98
step 1500, training accuracy 0.96
step 1600, training accuracy 1
step 1700, training accuracy 0.92
step 1800, training accuracy 0.94
step 1900, training accuracy 0.96
step 2000, training accuracy 0.98
step 2100, training accuracy 1
step 2200, training accuracy 0.96
step 2300, training accuracy 0.98
step 2400, training a

**Final Test Accuracy: 0.9923**