# Handwritten Digits with Tensorflow

In [2]:
import tensorflow as tf

## 1. Import The Data

In [4]:
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)  # y labels are oh-encoded

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [5]:
n_train = mnist.train.num_examples  # 55,000
n_validation = mnist.validation.num_examples  # 5000
n_test = mnist.test.num_examples  # 10,000

## 2. Defining the Neural Network Architecture

In [6]:
n_input = 784  # input layer (28x28 pixels)
n_hidden1 = 512  # 1st hidden layer
n_hidden2 = 256  # 2nd hidden layer
n_hidden3 = 128  # 3rd hidden layer
n_output = 10  # output layer (0-9 digits)

In [7]:
# The network hyperparameters
learning_rate = 1e-4
n_iterations = 10000
batch_size = 64
dropout = 0.5

The `dropout` variable represents a threshold at which we eliminate some units at random. We will be using dropout in our final hidden layer to give each unit a 50% chance of being eliminated at every training step. This helps prevent overfitting.

## 3. Building the Tensorflow Graph

In [8]:
# defining tensors
inputs = tf.placeholder(tf.float32, shape=(None, n_input), name='inputs')
labels = tf.placeholder(tf.float32, shape=(None, n_output), name='labels')
keep_prob = tf.placeholder(tf.float32)

The `keep_prob` tensor is used to control the dropout rate, and we initialize it as a placeholder rather than an immutable variable because we want to use the same tensor both for training (when`dropout` is set to `0.5`) and testing (when `dropout` is set to `1.0`).

We will use random values from a truncated normal distribution for the weights. We want them to be close to zero, so they can adjust in either a positive or negative direction, and slightly different, so they generate different errors.

In [None]:
weights = {
    'w1': tf.Variable(tf.truncated_normal([n_input, n_hidden1], stddev=0.1)),
    'w2': tf.Variable(tf.truncated_normal([n_hidden1, n_hidden2], stddev=0.1)),
    'w3': tf.Variable(tf.truncated_normal([n_hidden2, n_hidden3], stddev=0.1)),
    'out': tf.Variable(tf.truncated_normal([n_hidden3, n_output], stddev=0.1)),
}

For the bias, we use a small constant value to ensure that the tensors activate in the initial stages and therefore contribute to the propagation.

In [None]:
biases = {
    'b1': tf.Variable(tf.constant(0.1, shape=[n_hidden1])),
    'b2': tf.Variable(tf.constant(0.1, shape=[n_hidden2])),
    'b3': tf.Variable(tf.constant(0.1, shape=[n_hidden3])),
    'out': tf.Variable(tf.constant(0.1, shape=[n_output]))
}

In [None]:
layer_1 = tf.add(tf.matmul(inputs, weights['w1']), biases['b1'])
layer_2 = tf.add(tf.matmul(layer_1, weights['w2']), biases['b2'])
layer_3 = tf.add(tf.matmul(layer_2, weights['w3']), biases['b3'])
layer_drop = tf.nn.dropout(layer_3, rate=1-keep_prob)
output_layer = tf.matmul(layer_3, weights['out']) + biases['out']

In [9]:
# Strategie 2 of defining Layers
hidden1 = tf.layers.dense(inputs, n_hidden1,
                              activation=tf.nn.relu,
                              name='hidden1')
hidden2 = tf.layers.dense(inputs, n_hidden2,
                              activation=tf.nn.relu,
                              name='hidden2')
hidden3 = tf.layers.dense(inputs, n_hidden3,
                              activation=tf.nn.relu,
                              name='hidden3')
logits = tf.layers.dense(hidden2, n_output,
                             name='logits')

Instructions for updating:
Use keras.layers.dense instead.
Instructions for updating:
Colocations handled automatically by placer.


In [None]:
# Defining the loss function

cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits_v2(
        labels=labels, logits=output_layer
        ))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

In [10]:
# Strategie 2 for loss function


# convert labels to tf.float32
labels_float = tf.cast(labels, tf.float32)

# calculate the cross entropy using softmax
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(labels=labels_float, logits=logits)    

# calculate the loss
loss = tf.reduce_mean(cross_entropy)

# set the training operation using AdamOptimizer
adam = tf.train.AdamOptimizer()
train_op = adam.minimize(loss)

We’ve now defined the network and built it out with TensorFlow. The next step is to feed data through the graph to train it, and then test that it has actually learned something.

## 4. Training and Testing

Before starting the training process, we will define our method of evaluating the accuracy so we can print it out on mini-batches of data while we train. These printed statements will allow us to check that from the first iteration to the last, loss decreases and accuracy increases; they will also allow us to track whether or not we have ran enough iterations to reach a consistent and optimal result:

In [None]:
correct_pred = tf.equal(tf.argmax(output_layer, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

In [11]:
# accuracy strategie 2

# calculate the probabilities from logits using the softmax function
probs = tf.nn.softmax(logits)

# calculate label predictions using argmax
predictions = tf.argmax(probs, axis=-1)

# convert labels back to class indexes to calculate our accuracy (from hot vectors)
class_labels = tf.argmax(labels, axis=-1)

# calculate the number of false and true predictions
is_correct = tf.equal(predictions, class_labels)

# calculate the accuracy of our model
is_correct_float = tf.cast(is_correct, tf.float32)
accuracy = tf.reduce_mean(is_correct_float)

We are now ready to initialize a session for running the graph. In this session we will feed the network with our training examples, and once trained, we feed the same graph with new test examples to determine the accuracy of the model. Add the following lines of code to your file:

In [None]:
init_op = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init_op)

In [None]:
# train on mini batches
for i in range(n_iterations):
    batch_x, batch_y = mnist.train.next_batch(batch_size)
    sess.run(train_step, feed_dict={
        inputs: batch_x,
        labels: batch_y,
        keep_prob: dropout
    })
    
    # print loss and accuracy (per minibatch)
    if i % 100 == 0:
        minibatch_loss, minibatch_accuracy = sess.run(
            [cross_entropy, accuracy],
            feed_dict={inputs: batch_x, labels: batch_y, keep_prob: 1.0}
        )
        print(
            "Iteration",
            str(i),
            "\t| Loss =",
            str(minibatch_loss),
            "\t| Accuracy =",
            str(minibatch_accuracy)
            )

In [12]:
# startegy 2 for training
init_op = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init_op)
    
for i in range(n_iterations):
    batch_x, batch_y = mnist.train.next_batch(batch_size)
    feed_dict = {
        inputs: batch_x,
        labels: batch_y
    }
    sess.run(train_op, feed_dict=feed_dict)

Once the training is complete, we can run the session on the test images. This time we are using a `keep_prob dropout` rate of `1.0` to ensure all units are active in the testing process.

In [None]:
test_accuracy = sess.run(accuracy, feed_dict={inputs: mnist.test.images,\
                                              labels: mnist.test.labels, keep_prob: 1.0})
print("\nAccuracy on test set:", test_accuracy)

In [13]:
# strategy 2 for testing
test_accuracy = sess.run(accuracy, feed_dict={inputs: mnist.test.images,\
                                              labels: mnist.test.labels})
print("\nAccuracy on test set:", test_accuracy)


Accuracy on test set: 0.9798


To demonstrate that the network is actually recognizing the hand-drawn images, let’s test it on a single image of my own.

In [14]:
import numpy as np
from PIL import Image

In [22]:
img = np.invert(Image.open("test_image.png").convert('L')).ravel()

The `open` function of the `Image` library loads the test image as a 4D array containing the three RGB color channels and the Alpha transparency. This is not the same representation we used previously when reading in the dataset with TensorFlow, so we’ll need to do some extra work to match the format.

First, we use the `convert` function with the `L` parameter to reduce the 4D RGBA representation to one grayscale color channel. We store this as a `numpy` array and invert it using `np.invert`, because the current matrix represents black as 0 and white as 255, whereas we need the opposite. Finally, we call `ravel` to flatten the array.

Now that the image data is structured correctly, we can run a session in the same way as previously, but this time only feeding in the single image for testing.

In [23]:
prediction = sess.run(tf.argmax(logits, 1), feed_dict={inputs: [img]})
print ("Prediction for test image:", np.squeeze(prediction))

Prediction for test image: 2
