# Handwritten Digit Classification in Tensorflow

The goal here is to classify hand-drawn images of the numbers 0-9 and build and train a neural network to recognize and predict the correct label for the digit displayed. The dataset we will be using is called the MNIST dataset, and it is a classic in the machine learning community. This dataset is made up of images of handwritten digits, 28x28 pixels in size. Here are some examples of the digits included in the dataset:
![title](MNIST.png)

## Step1: Import MNIST Dataset

In [1]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data # Download MNIST Dataset
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) # y labels are one-hot encoded


Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


### Ground truth as one -hot encoding for values in the range 0-9

When reading in the data, we are using one-hot-encoding to represent the labels (the actual digit drawn, e.g. "3") of the images. One-hot-encoding uses a vector of binary values to represent numeric or categorical values. As our labels are for the digits 0-9, the vector contains ten values, one for each possible digit. One of these values is set to 1, to represent the digit at that index of the vector, and the rest are set to 0. For example, the digit 3 is represented using the vector [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]. As the value at index 3 is stored as 1, the vector therefore represents the digit 3. 

In [2]:
#---------- Train, val and test dataset ----------# 
n_train = mnist.train.num_examples  # 55,000
n_validation = mnist.validation.num_examples  # 5000
n_test = mnist.test.num_examples  # 10,000

## Step2: Defining the Neural Network
The architecture of the neural network refers to elements such as the number of layers in the network, the number of units in each layer, and how the units are connected between layers. In the inline below, add the number of units per layer in global variables. 


In [3]:
n_input = 784  # input layer (28x28 pixels)
n_hidden1 = 512  # 1st hidden layer
n_hidden2 = 256  # 2nd hidden layer
n_hidden3 = 128  # 3rd hidden layer
n_output = 10  # output layer (0-9 digits)


 Example: ![title](Neural_Network.png)

In [4]:
#---------- Fix hyperparameters to train neural network ----------# 
learning_rate = 1e-4
n_iterations = 1000
batch_size = 128
dropout = 0.5

## Step3: Building Tensorflow Graph

In [5]:
X = tf.placeholder("float", [None, n_input]) 
#----------  Example: [None, 784], Where None denotes any number of samples of 784 pixels each ----------# 

Y = tf.placeholder("float", [None, n_output])
#---------- Example: [None, 10], Where None denotes any number of samples with 10 possible classes ----------#

#---------- The keep_prob tensor is used to control the dropout rate, and we initialize it as a placeholder rather 
# than an immutable variable because we want to use the same tensor both for training (when dropout is set 
# to 0.5) and testing (when dropout is set to 1.0). ----------#
keep_prob = tf.placeholder(tf.float32)

In [6]:
#---------- Define weights and Biases ----------#

weights = {
    'w1': tf.Variable(tf.truncated_normal([n_input, n_hidden1], stddev=0.1)),
    'w2': tf.Variable(tf.truncated_normal([n_hidden1, n_hidden2], stddev=0.1)),
    'w3': tf.Variable(tf.truncated_normal([n_hidden2, n_hidden3], stddev=0.1)),
    'out': tf.Variable(tf.truncated_normal([n_hidden3, n_output], stddev=0.1)),
}

#---------- Set bias to some constant value ----------#
biases = {
    'b1': tf.Variable(tf.constant(0.1, shape=[n_hidden1])),
    'b2': tf.Variable(tf.constant(0.1, shape=[n_hidden2])),
    'b3': tf.Variable(tf.constant(0.1, shape=[n_hidden3])),
    'out': tf.Variable(tf.constant(0.1, shape=[n_output]))
}

#---------- Note: Weights and bias are assigned as dictionary for ease of access ----------#

In [7]:
#---------- Define Layers ----------#
layer_1 = tf.add(tf.matmul(X, weights['w1']), biases['b1'])
layer_2 = tf.add(tf.matmul(layer_1, weights['w2']), biases['b2'])
layer_3 = tf.add(tf.matmul(layer_2, weights['w3']), biases['b3'])
layer_drop = tf.nn.dropout(layer_3, keep_prob)
output_layer = tf.matmul(layer_3, weights['out']) + biases['out']

In [8]:
#---------- Loss function used: Cross Entropy ----------#
#---------- Optimization Algo: Gradient Descent Algorithm ----------#

cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(
        labels=Y, logits=output_layer
        ))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.



## Step4: Training and Testing

In [9]:
## In correct_pred, we use the arg_max function to compare which images are being predicted correctly by looking
## at the output_layer (predictions) and Y (labels), and we use the equal function to return this as a list of 
## Booleans. We can then cast this list to floats and calculate the mean to get a total accuracy score. 

correct_pred = tf.equal(tf.argmax(output_layer, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))


In [10]:
#---------- Initialize a session for running the graph ----------#
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

In [11]:
#---------- train on mini batches ----------#
for i in range(n_iterations):
    batch_x, batch_y = mnist.train.next_batch(batch_size)
    sess.run(train_step, feed_dict={
        X: batch_x, Y: batch_y, keep_prob: dropout
        })

    # print loss and accuracy (per minibatch)
    if i % 100 == 0:
        minibatch_loss, minibatch_accuracy = sess.run(
            [cross_entropy, accuracy],
            feed_dict={X: batch_x, Y: batch_y, keep_prob: 1.0}
            )
        print(
            "Iteration",
            str(i),
            "\t| Loss =",
            str(minibatch_loss),
            "\t| Accuracy =",
            str(minibatch_accuracy)
            )

Iteration 0 	| Loss = 3.689118 	| Accuracy = 0.0859375
Iteration 100 	| Loss = 0.6484231 	| Accuracy = 0.8125
Iteration 200 	| Loss = 0.29669413 	| Accuracy = 0.921875
Iteration 300 	| Loss = 0.34371233 	| Accuracy = 0.890625
Iteration 400 	| Loss = 0.3677974 	| Accuracy = 0.890625
Iteration 500 	| Loss = 0.30938524 	| Accuracy = 0.9296875
Iteration 600 	| Loss = 0.11597563 	| Accuracy = 0.9765625
Iteration 700 	| Loss = 0.32951662 	| Accuracy = 0.9375
Iteration 800 	| Loss = 0.2728576 	| Accuracy = 0.9296875
Iteration 900 	| Loss = 0.33552873 	| Accuracy = 0.8828125


In [12]:
#---------- Compute the results on test data ----------#
test_accuracy = sess.run(accuracy, feed_dict={X: mnist.test.images, Y: mnist.test.labels, keep_prob: 1.0})
print("\nAccuracy on test set:", test_accuracy)


Accuracy on test set: 0.917


############# Done ##############