# Up and Running with MNIST

## Project Outline
Your first project will be to build, train, and evaluate a model that identifies images of handwritten digits 0-9 from the MNIST dataset. To do this, we’ll explicitly define a deep neural network (i.e. a neural network with 2 or more hidden layers), including everything from its structure to its activation functions to its optimizer algorithm. Running on top of cuDNN and TensorRT-accelerated hardware, veterans of this dataset will see a noticeable improvement in training time. In outlining this model so concretely, it will be easier to tweak individual parameters or the structure of the network as a whole, allowing for you to explore how each parameter affects model performance. 

## The Data
MNIST contains 55,000 training images and 10,000 testing images. Each image is a 28x28, monochromatic image of a handwritten digit from 0 to 9 and look like this:

<img src="./images/seven.png">

To import the dataset into your python project, add the following lines:

In [None]:
import time
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
# one_hot = specifies that only one output node will be "hot" or 1, all others will be 0.
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

This partitions the data into a training set and a testing set - data to train on and data to evaluate our model’s performance on. A label complements each image, denoting what the digit is shown in the image. We will use this to compare our prediction for each image with its label and develop the network based on any discrepancies. 
We also need to define how many possible outputs we can have, called “classes.” In this case, since the only option for an answers is a number from 0 to 9, we have 10 classes:

In [None]:
num_classes = 10

The next thing we have to do is store the image-label pairs, like an (x,y) data point, where the ‘x’ is a one-dimensional array of size 784 (28^2 = 784) containing the pixel values of the image and ‘y’ is the label for that image:

In [None]:
x = tf.placeholder('float', [None, 784])  # Data
y = tf.placeholder('float')  # Labels of that data

Then, we need to define how many images to send through the network per each training iteration - our “batch size”:

In [None]:
batch_size = 100

This controls how many images will pass through the network before propagating backwards and updating weights. Thus, smaller batch sizes demand longer training times but yield more accuracy.

## Defining the Network
Now to build out the structure of the network, which will look something like this:

<img src="./images/network1.png">

We will be making a Feed Forward neural network because data will pass through from input to output. Our network, however, will have 784 input neurons, 10 output neurons, and much more than 9 neurons in each hidden layer. Here’s how the data will flow:

<img src="./images/network2.png">

The value at each neuron will follow this formula: 

<img src="./images/formula.png"> 

Where
	W = the weight coming into that neuron
	x = the input data value (from the previous layer)
	b = the bias (helps if all the data is 0, in which no neurons would otherwise fire)
The weights, multiplied by the input data, are summed together. This value is then fed into an “activation function,” but we’ll get to that later. Let’s start by specifying how many neurons (nodes) will be in each hidden layer:

In [None]:
num_nodes_hl1 = 500
num_nodes_hl2 = 500
num_nodes_hl3 = 500

Now let’s define each layer. We’ll start by creating a method to hold our model and within it we’ll outline the connections between each layer:

In [None]:
def nn_model(data):  # The network model
    hidden_layer_1 = {'weights': tf.Variable(tf.random_normal([784, num_nodes_hl1])), 'biases': tf.Variable(tf.random_normal([num_nodes_hl1]))}

    hidden_layer_2 = {'weights': tf.Variable(tf.random_normal([num_nodes_hl1, num_nodes_hl2])),'biases': tf.Variable(tf.random_normal([num_nodes_hl2]))}

    hidden_layer_3 = {'weights': tf.Variable(tf.random_normal([num_nodes_hl2, num_nodes_hl3])),'biases': tf.Variable(tf.random_normal([num_nodes_hl3]))}

    output_layer = {'weights': tf.Variable(tf.random_normal([num_nodes_hl3, num_classes])),'biases': tf.Variable(tf.random_normal([num_classes]))}
    
    # Wx + b
    layer_1 = tf.add(tf.matmul(data, hidden_layer_1['weights']), hidden_layer_1['biases'])
    layer_1 = tf.nn.relu(layer_1)

    layer_2 = tf.add(tf.matmul(layer_1, hidden_layer_2['weights']), hidden_layer_2['biases'])
    layer_2 = tf.nn.relu(layer_2)

    layer_3 = tf.add(tf.matmul(layer_2, hidden_layer_3['weights']), hidden_layer_3['biases'])
    layer_3 = tf.nn.relu(layer_3)

    output = tf.add(tf.matmul(layer_3, output_layer['weights']), output_layer['biases'])

    return output

Just like the neurons in the human brain, each node requires an “activation function” to determine whether the neuron should be activated. More broadly, it imparts nonlinearity onto the model, which crucially allows the network to do more complex tasks than a linear regression model. A good general activation function is ReLU (Rectified Linear Unit), which we use here. 

## Training and Testing
Now we need to train our model to classify the digits. In order to improve our network, we need to define a metric by which we can judge our performance per iteration - a “cost function.” To do this, we’re going to use the softmax function, which calculates the probability error in discrete, mutually exclusive classification tasks. Basically, for each of our classes, it will output a value from 0 to 1 denoting the probability of each digit being the one depicted in the input image, with the sum of the probabilities equalling 1. We’re then going to extract the value of our cost by getting the mean of the Tensor returned by the softmax function.

Now that we have a cost function defined, we need to use it to improve our model. To do so, we’ll use an “optimizer” to minimize our cost function by updating the weights between layers during backpropagation. I’ve chosen to use the Adam optimization algorithm, but others can be used here as well (Ex. Adagrad, gradient descent, RMSProp). 

When training the network, we can specify how many “epochs” we want it to train for. The more epochs, in theory the more accurate your model, though your improvements do start to diminish. One epoch is one feed forward pass through the network followed by a backpropagation cycle. 

To train the network, we have to start a TensorFlow session, iterate through each batch, update our cost function, optimize the weights based on that iteration, and then do this for however many epochs as were defined above. To show how your improvement diminishes as your epoch count increases, a percent improvement counter displays how much better the current epoch did compared to its predecessor:

With all of this in mind, within a new method add:

In [None]:
def run_nn(x):    
    prediction = nn_model(x)
    costVal = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y))
    optimizer = tf.train.AdamOptimizer().minimize(costVal)
    
    num_epochs = 10
        
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        percent_improvement = 0
        for epoch in range(num_epochs):
            current_loss = 0
            num_examples = mnist.train.num_examples
            # Num_examples / batch size = how many times to cycle
            for _ in range(int(num_examples / batch_size)):
                # Partitions out the dataset to use each time
                epoch_x, epoch_y = mnist.train.next_batch(batch_size)
                _, cost = sess.run([optimizer, costVal], feed_dict={x: epoch_x, y: epoch_y})
                current_loss += cost  # Sum up the cost in this epoch
            if epoch != 0:
                percent_improvement = (1 - (current_loss / previous_loss)) * 100
            previous_loss = current_loss
            print('Epoch', epoch + 1, '/', num_epochs, ': Loss:', current_loss, "Percent Improvement:", percent_improvement)
        
                # How close are we to the actual answer?
        correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
        # Cast it to a Tensor of numbers
        accuracy = tf.reduce_mean(tf.cast(correct, 'float'))
        # eval() gets the value of the accuracy Tensor
        print('Accuracy:', accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))

All that needs to be done now is to run our model:

In [None]:
start = time.time()  # Record start time
run_nn(x)  # Then train and test the model
# And print the elapsed time in seconds
print("Total time: " + "%.2f" % (time.time() - start))

Upon running this notebook, you should see an accuracy score for your model, as well as the time it took to train and run it. In the future, especially for larger models, a trained model can be saved using tf.saved_model.simple_save and restored with tf.saved_model.loader.load, thereby massively decreasing runtime as evaluation is almost always faster than training.