## A Neural Network 

Now we’ll create a simple three layer neural network in TensorFlow.  In future articles, we’ll show how to build more complicated neural network structures such as convolution neural networks and recurrent neural networks.  For this example though, we’ll keep it simple.  If you need to scrub up on your neural network basics, check out my popular tutorial on the subject.  In this example, we’ll be using the MNIST dataset (and its associated loader) that the TensorFlow package provides.  This MNIST dataset is a set of 28×28 pixel grayscale images which represent hand-written digits.  It has 55,000 training rows, 10,000 testing rows and 5,000 validation rows.

We can load the data by running:

In [12]:
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


The one_hot=True argument specifies that instead of the labels associated with each image being the digit itself i.e. “4”, it is a vector with “one hot” node and all the other nodes being zero i.e. [0, 0, 0, 0, 1, 0, 0, 0, 0, 0].  This lets us easily feed it into the output layer of our neural network.

## Setting things up

Next, we can set-up the placeholder variables for the training data (and some training parameters):


In [13]:
# Python optimisation variables
learning_rate = 0.5
epochs = 10
batch_size = 100

# declare the training data placeholders
# input x - for 28 x 28 pixels = 784
x = tf.placeholder(tf.float32, [None, 784])
# now declare the output data placeholder - 10 digits
y = tf.placeholder(tf.float32, [None, 10])

Notice the x input layer is 784 nodes corresponding to the 28 x 28 (=784) pixels, and the y output layer is 10 nodes corresponding to the 10 possible digits.  Again, the size of x is (? x 784), where the ? stands for an as yet unspecified number of samples to be input – this is the function of the placeholder variable.

Now we need to setup the weight and bias variables for the three layer neural network.  There are always L-1 number of weights/bias tensors, where L is the number of layers.  So in this case, we need to setup two tensors for each:

In [14]:
# now declare the weights connecting the input to the hidden layer
W1 = tf.Variable(tf.random_normal([784, 300], stddev=0.03), name='W1')
b1 = tf.Variable(tf.random_normal([300]), name='b1')
# and the weights connecting the hidden layer to the output layer
W2 = tf.Variable(tf.random_normal([300, 10], stddev=0.03), name='W2')
b2 = tf.Variable(tf.random_normal([10]), name='b2')

Ok, so let’s unpack the above code a little.  First, we declare some variables for W1 and b1, the weights and bias for the connections between the input and hidden layer.  This neural network will have 300 nodes in the hidden layer, so the size of the weight tensor W1 is [784, 300].  We initialise the values of the weights using a random normal distribution with a mean of zero and a standard deviation of 0.03.  TensorFlow has a replicated version of the numpy random normal function, which allows you to create a matrix of a given size populated with random samples drawn from a given distribution.  Likewise, we create W2 and b2 variables to connect the hidden layer to the output layer of the neural network.

Next, we have to setup node inputs and activation functions of the hidden layer nodes:

In [15]:
# calculate the output of the hidden layer
hidden_out = tf.add(tf.matmul(x, W1), b1)
hidden_out = tf.nn.relu(hidden_out)

In the first line, we execute the standard matrix multiplication of the weights (W1) by the input vector x and we add the bias b1.  The matrix multiplication is executed using the tf.matmul operation.  Next, we finalise the hidden_out operation by applying a rectified linear unit activation function to the matrix multiplication plus bias.  Note that TensorFlow has a rectified linear unit activation already setup for us, tf.nn.relu.

This is to execute the following equations, as detailed in the neural networks tutorial:

\begin{align} 
z(l+1)h(l+1)=W(l)x+b(l)=f(z(l+1))\\
z(l+1)=W(l)x+b(l)h(l+1)=f(z(l+1))\\
\end{align}

Now, let’s setup the output layer, y_:
    

In [16]:
# now calculate the hidden layer output - in this case, let's use a softmax activated
# output layer
y_ = tf.nn.softmax(tf.add(tf.matmul(hidden_out, W2), b2))

Again we perform the weight multiplication with the output from the hidden layer (hidden_out) and add the bias, b2.  In this case, we are going to use a softmax activation for the output layer – we can use the included TensorFlow softmax function tf.nn.softmax.

We also have to include a cost or loss function for the optimisation / backpropagation to work on. Here we’ll use the cross entropy cost function, represented by:
\begin{align}\
J=−1m∑i=1m∑j=1ny(i)jlog(yj_(i))+(1–y(i)j)log(1–yj_(i))
\end{align}\
Where y(i)jyj(i) is the ith training label for output node j, yj_(i)yj_(i) is the ith predicted label for output node j, m is the number of training / batch samples and n is the number .  There are two operations occurring in the above equation.  The first is the summation of the logarithmic products and additions across all the output nodes.  The second is taking a mean of this summation across all the training samples.  We can implement this cross entropy cost function in TensorFlow with the following code:

In [17]:
y_clipped = tf.clip_by_value(y_, 1e-10, 0.9999999)
cross_entropy = -tf.reduce_mean(tf.reduce_sum(y * tf.log(y_clipped)
                         + (1 - y) * tf.log(1 - y_clipped), axis=1))

Some explanation is required.  The first line is an operation converting the output y_ to a clipped version, limited between 1e-10 to 0.999999.  This is to make sure that we never get a case were we have a log(0) operation occurring during training – this would return NaN and break the training process.  The second line is the cross entropy calculation.

To perform this calculation, first we use TensorFlow’s tf.reduce_sum function – this function basically takes the sum of a given axis of the tensor you supply.  In this case, the tensor that is supplied is the element-wise cross-entropy calculation for a single node and training sample i.e.: y(i)jlog(yj_(i))+(1–y(i)j)log(1–yj_(i))yj(i)log(yj_(i))+(1–yj(i))log(1–yj_(i)).  Remember that y and y_clipped in the above calculation are (m x 10) tensors – therefore we need to perform the first sum over the second axis.  This is specified using the axis=1 argument, where “1” actually refers to the second axis when we have a zero-based indices system like Python.

After this operation, we have an (m x 1) tensor.  To take the mean of this tensor and complete our cross entropy cost calculation (i.e. execute this part 1m∑mi=11m∑i=1m), we use TensorFlow’s tf.reduce_mean function.  This function simply takes the mean of whatever tensor you provide it.  So now we have a cost function that we can use in the training process.

Let’s setup the optimiser in TensorFlow:

In [18]:
# add an optimiser
optimiser = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cross_entropy)

Here we are just using the gradient descent optimiser provided by TensorFlow.  We initialize it with a learning rate, then specify what we want it to do – i.e. minimise the cross entropy cost operation we created.  This function will then perform the gradient descent (for more details on gradient descent see here and here) and the backpropagation for you.  How easy is that?  TensorFlow has a library of popular neural network training optimisers, see here.

Finally, before we move on to the main show, were we actually run the operations, let’s setup the variable initialisation operation and an operation to measure the accuracy of our predictions:

In [19]:
# finally setup the initialisation operator
init_op = tf.global_variables_initializer()

# define an accuracy assessment operation
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

The correct prediction operation correct_prediction makes use of the TensorFlow tf.equal function which returns True or False depending on whether to arguments supplied to it are equal.  The tf.argmax function is the same as the numpy argmax function, which returns the index of the maximum value in a vector / tensor.  Therefore, the correct_prediction operation returns a tensor of size (m x 1) of True and False values designating whether the neural network has correctly predicted the digit.  We then want to calculate the mean accuracy from this tensor – first we have to cast the type of the correct_prediction operation from a Boolean to a TensorFlow float in order to perform the reduce_mean operation.  Once we’ve done that, we now have an accuracy operation ready to assess the performance of our neural network.

## Setting up the training
We now have everything we need to setup the training process of our neural network.  I’m going to show the full code below, then talk through it:



In [20]:
# start the session
with tf.Session() as sess:
   # initialise the variables
   sess.run(init_op)
   total_batch = int(len(mnist.train.labels) / batch_size)
   for epoch in range(epochs):
        avg_cost = 0
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size=batch_size)
            _, c = sess.run([optimiser, cross_entropy], 
                         feed_dict={x: batch_x, y: batch_y})
            avg_cost += c / total_batch
        print("Epoch:", (epoch + 1), "cost =", "{:.3f}".format(avg_cost))
   print(sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels}))

('Epoch:', 1, 'cost =', '0.566')
('Epoch:', 2, 'cost =', '0.220')
('Epoch:', 3, 'cost =', '0.156')
('Epoch:', 4, 'cost =', '0.126')
('Epoch:', 5, 'cost =', '0.099')
('Epoch:', 6, 'cost =', '0.082')
('Epoch:', 7, 'cost =', '0.066')
('Epoch:', 8, 'cost =', '0.054')
('Epoch:', 9, 'cost =', '0.045')
('Epoch:', 10, 'cost =', '0.038')
0.9758


Stepping through the lines above, the first couple relate to setting up the with statement and running the initialisation operation.  The third line relates to our mini-batch training scheme that we are going to run for this neural network.  If you want to know about mini-batch gradient descent, check out this post.  In the third line, we are calculating the number of batches to run through in each training epoch.  After that, we loop through each training epoch and initialise an avg_cost variable to keep track of the average cross entropy cost for each epoch.  The next line is where we extract a randomised batch of samples, batch_x and batch_y, from the MNIST training dataset.  The TensorFlow provided MNIST dataset has a handy utility function, next_batch, that makes it easy to extract batches of data for training.

The following line is where we run two operations.  Notice that sess.run is capable of taking a list of operations to run as its first argument.  In this case, supplying [optimiser, cross_entropy] as the list means that both these operations will be performed.  As such, we get two outputs, which we have assigned to the variables _ and c.  We don’t really care too much about the output from the optimiser operation but we want to know the output from the cross_entropy operation – which we have assigned to the variable c.  Note, we run the optimiser (and cross_entropy) operation on the batch samples.  In the following line, we use c to calculate the average cost for the epoch.

Finally, we print out our progress in the average cost, and after the training is complete, we run the accuracy operation to print out the accuracy of our trained network on the test set.  Running this program produces the following output:

There we go – approximately 98% accuracy on the test set, not bad.  We could do a number of things to improve the model, such as regularisation (see this tips and tricks post), but here we are just interested in exploring TensorFlow.  You can also use TensorBoard visualisation to look at things like the increase in accuracy over the epochs:

In [None]:
![TensorBoard-increase-in-accuracy-NN]
