# Introduction to Artificial Neural Networks

In [0]:
# To support both python 2 and python 3
from __future__ import division, print_function, unicode_literals

# Common imports
import numpy as np
import os

# to make this notebook's output stable across runs
def reset_graph(seed=42):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

# To plot pretty figures
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12


## CNN for MNIST

In this section we will build a CNN with TensorFlow, and we will implement Minibatch Gradient Descent to train it on the MNIST dataset. The first step is the construction phase, building the TensorFlow graph. The second step is the execution phase, where you actually run the graph to train the model.

First we need to import the tensorflow library. Then we must specify the number of inputs and outputs, and set the number of hidden neurons in each layer:

In [0]:
%tensorflow_version 1.x

In [0]:
import tensorflow as tf

In [4]:
tf.__version__

'1.15.0'

In [0]:
n_conv1 = 64
n_hidden1 = 512
n_outputs = 10

In [0]:
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
X_train = np.expand_dims(X_train,-1).astype(np.float32) / 255.0
X_test = np.expand_dims(X_test,-1).astype(np.float32) / 255.0
y_train = y_train.astype(np.int32)
y_test = y_test.astype(np.int32)
X_valid, X_train = X_train[:5000], X_train[5000:]
y_valid, y_train = y_train[:5000], y_train[5000:]

We use placeholder nodes to represent the training data and targets. The shape of X is only partially defined. We know that it will be a 4D tensor (number_instance, height, width, num_colour_channel), but we don’t know yet how many instances each training batch will contain. So the shape of X is (None, 28, 28, 1). Similarly, we know that y will be a 1D tensor with one entry per instance, but again we don’t know the size of the training batch at this point, so the shape is (None).

In [0]:
reset_graph()

X = tf.placeholder(tf.float32, shape=(None, 28, 28, 1), name="X")
y = tf.placeholder(tf.int32, shape=(None), name="y") 

Now you need to create the one convolutional layer, one pooling layer, one flatten layer, one fully-connected layer, and finally one output layer (which is also fully connected).

In [8]:
with tf.name_scope("dnn"):
    conv1 = tf.layers.conv2d(X, 
                             n_conv1, 
                             kernel_size = (7,7), 
                             strides=(3,3), 
                             name="conv1", 
                             activation=tf.nn.relu)
    
    pool1 = tf.layers.max_pooling2d(conv1, 
                                    pool_size=(4,4), 
                                    strides=(4,4))
    
    flatten = tf.layers.flatten(pool1)

    fc1 = tf.layers.dense(flatten, 
                          n_hidden1, 
                          name="fc1",
                          activation=tf.nn.relu)
    
    logits = tf.layers.dense(fc1, 
                             n_outputs, 
                             name="outputs")

Instructions for updating:
Use `tf.keras.layers.Conv2D` instead.
Instructions for updating:
Please use `layer.__call__` method instead.
Instructions for updating:
Use keras.layers.MaxPooling2D instead.
Instructions for updating:
Use keras.layers.flatten instead.
Instructions for updating:
Use keras.layers.Dense instead.


Now that we have the neural network model ready to go, we need to define the cost function that we will use to train it. We will use cross entropy, cross entropy will penalize models that estimate a low probability for the target class.

In [0]:
with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")

We have the neural network model, we have the cost function, and now we need to define a GradientDescentOptimizer that will tweak the model parameters to minimize the cost function.

In [0]:
learning_rate = 0.01

with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

The last important step in the construction phase is to specify how to evaluate the model. We will simply use accuracy as our performance measure. First, for each instance, determine if the neural network’s prediction is correct by checking whether or not the highest logit corresponds to the target class. For this you can use the in_top_k() function. This returns a 1D tensor full of boolean values, so we need to cast these booleans to floats and then compute the average. This will give us the network’s overall accuracy.

In [0]:
with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

And, as usual, we need to create a node to initialize all variables, and we will also create a Saver to save our trained model parameters to disk:

In [0]:
init = tf.global_variables_initializer()
saver = tf.train.Saver()

Now we define the number of epochs that we want to run, as well as the size of the mini-batches:

In [0]:
n_epochs = 20
batch_size = 50

And now we can train the model:

In [14]:
def shuffle_batch(X, y, batch_size):
    rnd_idx = np.random.permutation(len(X))
    n_batches = len(X) // batch_size
    for batch_idx in np.array_split(rnd_idx, n_batches):
        X_batch, y_batch = X[batch_idx], y[batch_idx]
        yield X_batch, y_batch
        
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        acc_batch = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
        acc_valid = accuracy.eval(feed_dict={X: X_valid, y: y_valid})
        print(epoch, "Batch accuracy:", acc_batch, "Validation accuracy:", acc_valid)

    save_path = saver.save(sess, "./my_model_final.ckpt")

0 Batch accuracy: 0.92 Validation accuracy: 0.8896
1 Batch accuracy: 0.94 Validation accuracy: 0.9272
2 Batch accuracy: 0.96 Validation accuracy: 0.9412
3 Batch accuracy: 0.92 Validation accuracy: 0.9486
4 Batch accuracy: 0.96 Validation accuracy: 0.9562
5 Batch accuracy: 0.98 Validation accuracy: 0.9588
6 Batch accuracy: 1.0 Validation accuracy: 0.9646
7 Batch accuracy: 1.0 Validation accuracy: 0.967
8 Batch accuracy: 0.98 Validation accuracy: 0.9684
9 Batch accuracy: 0.94 Validation accuracy: 0.9678
10 Batch accuracy: 0.94 Validation accuracy: 0.971
11 Batch accuracy: 0.98 Validation accuracy: 0.9734
12 Batch accuracy: 0.96 Validation accuracy: 0.973
13 Batch accuracy: 0.96 Validation accuracy: 0.971
14 Batch accuracy: 0.98 Validation accuracy: 0.9732
15 Batch accuracy: 0.94 Validation accuracy: 0.9744
16 Batch accuracy: 1.0 Validation accuracy: 0.9766
17 Batch accuracy: 1.0 Validation accuracy: 0.9748
18 Batch accuracy: 1.0 Validation accuracy: 0.9778
19 Batch accuracy: 0.96 Validat

This code opens a TensorFlow session, and it runs the init node that initializes all the variables. Then it runs the main training loop: at each epoch, the code iterates through a number of mini-batches that corresponds to the training set size. Each mini-batch is fetched, and then the code simply runs the training operation, feeding it the current mini-batch input data and targets. Next, at the end of each epoch, the code evaluates the model on the last mini-batch and on the full training set, and it prints out the result. Finally, the model parameters are saved to disk.

Now that the neural network is trained, you can use it to make predictions. To do that, you can reuse the same construction phase, but change the execution phase like this:

In [15]:
with tf.Session() as sess:
    saver.restore(sess, "./my_model_final.ckpt") # or better, use save_path
    X_new_scaled = X_test[:20]
    Z = logits.eval(feed_dict={X: X_new_scaled})
    y_pred = np.argmax(Z, axis=1)
    
print("Predicted classes:", y_pred)
print("Actual classes:   ", y_test[:20])

INFO:tensorflow:Restoring parameters from ./my_model_final.ckpt
Predicted classes: [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4]
Actual classes:    [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4]


## Exercises

**Task 1:** Train your own CNN on the MNIST dataset and see if you can get over 98% precision.

Hints: Use deeper CNNs; Use more advanced optimiser rather than SGD (e.g., Adam); Use dropout