# Training a simple CNN on MNIST

In this notebook we are going to train a simple neural network on the [MNIST dataset](http://yann.lecun.com/exdb/mnist/) which contains small images of handwritten numerical digits. MNIST is a very simple dataset and has become the 'hello world' of machine learning. 

We will be training a CNN to perform classification on these handwritten digits. Don't worry too much about how a convolutional network works, or the loss functions specific to classification - these will be covered in more detail in later classes. This excercise is to give you an overview of the process of training, and to get familiar with process of using a deep learning library like TensorFlow. 

If you want to get a more intuitive understand of how a CNN like this is connect together and performs classification, you can look at [this interactive visualisation of a CNN](https://blog.terencebroad.com/archive/convnetvis/vis.html) also trained to classify MNIST digits. 

This notebook is based on a notebook originally by [fchollet](https://twitter.com/fchollet) - the original creator of Keras.

### Installing TensorFlow / Keras

If you haven't installed TensorFlow and Keras, follow the instructions in the other notebook for this session. 

### Importing TensorFlow / Keras

In [1]:
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers

### Preparing the dataset

This code loads the MNIST dataset using the handy util functions in Keras, and also prepare the data for training. When training neural networks it is important the ensure that the size of the dataset is the same as the size of the input of the model, otherwise things will quickly break!

Here our images of digits are 28x28 pixels square, and are grayscale, which means they only have one colour channel. 

As our digits are in the range 0-9, that means we have 10 classes in total that we are trying to classify.

In [6]:
# Model / data parameters
num_classes = 10
input_shape = (28, 28, 1)

# Load the data and split it between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")


# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


### Build the model

This next section of code is where we build the neural network model, by defining the model his in the class `Keras.Sequential` we are defining the *order* in which one layer connects to another, which is how we tell TensorFlow which part of the network connects to the next.

This network combines convolutional and pooling layers (something we will cover in Week 3.1) it the first several layers of the network, with a dense layer (the Keras term for fully connected). 

The output of the network is vector of 10, the same as the number of classes we are classifying. Each one of these units represents a prediction of how likely the network predicts the input digit as being that class. We use the class with the highest confidence as the prediction from the model. 

In [7]:
model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)

model.summary()

2023-01-11 14:29:01.278997: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-01-11 14:29:01.279882: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


Metal device set to: Apple M1
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 flatten (Flatten)           (None, 1600)              0         
                                                                 
 dropout (Dropout)        

### Training the model

This block of code is where we run the training of the model. It is only a few lines of code, because most of what happens in training is handled behind the scenes by TensorFlow. 

There are two parameters we need to define, the `batch_size` and the number of `epochs`. The batch size defines how many data samples we process at once during training, this helps speed up training if we use a bigger batch size (but is dependent on the size of the memory of our computer). Using a higher batch size generally leads to better results training, as the weights are updated based on the loss of the whole batch, which leads to more stable training than if we were to update the weights after each single example. Training in batches is a form of *regularisation* - something that will come up again and again with different tricks for getting the best performance out of training. 

The number of `epochs` defines how iterations over the dataset we perform over training. The more epochs in training we perform, the longer training is going to take, but it often (but not always) can lead to better performance.

In function call `model.compile` we define the loss function and the optimiser used to update the weights.

In function call `model.fit` we actually perform the training of the model.

In [11]:
batch_size = 254
epochs = 10

# Here we are defining the loss function and the optimiser used for training.
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

#Here we call the function that performs training, this will train for the number of epochs we have defined.
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)

Epoch 1/10


2023-01-11 14:42:31.707437: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.




2023-01-11 14:42:36.721315: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x2cd8af340>

### Evaluating the model

Here we are going to evaluate the model. This is where we take our trained model and test it against the test dataset. This will give us an overall accuracy score used to assess the model. 

From here we would usually want to save a model, and then use it in another piece of code where we test it on new inputs. We won't be doing that today, but we will be looking at how we do this in later classes.

In [9]:
score = model.evaluate(x_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

Test loss: 0.025400575250387192
Test accuracy: 0.9919000267982483


### Next steps

Have a go at changing the batch size and number of epochs and see how that impacts the accuracy of the model, as well as the speed of training. For instance, you could try training a model with a low batch size for 1 epochs, and see how that differs to a training run with a high batch size. 

 You could also have a look at the [Keras documentation](https://keras.io/api/optimizers/) for the different optimisers available, and try changing the optimiser in the `model.compile` function to see how that impacts training performance. 

Finally, once you have done a few training runs here with different settings, head over to [https://playground.tensorflow.org/](https://playground.tensorflow.org/) and experiment with training different neural network models interactively on some toy data. 