# Building a NN

Welcome to building NNs for hand-written digit classification. In this workbook we'll take you through the main steps of building a decent categorisation algorithm for digits 0-9:


1.   Setting up the workbook for MAXIMUM SPEED
2.   Loading the required modules for the workbook to work
3.   Loading the data and analysing the data 
4.   Pre-processing the data - **honestly, this is the dull bit**
5.   Defining a model
6.   Training a model
7.   Testing the model

And then we're going to do it all over again for CNNs!

### 1. MAXIMUM SPEED

Google Colab generously gives you one GPU (graphics processing unit) to run computations on.
A GPU is much quicker than a CPU, in that it can perform many more FLOPs (floating point operations [read "calculations"]) per second.

To turn this feature on go to:
Edit > Notebook Settings > Change the hardware accelerator to GPU


In [0]:
# If this code runs and says "Found GPU ..." etc then congrats, you've turned the computation machine to full volume

import tensorflow as tf # Importing our first module (as below) but we need it 
                        # earlier to check whether we have the GPU running in the correct place!
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

### 2. Loading the required modules

Python relies on loading in other modules and libraries that have special in-built functions for tasks that we want to perform. 

Keras (which is a higher-level abstraction of tensorflow) is a super easy API which has pre-programmed a lot of the dependancies and code necessary to completely build and train a neural net.

Next to each module I've included a comment about wtf it does.

In [0]:

import numpy as np   # Loading in numpy (a module for vector / array calculations) anc calling it "np"
np.random.seed(123)  # As the models are initiated randomly, then we set the seed on this workbook so that it isn't random and we all get the same output.
import tensorflow as tf # Importing our first module (as below) but we need it 

from keras.models import Sequential, load_model # Keras holds all the tools, here we're accessing its "models" library 
                                                # and installing the "Sequential" model that allows us to stack all the layers 
                                                # in the CNN and it takes care of all the maths and setup. load_model allows us to
                                                # load previously saved models.


from keras.layers import Dense, Dropout, Activation, Flatten 
# Dense: the old fully-connected layer
# Dropout: randomly drops out some of the connections during each training phase (may be helpful in comp...)
# Activation: your choice of activation function to apply for non-linearities
# Flatten: transforms from high-dimension to a 1D vector

from keras.layers import Conv2D, MaxPooling2D  # Convolution2D: finds the 2D features; MaxPooling2D reduces the dimensionality of the features	
from keras.utils import np_utils # This allows us to manipulate the data a bit easier later

from keras.callbacks import ModelCheckpoint # Allows us to store versions of the model as it goes through its training, 
                                            # may be useful in competition time...


from keras.datasets import mnist	# The lovely people at Keras already have the data for us ready in a nice format, so might as well use it


from matplotlib import pyplot as plt  # Some plotting 

### 3. Loading the data and analysing the data 

It's important we have separate test and train sets for our data so that we can comparatively measure the performance between models on unseen data.

If we fed all the data to the NN then it may "overfit" ie. not account for generalisations of 7s. We'd have no way of seeing whether this was the case as we'd only be able to test it on seen data, which we know it performs well on.

By holding some data back, we can see when the performance of the model begins to deteriorate due to this overfitting effect.

In [0]:
# Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

print(X_train.shape) # Should be: (60000, 28, 28) ie. 60,000 instances of numbers in images that are 28x28 pixels.
print(y_train.shape) # Should be: (60000,) ie. 60,000 classifications ie. either 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
print(X_test.shape)  # We have 10,000 test images to try out with our model later
print(y_test.shape)

I mean, that's cool and all, but what the hell does this stuff actually look like?

In [0]:
print(y_train[0]) # Plotting the correct label of the first (yes, Python starts counting from zero) image, supposedly a 5
plt.imshow(X_train[0], cmap = 'Greys') # Plotting the image to see what it looks like (reasonably a 5), cmap specifies the colour scheme
# Note the 28x28 image size on the x and y axis

In [0]:
# Can you write some code to obtain two images of 4s? (May need 2 code cells here)
# It would be helpful to be able to plot a small list of labels from y_train to see where they turn up...

print(y_train[:10]) # Plotting the first 10 labels, and see there is a 4 at position 3 and 10 (which is index 2 and 9 in Python...)
plt.imshow(X_train[2], cmap = 'Greys')

In [0]:
plt.imshow(X_train[9], cmap= 'Greys')

### 4. Pre-processing the data for NNs

The image and label data aren't particularly in the format that the Keras model will want.

NNs simply take a vector of numbers as their input and transform them to a different vector as their output. These images are 28x28 pixels, each one with a value from 0 to 255 to represent how black/white that pixel is.
The outputs we've got here are 0,1,2,3,4,5,6,7,8,9, which aren't actually vectors that the machine can work with.

What the model wants is an input that's a 1D vector that's 784 pieces long (ie. 28*28), and an output that's a vector of 10 probabilities that sum to 1 (ie. the probability that it thinks it's a 0, a 1, a 2 etc.).


It'll then compare it's output probabilities to the true probabilities of the input. ie. a true 7 would be [0, 0, 0, 0, 0, 0, 0, 1, 0, 0] meaning it's 100% a 7 and it isn't anything else.

(The machine might put out something like [0, 0, 0, 0, 0, 0.2, 0, 0.8, 0, 0] meaning it thinks the image is most likely a 7 with 80% chance, but could also be a 5 with a 20% chance).

The difference between the output vector from the model and the true probability is measured by a loss function called categorical cross-entropy, which is some needless maths. Just bear in mind it gives us a measure of how much we need to update our model parameters to get closer to an optimal answer (hopefully).

In [0]:
# The image data isn't quite in the format that the Keras NN will want, so we want to change it:
print(X_train.shape)
X_train = X_train.reshape(60000, 784)
print(X_train.shape)

In [0]:
# Can you do it for the image test set?
X_test = X_test.reshape(X_test.shape[0], 784)

The pixel values in MNIST (as they are black and white) go from 0 to 255 - all the way from white, through grey, to black. 

If we use the pixel values as-is then their size and variation is so large that, when being multiplied through the weight-matrices, they could cause large errors in the output - which would cause large corrections in the backprop during training. Having these constantly large corrections may mean that the model may not converge.

On account of this we transform the pixel data to the range [0, 1] by dividing each pixel value by the max pixel value possible (255).

In [0]:
X_train = X_train.astype('float32')
X_train /= 255 

In [0]:
# Same for the test set again please :) 

X_test = X_test.astype('float32')
X_test /= 255

And now to convert the label data into the vectors we want

In [0]:
# Convert 1-dimensional class arrays to 10-dimensional class matrices
print(y_train.shape)
Y_train = np_utils.to_categorical(y_train, 10)
print(Y_train.shape)

In [0]:
# Again, do the same for the test set...
Y_test = np_utils.to_categorical(y_test, 10)

### 5. Defining a model

This is the fun bit, as we can stack the layers like legos and Keras will pretty much take care of all the inbetween layer stuff.

Here we call the Sequential API that allows us to stack feed-forward layers of varying sizes, with varying activation functions, without having to worry about a lot of the connections/ output to input size and luckily forgoing a lot of maths!

In [0]:
model = Sequential() # Gets us ready to build the model sequentially
model.add(Dense(500, input_dim=784, activation="relu")) # layer that takes our 784 length vector and crushes it to 392

# From here on in (inc the 500 above) these layer length vector choices are completely arbitrary, you can make them what you want.
# I've done them so there are approx the same number of params that we'll have in the CNN later.
model.add(Dense(300, activation="relu")) # Goes 500 to 300
model.add(Dense(200, activation="relu")) # 300 to 200
model.add(Dense(50, activation="relu")) # 200 to 50
model.add(Dense(10, activation="relu")) # layer that crushes a length 50 vector to a length 10 vector
model.add(Activation("softmax")) # converts our 10 length vector to a set of probabilities that sum to 1.

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) # Defines our loss function, optimiser and other metrics we want measuring.
# At this stage the model has been initiated, but with random weights - it has yet to be trained

print(model.summary()) # Allows us to have a high-level overview of the model!

You can use the block below to build your own model, we've started it for you by calling your model: "my_model" and calling Sequential(). 

Fill in the rest of your model, remembering to use my_model.add() as your model has a different name!

In [0]:
my_model = Sequential()

### 6. Training a model

Given the absolute headache of all that we've done previously, this bit's a doddle.
Keras takes care of all of the training using the model.fit() function that takes our training data and breaks it into batch sizes of 32.
It'll feed the 32 image vectors through, record the errors (categorical log-loss) and then make changes to the weights in the net accordingly. You can make this smaller, so that it updates errors more frequently - but then it becomes more computationally complex...

Once all 60,000 images have gone through then that's one "epoch" done. However it is likely that in seeing some of the later image vectors that the NN has undone its learning about some of the earlier images, so we feed the data through for multiple epochs (here 10) so that it tries to remember all the data.


You can see the model fit function here: https://keras.io/models/model/

In [0]:
checkpointer = ModelCheckpoint('simple_NN-{epoch:02d}.hdf5', verbose = 1) # This will (temporarily) save the model to our drive after each epoch, with the epoch number 
model.fit(X_train, Y_train, batch_size=32, epochs=10, verbose=1, callbacks=[checkpointer]) # Verbose = 1 shows the progress of the model training!

See whether you can fit your model2 for:
- 12 epochs
- batch size of 128 images
- Don't use callbacks, but investigate them here: https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint?version=stable

- Then save the final model

### 7. Testing a model

Again, Keras has us sorted and it's as simple as calling one simple function: evaluate.

Here we'll be looking at: 
- How well did our original model do after 9 epochs?
- How about after 10? Did we see any improvement?
- How about your model after 12 epochs?

Original model after 9 epochs: (we'll need to load model)

In [0]:
epoch9_model = load_model('simple_NN-09.hdf5')
epoch9_score = epoch9_model.evaluate(X_test, Y_test, verbose = 1)
print('Test loss: ', epoch9_score[0])
print('Test accuracy: ', epoch9_score[1])

Load and test 10 epoch model:

In [0]:
# Technically the model is already stored in memory as "model", so we don't really need
# to load it as something else, but it's good practice...

epoch10_model = load_model('simple_NN-10.hdf5')
epoch10_score = epoch10_model.evaluate(X_test, Y_test, verbose = 1)
print('Test loss: ', epoch10_score[0])
print('Test accuracy: ', epoch10_score[1])

Finally load and test your model:

# Building a CNN

The problem with having a feed-forward neural net for image classification is that the locality of the pixels is lost when compressing to a single vector.

For instance, the immediate pixels around pixel 31 are: 2, 3, 4, 30, 32, 58, 59, 60 - which are quite some distance apart when considering they're actually next to each other. This problem will be exacerbated in larger images.


MNIST is quite a simple dataset, so even the feed-forward net does relatively well (taking us up to ~97% test accuracy).
We will find that CNNs, by preserving the locality of the pixels, find much better features in the images to predict from, and can take us to the 99.7% level.

### 1. Reloading the data

We already loaded the necessary modules at the beginning of this document, such as MaxPooling and Conv2D, but we will need to reload our data sets given that we changed their shape in part 1:

In [0]:
# Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

print(X_train.shape) # Should be: (60000, 28, 28) ie. 60,000 instances of numbers in images that are 28x28 pixels.
print(y_train.shape) # Should be: (60000,) ie. 60,000 classifications ie. either 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
print(X_test.shape)  # We have 10,000 test images to try out with our model later
print(y_test.shape)

And once again, it has to be reshaped. The images are already in their 28 x 28 pixel form, but now we have to specify that we're only looking at one colour (greyscale).

Kera treats this as the number of "channels". So our greyscale images have one channel, but if we were to have them in full RGB colour, then Keras would have to look over 3 images of 28 x 28:
- One in red
- One in green
- One in blue

In [0]:
# The image data isn't quite in the format that the Keras CNN will want, so we want to change it:
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1) # reshape it into 60,000 instances of 28 height x 28 width x 1 channel images
input_shape = (28, 28, 1)  # Just useful here to store what will be the eventual input shape for the keras model.

# Can you do it for the image test set?
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)

And again, convert it to [0,1]...

In [0]:
X_train = X_train.astype('float32')
X_train /= 255 

# Same for the test set again please :) 

X_test = X_test.astype('float32')
X_test /= 255

And finally, rename our y labels from 0,1,2,3,4,5,6,7,8,9 to our vector representations:

In [0]:
# Convert 1-dimensional class arrays to 10-dimensional class matrices
print(y_train.shape)
Y_train = np_utils.to_categorical(y_train, 10)
print(Y_train.shape)


# Again, do the same for the test set...
Y_test = np_utils.to_categorical(y_test, 10)

### 2. Defining a model

Two key layers in building convolutional neural nets are:
1. Conv2D (convolutional layers)

2. MaxPooling2D (provides robustness by extracting the important features to a lower dimensional representation).


The convolutional layers (in the case below) sweep 32 grids (filters) of dimension 3x3 over the entirety of the image. 
As you move the box one pixel each time then each grid sweeps out it's own 26x26 pixel image. This image will look different for each filter, and each will highlight the feature in the image that it deems important.

Then, for all of these 32 images simultaneously, a different set of 3x3 filters attempts to extract their important info.
Thus we end up with 32 feature extractions of 24x24 pixels.

This is then down-sampled by max-pooling (it aggregates the data contained in these images by splitting the images into non-overlapping blocks and taking "the most important" number in that block, so 4 pixels may be given 1 value).

The down-sample is then flattened out into a long vector that represents the image in terms of the learned features.

A couple of dense layers then try to learn the interactions between the extracted features and the desired output - using a softmax activation to output 10 probabilities - one for each digit.

In [0]:
model = Sequential() # Start the legos

model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))) # Tell it to use 32 filters of size 3x3, then ReLu the output.
print(model.output_shape)

model.add(Conv2D(32, (3, 3), activation='relu')) # Do the same again!
print(model.output_shape)

model.add(MaxPooling2D(pool_size=(2,2))) # Down-sample in boxes of size 2x2 with no-overlap. Allows for some translation invariance in the image.
print(model.output_shape)

model.add(Flatten()) # Convert to one long vector for the computer to figure out what's important.
print(model.output_shape)

model.add(Dense(128, activation='relu')) # The thinking (fully-connected) layer.
print(model.output_shape)

model.add(Dense(10, activation='softmax')) # The dense layer that outputs the probabilities
print(model.output_shape)

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) # Defines our loss function, optimiser and other metrics we want measuring.
# At this stage the model has been initiated, but with random weights - it has yet to be trained

print(model.summary()) # Allows us to have a high-level overview of the model!

### 3. Training the model

Same as before! (Got to love Keras).

In [0]:
checkpointer = ModelCheckpoint('CNN-{epoch:02d}.hdf5', verbose = 1) # This will (temporarily) save the model to our drive after each epoch, with the epoch number 
model.fit(X_train, Y_train, batch_size=32, epochs=10, verbose=1, callbacks=[checkpointer]) # Verbose = 1 shows the progress of the model training!

### 4. Testing the model

As the model is stored as "model" in memory currently, then we can just call it directly for testing, without having to load it from Drive:

In [0]:
model_score = model.evaluate(X_test, Y_test, verbose = 1)
print('Test loss: ', model_score[0])
print('Test accuracy: ', model_score[1])

Hopefully, you'll see that with the same number of parameters, that our CNN is outperforming our NN!

Though note that this may not be the case, as the initialisation state of the weights in these models are random. 
So even if we trained exactly the same model again from scratch, it would produce a different result as it would have a different starting point.

Therefore we can still expect some variance in our test score.

This model, taken from https://keras.io/examples/mnist_cnn/ uses:

- Convolutional Layer (32 filters of size 3x3, with relu)
- Same again, but 64 filters!
- The MaxPooling layer as above
- A dropout layer with rate 0.25 (these dropout layers reduce overfitting in the model, so should help performance)
- Flatten
- Dense(128) layer
- Dropout with rate 0.5
- Output layer of size 10, with the softmax activation
- The AdaDelta optimiser, with the loss as before

In [0]:
# Start with sequential...
model2 = Sequential() 

model2.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=(28, 28, 1)))

model2.add(Conv2D(64, (3, 3), activation='relu'))

model2.add(MaxPooling2D(pool_size=(2, 2)))

model2.add(Dropout(0.25))

model2.add(Flatten())

model2.add(Dense(128, activation='relu'))

model2.add(Dropout(0.5))

model2.add(Dense(10, activation='softmax'))

# Compile it ...
model2.compile(loss='categorical_crossentropy',
              optimizer='adadelta',
              metrics=['accuracy'])

# Print summary...
print(model2.summary())

Train, but this time with 12 epochs and a batch size of 128:

In [0]:
model2.fit(X_train, Y_train, batch_size=128, epochs=12, verbose=1) # Verbose = 1 shows the progress of the model training!

And evaluate:

In [0]:
model2_score = model2.evaluate(X_test, Y_test, verbose = 1)
print('Test loss: ', model2_score[0])
print('Test accuracy: ', model2_score[1])

^ for me, at time of running, that's breaking the 99% accuracy marker

# Taking real images from my laptop

I get it, the above is pretty boring. It's just numbers. Can I make it do something from my phone camera though?

Step one: a single prediction on a data point we already have:

In [0]:
image = X_train[100].reshape(28, 28) # turn it back into the shape the plots can handle.
print(Y_train[100]) # so it's supposedly a 5...
plt.imshow(image, cmap = 'Greys')

Does the model predict it's a 5 though?

In [0]:
input_img = X_train[100].reshape(1,28,28,1)
result = model2.predict(input_img) # gives a vector of probabilities
print(np.sum(result)) # shows they sum to 1.

print(np.argmax(result)) # will pick the entry with the highest prob...

It's a 5!!! Impressive for such a squiggle!

Can I now get one from my laptop though?

In [0]:
# Writing a quick function to load an image, squish it to size, 
# turn it to greyscale and resize it to 28x28 it to what we want:

from PIL import Image # quite a nice library for this task
import PIL.ImageOps  # Used to turn the image into a black number

def file_to_input(string):
  img  = Image.open(string) # loads image from the file
  width, height = img.size # get the size of the image
  area = (width/4, height/4, 3*width/4, 3*height/4) 
  img = img.crop(area) # crops to the area, which happens to be the middle of the image

  img = img.resize((28, 28)) # resize the image to 28x28
  img = img.convert('L') # turn to greyscale
  img = PIL.ImageOps.invert(img)

  pic = np.array(img)
  pic = pic/255
  # np.place(pic, pic < 0.5, 0) # these could possibly used to sharpen the images...
  # np.place(pic, pic > 0.5, 1)

  return pic

In [0]:
img1 = file_to_input('5 v1.jpg')
print(img1.shape)
plt.imshow(img1, cmap='Greys')
img1_reshape = img1.reshape(1,28,28,1)
result = model2.predict(img1_reshape)
print(np.argmax(result))

Correct!

In [0]:
img2 = file_to_input('5 v2.jpg')
print(img2.shape)
plt.imshow(img2, cmap='Greys')
img2_reshape = img2.reshape(1,28,28,1)
result = model2.predict(img2_reshape)
print(np.argmax(result))

Correct!

In [0]:
img3 = file_to_input('5 v3.jpg')
print(img3.shape)
plt.imshow(img3, cmap='Greys')
img3_reshape = img3.reshape(1,28,28,1)
result = model2.predict(img3_reshape)
print(np.argmax(result))

Finally tricked it! Now just to close out all the images we opened...

In [0]:
plt.close('all')
Image.close('all')