<a href="https://colab.research.google.com/github/PhilipMottershead/CSM6420/blob/master/prac_6/Practical6_CNNs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Convolutional Neural Networks

Standard CNNs are comprised of three types of layers: convolutional layers, pooling layers and fully-connected layers.  When  these  layers  are  stacked, a CNN architecture has been formed. A simplified CNN architecture for MNIST image classification is illustrated in Figure 2.

<a title="Aphex34, CC BY-SA 4.0 &lt;https://creativecommons.org/licenses/by-sa/4.0&gt;, via Wikimedia Commons" href="https://commons.wikimedia.org/wiki/File:Typical_cnn.png"><img width="718" alt="Typical cnn" src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/63/Typical_cnn.png/512px-Typical_cnn.png"></a>

**Figure:** A common form of CNN architecture in which convolutional layers are stacked continuously before being passed through the pooling layer for subsampling, output of which are the features that will be fed to the fully connected (or dense) layers for final output.

It is important to note that simply understanding the overall architecture of a CNN architecture will not suffice. The creation and optimisation of these models can take quite some time, and can be quite confusing. We will now explore in detail the individual layers, detailing their hyperparameters and connectivities.

As we glide through the input, the scalar product is calculated for each value in that kernel (Figure 3). From this the network will learn kernels that 'fire' when they see a specific feature at a given spatial position of the input. These  are commonly known as **activations**.

<a title="Aphex34, CC BY-SA 4.0 &lt;https://creativecommons.org/licenses/by-sa/4.0&gt;, via Wikimedia Commons" href="https://d2l.ai/_images/correlation.svg"><img width="500" alt="Typical cnn" src="https://d2l.ai/_images/correlation.svg"></a>

**Figure 1:** Illustration of a signle step in convolutional operation. The shaded portions are the first output element as well as the input and kernel tensor elements used for the output computation:  0×0+1×1+3×2+4×3=19. 

Every kernel will have a corresponding activation/feature map, of which will be stacked along the depth dimension to form the full output volume from the convolutional layer.

These kernels are usually small in spatial dimensionality, but spreads along the entirety of the depth of the input. When the data hits a convolutional layer, the layer convolves each filter across the spatial dimensionality of the input to produce a 2D activation map.

One of the key differences compared to the MLP is that the neurons that the layers within the CNN are comprised of neurons organised into three dimensions, the spatial dimensionality of the input **(height, and width) and the depth**. The depth is the third dimension of an activation volume, that is the number of filters/kernels used. Unlike standard MLPs, the neurons within any given layer will only connect to a small region (receiptive field) of the layer preceding it.

We are also able to define the **stride** in which we set the depth around the spatial dimensionality of the input in order to place the receptive field. For example, if we were to set a stride as 1 then we would have a heavily overlapped receptive field producing extremely large activations. Alternatively, setting the stride to a greater number will reduce the amount of overlapping and produce an output of lower spatial dimensions.

**Zero-padding** is the simple process of padding the border of the input, and is an effective method to give further control as to the dimensionality of theoutput volumes. It is important to understand through the use of these tehcniques, we will in turn alter the spatial dimensionality of the convolutional layers' output. We can calculate this using the following method:

In [1]:
def calculate_conv_output(height, width, depth, kernel_size, zero_padding, stride):
    # Receptive field size = kernel size.
    
    volume_size = (height*width)*depth
    z = (zero_padding*zero_padding)
    
    return ((volume_size - kernel_size) + z) / stride + 1 

If the calculated result from this equation is not equal to a whole integer then the stride has been incorrectly set, as the neurons will be unable to fit neatly across the given input. 


See the slides for lecture10-CNNs for more information on CNN. 
Or, the standford course on CNNs https://cs231n.github.io/convolutional-networks/
Or go through the short tutorial for the basic components in a ConvNet
https://machinelearningmastery.com/crash-course-convolutional-neural-networks/


## Task One: MNIST Classification

Using the slides given last week, build a CNN to classify MNIST digits:

Last week we reduced the data dimensionality with PCA prior to appl a feedforward neural network. This time, we'll train a network on the complete image and use a CNN, a sparsely connected network. 


#### Just recall, in last practical, we learn how to build a simple fully connected neural network, aka Multilayer Perceptron (MLP) using dense layers

In [2]:
import tensorflow as tf
import tensorflow.keras as keras
from keras.models import Sequential
from keras.layers import Dense, Input
from keras.backend import clear_session

# Good Practice Klaxon: Free your memory from previously made models.
clear_session()

# Create a new blank model
model = Sequential()
# Add a hidden layer of shape 2 with an input of size 4 (, denotes that we can accept variable amounts of data)
model.add(Dense(2, input_shape=(4,)))
# And finally, add an output layer of shape 1
model.add(Dense(1))

# Print out a summary of the model
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 2)                 10        
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 3         
Total params: 13
Trainable params: 13
Non-trainable params: 0
_________________________________________________________________


Next, prepare the data

In [3]:
from keras.utils import to_categorical
from keras.datasets import mnist

# input image dimensions
width = 28
height = 28

num_classes = 10

# the data, split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Reshape for CNN
X_train = X_train.reshape(X_train.shape[0], height, width, 1)
X_test = X_test.reshape(X_test.shape[0], height, width, 1)
input_shape = (width, height, 1)


# Make it faster. 
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes)


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
X_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


Build your convolutional neural networks below (you can get some insiration from this [keras example](https://keras.io/examples/vision/mnist_convnet/).

In [4]:
from keras.models import Sequential
from keras.layers import Dense, Activation, Conv2D, MaxPool2D, Dropout, Flatten

model = Sequential()
model.add(Conv2D(16, kernel_size=(3, 3), activation='relu', input_shape=(28,28,1)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(0.25)) # Dropout 25% of the nodes of the previous layer during training
model.add(Flatten())     # Flatten, and add a fully connected layer
model.add(Dense(128, activation='relu')) 
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax')) # Last layer: 10 class nodes, with dropout
model.summary()


Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 16)        160       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 24, 24, 32)        4640      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 12, 12, 32)        0         
_________________________________________________________________
dropout (Dropout)            (None, 12, 12, 32)        0         
_________________________________________________________________
flatten (Flatten)            (None, 4608)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 128)               589952    
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)              

Note that we have about half a million parameters. With a strong optimizer like Adam, and a big dataset like MNIST, this shouldn't be a problem.

Also consider using GPU for accelerated computing if training is too slow using CPU only. 

In colab, you can easily add GPU to your runtime: just go to the top menu, click "Runtime"->"Change runtime type" -> "Accelerater hardware" is by default None, you can select "GPU" or "TPU" here.

You can also upload the notebook to Kaggle and run it there with GPU accelorated training. 

TensorFlow and Keras will automatically execute on GPU if a GPU is available, so there’s nothing more you need to do after you’ve selected the GPU runtime.

In [5]:
from keras.optimizers import Adam

optimizer = Adam()
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model, iterating on the data in batches of 32 samples
model.fit(X_train, y_train, epochs=15, batch_size=32, validation_split=1/6)

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<tensorflow.python.keras.callbacks.History at 0x7f156ee9ac90>

## Now evaluate the trained model.

In [6]:
score = model.evaluate(X_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

Test loss: 0.030247045680880547
Test accuracy: 0.9929999709129333


In [7]:
# Classification report using scikit-learn 
import numpy as np
from sklearn.metrics import classification_report, confusion_matrix
y_pred = model.predict(X_test)
print(y_pred) # y_pred is an 2-d array with 10 columns
y_predc = y_pred.argmax(axis=1) #get the class labels by choosing the class with the highest output
y_testc = y_test.argmax(axis=1)

print(classification_report(y_testc, y_predc))
print(confusion_matrix(y_true=y_testc, y_pred=y_predc))

[[1.55239633e-16 2.25809683e-14 2.63065475e-13 ... 1.00000000e+00
  3.07390883e-18 2.97734249e-13]
 [1.54880761e-10 3.83274568e-09 1.00000000e+00 ... 1.56422726e-14
  1.09838625e-14 1.07271438e-19]
 [1.18689486e-15 9.99999523e-01 9.27655782e-12 ... 5.23029826e-07
  5.03183745e-11 6.66612056e-13]
 ...
 [5.25038435e-17 1.72199809e-12 1.09858267e-15 ... 8.64090396e-12
  8.07054645e-10 2.42530418e-09]
 [7.83052551e-14 1.89444577e-17 1.00276498e-16 ... 2.03198058e-15
  1.22499216e-07 1.47309942e-12]
 [1.70203940e-09 1.36431986e-18 3.39773110e-12 ... 6.81060288e-21
  1.10602395e-10 3.26676203e-16]]
              precision    recall  f1-score   support

           0       0.99      1.00      0.99       980
           1       0.99      1.00      1.00      1135
           2       0.99      0.99      0.99      1032
           3       0.99      1.00      1.00      1010
           4       0.99      1.00      1.00       982
           5       0.99      0.99      0.99       892
           6       1.

Exercise:

Try out different network architecture and hyperparameter settings, and observe the effect on performance.

You can also try out the classic [LeNet architecture (LeuCun et al. 1998)](https://d2l.ai/chapter_convolutional-neural-networks/lenet.html#sec-lenet), given in the [deep learning textbook d2l.ai](https://d2l.ai/index.html), see below. 
 - 2 convolutional layers uses 5×5 kernel and a sigmoid activation function. The first convolutional layer has 6 output channels, while the second has 16. Each 2×2 AvgPooling operation (stride 2). The convolutional block emits an output with shape given by (batch size, number of channel, height, width).
 - 3 dense layers, with 120, 84, and 10 outputs, respectively. Because we are still performing classification, the 10-dimensional output layer corresponds to the number of possible output classes.

<a title="Aphex34, CC BY-SA 4.0 &lt;https://creativecommons.org/licenses/by-sa/4.0&gt;, via Wikimedia Commons" href="https://d2l.ai/_images/lenet-vert.svg"><img width="200" alt="Typical cnn" src="https://d2l.ai/_images/lenet-vert.svg"></a>

Model: "sequential_12"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_20 (Conv2D)           (None, 26, 26, 16)        160       
_________________________________________________________________
conv2d_21 (Conv2D)           (None, 24, 24, 32)        4640      
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 12, 12, 32)        0         
_________________________________________________________________
dropout_10 (Dropout)         (None, 12, 12, 32)        0         
_________________________________________________________________
flatten_10 (Flatten)         (None, 4608)              0         
_________________________________________________________________
dense_25 (Dense)             (None, 128)               589952    
_________________________________________________________________
dropout_11 (Dropout)         (None, 128)             

In [10]:
# Your code
from keras.models import Sequential
from keras.layers import Dense, Activation, Conv2D, MaxPool2D, Dropout, Flatten,AveragePooling2D
model = Sequential()
model.add(Conv2D(filters=6, kernel_size=(5, 5), activation='sigmoid', input_shape=(28,28,1)))
model.add(AveragePooling2D())
model.add(Conv2D(filters=16, kernel_size=(5, 5), activation='sigmoid'))
model.add(AveragePooling2D())
model.add(Flatten())
model.add(Dense(units=120, activation='relu'))
model.add(Dense(units=84, activation='relu'))
model.add(Dense(units=10, activation = 'softmax'))
optimizer = Adam()
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model, iterating on the data in batches of 32 samples
model.fit(X_train, y_train, epochs=15, batch_size=32, validation_split=1/6)

score = model.evaluate(X_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

# Classification report using scikit-learn 
import numpy as np
from sklearn.metrics import classification_report, confusion_matrix
y_pred = model.predict(X_test)
print(y_pred) # y_pred is an 2-d array with 10 columns
y_predc = y_pred.argmax(axis=1) #get the class labels by choosing the class with the highest output
y_testc = y_test.argmax(axis=1)

print(classification_report(y_testc, y_predc))
print(confusion_matrix(y_true=y_testc, y_pred=y_predc))

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Test loss: 0.04651264473795891
Test accuracy: 0.9864000082015991
[[9.2845722e-09 4.4788169e-07 2.4556124e-07 ... 9.9998999e-01
  7.0673867e-09 7.8926923e-06]
 [1.9222799e-09 3.7318124e-08 9.9999976e-01 ... 1.3201962e-11
  1.2271141e-09 1.2619092e-11]
 [6.1616809e-08 9.9988937e-01 1.6242942e-05 ... 6.8861977e-05
  3.1876725e-06 2.3318475e-07]
 ...
 [3.3895703e-15 4.6882750e-08 4.0785552e-11 ... 5.2646580e-08
  2.8320480e-08 1.2943839e-09]
 [7.4099349e-05 3.9847499e-09 5.7792393e-09 ... 1.4560706e-09
  1.3565169e-04 1.8715943e-07]
 [7.2865856e-09 8.3997783e-12 1.6826393e-09 ... 3.8767409e-16
  9.0798631e-09 9.6931341e-10]]
              precision    recall  f1-score   support

           0       0.99      1.00      0.99       980
           1       0.99      0.99      0.99      1135
           2       0.99      0.99    

## Task 2 (optional): Fashion MNIST Classification

Develop and evaluate a model with a more difficult MNIST dataset: [the Fashion MNIST dataset](https://github.com/zalandoresearch/fashion-mnist), to load the data from keras:

from keras.datasets import fashion_mnist

(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

In [18]:
# Your code here
from keras.datasets import fashion_mnist

(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

# Reshape for CNN
X_train = X_train.reshape(X_train.shape[0], height, width, 1)
X_test = X_test.reshape(X_test.shape[0], height, width, 1)
input_shape = (width, height, 1)


# Make it faster. 
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes)


from keras.models import Sequential
from keras.layers import Dense, Activation, Conv2D, MaxPool2D, Dropout, Flatten,AveragePooling2D,MaxPooling2D
model = Sequential()
model.add(Conv2D(16, kernel_size=(3, 3), activation='relu', input_shape=(28,28,1)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(0.25)) # Dropout 25% of the nodes of the previous layer during training
model.add(Flatten())     # Flatten, and add a fully connected layer
model.add(Dense(128, activation='relu')) 
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax')) # Last layer: 10 class nodes, with dropout
model.summary()

optimizer = Adam()
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model, iterating on the data in batches of 32 samples
model.fit(X_train, y_train, epochs=15, batch_size=32, validation_split=1/6)

score = model.evaluate(X_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

# Classification report using scikit-learn 
import numpy as np
from sklearn.metrics import classification_report, confusion_matrix
y_pred = model.predict(X_test)
print(y_pred) # y_pred is an 2-d array with 10 columns
y_predc = y_pred.argmax(axis=1) #get the class labels by choosing the class with the highest output
y_testc = y_test.argmax(axis=1)

print(classification_report(y_testc, y_predc))
print(confusion_matrix(y_true=y_testc, y_pred=y_predc))

X_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Model: "sequential_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_14 (Conv2D)           (None, 26, 26, 16)        160       
_________________________________________________________________
conv2d_15 (Conv2D)           (None, 24, 24, 32)        4640      
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 12, 12, 32)        0         
_________________________________________________________________
dropout_4 (Dropout)          (None, 12, 12, 32)        0         
_________________________________________________________________
flatten_7 (Flatten)          (None, 4608)              0         
_________________________________________________________________
dense_19 (Dense)             (None, 128)               589952    
_______________________________________________

In [31]:
# Your code
from keras.models import Sequential
from keras.layers import Dense, Activation, Conv2D, MaxPool2D, Dropout, Flatten,AveragePooling2D
model = Sequential()
model.add(Conv2D(16, kernel_size=(3, 3), activation='relu', input_shape=(28,28,1)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(0.25)) # Dropout 25% of the nodes of the previous layer during training
model.add(Flatten())     # Flatten, and add a fully connected layer
model.add(Dense(128, activation='relu')) 
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax')) # Last layer: 10 class nodes, with dropout
model.summary()

from keras.optimizers import Adam

optimizer = Adam()
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model, iterating on the data in batches of 32 samples
model.fit(X_train, y_train, epochs=15, batch_size=64, validation_split=1/6)

score = model.evaluate(X_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

# Classification report using scikit-learn 
import numpy as np
from sklearn.metrics import classification_report, confusion_matrix
y_pred = model.predict(X_test)
print(y_pred) # y_pred is an 2-d array with 10 columns
y_predc = y_pred.argmax(axis=1) #get the class labels by choosing the class with the highest output
y_testc = y_test.argmax(axis=1)

print(classification_report(y_testc, y_predc))
print(confusion_matrix(y_true=y_testc, y_pred=y_predc))

Model: "sequential_22"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_40 (Conv2D)           (None, 26, 26, 16)        160       
_________________________________________________________________
conv2d_41 (Conv2D)           (None, 24, 24, 32)        4640      
_________________________________________________________________
max_pooling2d_17 (MaxPooling (None, 12, 12, 32)        0         
_________________________________________________________________
dropout_26 (Dropout)         (None, 12, 12, 32)        0         
_________________________________________________________________
flatten_20 (Flatten)         (None, 4608)              0         
_________________________________________________________________
dense_51 (Dense)             (None, 128)               589952    
_________________________________________________________________
dropout_27 (Dropout)         (None, 128)             