# Convolutional Neural Networks

Standard CNNs are comprised of three types of layers: convolutional layers, pooling layers (for subsampling) and fully-connected layers.  When  these  layers  are  stacked, a CNN architecture has been formed. A simplified CNN architecture for MNIST image classification is illustrated in Figure 2.

<a title="Aphex34, CC BY-SA 4.0 &lt;https://creativecommons.org/licenses/by-sa/4.0&gt;, via Wikimedia Commons" href="https://commons.wikimedia.org/wiki/File:Typical_cnn.png"><img width="718" alt="Typical cnn" src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/63/Typical_cnn.png/512px-Typical_cnn.png"></a>

**Figure 1:** A common form of CNN architecture in which convolutional layers are stacked continuously before being passed through the pooling (subsampling) layer for subsampling, output of which are the features that will be fed to the fully connected (or dense) layers for final output. Source: Wikimedia


It is important to note that simply understanding the overall architecture of a CNN architecture will not suffice. The creation and optimisation of these models can take quite some time, and can be quite confusing. We will now explore in detail the individual layers, detailing their hyperparameters and connectivities.

## Convolutional operation

### filters (i.e. kernels) and feature maps (i.e. activations)
As we glide through the input, the scalar product is calculated for each value in that filter, or kernel (Figure 2). From this the network will learn kernels that 'fire' when they see a specific feature at a given spatial position of the input. These are commonly known as **activations**.

<a title="Aphex34, CC BY-SA 4.0 &lt;https://creativecommons.org/licenses/by-sa/4.0&gt;, via Wikimedia Commons" href="https://d2l.ai/_images/correlation.svg"><img width="500" alt="Typical cnn" src="https://d2l.ai/_images/correlation.svg"></a>

**Figure 2:** Illustration of a signle step in convolutional operation. The shaded portions are the first output element as well as the input and kernel tensor elements used for the output computation:  0×0+1×1+3×2+4×3=19.

Every kernel will have a corresponding activation/feature map, of which will be stacked along the depth dimension to form the full output volume from the convolutional layer.

These kernels are usually small in spatial dimensionality, but spreads along the entirety of the depth of the input. When the data hits a convolutional layer, the layer convolves each filter across the spatial dimensionality of the input to produce a 2D activation map.

One of the key differences compared to the MLP is that the neurons that the layers within the CNN are comprised of neurons organised into three dimensions, the spatial dimensionality of the input **(height, width, and the depth)**. The depth, or channels, is the third dimension of an activation volume, that is the number of filters/kernels/feature-maps used. Unlike standard MLPs, the neurons within any given layer will only connect to a small region (receiptive field) of the layer preceding it.

### stride and padding
We are also able to define the **stride** in which we set the depth around the spatial dimensionality of the input in order to place the receptive field. For example, if we were to set a stride as 1 then we would have a heavily overlapped receptive field producing extremely large activations. Alternatively, setting the stride to a greater number will reduce the amount of overlapping and produce an output of lower spatial dimensions.

**Zero-padding** is the simple process of padding the border of the input, and is an effective method to give further control as to the dimensionality of the output volumes. It is important to understand through the use of these tehcniques, we will in turn alter the spatial dimensionality of the convolutional layers' output. We can calculate this using the following method:


In [None]:
def calculate_conv_output(height, width, depth, kernel_size, zero_padding, stride):
    # Receptive field size = kernel size.

    volume_size = (height*width)*depth
    z = (zero_padding*zero_padding)

    return ((volume_size - kernel_size) + z) / stride + 1

If the calculated result from this equation is not equal to a whole integer then the stride has been incorrectly set, as the neurons will be unable to fit neatly across the given input.


See the slides for lecture10-CNNs for more information on CNN.
Or, the standford course on CNNs https://cs231n.github.io/convolutional-networks/
Or go through the short tutorial for the basic components in a ConvNet
https://machinelearningmastery.com/crash-course-convolutional-neural-networks/


## Task One: MNIST Classification

Using the slides given last week, build a CNN to classify MNIST digits:

Last week we reduced the data dimensionality with PCA prior to appl a feedforward neural network. This time, we'll train a network on the complete image and use a CNN, a sparsely connected network.


#### Just recall, in last practical, we learn how to build a simple fully connected neural network, aka Multilayer Perceptron (MLP) using dense layers

In [None]:
import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense, Input
from keras.backend import clear_session

# Good Practice Klaxon: Free your memory from previously made models.
clear_session()

# Create a new blank model
model = Sequential()
# Set input of size (4,) denotes that we can accept variable amounts of data)
model.add(Input((4,)))
model.add(Dense(2, activation="relu"))
# And finally, add an output layer of shape 1
model.add(Dense(10, activation="softmax"))

# Print out a summary of the model
model.summary()

Next, prepare the data. Notice the difference in the shape of the input data (due to the choice of different model architectures, this time a CNN in contrast to MLP used in last practical).

In [None]:
from keras.utils import to_categorical
from keras.datasets import mnist

# input image dimensions
width = 28
height = 28

num_classes = 10

# the data, split between train and test sets
(X_dev, y_dev), (X_test, y_test) = mnist.load_data()

# Reshape for CNN: (nOfSamples, height, width, nOfchannels)
X_dev = X_dev.reshape(X_dev.shape[0], height, width, 1)
X_test = X_test.reshape(X_test.shape[0], height, width, 1)
input_shape = (width, height, 1)


# Make it faster.
X_dev = X_dev.astype('float32')
X_test = X_test.astype('float32')
X_dev /= 255
X_test /= 255
print('X_dev shape:', X_dev.shape)
print(X_dev.shape[0], 'development samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_dev = to_categorical(y_dev, num_classes)
y_test = to_categorical(y_test, num_classes)


Next, to follow the usual procedure for ML model development we need to set aside a validation set from the original training set for model selection, i.e. to tune the hyperparametters and model architectures.

Here we chose hold-out cross-validation, splitting the data using the ScikitLearn function [train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html#sklearn.model_selection.train_test_split).

Make sure to set **the random state** for reproducibility.
E.g.
X_train, X_val, y_train, y_val = train_test_split(X_dev, y_dev, test_size=0.33, **random_state**=42)

In [None]:
# Split the development set into training and validationn set (1/6 of total dev set)
# Your code here:
#



Build your convolutional neural networks below (you can get some insiration from this [keras example](https://keras.io/examples/vision/mnist_convnet/)).

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Activation, Conv2D, MaxPool2D, Dropout, Flatten

# Use function to define different models for reuse in experimments
def create_cnn_model():
    model = Sequential()
    model.add(Input(shape=(28,28,1,)))
    model.add(Conv2D(16, kernel_size=(3, 3), activation='relu'))
    model.add(Conv2D(32, kernel_size=(3, 3), activation='relu'))
    model.add(MaxPool2D(pool_size=(2, 2)))
    model.add(Dropout(0.25)) # Dropout 25% of the nodes of the previous layer during training
    model.add(Flatten())     # Flatten, and add a fully connected layer
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(10, activation='softmax')) # Last layer: 10 class nodes, with dropout
    return model

model=create_cnn_model()
model.summary()


Note that we have about half a million parameters. With a strong optimizer like Adam, and a big dataset like MNIST, this shouldn't be a problem.

Also consider using GPU for accelerated computing if training is too slow using CPU only.

In colab, you can easily add GPU to your runtime: just go to the top menu, click "Runtime"->"Change runtime type" -> "Accelerater hardware" is by default None, you can select "GPU" or "TPU" here.

You can also upload the notebook to Kaggle and run it there with GPU accelorated training.

TensorFlow and Keras will automatically execute on GPU if a GPU is available, so there’s nothing more you need to do after you’ve selected the GPU runtime.

In [None]:
from keras.optimizers import Adam
optimizer = Adam()
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model, iterating on the data in batches of 32 samples
history = model.fit(X_train, y_train, epochs=5, batch_size=32, validation_data=(X_val, y_val))

**Question:**
if GPU accelerating mode is enabled, how to change the batch size (assuming  current batch size is 32) to allow better use of the GPU?





Generat learning curves by e.g.plotting the training and validation loss (or accuracy) side by side.


In [None]:
import matplotlib.pyplot as plt
history_dict = history.history
loss_values = history_dict["loss"] # on training set
val_loss_values = history_dict["val_loss"] # on validation set
epochs = range(1, len(loss_values) + 1)
plt.plot(epochs, loss_values, "--", color='green', label="Training loss")
plt.plot(epochs, val_loss_values, "-", label="Validation loss")
plt.title("Training and validation loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.show()

**Exercise**:
Plot the learning curve with training accuracy and validation accuracy against the epoch number.

In [None]:
# Your code here:
#


## Now evaluate the trained model.


In [None]:
score = model.evaluate(X_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

In [None]:
# Classification report using scikit-learn
import numpy as np
from sklearn.metrics import classification_report, confusion_matrix
y_pred = model.predict(X_test)
print(y_pred) # y_pred is an 2-d array with 10 columns
y_predc = y_pred.argmax(axis=1) #get the class labels by choosing the class with the highest output
y_testc = y_test.argmax(axis=1)

print(classification_report(y_testc, y_predc))
print(confusion_matrix(y_true=y_testc, y_pred=y_predc))

## Training Monitoring and visualization with TensorBoard
To do good research or develop good models, you need rich, frequent feedback about what’s going on inside your models during your experiments. That’s the point of running experiments: to get information about how well a model performs as much information as possible.

TensorBoard (www.tensorflow.org/tensorboard) is a browser-based application that you can run locally. It’s the best way to monitor everything that goes on inside your model during training. With TensorBoard, you can
- Visually monitor metrics during training
- Visualize your model architecture
- Visualize histograms of activations and gradients
- Explore embeddings in 3D

The easiest way to use TensorBoard with a Keras model and the fit() method is to use the keras.callbacks.TensorBoard callback.

In [None]:
# Create a log directory (whatever path and name that suits)
!mkdir log

In [None]:
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
tensorboard = keras.callbacks.TensorBoard(log_dir="log")

history = model.fit(X_train, y_train, epochs=3, batch_size=32, validation_data=(X_val, y_val),
          callbacks=[tensorboard])

Once the model starts running, it will write logs at the target location. If you are running your Python script on a local machine, you can then launch the local TensorBoard server using the following command (note that the tensorboard executable should be already available if you have installed TensorFlow via pip; if not, you can install TensorBoard manually via pip install tensorboard):

tensorboard --logdir /full_path_to_your_log_dir

You can then navigate to the URL that the command returns in order to access the TensorBoard interface.

If you are running your script in a Colab notebook, you can run an embedded TensorBoard instance as part of your notebook, using the following commands:

In [None]:
%load_ext tensorboard
%tensorboard --logdir log

**Exercise**:

Try out different network architecture and hyperparameter settings, and observe the effect on performance using Tensorboard.

You can also try out the classic [LeNet architecture (LeuCun et al. 1998)](https://d2l.ai/chapter_convolutional-neural-networks/lenet.html#sec-lenet), given in the [deep learning textbook d2l.ai](https://d2l.ai/index.html), see below.
 - 2 convolutional layers uses 5×5 kernel and a sigmoid activation function. The first convolutional layer has 6 output channels, while the second has 16. Each 2×2 AvgPooling operation (stride 2). The convolutional block emits an output with shape given by (batch size, number of channel, height, width).
 - 3 dense layers, with 120, 84, and 10 outputs, respectively. Because we are still performing classification, the 10-dimensional output layer corresponds to the number of possible output classes.

<a title="Aphex34, CC BY-SA 4.0 &lt;https://creativecommons.org/licenses/by-sa/4.0&gt;, via Wikimedia Commons" href="https://d2l.ai/_images/lenet-vert.svg"><img width="200" alt="Typical cnn" src="https://d2l.ai/_images/lenet-vert.svg"></a>

In [None]:
# Your code




## Task 2 (optional): Classification using different benchmarking datasets

Develop and evaluate a model with different datasets, e.g.
- a more difficult MNIST dataset: [the Fashion MNIST dataset](https://github.com/zalandoresearch/fashion-mnist), to load the data from keras:

from keras.datasets import fashion_mnist

(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

- Cifar10 or Cifar100 dataset
https://keras.io/api/datasets/cifar100/


In [None]:
# Your code here
