<a href="https://colab.research.google.com/github/ashkree/ICT303_Practical/blob/main/ICT303_Lab_Topic_4_CNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **ICT303 - Advanced Machine Learning and Artificial Intelligence**
# **Lab of Topic 4 - Convolutional Neural Networks (Part I)**

The goal of this lab is to learn how to implement, train and test convolutional neural networks (CNN). We will look at a particular CNN called **LeNet**, which was briefly explaiend in the lecture.

The goal is to implement the LeNet architecture and train it so that it can recognize hand-written digits (0 to 9). For this, we will use the CIFAR10 dataset, which has images of handwritten digits.

This lab is adapted from Chapter 6 of the textbook:
- https://classic.d2l.ai/chapter_convolutional-neural-networks/lenet.html

## **1. LeNet-5 architecture**

There are different LeNet versions. The main difference between them is in the number of layers and the number of neurons within each layer. Here, we focus on **LeNet-5**.

LeNet-5, introduced in 1990s, consists of two parts:
1.   A convolutional network, also called convolutional encoder, and
2.   A dense block of fully connected layers - this is another term used in deep learning to just say MLP. It is also called a **dense block**.

The **convolutional network** consists of two convolutional layers. Each layer uses
- $5\times5$ convolutional kernels (filters), with stride $2$,
- A sigmoid activation function. Note that while nowadays, it is commonly known that ReLU and max pooling work better, these were not discovered in the 1990s.


Each convolutional layer is followed by a $2\times 2$ **average pooling layer**.

Each convolutional layer maps its input to an ouput of lower resolution but with more channels:
- The first convolutional layer has $6$ output channels,
- The second convolutional layer has $16$ output channels.

This way, if the input to the encoder is a grayscale image of size $28\times28$, its final output would have $16$ channels of size $5\times 5$ each.

The **dense block** is a multilayer perceptron (MLP) of $3$  layers. In deep learning, we often use the terminology **three fully connected layers** to refer to an MLP of three layers:
- The first layer has $120$ neurons,
- The second has $84$,

**Question:** Can you figure out how many neurons the output (last) layer should have?

The dense block is placed right after the encoder. To pass the output of the encoder to the dense block, it needs to be converted into a vector. This operation is called **flattening**.

The figure below shows the architecture of the network (from the textbook). The notation $6@28\times28$ means that the number of channels, width and height of the feature map are respetively, $6, 28$ and $28$.

lenet.svg



It can also be represented in a compact way as follows;

lenet-vert.svg

## **2. Exercise 1**

Extend the MLP code you created last week to create the LeNet network.

To create a convolutional layer, you can use the command:


```
nn.Conv2d(n_input_channels, n_output_channels, kernel_size, padding)
```

where:
- n_input_channels: no. of input channels,
- n_output_channels: no. of output channels,
- kernel_size: kernel size,
- padding: the padding

To create  an average pooling layer, you can use the command:

```
nn.AvgPool2d(kernel_size, stride)
```

where stride is the stride value.



## **3. Exercise 2**
Using the classes you created last week and in the previous exercise (may be with some customizations):
- train the network on the [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) data set, and analyze the computation time.
- Update the code so that if a GPU is available, it will train on the GPU  instead of the CPU.
- Test it on a few images both on CPU as well as on GPU, and compare the computation time.
- Can you figure out how much memory the training phase and testing phase require?

As a good practice when training deep neural networks, you should save the weights of the network every fixed number of epochs. This will serve two purposes:
- Training takes time, even on GPU. Sometimes, if not often, you may need to stop the training and resume later. Sometimes, the PC may crash, so if this happens, you will be able to resume from the last saved state.
- We will see next week, when we look at a formal way of evaluating the performance/accuracy of the network, that we need to plot the training and validation errors and choose the state that avoids overfitting. TensorFlow's TensorBoard provides a convenient of doing this. We will see it next week.

## **4. Exercise 3**
Using the code of Exercise 2 (with some customization), train the network on the [Fashion-MNIST](https://www.kaggle.com/datasets/zalando-research/fashionmnist) data set and test it on a few images.