# **ICT303 - Advanced Machine Learning and Artificial Intelligence**
# **Lab 5 - Convolutional Neural Networks (Part II)**

The goal of this lab is to learn how to use TensorBoard to do some visualizations. We are particularly interested in visualizing the training and validation loss, so that we can decide when to stop training.

Also, during the training process, you should save the state (weights) of the network so that if the computer stops for some reason, you should be able to resume from the latest state instead of restarting the training from scratch.

Another important aspect to consider when training neural networks is overfitting. In fact, once the network starts overfitting, i.e., when the validation loss goes up, you need to stop training and use one of the previous states as your final network weights. Thus it is important to save the learned paprameters at regular epoch intervals.

In this lab, you will also learn how to perform training on GPU when it is available, instead of CPU.

Finally, you will be asked to implement and train some of the networks we have seen in this week's lecture.

This lab is adapted from [Chapter 7](https://classic.d2l.ai/chapter_convolutional-neural-networks/index.html) and and [Chapter 8](https://classic.d2l.ai/chapter_convolutional-neural-networks/index.html) of the textbook.

## **1. Training and validation**

When training a machine learning model, we need to make sure that it does not only achieve a low error (i.e., a good performance) on the data used to fit the model (i.e., the training data) but it should also perform well on new data that the model did not see during training. This is called **generalization**. In fact, if the network performs:
- poorly on both the training and testing data then we say that it is **underfitting**.
- well on the training data but poorly on the test data then we say that it is **overfitting** the training data. In other word, the network has just memorized the correct answers on the training data. When it is given new data, it is unable to find the correct answer.

Ideally, the network should perform well on the training data and also well on the test data (or when it is deployed for usage).

The problem is how can we make sure that the model will perform well on the data that we don't have yet?

This is done using a validation data set. In other words, before you start implementing a machine learnig model, you need to collect some data with their corresponding ground-truth output (or labels). This data can be either real or synthetic as we did so far. Then, you need to split this data into three parts:
- **The training data set:** This is the one that you will use to fit/train the model.
- **The validation data set:** This set should **not** be used to learn/update the model parameters. Instead it will be used to evaluate the perfrmance of the model and check whether it is not overfitting. It will also be used to finetune the hyper parameters. In the other words, you need tyo test multiple hyperparameters and choose the ones that give you the best performace on the validation set.
- **The test data set:** This is the data set yopu will use to test and report the performance of the neural network. This set should **not** be used to learn the model parameters. Also, it should **not** be used for validation, i.e., assessing whether the model is overfitting and tuning the hyperparameters.

Every time you update the model parameters (e.g., every training epoch), you need to compute two errors (losses):
- **The training loss:** it indicates how the model is performing on the training data. Ideally, this loss should decrease at every epoch. But in practice, it will be a bit jaggy.
- **The generalization loss:** it indicates how the model performs on the validation data set. In other words, it is an indication of how the model would generalize to unseen data.
- Importantly, **do not** evaluate the performance of the network on the test data yet.

Ideally, the generalization loss will also decrease over time. However, if you plot the two losses as a function of the epoch number, you will obtain curves that look like the ones in the figure below (the figure is from https://d2l.ai/chapter_linear-regression/generalization.html):



capacity-vs-error.svg

You will see that the two errors (the training loss and the generalization loss) start high. It means that the model has not been trained properly yet - in other words, at this stage, the model **underfits** the data. It did not learn anything yet.

If you run the training procedure for a large number of epochs, you will then observe that the training loss converges to a very low value (ideally $0$ but will never reach $0$), which looks good a-priori. However, you will observe that at a certain point, the generalization loss, i.e., the error when the model is run on the validation data, starts increasing! This means that the model's performance on **unseen** data starts to degradate. This is called **overfitting**, which means that the model starts to memorize the training data but cannot generalize (or extrapolate or apply) what it has learned on new data (since it has only memorized the training data).

If you closely look at the curve, you notice that the generalization loss decreases overtime. But, at a certain point it starts increasing. It is at this point that the model starts to overfit. This tipping point is actually the optimum, i.e., you should stop training at that point and use the learned parameters at this point as the optimal ones.

Thus, while training any neural network, you need to
- Plot the training and generalize loss as a function of the number of epochs.
- Save at regular epoch intervals the parameters that the network has learned.
- Use  the learned parameters at the tipping point as the optimal network parameters.

To see this in practice, we will use the MLP network we created in Lab 3 and train it on CIFAR10 dataset. I reproduced the code here just for completeness.

In your case, you are asked to customize the code for the LeNet-5 network you created in the previous lab.

In [None]:
## MLP network (copied from Lab 4)

# Importing all dependencies
import os # for some OS ops
import torch
from torch import nn

from torch.utils.data import DataLoader
from torchvision.datasets import CIFAR10  # The data set that we will use

import matplotlib.pyplot as plt
import numpy as np

## The MLP class
class MLP(nn.Module):
  '''
    Multilayer Perceptron.
  '''
  def __init__(self, inputSize=32 * 32 * 3, outputSize=10, lr=0.01):
    super().__init__()
    self.layers = nn.Sequential(
      nn.Flatten(),
      nn.Linear(inputSize, 64),
      nn.ReLU(),
      nn.Linear(64, 32),
      nn.ReLU(),
      nn.Linear(32, outputSize),
    )
    # Setting the learning rate
    self.lr = lr

  ## The forward step
  def forward(self, X):
    # Computes the output given the input X
    return self.layers(X)

  ## The loss function - Here, we will use Cross Entropy Loss
  def loss(self, y_hat, y):
    fn = nn.CrossEntropyLoss() # see https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html
    return fn(y_hat, y)

  ## The optimization algorithm
  def configure_optimizers(self):
    # return torch.optim.SGD(self.parameters(), self.lr)
    return torch.optim.Adam(self.parameters(), self.lr)

In [None]:
## The training loop
class Trainer:

  def __init__(self, n_epochs = 3):
    self.max_epochs = n_epochs
    return

  def fit(self, model, data):

    self.data = data

    # configure the optimizer
    self.optimizer = model.configure_optimizers()
    self.model     = model

    for epoch in range(self.max_epochs):
      self.fit_epoch()

    print("Training process has finished")

  def fit_epoch(self):

    current_loss = 0.0

    # iterate over the DataLoader for training data
    for i, data in enumerate(self.data):
      # Get input
      inputs, target = data

      # Clear gradient buffers because we don't want any gradient from previous
      # epoch to carry forward, dont want to cummulate gradients
      self.optimizer.zero_grad()

      # get output from the model, given the inputs
      outputs = self.model(inputs)

      # get loss for the predicted output
      loss = self.model.loss(outputs, target)

      # get gradients w.r.t to the parameters of the model
      loss.backward()

      # update the parameters (perform optimization)
      self.optimizer.step()

      # Let's print some statistics (average of the training loss over minibatches of 500 data items)
      current_loss += loss.item()
      if i % 500 == 499:
          print('Loss after mini-batch %5d: %.3f' %
                (i + 1, current_loss / 500))
          current_loss = 0.0

The training program looks as follows;

In [None]:
## The main training program

# 1. Loading the CIFAR-10 data set
# Transforms to apply to the data - More about this later
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# Loading the data
train_dataset     = CIFAR10(os.getcwd(), train=True, download=True, transform=transform)

# If you would like to see the list of classes, uncomment the line below
# print(train_dataset.classes)

batch_size = 10
trainloader = torch.utils.data.DataLoader(train_dataset, batch_size, shuffle=True, num_workers=1)

# 2. The MLP model
mlp_model = MLP(lr=1e-04)

# 3. Training the network
# 3.1. Creating the trainer class
trainer = Trainer(n_epochs=1)

# 3.2. Training the model
trainer.fit(mlp_model, trainloader)

The testing code looks as follows;


In [None]:
'''
# printing some info about the dataset
print("Number of points:", dataset.shape[0])
print("Number of features:", dataset.shape[1])
print("Features:", dataset.columns.values)
print("Number of Unique Values")
for col in dataset:
    print(col,":",len(dataset[col].unique()))
plt.figure(figsize=(12,8))
'''
classes = ('airplanes', 'cars', 'birds', 'cats', 'deer', 'dogs', 'frogs', 'horses', 'ships', 'truck')
def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

# Loading the test set
testset = CIFAR10(os.getcwd(), train=False, download=True, transform=transform)

testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=False, num_workers=1)

dataiter = iter(testloader)
images, labels = next(dataiter)  # this gets one batch
# print(images.shape[0]) # should be equal to batch_size
# print(labels)  # the labels of the images in the batch

# Let's see some images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join(f'{classes[labels[j]]:5s}' for j in range(images.shape[0])))

# Now, let's see what the network thinks these examples are
output = mlp_model(images)
estimated_labels = torch.max(output, 1).indices
#print(estimated_labels)
#print(labels)

print('Estimated Labels: ', ' '.join(f'{classes[estimated_labels[j]]:5s}' for j in range(images.shape[0])))

### **3.1. Plotting the training and validation curves**

We would like to visualize the training and validation loss curves, similar to the figure above. For this, TensorFlow offers a powerful visualization tool called [TensorBoard](https://www.tensorflow.org/tensorboard). Please refer to this link for a detailed description of the different functionalities offered by the TensorBoard as well as how to use them. Below, we just explain the basic functionalities.

To use TensorBoard with PyTorch, please refer to this [link](https://pytorch.org/tutorials/recipes/recipes/tensorboard_with_pytorch.html). Here, I will briefly describe how to visualize images from the dataset, the network architecture, and the loss and accuracy curves.

First, to use TensorBoard, we need to load it using the following code:

In [None]:
%load_ext tensorboard
%load_ext tensorboard

# Clear any logs from previous runs
%rm -rf ./logs/

from torch.utils.tensorboard import SummaryWriter

# default `log_dir` is "./runs" - we'll be more specific here
writer = SummaryWriter('runs/CIFAR-10_experiment_1')

TensorBoard can be used to visualize images as well as the architecture of the neural network:


In [None]:
import matplotlib.pyplot as plt
import torchvision
import torchvision.transforms as transforms

## Traditional way of displaying images
def matplotlib_imshow(img, one_channel=False):
    if one_channel:
        img = img.mean(dim=0)
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    if one_channel:
        plt.imshow(npimg, cmap="Greys")
    else:
        plt.imshow(np.transpose(npimg, (1, 2, 0)))

## Let's see some images

# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

# create grid of images
img_grid = torchvision.utils.make_grid(images)

# show images (traditional image display without TensorBoard)
# matplotlib_imshow(img_grid, one_channel=True)

# write to tensorboard
writer.add_image('four_fashion_mnist_images', img_grid)

## Let's see the network model
writer.add_graph(mlp_model, images)

# Make sure you close the tensor board
writer.close()

To launch TensorBoard, run the following command:

In [None]:
tensorboard --logdir runs

Now, to visualize the training loss, we need to update the Trainer class as follows;
- We need to pass in an instance of the TensorBoar to the constructor of the Trainer
- The methods fit() and/or fit_epoch() need to log in the values that needs to be visualized as curves. In the example, below, we will log in the training loss after each epoch.


In [None]:
## The training loop
class Trainer:

  def __init__(self, tb, n_epochs = 3):
    self.max_epochs = n_epochs
    self.writer = tb  # the tensorboard instance
    return

  def fit(self, model, data):

    self.data = data

    # configure the optimizer
    self.optimizer = model.configure_optimizers()
    self.model     = model

    for epoch in range(self.max_epochs):
      self.fit_epoch()

      # Logging the average training loss so that it can be visualized in the tensorboard
      self.writer.add_scalar("Training Loss", self.avg_training_loss, epoch)

    print("Training process has finished")

  def fit_epoch(self):

    current_loss = 0.0
    self.avg_training_loss = 0.0

    # iterate over the DataLoader for training data
    for i, data in enumerate(self.data):
      # Get input
      inputs, target = data

      # Clear gradient buffers because we don't want any gradient from previous
      # epoch to carry forward, dont want to cummulate gradients
      self.optimizer.zero_grad()

      # get output from the model, given the inputs
      outputs = self.model(inputs)

      # get loss for the predicted output
      loss = self.model.loss(outputs, target)

      # get gradients w.r.t to the parameters of the model
      loss.backward()

      # update the parameters (perform optimization)
      self.optimizer.step()

      # Let's print some statistics (average of the training loss over minibatches of 500 data items)
      current_loss += loss.item()

      # Adding training loss
      self.avg_training_loss += loss.item()

      if i % 500 == 499:
          print('Loss after mini-batch %5d: %.3f' %
                (i + 1, current_loss / 500))
          current_loss = 0.0

    # The average training loss
    self.avg_training_loss = self.avg_training_loss / i # to get the average



The training code then becomes as follows;

In [None]:
%load_ext tensorboard
%load_ext tensorboard

from torch.utils.tensorboard import SummaryWriter

# Clear any logs from previous runs
%rm -rf ./logs/

# default `log_dir` is "runs" - we'll be more specific here
writer = SummaryWriter('runs/CIFAR-10_experiment_1')

## The main training program
# 1. Loading the CIFAR-10 data set
# Transforms to apply to the data - More about this later
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# Loading the data
train_dataset = CIFAR10(os.getcwd(), train=True, download=True, transform=transform)

# If you would like to see the list of classes, uncomment the line below
# print(train_dataset.classes)

batch_size = 10
trainloader = torch.utils.data.DataLoader(train_dataset, batch_size, shuffle=True, num_workers=1)

# 2. The MLP model
mlp_model = MLP(lr=1e-04)

# 3. Training the network
# 3.1. Creating the trainer class - note that here, I passed writer as a  parameter to the trainer
trainer = Trainer(writer, n_epochs=5)

# 3.2. Training the model
trainer.fit(mlp_model, trainloader)

# Visualize
writer.close()

Launch the TensorBoard:

In [None]:
tensorboard --logdir runs

### **3.2. Exericse 1 - Training MLP with validation**

Extend the training class so that it trains on $80\%$ of the training data and uses the remaining $20%$ for validation. At each epoch, the method  **fit_epoch()** uses the training data to update the network parameters. It then needs to compute the loss on the validation data set. Make sure you store the validation loss at each epoch and then visualize both the training loss and validation loss on the TensorBoard. It is better to visualize the two curves on the same plot so that one can compare them.

### **3.3. Exericse 2 - Accuracy**

Extend your code in Exercise 1 to also compute the accuracy at each epoch, both on the training set as well as on the validation set. Plot the accuracy curves on a separate plot.

To compute the accuracy, you need to run a forward pass on the data and compare the predicted labels with the groundtruth labels. The accuracy is then defined as the numbr of correct predictions divided by the total number of data items you passed into your network.


### **3.4. Exericse 3 - Training on GPU**

Deep learning networks are very slow - both during training as well as at testing. Luckily, they are highly parallelizable, especially CNNs, thus they fit very well for running on modern GPUs.

In this exercise, you are asked to extend the MLP class as well as LeNet class so that you can train it and test it on GPU when a GPU is available.

In the free version of Colab, you will get $12$ hours of execution time but the session will be disconnected if you are idle for more than $60$ minutes. It means that for every $12$ hours Disk, RAM, CPU Cache and the Data that is on your allocated virtual machine will get erased. Thus, if you plan to train for longer than $12$ hours, you must make sure that the state of your training is regularly saved. In fact, even if your expected training timne is less then 12 hours, you need to ensure that you regularly save the learned network parameters.

To enable GPU hardware accelerator on Colab, just go to the menu **Runtime -> Change runtime type -> Hardware accelerator -> GPU**.

Then, inside the training loop, in the function *fit_epoch()*, you need to trasnfer the training data into GPU. Instead of writing:


```
inputs, target = data
```
You need to write:


```
inputs, target = data
if torch.cuda.is_available():
   data, target = data.cuda(), target.cuda()
```

You do the same if you add a validation loop (Exercise 2).

Update your LeNet class, train it on GPU and then compare the training time with the previous version that runs on CPU.

## **2. Modern Convolutional Neural Networks**

Pick up one or two of the networks we saw durng the lecture and try to implement them using PyTorch. Train them on MNIST or CIFAR10 datasets and compare their performance with LeNet that you implemented last week.

For this exercise, start with the LeNet class you created in the previous lab.