[![Fixel Algorithms](https://i.imgur.com/AqKHVZ0.png)](https://fixelalgorithms.gitlab.io)

# AI Program

## Machine Learning - Deep Learning - The ResNet Model

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 1.0.000 | 26/05/2024 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/AIProgram/2024_02/0092DeepLearningResNet.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.model_selection import ParameterGrid

# Deep Learning
import torch
import torch.nn            as nn
import torch.nn.functional as F
from torch.optim.optimizer import Optimizer
from torch.optim.lr_scheduler import LRScheduler
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
import torchinfo
from torchmetrics.classification import MulticlassAccuracy
import torchvision
from torchvision.transforms import v2 as TorchVisionTrns

# Miscellaneous
import copy
from enum import auto, Enum, unique
import math
import os
from platform import python_version
import random
import time

# Typing
from typing import Callable, Dict, Generator, List, Optional, Self, Set, Tuple, Union

# Visualization
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython
from IPython.display import HTML, Image
from IPython.display import display
from ipywidgets import Dropdown, FloatSlider, interact, IntSlider, Layout, SelectionSlider
from ipywidgets import interact

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```python
someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler
```

### Code Exercise

 - Single line fill

```python
valToFill = ???
```

 - Multi Line to Fill (At least one)

```python
# You need to start writing
?????
```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

?????
#===============================================================#
```

In [None]:
# Configuration
# %matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# Matplotlib default color palette
lMatPltLibclr = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

# Improve performance by benchmarking
torch.backends.cudnn.benchmark = True

# Reproducibility (Per PyTorch Version on the same device)
# torch.manual_seed(seedNum)
# torch.backends.cudnn.deterministic = True
# torch.backends.cudnn.benchmark     = False #<! Makes things slower


In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2

D_CLASSES_CIFAR_10  = {0: 'Airplane', 1: 'Automobile', 2: 'Bird', 3: 'Cat', 4: 'Deer', 5: 'Dog', 6: 'Frog', 7: 'Horse', 8: 'Ship', 9: 'Truck'}
L_CLASSES_CIFAR_10  = ['Airplane', 'Automobile', 'Bird', 'Cat', 'Deer', 'Dog', 'Frog', 'Horse', 'Ship', 'Truck']
T_IMG_SIZE_CIFAR_10 = (32, 32, 3)

DATA_FOLDER_PATH    = 'Data'
TENSOR_BOARD_BASE   = 'TB'


In [None]:
# Download Auxiliary Modules for Google Colab
if runInGoogleColab:
    !wget https://raw.githubusercontent.com/FixelAlgorithmsTeam/FixelCourses/master/AIProgram/2024_02/DataManipulation.py
    !wget https://raw.githubusercontent.com/FixelAlgorithmsTeam/FixelCourses/master/AIProgram/2024_02/DataVisualization.py
    !wget https://raw.githubusercontent.com/FixelAlgorithmsTeam/FixelCourses/master/AIProgram/2024_02/DeepLearningPyTorch.py

In [None]:
# Courses Packages

from DataVisualization import PlotLabelsHistogram, PlotMnistImages
from DeepLearningPyTorch import NNMode, ResidualBlock
from DeepLearningPyTorch import InitWeightsKaiNorm, TrainModel


In [None]:
# General Auxiliary Functions



## The ResNet Model

The ResNet model is considered to be one of the most successful architectures.  
Its main novelty is the _Skip Connection_ which improved the performance greatly.

By _hand waiving_ the contribution of the skip connection can be explained as:

 * Learn the Residual  
   When looking for features, instead of extracting them from the image all needed is to emphasize them.  
   Namely, as a block, since the input is given, the active path only learns the residual.
 * Ensemble of Models  
   Since the data can move forward in many paths, one can see each path as a model.
 * Skip Vanishing Gradients   
   If the gradient is vanishing at a path, the skip connection can skip it during backpropagation.


This notebook presents:
 - Review a model based on a paper.
 - Implement the _Residual Block_.
 - Implement a _ResNet_ like model.
 - Train the model on _CIFAR10_ dataset.

</br>

* <font color='brown'>(**#**)</font> A great recap on ResNet is given in the book [Dive into Deep Learning](https://d2l.ai): [Residual Networks (ResNet) and ResNeXt](https://d2l.ai/chapter_convolutional-modern/resnet.html).
* <font color='brown'>(**#**)</font> Analysis of the _Skip Connection_ is given in [Skip Connections Eliminate Singularities](https://arxiv.org/abs/1701.09175).
* <font color='brown'>(**#**)</font> Discussions and notes on _skip connection_: [CrossValidated - Neural Network with Skip Layer Connections](https://stats.stackexchange.com/questions/56950), [FastAI Forum - What’s the Intuition behind Skip Connections in ResNet](https://forums.fast.ai/t/63589).
* <font color='brown'>(**#**)</font> [Intuitive Explanation of Skip Connections in Deep Learning](https://theaisummer.com/skip-connections).
* <font color='brown'>(**#**)</font> [How Many Models Is ResNet](https://lernapparat.de/resnet-how-many-models) - A different point of view on ResNet.

In [None]:
# Parameters

# Data
numSamplesPerClsTrain   = 4000
numSamplesPerClsVal     = 400

# Model
dropP = 0.5 #<! Dropout Layer

# Training
batchSize   = 256
numWork     = 2 #<! Number of workers
nEpochs     = 45

# Visualization
numImg = 3


## Generate / Load Data

Load the [CIFAR 10 Data Set](https://en.wikipedia.org/wiki/CIFAR-10).  
It is composed of 60,000 RGB images of size `32x32` with 10 classes uniformly spread.

* <font color='brown'>(**#**)</font> The dataset is retrieved using [Torch Vision](https://pytorch.org/vision/stable/index.html)'s built in datasets.  

In [None]:
# Load Data

dsTrain = torchvision.datasets.CIFAR10(root = DATA_FOLDER_PATH, train = True,  download = True, transform = torchvision.transforms.ToTensor())
dsVal   = torchvision.datasets.CIFAR10(root = DATA_FOLDER_PATH, train = False, download = True, transform = torchvision.transforms.ToTensor())
lClass  = dsTrain.classes


print(f'The training data set data shape: {dsTrain.data.shape}')
print(f'The validation data set data shape: {dsVal.data.shape}')
print(f'The unique values of the labels: {np.unique(lClass)}')

* <font color='brown'>(**#**)</font> The dataset is indexible (Subscriptable). It returns a tuple of the features and the label.
* <font color='brown'>(**#**)</font> While data is arranged as `H x W x C` the transformer, when accessing the data, will convert it into `C x H x W`. 

In [None]:
# Element of the Data Set

mX, valY = dsTrain[0]

print(f'The features shape: {mX.shape}')
print(f'The label value: {valY}')

### Plot the Data

In [None]:
# Extract Data

tX = dsTrain.data #<! NumPy Tensor (NDarray)
mX = np.reshape(tX, (tX.shape[0], -1))
vY = dsTrain.targets #<! NumPy Vector


In [None]:
# Plot the Data

hF = PlotMnistImages(mX, vY, numImg, tuImgSize = T_IMG_SIZE_CIFAR_10)

In [None]:
# Histogram of Labels

hA = PlotLabelsHistogram(vY, lClass = L_CLASSES_CIFAR_10)
plt.show()

* <font color='red'>(**?**)</font> If data is converted into _grayscale_, how would it effect the performance of the classifier? Explain.  
  You may assume the conversion is done using the mean value of the RGB pixel.

## Pre Process Data

This section:

 * Normalizes the data in a predefined manner.
 * Takes a sub set of the data.

In [None]:
# Calculate the Standardization Parameters
vMean = np.mean(dsTrain.data / 255.0, axis = (0, 1, 2))
vStd  = np.std(dsVal.data / 255.0, axis = (0, 1, 2))

print('µ =', vMean)
print('σ =', vStd)

In [None]:
# Update Transforms
# Using v2 Transforms
oDataTrnsTrain = TorchVisionTrns.Compose([
    TorchVisionTrns.ToImage(),
    TorchVisionTrns.ToDtype(torch.float32, scale = True),
    TorchVisionTrns.RandomHorizontalFlip(p = 0.5),
    TorchVisionTrns.Normalize(mean = vMean, std = vStd),
])
oDataTrnsVal = TorchVisionTrns.Compose([
    TorchVisionTrns.ToImage(),
    TorchVisionTrns.ToDtype(torch.float32, scale = True),
    TorchVisionTrns.Normalize(mean = vMean, std = vStd),
])

# Update the DS transformer
dsTrain.transform   = oDataTrnsTrain
dsVal.transform     = oDataTrnsVal

* <font color='red'>(**?**)</font> What does `RandomHorizontalFlip` do? Why can it be used?

In [None]:
# "Normalized" Image

mX, valY = dsTrain[5]

hF, hA = plt.subplots()
hImg = hA.imshow(np.transpose(mX, (1, 2, 0)))
hF.colorbar(hImg)
plt.show()

### Data Loaders

This section defines the data loaded.



In [None]:
# Data Loader

dlTrain = torch.utils.data.DataLoader(dsTrain, shuffle = True, batch_size = 1 * batchSize, num_workers = numWork, persistent_workers = True)
dlVal   = torch.utils.data.DataLoader(dsVal, shuffle = False, batch_size = 2 * batchSize, num_workers = numWork, persistent_workers = True)


* <font color='red'>(**?**)</font> Why is the size of the batch twice as big for the test dataset?

In [None]:
# Iterate on the Loader
# The first batch.
tX, vY = next(iter(dlTrain)) #<! PyTorch Tensors

print(f'The batch features dimensions: {tX.shape}')
print(f'The batch labels dimensions: {vY.shape}')

In [None]:
# Looping
for ii, (tX, vY) in zip(range(1), dlVal): #<! https://stackoverflow.com/questions/36106712
    print(f'The batch features dimensions: {tX.shape}')
    print(f'The batch labels dimensions: {vY.shape}')

## Define the Model

The model is defined as a sequential model built with non sequential blocks.

In PyTorch models are defined by 2 main methods:
1. The `__init__()` Method  
   Set the parameters and elements of the model.
2. The `forward()` Method  
   Set the structure of the computation.

It holds whether the model is a layer or a sub module of a bigger model.

### The Residual Block

This section implements the residual block as an `torch.nn.Module`.

![](https://i.imgur.com/uCUvner.png)

* <font color='blue'>(**!**)</font> Got through the `ResidualBlock` class.
* <font color='brown'>(**#**)</font> The `ResidualBlock` class is an example of a non sequential model (Though it is a block).
* <font color='brown'>(**#**)</font> Some argue that the residual block in the following form yields better results:

```python
nn.Sequential(
            nn.BatchNorm2d(C), nn.ReLU(), nn.Conv2d(C, C, 3, padding = 1, bias = False),
            nn.BatchNorm2d(C), nn.ReLU(), nn.Conv2d(C, C, 3, padding = 1, bias = False)
)
```

In [None]:
# The Residual Block
oResBlock = ResidualBlock(64)
torchinfo.summary(oResBlock, (4, 64, 56, 56), col_names = ['kernel_size', 'output_size', 'num_params'], device = 'cpu')

* <font color='green'>(**@**)</font> Implement the `ResidualBlock` class using `nn.Sequential` to describe the left path.
* <font color='brown'>(**#**)</font> Assume the output of the _Residual Layer_ was $g \left( \boldsymbol{x} \right) = \alpha f \left( \boldsymbol{x} \right) + \left( 1 - \alpha \right) \boldsymbol{x}$.  
  If one would like $\alpha$ to be a learned parameter it would have ot be registered as such.  
  See [PyTorch: Custom nn Modules](https://pytorch.org/tutorials/beginner/examples_nn/polynomial_module.html), [Dive into Deep Learning - Builder's Guide - Custom Layers](https://d2l.ai/chapter_builders-guide/custom-layer.html), [Writing a Custom Layer in PyTorch](scribe.rip/14ab6ac94b77).

In [None]:
# Model
# Defining a sequential model.

numChannels = 128

def BuildModel( nC: int ) -> nn.Module:

    oModel = nn.Sequential(
        nn.Identity(),
        nn.Conv2d(3, nC, 3, padding = 1, bias = False),  nn.BatchNorm2d(nC), nn.ReLU(),                  nn.Dropout2d(0.2),
        nn.Conv2d(nC, nC, 3, padding = 1, bias = False), nn.BatchNorm2d(nC), nn.ReLU(), nn.MaxPool2d(2), nn.Dropout2d(0.2),
        
        ResidualBlock(nC), nn.Dropout2d(0.2),
        ResidualBlock(nC), nn.Dropout2d(0.2),
        
        nn.AdaptiveAvgPool2d(1),
        nn.Flatten(),
        nn.Linear(nC, 10)
    )

    oModel.apply(InitWeightsKaiNorm)

    return oModel

oModel = BuildModel(numChannels)

torchinfo.summary(oModel, (batchSize, 3, 32, 32), col_names = ['kernel_size', 'output_size', 'num_params'], device = 'cpu')

* <font color='green'>(**@**)</font> Use the default initialization and compare results.
* <font color='red'>(**?**)</font> What is the motivation to _build on your own_ the ResNet model instead of using the pre trained model?  
  Think about the dimensions of the dataset samples.

## Train the Model

This section trains the model using different schedulers:

 - Updates the training function to use more features of _TensorBoard_.
 - Trains the model with different _hyper parameters_.

In [None]:
# Run Device

runDevice = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') #<! The 1st CUDA device
oModel    = oModel.to(runDevice) #<! Transfer model to device

In [None]:
# Loss and Score Function

hL = nn.CrossEntropyLoss()
hS = MulticlassAccuracy(num_classes = len(lClass), average = 'micro')
hL = hL.to(runDevice) #<! Not required!
hS = hS.to(runDevice)

* <font color='brown'>(**#**)</font> The averaging mode `macro` averages samples per class and average the result of each class.
* <font color='brown'>(**#**)</font> The averaging mode `micro` averages all samples.
* <font color='red'>(**?**)</font> Given 8 samples of class `A` with 6 predictions being correct and 2 samples of class `B` with 1 being correct.  
  What will be the _macro average_? What will be the _micro average_?

In [None]:
# Define Optimizer

oOpt = torch.optim.AdamW(oModel.parameters(), lr = 1e-3, betas = (0.9, 0.99), weight_decay = 1e-3) #<! Define optimizer

In [None]:
# Define Scheduler

oSch = torch.optim.lr_scheduler.OneCycleLR(oOpt, max_lr = 5e-3, total_steps = nEpochs)

In [None]:
# Train Model

oModel, lTrainLoss, lTrainScore, lValLoss, lValScore, lLearnRate = TrainModel(oModel, dlTrain, dlVal, oOpt, nEpochs, hL, hS, oSch = oSch)

In [None]:
# Plot Training Phase

hF, vHa = plt.subplots(nrows = 1, ncols = 3, figsize = (12, 5))
vHa = np.ravel(vHa)

hA = vHa[0]
hA.plot(lTrainLoss, lw = 2, label = 'Train')
hA.plot(lValLoss, lw = 2, label = 'Validation')
hA.set_title('Binary Cross Entropy Loss')
hA.set_xlabel('Epoch')
hA.set_ylabel('Loss')
hA.legend()

hA = vHa[1]
hA.plot(lTrainScore, lw = 2, label = 'Train')
hA.plot(lValScore, lw = 2, label = 'Validation')
hA.set_title('Accuracy Score')
hA.set_xlabel('Epoch')
hA.set_ylabel('Score')
hA.legend()

hA = vHa[2]
hA.plot(lLearnRate, lw = 2)
hA.set_title('Learn Rate Scheduler')
hA.set_xlabel('Epoch')
hA.set_ylabel('Learn Rate')

* <font color='blue'>(**!**)</font> Implement the ResNeXt like model. Start with the block and embed it into the model.