[![Fixel Algorithms](https://i.imgur.com/AqKHVZ0.png)](https://fixelalgorithms.gitlab.io)

# AI Program

## Machine Learning - Deep Learning - PyTorch Regression - Exercise

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 1.0.001 | 02/06/2024 | Royi Avital | Changed `Test` into `Validation`                                   |
|         |            |             | Added `α` as a parameter to the `LeakyReLU` layer                  |
| 1.0.000 | 27/04/2024 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/AIProgram/2024_02/0083DeepLearningPyTorchCifar10.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning
from sklearn.datasets import fetch_california_housing

# Deep Learning
import torch
import torch.nn            as nn
import torch.nn.functional as F
from torch.optim.optimizer import Optimizer
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
from torchmetrics.regression import R2Score
import torchinfo

# Miscellaneous
import copy
import math
import os
from platform import python_version
import random
import time

# Typing
from typing import Any, Callable, Dict, Generator, List, Optional, Self, Set, Tuple, Union

# Visualization
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython
from IPython.display import HTML, Image
from IPython.display import display
from ipywidgets import Dropdown, FloatSlider, interact, IntSlider, Layout, SelectionSlider
from ipywidgets import interact

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```python
someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler
```

### Code Exercise

 - Single line fill

 ```python
 vallToFill = ???
 ```

 - Multi Line to Fill (At least one)

 ```python
 # You need to start writing
 ????
 ```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

???
#===============================================================#
```

In [None]:
# Configuration
# %matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# Matplotlib default color palette
lMatPltLibclr = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

# Improve performance by benchmarking
torch.backends.cudnn.benchmark = True

# Reproducibility
# torch.manual_seed(seedNum)
# torch.backends.cudnn.deterministic = True
# torch.backends.cudnn.benchmark     = False


In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2

D_CLASSES_CIFAR_10  = {0: 'Airplane', 1: 'Automobile', 2: 'Bird', 3: 'Cat', 4: 'Deer', 5: 'Dog', 6: 'Frog', 7: 'Horse', 8: 'Ship', 9: 'Truck'}
L_CLASSES_CIFAR_10  = ['Airplane', 'Automobile', 'Bird', 'Cat', 'Deer', 'Dog', 'Frog', 'Horse', 'Ship', 'Truck']
T_IMG_SIZE_CIFAR_10 = (32, 32, 3)

DATA_FOLDER_PATH    = 'Data'
TENSOR_BOARD_BASE   = 'TB'


In [None]:
# Download Auxiliary Modules for Google Colab
if runInGoogleColab:
    !wget https://raw.githubusercontent.com/FixelAlgorithmsTeam/FixelCourses/master/AIProgram/2024_02/DataManipulation.py
    !wget https://raw.githubusercontent.com/FixelAlgorithmsTeam/FixelCourses/master/AIProgram/2024_02/DataVisualization.py
    !wget https://raw.githubusercontent.com/FixelAlgorithmsTeam/FixelCourses/master/AIProgram/2024_02/DeepLearningPyTorch.py

In [None]:
# Courses Packages

from DeepLearningPyTorch import NNMode


In [None]:
# General Auxiliary Functions


## California House Pricing Regression with PyTorch

This notebook applies regression (Single value per sample) on the [California Housing Dataset](https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html).  

The notebook presents:

 * Use PyTorch tools for splitting of data.
 * The use of the MSE loss in PyTorch.
 * The use of the ${R}^{2}$ score in PyTorch.
 * Use grid search for weight decay parameter (`λ`).
 * Using [TensorBoard](https://www.tensorflow.org/tensorboard) with PyTorch.



In [None]:
# Parameters

# Data
numSamplesTrain = 15_000
numSamplesVal   = 5_640

# Model
dropP = 0.1 #<! Dropout Layer
α     = 0.1 #<! LeakyReLu

# Training
batchSize   = 256
numWork     = 2 #<! Number of workers
nEpochs     = 200

lλ = [0.0, 1e-5, 1e-4, 1e-3, 1e-2]

# Visualization


## Generate / Load Data

This section loads the [California Housing Dataset](https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html):

 * The dataset is retrieved using [`fetch_california_housing()`](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html).  
 * It is wrapped into a `Dataset` using [`torch.utils.data.TensorDataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.TensorDataset).
 * The data is split to 15,000 train samples and 5,640 test samples.

</br>

* <font color='brown'>(**#**)</font> The `TensorDataset` suits for small data sets which fit memory.

In [None]:
# Load Data

mX, vY  = fetch_california_housing(return_X_y = True)

print(f'The features data shape: {mX.shape}')
print(f'The labels data shape: {vY.shape}')


In [None]:
# PyTorch Dataset

#===========================Fill This===========================#
# 1. Wrap the data using `torch.utils.data.TensorDataset`.
# 2. Split the data using `torch.utils.data.random_split` (numSamplesTrain, numSamplesVal).
# !! Data must be converted into Tensors before using `TensorDataset`.
# !! Make sure to define the `dtype` properly.
dsData          = ???
dsTrain, dsVal  = ???
#===============================================================#

print(f'The training data set data shape: {(len(dsTrain), dsTrain.dataset.tensors[0].shape[1])}')
print(f'The test data set data shape: {(len(dsVal), dsTrain.dataset.tensors[0].shape[1])}')


* <font color='brown'>(**#**)</font> Pay attention that `dsTrain` and `dsVal` are [`Subset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Subset).

## Pre Process Data

Make the data zero mean and unit variance.

</br>

* <font color='brown'>(**#**)</font> The normalization is applied per feature.
* <font color='brown'>(**#**)</font> Calculation be based on the train data and applied to both.
* <font color='brown'>(**#**)</font> Since data fits memory, no need for `transform`. In case it is needed, one must create a [custom `Dataset` sub class](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html).

In [None]:
# Calculate the Standardization Parameters

#===========================Fill This===========================#
# 1. Calculate the mean per feature.
# 2. Calculate the standard deviation per feature.
# !! Calculation by train data only.
vMean = ???
vStd  = ???
#===============================================================#

print('µ =', vMean)
print('σ =', vStd)

In [None]:
# Apply the Standardization Parameters

# Train
dsTrain.dataset.tensors[0][dsTrain.indices, :] -= vMean
dsTrain.dataset.tensors[0][dsTrain.indices, :] /= vStd

# Train
dsVal.dataset.tensors[0][dsVal.indices, :] -= vMean
dsVal.dataset.tensors[0][dsVal.indices, :] /= vStd

### Data Loaders

The dataloader is the functionality which loads the data into memory in batches.  
Its challenge is to bring data fast enough so the Hard Disk is not the training bottleneck.  
In order to achieve that, Multi Threading / Multi Process is used.

* <font color='brown'>(**#**)</font> The multi process, by the `num_workers` parameter is not working well _out of the box_ on Windows.  
  See [Errors When Using `num_workers > 0` in `DataLoader`](https://discuss.pytorch.org/t/97564), [On Windows `DataLoader` with `num_workers > 0` Is Slow](https://github.com/pytorch/pytorch/issues/12831).  
  A way to overcome it is to define the training loop as a function in a different module (File) and import it (https://discuss.pytorch.org/t/97564/4, https://discuss.pytorch.org/t/121588/21). 
* <font color='brown'>(**#**)</font> The `num_workers` should be set to the lowest number which feeds the GPU fast enough.  
  The idea is preserve as much as CPU resources to other tasks.
* <font color='brown'>(**#**)</font> On Windows keep the `persistent_workers` parameter to `True` (_Windows_ is slower on forking processes / threads).
* <font color='brown'>(**#**)</font> The Dataloader is a generator which can be looped on.
* <font color='brown'>(**#**)</font> In order to make it iterable it has to be wrapped with `iter()`.

In [None]:
# Data Loader

# The `drop_last` parameter has a default of False in PyTorch
dlTrain = torch.utils.data.DataLoader(dsTrain, shuffle = True, batch_size = 1 * batchSize, num_workers = numWork, drop_last = True, persistent_workers = True)
dlVal   = torch.utils.data.DataLoader(dsVal, shuffle = False, batch_size = 2 * batchSize, num_workers = numWork, persistent_workers = True)


* <font color='red'>(**?**)</font> Why is the size of the batch twice as big for the test dataset?

In [None]:
# Iterate on the Loader
# The first batch.
tX, vY = next(iter(dlTrain)) #<! PyTorch Tensors

print(f'The batch features dimensions: {tX.shape}')
print(f'The batch labels dimensions: {vY.shape}')

## Define the Model

The model is defined as a sequential model.

The model is given by:

```python
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
Sequential                               --                        --
├─Identity: 1-1                          [128, 8]                  --
├─Linear: 1-2                            [128, 900]                8,100
├─LeakyReLU: 1-3                         [128, 900]                --
├─Dropout: 1-4                           [128, 900]                --
├─Linear: 1-5                            [128, 700]                630,700
├─LeakyReLU: 1-6                         [128, 700]                --
├─Dropout: 1-7                           [128, 700]                --
├─Linear: 1-8                            [128, 500]                350,500
├─LeakyReLU: 1-9                         [128, 500]                --
├─Dropout: 1-10                          [128, 500]                --
├─Linear: 1-11                           [128, 300]                150,300
├─LeakyReLU: 1-12                        [128, 300]                --
├─Dropout: 1-13                          [128, 300]                --
├─Linear: 1-14                           [128, 100]                30,100
├─LeakyReLU: 1-15                        [128, 100]                --
├─Dropout: 1-16                          [128, 100]                --
├─Linear: 1-17                           [128, 1]                  101
├─Flatten: 1-18                          [128]                     --
==========================================================================================
Total params: 1,169,801
Trainable params: 1,169,801
```

* <font color='brown'>(**#**)</font> One may alter the batch size.

In [None]:
# Model
# Defining a sequential model.

numFeatures = mX.shape[1]

def GetModel( dropP: float, α: float = 0.1 ) -> nn.Module:
    oModel = nn.Sequential(
        nn.Identity(), #<! Allows seeing the dimensions of the input
        #===========================Fill This===========================#
        # 1. Define the model layers.
        # !! Use `dropP` for the Dropout layers.
        ?????
        #===============================================================#
        nn.Flatten(start_dim = 0)
        )
    
    return oModel



* <font color='brown'>(**#**)</font> Dropout and _LeakyReLU_ / _ReLU_ are mathematically commutative.
* <font color='red'>(**?**)</font> Why is there no Dropout on the last layer?

In [None]:
# Model Summary

oModel = GetModel(dropP, α)
torchinfo.summary(oModel, tX.shape, device = 'cpu')
# torchinfo.summary(oModel, tX.shape, col_names = ['input_size', 'output_size', 'num_params'], device = 'cpu') #<! See input, hence Identity is redundant

* <font color='brown'>(**#**)</font> Pay attention the dropout parameter of PyTorch is about the probability to zero out the value.

In [None]:
# Initialization Function

def InitWeights( oLayer: nn.Module ) -> None:
        if isinstance(oLayer, nn.Linear):
            nn.init.kaiming_normal_(oLayer.weight.data) #<! Only on weights, not biases

In [None]:
# Apply Manual Initialization

oModel.apply(InitWeights) #<! Applies the function on all layers

In [None]:
# Run Model
# Apply a test run.

mXX   = torch.randn(batchSize, numFeatures)
vYHat = oModel(mXX)

print(f'The input dimensions: {mXX.shape}')
print(f'The output dimensions: {vYHat.shape}')

## Training Loop


### TensorBoard

[TensorBoard](https://www.tensorflow.org/tensorboard) is a tool to analyze runs of models.  
The concept is to save data to HD while running and display it using the server.

Using _TensorBoard_ is based on:

 * Defining a `SummaryWriter` object which documents a session.
 * Using the `SummaryWriter`'s method to add data: Scalars, Images, etc...

</br>

* <font color='brown'>(**#**)</font> While [TensorBoard](https://www.tensorflow.org/tensorboard) is common in the DL world, it might used to handle any ML analysis.
* <font color='brown'>(**#**)</font> See [`torch.utils.tensorboard.writer.SummaryWriter`](https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard) documentation.
* <font color='brown'>(**#**)</font> Alternatives: [ClearML](https://clear.ml), [Weights & Biases](https://wandb.ai), [ML Flow](https://mlflow.org).

In [None]:
# Define the Writer
oTBWriter = SummaryWriter(log_dir = os.path.join(TENSOR_BOARD_BASE, 'Test'))
oTBWriter.add_graph(oModel, mXX) #<! Graph of the Model
oTBWriter.close()

* <font color='brown'>(**#**)</font> Alternatives to visualize a net: [StackOverflow - Visualize a PyTorch Model](https://stackoverflow.com/questions/52468956), [Tools to Design or Visualize Architecture of Neural Network](https://github.com/ashishpatel26/Tools-to-Design-or-Visualize-Architecture-of-Neural-Network).

### PyTorch Epoch

In [None]:
# Run Epoch PyTorch

def RunEpoch( oModel: nn.Module, dlData: DataLoader, hL: Callable, hS: Callable, oOpt: Optional[Optimizer] = None, opMode: NNMode = NNMode.TRAIN ) -> Tuple[float, float]:
    """
    Runs a single Epoch (Train / Test) of a model.  
    Input:
        oModel      - PyTorch `nn.Module` object.
        dlData      - PyTorch `Dataloader` object.
        hL          - Callable for the Loss function.
        hS          - Callable for the Score function.
        oOpt        - PyTorch `Optimizer` object.
        opMode      - An `NNMode` to set the mode of operation.
    Output:
        valLoss     - Scalar of the loss.
        valScore    - Scalar of the score.
    Remarks:
      - The `oDataSet` object returns a Tuple of (mX, vY) per batch.
      - The `hL` function should accept the `vY` (Reference target) and `mZ` (Output of the NN).  
        It should return a Tuple of `valLoss` (Scalar of the loss) and `mDz` (Gradient by the loss).
      - The `hS` function should accept the `vY` (Reference target) and `mZ` (Output of the NN).  
        It should return a scalar `valScore` of the score.
      - The optimizer is required for training mode.
    """
    
    epochLoss   = 0.0
    epochScore  = 0.0
    numSamples  = 0
    numBatches = len(dlData)

    runDevice = next(oModel.parameters()).device #<! CPU \ GPU

    if opMode == NNMode.TRAIN:
        oModel.train(True) #<! Equivalent of `oModel.train()`
    elif opMode == NNMode.INFERENCE:
        oModel.eval() #<! Equivalent of `oModel.train(False)`
    else:
        raise ValueError(f'The `opMode` value {opMode} is not supported!')
    
    for ii, (mX, vY) in enumerate(dlData):
        # Move Data to Model's device
        mX = mX.to(runDevice) #<! Lazy
        vY = vY.to(runDevice) #<! Lazy


        batchSize = mX.shape[0]
        
        if opMode == NNMode.TRAIN:
            # Forward
            mZ      = oModel(mX) #<! Model output
            valLoss = hL(mZ, vY) #<! Loss
            
            # Backward
            oOpt.zero_grad()    #<! Set gradients to zeros
            valLoss.backward()  #<! Backward
            oOpt.step()         #<! Update parameters
            oModel.eval()       #<! Set layers for inference mode
        else: #<! Value of `opMode` was already validated
            with torch.no_grad():
                # No computational 
                mZ      = oModel(mX) #<! Model output
                valLoss = hL(mZ, vY) #<! Loss

        with torch.no_grad():
            # Score
            valScore = hS(mZ, vY)
            # Normalize so each sample has the same weight
            epochLoss  += batchSize * valLoss.item()
            epochScore += batchSize * valScore.item()
            numSamples += batchSize

        print(f'\r{"Train" if opMode == NNMode.TRAIN else "Val"} - Iteration: {ii:3d} ({numBatches}): loss = {valLoss:.6f}', end = '')
    
    print('', end = '\r')
            
    return epochLoss / numSamples, epochScore / numSamples

* <font color='brown'>(**#**)</font> One could `with torch.inference_mode():` for inference mode.  
  See [Inference in PyTorch: What Do the Wrappers Mean](https://muellerzr.github.io/blog/PyTorchInference.html).

### PyTorch Training Loop

In [None]:
# Training Model Loop Function

def TrainModel( oModel: nn.Module, dlTrain: DataLoader, dlVal: DataLoader, oOpt: Optimizer, numEpoch: int, hL: Callable, hS: Callable , oTBWriter: Optional[SummaryWriter] = None) -> Tuple[nn.Module, List, List, List, List]:

    lTrainLoss  = []
    lTrainScore = []
    lValLoss    = []
    lValScore   = []

    #!!!
    # Support R2
    bestScore = -1e9 #<! Assuming higher is better
    #!!!

    for ii in range(numEpoch):
        startTime           = time.time()
        trainLoss, trainScr = RunEpoch(oModel, dlTrain, hL, hS, oOpt, opMode = NNMode.TRAIN) #<! Train
        valLoss,   valScr   = RunEpoch(oModel, dlVal, hL, hS, oOpt, opMode = NNMode.INFERENCE)    #<! Score Validation
        epochTime           = time.time() - startTime

        # Aggregate Results
        lTrainLoss.append(trainLoss)
        lTrainScore.append(trainScr)
        lValLoss.append(valLoss)
        lValScore.append(valScr)

        #!!!
        if oTBWriter is not None:
            oTBWriter.add_scalar('Train Loss', trainLoss, ii)
            oTBWriter.add_scalar('Train Score', trainScr, ii)
            oTBWriter.add_scalar('Validation Loss', valLoss, ii)
            oTBWriter.add_scalar('Validation Score', valScr, ii)
        #!!!
        
        # Display (Babysitting)
        print('Epoch '              f'{(ii + 1):4d} / ' f'{numEpoch}:', end = '')
        print(' | Train Loss: '     f'{trainLoss          :6.3f}', end = '')
        print(' | Val Loss: '       f'{valLoss            :6.3f}', end = '')
        print(' | Train Score: '    f'{trainScr           :6.3f}', end = '')
        print(' | Val Score: '      f'{valScr             :6.3f}', end = '')
        print(' | Epoch Time: '     f'{epochTime          :5.2f}', end = '')

        # Save best model ("Early Stopping")
        if valScr > bestScore:
            bestScore = valScr
            print(' | <-- Checkpoint!', end = '')
            try:
                dCheckpoint = {'Model' : oModel.state_dict(), 'Optimizer' : oOpt.state_dict()}
                torch.save(dCheckpoint, 'BestModel.pt')
            except:
                pass
        print(' |')
    
    # Load best model ("Early Stopping")
    dCheckpoint = torch.load('BestModel.pt')
    oModel.load_state_dict(dCheckpoint['Model'])

    return oModel, lTrainLoss, lTrainScore, lValLoss, lValScore


* <font color='red'>(**?**)</font> Why is the state of the optimizer saved as well?

## Train the Model

In [None]:
# Check GPU Availability

runDevice = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') #<! The 1st CUDA device

In [None]:
# Set the Loss & Score

hL = nn.MSELoss()
hS = R2Score(num_outputs = 1)
hS = hS.to(runDevice)

In [None]:
# Train the Model

for ii, λ in enumerate(lλ):
    # Hyper Parameter Loop
    oTBWriter = SummaryWriter(log_dir = os.path.join(TENSOR_BOARD_BASE, f'Cali{ii:03d}'))
    # oRunModel = GetModel(dropP, α)
    oRunModel = copy.deepcopy(oModel) #<! All models with the same initialization
    oRunModel = oRunModel.to(runDevice) #<! Transfer model to device
    oOpt = torch.optim.AdamW(oRunModel.parameters(), lr = 1e-4, betas = (0.9, 0.99), weight_decay = λ) #<! Define optimizer
    oRunModel, lTrainLoss, lTrainScore, lValLoss, lValScore = TrainModel(oRunModel, dlTrain, dlVal, oOpt, nEpochs, hL, hS, oTBWriter)
    oTBWriter.close()

* <font color='red'>(**?**)</font> If all `λ` were the same, will all `oRunModel` give the same output? Think about the Dropout layer.
* <font color='green'>(**@**)</font> Optimize model / hyper parameters to get ${R}^{2} \approx 0.82$.

### TensorBoard Results

 1. Open _Command Line_ (`cmd` on Windows).
 2. Change the path to the notebook folder.
 3. Run `tensorboard --logdir=TB`.
 4. Open the browser at the given address.