[![Fixel Algorithms](https://i.imgur.com/AqKHVZ0.png)](https://fixelalgorithms.gitlab.io)

# AI Program

## Machine Learning - Deep Learning - PyTorch Schedulers

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 1.0.000 | 11/05/2024 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/AIProgram/2024_02/0089DeepLearningPyTorchSchedulers.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning

# Deep Learning
import torch
import torch.nn            as nn
import torch.nn.functional as F
from torch.optim.optimizer import Optimizer
from torch.optim.lr_scheduler import LRScheduler
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
import torchinfo
from torchmetrics.classification import MulticlassAccuracy
import torchvision

# Miscellaneous
import copy
import math
import os
from platform import python_version
import random
import time

# Typing
from typing import Callable, Dict, Generator, List, Optional, Self, Set, Tuple, Union

# Visualization
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython
from IPython.display import HTML, Image
from IPython.display import display
from ipywidgets import Dropdown, FloatSlider, interact, IntSlider, Layout, SelectionSlider
from ipywidgets import interact

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```python
someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler
```

### Code Exercise

 - Single line fill

```python
valToFill = ???
```

 - Multi Line to Fill (At least one)

```python
# You need to start writing
?????
```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

?????
#===============================================================#
```

In [None]:
# Configuration
# %matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# Matplotlib default color palette
lMatPltLibclr = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

# Improve performance by benchmarking
torch.backends.cudnn.benchmark = True

# Reproducibility (Per PyTorch Version on the same device)
# torch.manual_seed(seedNum)
# torch.backends.cudnn.deterministic = True
# torch.backends.cudnn.benchmark     = False #<! Makes things slower


In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2

D_CLASSES_CIFAR_10  = {0: 'Airplane', 1: 'Automobile', 2: 'Bird', 3: 'Cat', 4: 'Deer', 5: 'Dog', 6: 'Frog', 7: 'Horse', 8: 'Ship', 9: 'Truck'}
L_CLASSES_CIFAR_10  = ['Airplane', 'Automobile', 'Bird', 'Cat', 'Deer', 'Dog', 'Frog', 'Horse', 'Ship', 'Truck']
T_IMG_SIZE_CIFAR_10 = (32, 32, 3)

DATA_FOLDER_PATH    = 'Data'
TENSOR_BOARD_BASE   = 'TB'


In [None]:
# Download Auxiliary Modules for Google Colab
if runInGoogleColab:
    !wget https://raw.githubusercontent.com/FixelAlgorithmsTeam/FixelCourses/master/AIProgram/2024_02/DataManipulation.py
    !wget https://raw.githubusercontent.com/FixelAlgorithmsTeam/FixelCourses/master/AIProgram/2024_02/DataVisualization.py
    !wget https://raw.githubusercontent.com/FixelAlgorithmsTeam/FixelCourses/master/AIProgram/2024_02/DeepLearningPyTorch.py

In [None]:
# Courses Packages

from DataVisualization import PlotLabelsHistogram, PlotMnistImages
from DeepLearningPyTorch import NNMode
from DeepLearningPyTorch import InitWeightsKaiNorm, TrainModel


In [None]:
# General Auxiliary Functions


## PyTorch Schedulers

PyTorch _Schedulers_ are functions which alters the learning rate by event: Iteration index value update, loss function value update, etc...  
The scheduling of the _learning rate_ can assist with better convergence, both in speed and "quality" (Wide basin).

One could implement schedulers manually as part of the training loop, yet PyTorch offers some built in recipes which are easier to use.

The notebook presents:

 * The concept of _Schedulers_.
 * Compares the result of training loop with different schedulers.


</br>

* <font color='brown'>(**#**)</font> [YouTube - Sebastian Raschka - Learning Rate Schedulers in PyTorch](https://www.youtube.com/watch?v=tB1rz4L93JA).
* <font color='brown'>(**#**)</font> [PyTorch Training Performance Guide - LR Schedulers, Adaptive Optimizers](https://residentmario.github.io/pytorch-training-performance-guide/lr-sched-and-optim.html).
* <font color='brown'>(**#**)</font> [Guide to Pytorch Learning Rate Scheduling](https://www.kaggle.com/code/isbhargav/guide-to-pytorch-learning-rate-scheduling).
* <font color='brown'>(**#**)</font> [Distill - Why Momentum Really Works](https://distill.pub/2017/momentum).

In [None]:
# Parameters

# Data

# Model
dropP = 0.5 #<! Dropout Layer

# Training
batchSize   = 256
numWork     = 2 #<! Number of workers
nEpochs     = 10

# Visualization
numImg = 3


## Generate / Load Data

Load the [CIFAR 10 Data Set](https://en.wikipedia.org/wiki/CIFAR-10).  
It is composed of 60,000 RGB images of size `32x32` with 10 classes uniformly spread.

* <font color='brown'>(**#**)</font> The dataset is retrieved using [Torch Vision](https://pytorch.org/vision/stable/index.html)'s built in datasets.  


In [None]:
# Load Data

# PyTorch 
dsTrain = torchvision.datasets.CIFAR10(root = DATA_FOLDER_PATH, train = True,  download = True, transform = torchvision.transforms.ToTensor())
dsTest  = torchvision.datasets.CIFAR10(root = DATA_FOLDER_PATH, train = False, download = True, transform = torchvision.transforms.ToTensor())
lClass  = dsTrain.classes


print(f'The training data set data shape: {dsTrain.data.shape}')
print(f'The test data set data shape: {dsTest.data.shape}')
print(f'The unique values of the labels: {np.unique(lClass)}')

* <font color='brown'>(**#**)</font> The dataset is indexible (Subscriptable). It returns a tuple of the features and the label.
* <font color='brown'>(**#**)</font> While data is arranged as `H x W x C` the transformer, when accessing the data, will convert it into `C x H x W`. 

In [None]:
# Element of the Data Set

mX, valY = dsTrain[0]

print(f'The features shape: {mX.shape}')
print(f'The label value: {valY}')

### Plot the Data

In [None]:
# Extract Data

tX = dsTrain.data #<! NumPy Tensor (NDarray)
mX = np.reshape(tX, (tX.shape[0], -1))
vY = dsTrain.targets #<! NumPy Vector


In [None]:
# Plot the Data

hF = PlotMnistImages(mX, vY, numImg, tuImgSize = T_IMG_SIZE_CIFAR_10)

In [None]:
# Histogram of Labels

hA = PlotLabelsHistogram(vY, lClass = L_CLASSES_CIFAR_10)
plt.show()

## Pre Process Data

This section normalizes the data to have zero mean and unit variance per **channel**.  
It is required to calculate:

 * The average pixel value per channel.
 * The standard deviation per channel.

</br>

* <font color='brown'>(**#**)</font> The values calculated on the train set and applied to both sets.
* <font color='brown'>(**#**)</font> The the data will be used to pre process the image on loading by the `transformer`.
* <font color='brown'>(**#**)</font> There packages which specializes in transforms: [`Kornia`](https://github.com/kornia/kornia), [`Albumentations`](https://github.com/albumentations-team/albumentations).  
  They are commonly used for _Data Augmentation_ at scale.

* <font color='red'>(**?**)</font> What do you expect the mean value to be?
* <font color='red'>(**?**)</font> What do you expect the standard deviation value to be?

In [None]:
# Calculate the Standardization Parameters
vMean = np.mean(dsTrain.data / 255.0, axis = (0, 1, 2))
vStd  = np.std(dsTest.data / 255.0, axis = (0, 1, 2))

print('µ =', vMean)
print('σ =', vStd)

In [None]:
# Update Transformer

oDataTrns = torchvision.transforms.Compose([           #<! Chaining transformations
    torchvision.transforms.ToTensor(),                 #<! Convert to Tensor (C x H x W), Normalizes into [0, 1] (https://pytorch.org/vision/main/generated/torchvision.transforms.ToTensor.html)
    torchvision.transforms.Normalize(vMean, vStd), #<! Normalizes the Data (https://pytorch.org/vision/main/generated/torchvision.transforms.Normalize.html)
    ])

# Update the DS transformer
dsTrain.transform = oDataTrns
dsTest.transform  = oDataTrns

In [None]:
# "Normalized" Image

mX, valY = dsTrain[5]

hF, hA = plt.subplots()
hImg = hA.imshow(np.transpose(mX, (1, 2, 0)))
hF.colorbar(hImg)
plt.show()

### Data Loaders

The dataloader is the functionality which loads the data into memory in batches.  
Its challenge is to bring data fast enough so the Hard Disk is not the training bottleneck.  
In order to achieve that, Multi Threading / Multi Process is used.

* <font color='brown'>(**#**)</font> The multi process, by the `num_workers` parameter is not working well _out of the box_ on Windows.  
  See [Errors When Using `num_workers > 0` in `DataLoader`](https://discuss.pytorch.org/t/97564), [On Windows `DataLoader` with `num_workers > 0` Is Slow](https://github.com/pytorch/pytorch/issues/12831).  
  A way to overcome it is to define the training loop as a function in a different module (File) and import it (https://discuss.pytorch.org/t/97564/4, https://discuss.pytorch.org/t/121588/21). 
* <font color='brown'>(**#**)</font> The `num_workers` should be set to the lowest number which feeds the GPU fast enough.  
  The idea is preserve as much as CPU resources to other tasks.
* <font color='brown'>(**#**)</font> On Windows keep the `persistent_workers` parameter to `True` (_Windows_ is slower on forking processes / threads).
* <font color='brown'>(**#**)</font> The Dataloader is a generator which can be looped on.
* <font color='brown'>(**#**)</font> In order to make it iterable it has to be wrapped with `iter()`.

In [None]:
# Data Loader

dlTrain  = torch.utils.data.DataLoader(dsTrain, shuffle = True, batch_size = 1 * batchSize, num_workers = numWork, persistent_workers = True)
dlTest   = torch.utils.data.DataLoader(dsTest, shuffle = False, batch_size = 2 * batchSize, num_workers = numWork, persistent_workers = True)

* <font color='red'>(**?**)</font> Why is the size of the batch twice as big for the test dataset?

In [None]:
# Iterate on the Loader
# The first batch.
tX, vY = next(iter(dlTrain)) #<! PyTorch Tensors

print(f'The batch features dimensions: {tX.shape}')
print(f'The batch labels dimensions: {vY.shape}')

In [None]:
# Looping
for ii, (tX, vY) in zip(range(1), dlTest): #<! https://stackoverflow.com/questions/36106712
    print(f'The batch features dimensions: {tX.shape}')
    print(f'The batch labels dimensions: {vY.shape}')

## Define the Model

The model is defined as a sequential model.

In [None]:
# Model
# Defining a sequential model.

numFeatures = np.prod(tX.shape[1:])

oModel = nn.Sequential(
    nn.Identity(),
        
    nn.Conv2d(3,   32,  3, bias = False), nn.BatchNorm2d(32),                   nn.ReLU(),
    nn.Conv2d(32,  64,  3, bias = False), nn.BatchNorm2d(64),  nn.MaxPool2d(2), nn.ReLU(),
    nn.Conv2d(64,  128, 3, bias = False), nn.BatchNorm2d(128), nn.MaxPool2d(2), nn.ReLU(),
    nn.Conv2d(128, 256, 3, bias = False), nn.BatchNorm2d(256),                  nn.ReLU(),
    nn.Conv2d(256, 256, 3, bias = False), nn.BatchNorm2d(256),                  nn.ReLU(),
    
    nn.AdaptiveAvgPool2d(1),
    nn.Flatten(),
    nn.Linear(256, len(lClass)),
)

torchinfo.summary(oModel, tX.shape, col_names = ['kernel_size', 'output_size', 'num_params'], device = 'cpu')

* <font color='red'>(**?**)</font> Why is `bias = False` used above?
* <font color='brown'>(**#**)</font> Using a multiplication by 8 number of channels accelerate the run time (In most cases).
* <font color='brown'>(**#**)</font> Pay attention to model size and the RAM fo the GPU. Rule of thumb, up to ~40%.

## Train the Model

This section trains the model using different schedulers:

 - Updates the training function.
 - Updates the _epoch_ function to log information at mini batch level.
 - Create a class for a logger of TensorBoard.

In [None]:
# Run Device

runDevice = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') #<! The 1st CUDA device

In [None]:
# Loss and Score Function

hL = nn.CrossEntropyLoss()
hS = MulticlassAccuracy(num_classes = len(lClass), average = 'micro')
hL = hL.to(runDevice) #<! Not required!
hS = hS.to(runDevice)

### Schedulers

![](https://raw.githubusercontent.com/FixelAlgorithmsTeam/FixelCourses/master/DeepLearningMethods/07_PyTorch2/Schedulers.PNG)

The motivation of the scheduling is to increase the chances to:

1. Moving fast towards minimum.
2. Avoid being stuck in "bad" local minimum (Sharp, Narrow).
3. Finding a "good" local minimum (Deep, Wide).

There are few common policies:

1. Linear  
   Linearly interpolate between a starting _learning rate_ and a final _learning rate_.  
   It can be used for constant _learning rate_ as well.
2. Exponential   
   Multiplies the _learning rate_ by a constant at each step.
3. Cosine Annealing  
   Similar to the Linear policy with smoother operation by using the fall of a cosine.
4. Cyclic  
   The _learning rate_ is a damped (Optionally) saw tooth function.  
   Going up occasionally empirically proved to be effective avoiding "bad" stationary points.
5. One Cycle  
   Goes up and down a single time asymmetrically.


### Step Policy  

It used to be common to update the _learning rate_ at the end of an epoch.  
Yet as data sets have become large it commonly updated at the end of each mini batch since the number of epochs might be low.


* <font color='brown'>(**#**)</font> The implementation in this notebook applies the step per mini batch iteration.  
  Yet the function in `DeepLearningPyTorch.py` is at the _epoch_ level.
* <font color='brown'>(**#**)</font> More schedulers are available at the [`torch.optim`](https://pytorch.org/docs/stable/optim.html) page.
* <font color='brown'>(**#**)</font> PyTorch has the flexibility of assigning different learning rate per module.

In [None]:
# Schedulers (Demo)

nIter           = 5_0000
baseLearnRate   = 0.1

lScheduler = [
    ('Constant', torch.optim.lr_scheduler.LinearLR, {'start_factor': 1.0}),
    ('Linear', torch.optim.lr_scheduler.LinearLR, {'start_factor': 1.0, 'end_factor': 0.01, 'total_iters': nIter}),
    ('Exponential', torch.optim.lr_scheduler.ExponentialLR, {'gamma': 0.99985}),
    ('Cosine', torch.optim.lr_scheduler.CosineAnnealingLR, {'T_max': nIter} ),
    ('Cyclic', torch.optim.lr_scheduler.CyclicLR, {'base_lr': 1e-6, 'max_lr': baseLearnRate, 'step_size_up': nIter // 6, 'step_size_down' : nIter // 6, 'mode':'triangular2', 'cycle_momentum': False}),
    ('OneCycle', torch.optim.lr_scheduler.OneCycleLR, {'max_lr': baseLearnRate, 'total_steps': nIter}),
]

In [None]:
# Draw the Step Size
# Scheduler require an optimizer.
# Optimizer requires parameters.

numSched    = len(lScheduler)
mStepSize   = np.full(shape = (numSched, nIter + 1), fill_value = np.nan)

for ii, (_, SchedCls, dParams) in enumerate(lScheduler):
    # ii: The iteration used for the scheduler
    oModelTmp = copy.deepcopy(oModel) #<! Dummy model
    oOpt = torch.optim.SGD(oModelTmp.parameters(), lr = baseLearnRate) #<! Define optimizer
    oSched = SchedCls(oOpt, **dParams)
    for jj in range(nIter):
        mStepSize[ii, jj] = oSched.get_last_lr()[0]
        oSched.step()
    jj += 1
    mStepSize[ii, jj] = oSched.get_last_lr()[0] #<! Last iteration


In [None]:
# Plot the Learning Rate

hF, hA = plt.subplots(figsize = (10, 5))

for ii, (schedStr, *_) in enumerate(lScheduler):
    hA.plot(mStepSize[ii], label = schedStr)
hA.legend()
hA.set_title(f'Learning Rate Schedulers, Base Learning Rate: {baseLearnRate: 0.2f}')
hA.set_xlabel('Iteration')
hA.set_ylabel('Learning Rate')

plt.show();

* <font color='brown'>(**#**)</font> Schedulers are set per iteration (Batch) or epoch.

In [None]:
# Logger 
# Wrapper of TensorBoard's `SummaryWriter` with index for iteration and epoch.

class TBLogger():
    def __init__( self, logDir: Optional[str] = None ) -> None:

        self.oTBWriter  = SummaryWriter(log_dir = logDir)
        self.iiEpcoh    = 0
        self.iiItr      = 0
        
        pass

    def close( self ) -> None:

        self.oTBWriter.close()

In [None]:
# Training Epoch
def RunEpoch( oModel: nn.Module, dlData: DataLoader, hL: Callable, hS: Callable, oOpt: Optional[Optimizer] = None, oSch: Optional[LRScheduler] = None, opMode: NNMode = NNMode.TRAIN, oTBLogger: Optional[TBLogger] = None ) -> Tuple[float, float]:
    """
    Runs a single Epoch (Train / Test) of a model.  
    Input:
        oModel      - PyTorch `nn.Module` object.
        dlData      - PyTorch `Dataloader` object.
        hL          - Callable for the Loss function.
        hS          - Callable for the Score function.
        oOpt        - PyTorch `Optimizer` object.
        oSch        - PyTorch `Scheduler` (`LRScheduler`) object.
        opMode      - An `NNMode` to set the mode of operation.
        oTBLogger   - An `TBLogger` object.
    Output:
        valLoss     - Scalar of the loss.
        valScore    - Scalar of the score.
        learnRate   - Scalar of the average learning rate over the epoch.
    Remarks:
      - The `oDataSet` object returns a Tuple of (mX, vY) per batch.
      - The `hL` function should accept the `vY` (Reference target) and `mZ` (Output of the NN).  
        It should return a Tuple of `valLoss` (Scalar of the loss) and `mDz` (Gradient by the loss).
      - The `hS` function should accept the `vY` (Reference target) and `mZ` (Output of the NN).  
        It should return a scalar `valScore` of the score.
      - The optimizer / scheduler are required for training mode.
    """
    
    epochLoss   = 0.0
    epochScore  = 0.0
    numSamples  = 0
    #!!!
    epochLr     = 0.0
    #!!!
    numBatches = len(dlData)

    runDevice = next(oModel.parameters()).device #<! CPU \ GPU

    if opMode == NNMode.TRAIN:
        oModel.train(True) #<! Equivalent of `oModel.train()`
    elif opMode == NNMode.INFERENCE:
        oModel.eval() #<! Equivalent of `oModel.train(False)`
    else:
        raise ValueError(f'The `opMode` value {opMode} is not supported!')
    
    for ii, (mX, vY) in enumerate(dlData):
        # Move Data to Model's device
        mX = mX.to(runDevice) #<! Lazy
        vY = vY.to(runDevice) #<! Lazy


        batchSize = mX.shape[0]
        
        if opMode == NNMode.TRAIN:
            # Forward
            mZ      = oModel(mX) #<! Model output
            valLoss = hL(mZ, vY) #<! Loss
            
            # Backward
            oOpt.zero_grad()    #<! Set gradients to zeros
            valLoss.backward()  #<! Backward
            oOpt.step()         #<! Update parameters

            #!!!
            learnRate = oSch.get_last_lr()[0]
            oSch.step() #<! Update learning rate
            #!!!
            
            oModel.eval() #<! Set layers for inference mode

        else: #<! Value of `opMode` was already validated
            with torch.no_grad():
                # No computational 
                mZ      = oModel(mX) #<! Model output
                valLoss = hL(mZ, vY) #<! Loss
                
                learnRate = 0.0

        with torch.no_grad():
            # Score
            valScore = hS(mZ, vY)
            # Normalize so each sample has the same weight
            epochLoss  += batchSize * valLoss.item()
            epochScore += batchSize * valScore.item()
            epochLr    += batchSize * learnRate
            numSamples += batchSize

            #!!!
            if (oTBLogger is not None) and (opMode == NNMode.TRAIN):
                # Logging at Iteration level for training
                oTBLogger.iiItr += 1
                oTBLogger.oTBWriter.add_scalar('Train Loss', valLoss.item(), oTBLogger.iiItr)
                oTBLogger.oTBWriter.add_scalar('Train Score', valScore.item(), oTBLogger.iiItr)
                oTBLogger.oTBWriter.add_scalar('Learning Rate', learnRate, oTBLogger.iiItr)
            #!!!

        print(f'\r{"Train" if opMode == NNMode.TRAIN else "Val"} - Iteration: {ii:3d} ({numBatches}): loss = {valLoss:.6f}', end = '')
    
    print('', end = '\r')
            
    return epochLoss / numSamples, epochScore / numSamples, epochLr / numSamples

In [None]:
# Training Loop
def TrainModel( oModel: nn.Module, dlTrain: DataLoader, dlVal: DataLoader, oOpt: Optimizer, oSch: LRScheduler, numEpoch: int, hL: Callable, hS: Callable, oTBLogger: Optional[TBLogger] = None ) -> Tuple[nn.Module, List, List, List, List]:

    lTrainLoss  = []
    lTrainScore = []
    lValLoss    = []
    lValScore   = []
    #!!!
    lLearnRate  = []
    #!!!

    # Support R2
    bestScore = -1e9 #<! Assuming higher is better

    for ii in range(numEpoch):
        startTime                       = time.time()
        #!!!
        trainLoss, trainScr, trainLr    = RunEpoch(oModel, dlTrain, hL, hS, oOpt, oSch, opMode = NNMode.TRAIN, oTBLogger = oTBLogger) #<! Train
        #!!!
        valLoss,   valScr, _            = RunEpoch(oModel, dlVal, hL, hS, opMode = NNMode.INFERENCE)    #<! Score Validation
        epochTime                       = time.time() - startTime

        # Aggregate Results
        lTrainLoss.append(trainLoss)
        lTrainScore.append(trainScr)
        lValLoss.append(valLoss)
        lValScore.append(valScr)
        #!!!
        lLearnRate.append(trainLr)
        #!!!

        if oTBLogger is not None:
            #!!!
            oTBLogger.iiEpcoh += 1
            oTBLogger.oTBWriter.add_scalars('Loss (Epoch)', {'Train': trainLoss, 'Validation': valLoss}, ii)
            oTBLogger.oTBWriter.add_scalars('Score (Epoch)', {'Train': trainScr, 'Validation': valScr}, ii)
            oTBLogger.oTBWriter.add_scalar('Learning Rate (Epoch)', trainLr, ii)
            oTBLogger.oTBWriter.flush()
            #!!!
        
        # Display (Babysitting)
        print('Epoch '              f'{(ii + 1):4d} / ' f'{numEpoch}:', end = '')
        print(' | Train Loss: '     f'{trainLoss          :6.3f}', end = '')
        print(' | Val Loss: '       f'{valLoss            :6.3f}', end = '')
        print(' | Train Score: '    f'{trainScr           :6.3f}', end = '')
        print(' | Val Score: '      f'{valScr             :6.3f}', end = '')
        print(' | Epoch Time: '     f'{epochTime          :5.2f}', end = '')

        # Save best model ("Early Stopping")
        if valScr > bestScore:
            bestScore = valScr
            print(' | <-- Checkpoint!', end = '')
            try:
                #!!!
                dCheckpoint = {'Model' : oModel.state_dict(), 'Optimizer' : oOpt.state_dict(), 'Scheduler': oSch.state_dict()}
                #!!!
                torch.save(dCheckpoint, 'BestModel.pt')
            except:
                pass
        print(' |')
    
    # Load best model ("Early Stopping")
    dCheckpoint = torch.load('BestModel.pt')
    oModel.load_state_dict(dCheckpoint['Model'])

    return oModel, lTrainLoss, lTrainScore, lValLoss, lValScore, lLearnRate

In [None]:
# Set Schedulers

nIter         = nEpochs * len(dlTrain)
baseLearnRate = 1e-2

lScheduler = [
    ('Constant', torch.optim.lr_scheduler.LinearLR, {'start_factor': 1.0}),
    ('Linear', torch.optim.lr_scheduler.LinearLR, {'start_factor': 1.0, 'end_factor': 0.01, 'total_iters': nIter}),
    ('Exponential', torch.optim.lr_scheduler.ExponentialLR, {'gamma': 0.997}),
    ('Cosine', torch.optim.lr_scheduler.CosineAnnealingLR, {'T_max': nIter} ),
    ('Cyclic', torch.optim.lr_scheduler.CyclicLR, {'base_lr': 1e-6, 'max_lr': baseLearnRate, 'step_size_up': nIter // 6, 'step_size_down': nIter // 6, 'mode': 'triangular2', 'cycle_momentum': False}),
    ('OneCycle', torch.optim.lr_scheduler.OneCycleLR, {'max_lr': baseLearnRate, 'total_steps': nIter}),
]

* <font color='brown'>(**#**)</font> Some schedulers (For instance `OneCycleLR`) do not allow iterations beyond what is defined.
* <font color='brown'>(**#**)</font> Some schedulers are score / loss event driven. See `torch.optim.ReduceLROnPlateau`.

In [None]:
# Train Model

dModelHist = {}

for ii, (schedName, SchedCls, dSchedParam) in enumerate(lScheduler):
    print(f'Training with the {schedName} scheduler')
    oRunModel = copy.deepcopy(oModel)
    oRunModel = oRunModel.to(runDevice) #<! Transfer model to device
    oOpt = torch.optim.AdamW(oRunModel.parameters(), lr = baseLearnRate, betas = (0.9, 0.99), weight_decay = 1e-4) #<! Define optimizer
    oScd = SchedCls(oOpt, **dSchedParam)
    oTBLogger = TBLogger(logDir = os.path.join(TENSOR_BOARD_BASE, f'{schedName}'))
    _, lTrainLoss, lTrainScore, lValLoss, lValScore, lLearnRate = TrainModel(oRunModel, dlTrain, dlTest, oOpt, oScd, nEpochs, hL, hS, oTBLogger)
    dModelHist[schedName] = lTrainLoss, lTrainScore, lValLoss, lValScore, lLearnRate
    oTBLogger.close()

* <font color='brown'>(**#**)</font> A tuned combination of the optimizer and scheduler hyper parameter might give a different result.
* <font color='blue'>(**!**)</font> Display results: Learning Rate, Train Loss, Validation Score using MatPlotLib.