[![Fixel Algorithms](https://i.imgur.com/AqKHVZ0.png)](https://fixelalgorithms.gitlab.io)

# AI Program

## Machine Learning - Deep Learning - PyTorch Hooks

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 1.0.000 | 07/05/2024 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/AIProgram/2024_02/0088DeepLearningPyTorchHooks.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning

# Deep Learning
import torch
import torch.nn            as nn
import torch.nn.functional as F
from torch.optim.optimizer import Optimizer
from torch.utils.data import DataLoader
import torchinfo
from torchmetrics.classification import MulticlassAccuracy
import torchvision

# Miscellaneous
import copy
import math
import os
from platform import python_version
import random
import time

# Typing
from typing import Callable, Dict, Generator, List, Optional, Self, Set, Tuple, Union

# Visualization
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython
from IPython.display import HTML, Image
from IPython.display import display
from ipywidgets import Dropdown, FloatSlider, interact, IntSlider, Layout, SelectionSlider
from ipywidgets import interact

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```python
someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler
```

### Code Exercise

 - Single line fill

```python
valToFill = ???
```

 - Multi Line to Fill (At least one)

```python
# You need to start writing
?????
```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

?????
#===============================================================#
```

In [None]:
# Configuration
# %matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# Matplotlib default color palette
lMatPltLibclr = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

# Improve performance by benchmarking
torch.backends.cudnn.benchmark = True

# Reproducibility (Per PyTorch Version on the same device)
# torch.manual_seed(seedNum)
# torch.backends.cudnn.deterministic = True
# torch.backends.cudnn.benchmark     = False #<! Makes things slower


In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2

DATA_FOLDER_PATH = 'Data'


In [None]:
# Download Auxiliary Modules for Google Colab
if runInGoogleColab:
    !wget https://raw.githubusercontent.com/FixelAlgorithmsTeam/FixelCourses/master/AIProgram/2024_02/DataManipulation.py
    !wget https://raw.githubusercontent.com/FixelAlgorithmsTeam/FixelCourses/master/AIProgram/2024_02/DataVisualization.py
    !wget https://raw.githubusercontent.com/FixelAlgorithmsTeam/FixelCourses/master/AIProgram/2024_02/DeepLearningPyTorch.py

In [None]:
# Courses Packages

from DataVisualization import PlotLabelsHistogram, PlotMnistImages
from DeepLearningPyTorch import TrainModel


In [None]:
# General Auxiliary Functions


## PyTorch Hooks

PyTorch _Hooks_ / _Callbacks_ are _event driven functions_.  
They are integrated into objects (Tensors / `nn.Module`) and are executed when the event hooked happens.

Conceptually, one can think of the model as a `while` loop with `if` to execute if some condition holds.    
Specifically in the context of PyTorch, the even is updating or executing the Tensor / Module.

Object which supports hooks:
 - Tensors (`torch.Tensor`).
 - Modules (`nn.Module`).

Events:
 - Forward Pre Hook - Before forward pass.
 - Forward Hook - After forward pass.
 - Backward Hook - After backward pass (Gradient is available).


Some use cases:
 - Tracking the distribution of values of a certain layer during training.
 - Tracking the distribution of the values of the gradient of a certain layer during training.
 - How many neurons have "died"?

The notebook presents:

 * The concept of _Hooks_ for `nn.Module` in PyTorch.
 * An implementation of use case: Analysis of activations using hooks.
 * Using _Normal Kaiming Initialization_ instead of the default _Uniform Kaiming Initialization_.
 * Using model's `apply()` method for initialization.
 * Comparing the effect of the initialization on the data distribution using _Hooks_.


</br>

* <font color='brown'>(**#**)</font> [YouTube - Elliot Waite - PyTorch Hooks Explained](https://www.youtube.com/watch?v=syLFCVYua6Q).
* <font color='brown'>(**#**)</font> [How to Use PyTorch Hooks](https://scribe.rip/5041d777f904).
* <font color='brown'>(**#**)</font> [PyTorch Hooks](https://scribe.rip/5909c7636fb).

```mermaid
flowchart LR
%% Nodes
    X(fa:fa-image X)
    Y((Y))
    H{"f()"} 

subgraph Model
  C1[fa:fa-layer-group Conv2D]
  C2[fa:fa-layer-group Conv2D]
end

%% Edge connections between nodes
    X  --> C1
    C1 --> C2
    C2 --> Y
    C1 <-. Hook .-> H

%% Individual node styling. Try the visual editor toolbar for easier styling!
    style X  color:#FFFFFF, stroke:#AA00FF, fill:#AA00FF
    style Y  color:#FFFFFF, stroke:#00C853, fill:#00C853
    style C1 color:#FFFFFF, stroke:#2962FF, fill:#2962FF
    style C2 color:#FFFFFF, stroke:#2962FF, fill:#2962FF
    style H  color:#FFFFFF, stroke:#296255, fill:#88AA00
    
%% You can add notes with two "%" signs in a row!
```

### Use Case

If most of the neurons of a net at some layer have vanished, it means the net only uses a small part of its capacity.  
This should be diagnosed form multiple motivations:

 - Adjust the net model: Smaller / Different.
 - Prune the model to make inference faster.

</br>

* <font color='brown'>(**#**)</font> This notebook focuses on the _Hook_ as a tool to diagnose such cases yet not on how to prevent them or handle them.
* <font color='brown'>(**#**)</font> [The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks](https://arxiv.org/abs/1803.03635).  
  Shows most known model can be heavily pruned with no performance hit.  
  Though it seems the model have over capacity, it does not mean we know how to make them more efficient.

In [None]:
# Parameters

# Data

# Model
dropP = 0.5 #<! Dropout Layer

# Training
batchSize   = 256
numWork     = 2 #<! Number of workers
nEpochs     = 5

# Visualization
numImg = 3


## Generate / Load Data

This section loads the [MNIST Data set](https://en.wikipedia.org/wiki/MNIST_database).

The data is split to 60,000 train samples and 10,000 test samples.

* <font color='brown'>(**#**)</font> The dataset is retrieved using [Torch Vision](https://pytorch.org/vision/stable/index.html)'s built in datasets.  
* <font color='brown'>(**#**)</font> In PyTorch `Dataset` object defines how to access a dataset on hard drive.  
  It abstracts the data on an HD as an array like object.
* <font color='brown'>(**#**)</font> In PyTorch a `Dataloader` object handled the actual loading at scale during the training: Fetching the data from a dataset and pushing into the net.
* <font color='brown'>(**#**)</font> For custom data one should sub class the [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) class.  
  See [Writing Custom Datasets, DataLoaders and Transforms](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html). 

In [None]:
# Load Data

# PyTorch 
dsTrain = torchvision.datasets.MNIST(root = DATA_FOLDER_PATH, train = True,  download = True, transform = torchvision.transforms.ToTensor())
dsTest  = torchvision.datasets.MNIST(root = DATA_FOLDER_PATH, train = False, download = True, transform = torchvision.transforms.ToTensor())
lClass  = dsTrain.classes


print(f'The training data set data shape: {dsTrain.data.shape}')
print(f'The test data set data shape: {dsTest.data.shape}')
print(f'The unique values of the labels: {np.unique(lClass)}')

* <font color='brown'>(**#**)</font> The dataset is indexible (Subscriptable). It returns a tuple of the features and the label.
* <font color='brown'>(**#**)</font> While data is arranged as `H x W x C` the transformer, when accessing the data, will convert it into `C x H x W`. 

In [None]:
# Element of the Data Set

mX, valY = dsTrain[0]

print(f'The features shape: {mX.shape}')
print(f'The label value: {valY}')

### Plot the Data

In [None]:
# Extract Data

tX = dsTrain.data #<! NumPy Tensor (NDarray)
mX = np.reshape(tX, (tX.shape[0], -1))
vY = dsTrain.targets #<! NumPy Vector


In [None]:
# Plot the Data

hF = PlotMnistImages(mX, vY, numImg)

In [None]:
# Histogram of Labels

hA = PlotLabelsHistogram(vY, lClass = list(range(10)))
plt.show()

## Pre Process Data

This section normalizes the data to have zero mean and unit variance per **channel**.  
It is required to calculate:

 * The average pixel value per channel.
 * The standard deviation per channel.

</br>

* <font color='brown'>(**#**)</font> The values calculated on the train set and applied to both sets.
* <font color='brown'>(**#**)</font> The the data will be used to pre process the image on loading by the `transformer`.
* <font color='brown'>(**#**)</font> There packages which specializes in transforms: [`Kornia`](https://github.com/kornia/kornia), [`Albumentations`](https://github.com/albumentations-team/albumentations).  
  They are commonly used for _Data Augmentation_ at scale.

* <font color='red'>(**?**)</font> What do you expect the mean value to be?
* <font color='red'>(**?**)</font> What do you expect the standard deviation value to be?

In [None]:
# Calculate the Standardization Parameters
valMean = torch.mean(dsTrain.data / 255.0)
valStd  = torch.std(dsTest.data / 255.0)

print('µ =', valMean)
print('σ =', valStd)

In [None]:
# Update Transformer

oDataTrns = torchvision.transforms.Compose([           #<! Chaining transformations
    torchvision.transforms.ToTensor(),                 #<! Convert to Tensor (C x H x W), Normalizes into [0, 1] (https://pytorch.org/vision/main/generated/torchvision.transforms.ToTensor.html)
    torchvision.transforms.Normalize(valMean, valStd), #<! Normalizes the Data (https://pytorch.org/vision/main/generated/torchvision.transforms.Normalize.html)
    ])

# Update the DS transformer
dsTrain.transform = oDataTrns
dsTest.transform  = oDataTrns

In [None]:
# "Normalized" Image

mX, valY = dsTrain[5]

hF, hA = plt.subplots()
hImg = hA.imshow(np.transpose(mX, (1, 2, 0)))
hF.colorbar(hImg)
plt.show()

* <font color='red'>(**?**)</font> How come the values are not centered around $0$? Thing about the data distribution.

### Data Loaders

The dataloader is the functionality which loads the data into memory in batches.  
Its challenge is to bring data fast enough so the Hard Disk is not the training bottleneck.  
In order to achieve that, Multi Threading / Multi Process is used.

* <font color='brown'>(**#**)</font> The multi process, by the `num_workers` parameter is not working well _out of the box_ on Windows.  
  See [Errors When Using `num_workers > 0` in `DataLoader`](https://discuss.pytorch.org/t/97564), [On Windows `DataLoader` with `num_workers > 0` Is Slow](https://github.com/pytorch/pytorch/issues/12831).  
  A way to overcome it is to define the training loop as a function in a different module (File) and import it (https://discuss.pytorch.org/t/97564/4, https://discuss.pytorch.org/t/121588/21). 
* <font color='brown'>(**#**)</font> The `num_workers` should be set to the lowest number which feeds the GPU fast enough.  
  The idea is preserve as much as CPU resources to other tasks.
* <font color='brown'>(**#**)</font> On Windows keep the `persistent_workers` parameter to `True` (_Windows_ is slower on forking processes / threads).
* <font color='brown'>(**#**)</font> The Dataloader is a generator which can be looped on.
* <font color='brown'>(**#**)</font> In order to make it iterable it has to be wrapped with `iter()`.

In [None]:
# Data Loader

dlTrain  = torch.utils.data.DataLoader(dsTrain, shuffle = True, batch_size = 1 * batchSize, num_workers = numWork, drop_last = True, persistent_workers = True)
dlTest   = torch.utils.data.DataLoader(dsTest, shuffle = False, batch_size = 2 * batchSize, num_workers = numWork, persistent_workers = True)


* <font color='red'>(**?**)</font> Why is the size of the batch twice as big for the test dataset?

In [None]:
# Iterate on the Loader
# The first batch.
tX, vY = next(iter(dlTrain)) #<! PyTorch Tensors

print(f'The batch features dimensions: {tX.shape}')
print(f'The batch labels dimensions: {vY.shape}')

In [None]:
# Looping
for ii, (tX, vY) in zip(range(1), dlTest): #<! https://stackoverflow.com/questions/36106712
    print(f'The batch features dimensions: {tX.shape}')
    print(f'The batch labels dimensions: {vY.shape}')

## Define the Model

The model is defined as a sequential model.

In [None]:
# Model
# Defining a sequential model.

numFeatures = np.prod(tX.shape[1:])

oModel = nn.Sequential(
    nn.Identity(),
        
    nn.Conv2d(1,  8,  5, stride = 1), nn.ReLU(),
    nn.Conv2d(8,  16, 5, stride = 1), nn.ReLU(),
    nn.Conv2d(16, 32, 5, stride = 1), nn.ReLU(),
    nn.Conv2d(32, 32, 5, stride = 1), nn.ReLU(),
    nn.Conv2d(32, 32, 5, stride = 1), nn.ReLU(),
    nn.Conv2d(32, 32, 5, stride = 1), nn.ReLU(),
    
    nn.AdaptiveAvgPool2d(1),
    nn.Flatten(),
    nn.Linear(32, len(lClass)),
)

torchinfo.summary(oModel, tX.shape, device = 'cpu')

* <font color='red'>(**?**)</font> What will be the effect of a bigger stride for `Conv2d`?
* <font color='brown'>(**#**)</font> Pay attention to model size and the RAM fo the GPU. Rule of thumb, up to ~40%.

### Initialization Function

By default, PyTorch initializes the weights of the linear layers using the _Kaiming_ method to "control" the output variance.  
Yet by default it initializes the weights using a _Uniform Distribution_.  

PyTorch has several initialization methods as described in the [`torch.nn.init`](https://pytorch.org/docs/stable/nn.init.html) module.

This section implement a function to initialize weights using the Kaiming method with _Gaussian Distribution_.

* <font color='brown'>(**#**)</font> The implementation assumes to be used using [`apply()`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.apply) method of a model.
* <font color='brown'>(**#**)</font> The variance of a Uniform Distribution over `[a, b]`, $\mathcal{U}_{\left[ a, b \right]}$ is given by $\frac{1}{12} {\left( b - a \right)}^{2}$.

In [None]:
def WeightInit( oModule: nn.Module ) -> None:
    if isinstance(oModule, nn.Conv2d) or isinstance(oModule, nn.Conv1d) or isinstance(oModule, nn.Linear):
        nn.init.kaiming_normal_(oModule.weight.data)

* <font color='brown'>(**#**)</font> The function alters only a sub set of the matching classes.
* <font color='brown'>(**#**)</font> Convention in PyTorch: Functions ending with `_` are in place.  
  See [`torch.nn.functional.relu_()`](https://pytorch.org/docs/stable/generated/torch.nn.functional.relu_.html#torch.nn.functional.relu_) vs. [`torch.nn.functional.relu()`](https://pytorch.org/docs/stable/generated/torch.nn.functional.relu_.html#torch.nn.functional.relu).  
  It is accessible using `torch.relu()` and `torch.relu_()`.
* <font color='red'>(**?**)</font> In the case of Linear Layers (_Fully Connected_ / _Dense) with input of dimensions $d$ and a _ReLU_ layer.  
  What would be the value of $a$ in order to have weights uniformly distributed over $\left[ -a, a \right]$ matching the Kaiming initialization?

## Train the Model

This section trains the model twice:

 1. Using the default initialization (Kaiming, Uniform Distribution).
 2. Using the implemented initialization (Kaiming, Gaussian Distribution).

Both methods will be compared in their performance and analyzed using Hooks.

In [None]:
# Run Device

runDevice   = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') #<! The 1st CUDA device

In [None]:
# Loss and Score Function

hL = nn.CrossEntropyLoss()
hS = MulticlassAccuracy(num_classes = len(lClass), average = 'micro')
hL = hL.to(runDevice) #<! Not required!
hS = hS.to(runDevice)

* <font color='red'>(**?**)</font> For binary problems one should use Binary Cross Entropy instead of Cross Entropy.  
  Yet the number of outputs is only 1 and not 2. Why? Explain.

In [None]:
# Train Model - Default Initialization

oRunModel = copy.deepcopy(oModel)
oRunModel = oRunModel.to(runDevice) #<! Transfer model to device
oOpt = torch.optim.SGD(oRunModel.parameters(), lr = 2e-2) #<! Define optimizer
_, lTrainLossU, lTrainScoreU, lValLossU, lValScoreU, _ = TrainModel(oRunModel, dlTrain, dlTest, oOpt, nEpochs, hL, hS)

In [None]:
# Train Model - Implemented Initialization

oRunModel = copy.deepcopy(oModel)
oRunModel = oRunModel.to(runDevice) #<! Transfer model to device
oRunModel = oRunModel.apply(WeightInit)
oOpt = torch.optim.SGD(oRunModel.parameters(), lr = 2e-2) #<! Define optimizer
_, lTrainLossG, lTrainScoreG, lValLossG, lValScoreG, _ = TrainModel(oRunModel, dlTrain, dlTest, oOpt, nEpochs, hL, hS)

* <font color='brown'>(**#**)</font> The method [`apply()`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.apply) applies a given function on any element of the model.  
  The elements of the model are given by the result of the [`children()`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.children) method (Iterator).

In [None]:
# Plot Results
hF, vHa = plt.subplots(nrows = 1, ncols = 2, figsize = (12, 6))
vHa = vHa.flat

hA = vHa[0]
hA.plot(lTrainLossU, lw = 2, ls = ':', label = 'Train Uniform')
hA.plot(lValLossU, lw = 2, label = 'Validation Uniform')
hA.plot(lTrainLossG, lw = 2, ls = ':', label = 'Train Gaussian')
hA.plot(lValLossG, lw = 2, label = 'Validation Gaussian')
hA.grid()
hA.set_title('Cross Entropy Loss')
hA.set_xlabel('Epoch Index')
hA.set_ylabel('Loss')
hA.legend();


hA = vHa[1]
hA.plot(lTrainScoreU, lw = 2, ls = ':', label = 'Train Uniform')
hA.plot(lValScoreU, lw = 2, label = 'Validation Uniform')
hA.plot(lTrainScoreG, lw = 2, ls = ':', label = 'Train Gaussian')
hA.plot(lValScoreG, lw = 2, label = 'Validation Gaussian')
hA.grid()
hA.set_title('Accuracy Score')
hA.set_xlabel('Epoch Index')
hA.set_ylabel('Score')
hA.legend();


* <font color='brown'>(**#**)</font> The results can not answer globally which initialization is superior. This is a specific limited case.
* <font color='brown'>(**#**)</font> The motivation is to create a simple case to analyze using _Hook_.

## Hook

This section implements a _Forward Hook_ to analyze the distribution of the values at the output of a layer in the model.

The signature of an `nn.Module` hook is: `def ModuleHook(module: nn.Module, tIn: Tensor, tOut: Tensor):`.

* <font color='brown'>(**#**)</font> The definition of the hook is given in [`torch.nn.Module.register_forward_hook()`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.register_forward_hook).
* <font color='brown'>(**#**)</font> If the hood returns a tensor it will override the output of the layer.
* <font color='brown'>(**#**)</font> Hooks are nto part of the _computational graph_. Hence won't effect the backward pass.
* <font color='brown'>(**#**)</font> There is a global model variation in [`torch.nn.modules.module.register_module_forward_hook()`](https://pytorch.org/docs/stable/generated/torch.nn.modules.module.register_module_forward_hook.html).

In [None]:
# Layer Statistics Container

class LayerStats():
    def __init__( self, numIter: int, numBins: int, tuRange: Tuple ) -> None:
        
        self.vMean  = np.full(numIter, np.nan)
        self.vStd   = np.full(numIter, np.nan)
        self.mHist  = np.full((numBins, numIter), np.nan)
        self.mEdges = np.full((numBins + 1, numIter), np.nan)
        
        self.numIter    = numIter
        self.ii         = 0 #<! Iteration index
        self.numBins    = numBins #<! Number of bins for the histogram
        self.tuRange    = tuRange #<! Range of the histogram



In [None]:
# Hook Function 

def ForwardHook( oLayer: nn.Module, mX: torch.Tensor, mZ: torch.Tensor, oLyrStats: LayerStats ) -> None:
    # mX : Input Tensor.
    # mZ : Output Tensor.
    # No Return: No override of mZ
    
    if oLayer.training == False: #<! skip validation
        return
    
    ii      = oLyrStats.ii
    numBins = oLyrStats.numBins
    tuRange = oLyrStats.tuRange

    oLyrStats.vMean[ii] = mZ.data.mean().cpu()
    oLyrStats.vStd[ii]  = mZ.data.std().cpu()
    
    oLyrStats.mHist[:, ii], oLyrStats.mEdges[:, ii] = np.histogram(mZ.data.view(-1).cpu(), bins = numBins, range = tuRange)
    
    oLyrStats.ii += 1   



In [None]:
# Implementation for Analysis

nEpochs = 1 #<! Single Epoch
numIter = nEpochs * len(dlTrain) #<! Number of Epochs x Number of Batches
numBins = 101
tuRange = (-1.0, 7.0)

oLyrStat        = LayerStats(numIter, numBins, tuRange)
hForwardHook    = lambda oLayer, mX, mZ: ForwardHook(oLayer, mX, mZ, oLyrStat) #<! Matching signature of the hook

In [None]:
# Training with Hook - Default Initialization

oRunModel = copy.deepcopy(oModel)
oRunModel = oRunModel.to(runDevice) #<! Transfer model to device
oLayer    = oRunModel[6] #<! The activation after the 3rd conv layer
hHook     = oLayer.register_forward_hook(hForwardHook)
oOpt      = torch.optim.SGD(oRunModel.parameters(), lr = 2e-2) #<! Define optimizer

_, lTrainLossU, lTrainScoreU, lValLossU, lValScoreU, _ = TrainModel(oRunModel, dlTrain, dlTest, oOpt, nEpochs, hL, hS)


hHook.remove() #<! Remove hook
oLyrStatU = oLyrStat #<! Copy

In [None]:
# Instance of the Object
oLyrStat        = LayerStats(numIter, numBins, tuRange)
hForwardHook    = lambda oLayer, mX, mZ: ForwardHook(oLayer, mX, mZ, oLyrStat) #<! Matching signature of the hook

In [None]:
# Training with Hook - Implemented Initialization

oRunModel = copy.deepcopy(oModel)
oRunModel = oRunModel.to(runDevice) #<! Transfer model to device
oRunModel = oRunModel.apply(WeightInit)
oLayer    = oRunModel[6] #<! The activation after the 3rd conv layer
hHook     = oLayer.register_forward_hook(hForwardHook)
oOpt      = torch.optim.SGD(oRunModel.parameters(), lr = 2e-2) #<! Define optimizer

_, lTrainLossU, lTrainScoreU, lValLossU, lValScoreU, _ = TrainModel(oRunModel, dlTrain, dlTest, oOpt, nEpochs, hL, hS)

hHook.remove() #<! Remove hook
oLyrStatG = oLyrStat #<! Copy

* <font color='red'>(**?**)</font> What happened to the run time? Explain.

### Show Results


In [None]:
# Plot Statistics Function

def PlotStatistics( oLyrStats: LayerStats, hF: plt.Figure ) -> plt.Figure:
    
    vMean = oLyrStats.vMean
    vStd  = oLyrStats.vStd
    mHist = oLyrStats.mHist

    tuRange = oLyrStats.tuRange

    vAx = hF.axes

    vAx[0].plot(oLyrStats.vMean, lw = 2)
    vAx[1].plot(oLyrStats.vStd, lw = 2)
    vAx[2].imshow(np.log(oLyrStats.mHist + 0.1), origin = 'lower', extent = [0, oLyrStats.ii, tuRange[0], tuRange[1]], aspect = 'auto')
    vAx[0].set_title ('Activation Output - Mean')
    vAx[1].set_title ('Activation Output - Standard Deviation')
    vAx[2].set_title ('Activation Output - Histogram')
    vAx[0].set_xlabel('Iteration')
    vAx[1].set_xlabel('Iteration')
    vAx[2].set_xlabel('Iteration')
    vAx[0].grid()
    vAx[1].grid()
    
    # hF.tight_layout()

    return hF

In [None]:
# Display Results

hF, _ = plt.subplots(nrows = 1, ncols = 3, figsize = (15, 6))
PlotStatistics(oLyrStatU, hF)
hF.suptitle('Activation Output Analysis - Uniform')

hF, _ = plt.subplots(nrows = 1, ncols = 3, figsize = (15, 6))
PlotStatistics(oLyrStatG, hF)
hF.suptitle('Activation Output Analysis - Gaussian')

plt.show()

* <font color='brown'>(**#**)</font> The more the variation in values, the better (Up to s limit) as the net is taking better advantage of its capacity.
* <font color='red'>(**?**)</font> What would be the results of running more epochs?
* <font color='green'>(**@**)</font> Increase the number of epochs and rerun the analysis.
* <font color='brown'>(**#**)</font> The concept of activation of a neuron is "firing" (Positive value) when the feature the neuron was specialized on is detected.  
  Hence vanishing neurons means no features were detected.   
  This is a crude analogy, yet its intuition works in many cases.
* <font color='brown'>(**#**)</font> **Don't generalize** (Which initialization is superior) the results to other models!