<a href="https://colab.research.google.com/github/Sylar257/My-data-science-tool-kit/blob/master/145_PyTorch_Tricks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 145 PyTorch Tricks
This is a series of useful PyTorch tricks inspired by **vainaijr** in his [YouTube channel](https://www.youtube.com/watch?v=nnHQT9JnY74&list=PLUY8w37x-QUUkawz-cBnjLpvaZWvPZh_s&index=2&t=29s).<br>
This notebook is an implementation of all these techniques and is designed in a way to best demonstrate their usefulness.

## Trick #1
Visualization model using `torchsummaryX`

In [0]:
import torch
import torchvision.models as models
from Utils import *

Here we will build a Single-shot-detection model with just 20 classes.

In [3]:
# Create SSD300 with pretrained weights in the base-architecture
n_classes = 20
model = SSD300(n_classes)

Downloading: "https://download.pytorch.org/models/vgg16_bn-6c64b313.pth" to /root/.cache/torch/checkpoints/vgg16_bn-6c64b313.pth
100%|██████████| 528M/528M [00:14<00:00, 39.2MB/s]



Loaded base model with pre-trained weights



In [4]:
# install torchsummaryX
!pip install torchsummaryX

Collecting torchsummaryX
  Downloading https://files.pythonhosted.org/packages/36/23/87eeaaf70daa61aa21495ece0969c50c446b8fd42c4b8905af264b40fe7f/torchsummaryX-1.3.0-py3-none-any.whl
Installing collected packages: torchsummaryX
Successfully installed torchsummaryX-1.3.0


In [0]:
from torchsummaryX import summary

`summary(model, input)` takes our intentional model and a pseudo input **with the correct shape**

In [7]:
# pseudo input of batch size = 3, num_channel = 3, pixel: 300x300
summary(model, torch.zeros((3,3,300,300)))

                                         Kernel Shape        Output Shape  \
Layer                                                                       
0_base.Conv2d_conv1_1                   [3, 64, 3, 3]   [3, 64, 300, 300]   
1_base.BatchNorm2d_bn_1_1                        [64]   [3, 64, 300, 300]   
2_base.Conv2d_conv1_2                  [64, 64, 3, 3]   [3, 64, 300, 300]   
3_base.BatchNorm2d_bn_1_2                        [64]   [3, 64, 300, 300]   
4_base.MaxPool2d_pool1                              -   [3, 64, 150, 150]   
5_base.Conv2d_conv2_1                 [64, 128, 3, 3]  [3, 128, 150, 150]   
6_base.BatchNorm2d_bn_2_1                       [128]  [3, 128, 150, 150]   
7_base.Conv2d_conv2_2                [128, 128, 3, 3]  [3, 128, 150, 150]   
8_base.BatchNorm2d_bn_2_2                       [128]  [3, 128, 150, 150]   
9_base.MaxPool2d_pool2                              -    [3, 128, 75, 75]   
10_base.Conv2d_conv3_1               [128, 256, 3, 3]    [3, 256, 75, 75]   

Unnamed: 0_level_0,Kernel Shape,Output Shape,Params,Mult-Adds
Layer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0_base.Conv2d_conv1_1,"[3, 64, 3, 3]","[3, 64, 300, 300]",1792.0,155520000.0
1_base.BatchNorm2d_bn_1_1,[64],"[3, 64, 300, 300]",128.0,64.0
2_base.Conv2d_conv1_2,"[64, 64, 3, 3]","[3, 64, 300, 300]",36928.0,3317760000.0
3_base.BatchNorm2d_bn_1_2,[64],"[3, 64, 300, 300]",128.0,64.0
4_base.MaxPool2d_pool1,-,"[3, 64, 150, 150]",,
5_base.Conv2d_conv2_1,"[64, 128, 3, 3]","[3, 128, 150, 150]",73856.0,1658880000.0
6_base.BatchNorm2d_bn_2_1,[128],"[3, 128, 150, 150]",256.0,128.0
7_base.Conv2d_conv2_2,"[128, 128, 3, 3]","[3, 128, 150, 150]",147584.0,3317760000.0
8_base.BatchNorm2d_bn_2_2,[128],"[3, 128, 150, 150]",256.0,128.0
9_base.MaxPool2d_pool2,-,"[3, 128, 75, 75]",,


Final Note: Normally, if we use architectures directly from `TorchVision` or `Keras` we would have nice model summary just like this.<br>
This libarary is particular useful when we want to inspect user people's model or a verions that we have modified besed on commonly used models like the example above.<br>
In addition, we have a nice visualization of **num of parameters** & **output demension** for each layer which is kind of nice for debugging your own model or simply for reference.


# Trick #2
PyTorch Hooks

PyTorch hook is a tool that we can *register* to any **tensor** or **nn.Module** during our computation so that we can monitor what is going on with our `forward` and `backward` loops.<bR>
The `forward` is not refered to `nn.Module.forward` bu the `torch.Autograd.Function` object that is the `grad_fn` of a **tensor**.<br>
Notice, that a `nn.Module` like `nn.Linear` can have multiple `forward` invocations. It's output is created by two operations, $Y = W*X+B$, *addition* and *multiplication* and thus there will be two `forward` calls. 

## Hook types
1. The Forward Hook
2. The Backward Hook

A forward hook is excuted during the forward pass, while the backward hook is executed when `backward` function is called both of which are *functions* of `Autograd.Funciton` object.

A hook in PyTorch is basically a function, with a very specific signature. When we say a hook is executed, in reality, we are talkingabout this function being executed.<br>
`grad` is basically the value contained in the `grad` attribute of the tensor **after** `backward` is called. The function is not supposed to modify it's argument. It must either return `None` or a Tensor which will be used in place of `grad` for further gradient computations.<br>
The below example clarifies this point:

In [1]:
import torch
a = torch.ones(10)
a.requires_grad

False

In [2]:
a.requires_grad = True
a.requires_grad

True

In [3]:
b = 2*a
b.requires_grad

True

In [4]:
print(a.is_leaf)
print(b.is_leaf)

True
False


Since `b` is not a **leaf Variable**, its `grad` will by degault be destroyed during computation.<br>
We can used `b.retain_grad()` to ask PyTorch to retain its `grad`

In [0]:
b.retain_grad()

In [7]:
c = b.mean()
print(f"requires_grad: {c.requires_grad}")
print(f"is_lead: {c.is_leaf}")

requires_grad: True
is_lead: False


In [8]:
# pretend c is the loss being computed
c.backward()
print(a.grad, b.grad)

tensor([0.2000, 0.2000, 0.2000, 0.2000, 0.2000, 0.2000, 0.2000, 0.2000, 0.2000,
        0.2000]) tensor([0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000,
        0.1000])


Now we redo the experiment but with a **hook** that multiplies `b`'s grad by 2

In [11]:
a = torch.ones(10)
a.requires_grad = True
b = 2*a
b.retain_grad()
b.register_hook(lambda x:print(x))
b.mean().backward() # pretend the mean of b is the loss we want to back-prop

tensor([0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000,
        0.1000])


Here we can see that, the print out is exactly the same result by using **hook** on `b`, and the `lambda` function automatically take the `b.grad` as input.<br>
This gives us a sense that hook is tracking.

In [12]:
print(a.grad, b.grad)

tensor([0.2000, 0.2000, 0.2000, 0.2000, 0.2000, 0.2000, 0.2000, 0.2000, 0.2000,
        0.2000]) tensor([0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000,
        0.1000])


There are several uses of functionality as above:
1. We can print the *value* of gradient for **debugging**. We can also log them. This is especially useful with `non-leaf` variables whose gradients are freed up unless we perform `retain_grad` upon them. Doing the latter can lead to increased memory retention. Hooks provide much cleaner way to aggregate these values.
2. We can modify gradient **during** the backward pass. This is very important. While we can still access the `grad` variable of a tensor in a network, we can only access it after the **entire backward pass** has been processed. For example, we multiplied `b`'s gradient by 2, and now the subsequent gradient calculations, like those of `a`(or any tensor that will depend upon `b` for gradient) used `2*brad(b)` instead of `grad(b)`. In contrast, had we individually updated the parameters **after** the `backward`, we'd have to multily `b.grad` as well as `a.grad`

In [13]:
# to demonstrate
a = torch.ones(10)
a.requires_grad = True
b = 2*a
b.retain_grad()
b.mean().backward()

print(a.grad, b.grad)

tensor([0.2000, 0.2000, 0.2000, 0.2000, 0.2000, 0.2000, 0.2000, 0.2000, 0.2000,
        0.2000]) tensor([0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000,
        0.1000])


In [14]:
b.grad *= 2
print(a.grad, b.grad) # Note that in this case, a's grad needs to be updated mannually

tensor([0.2000, 0.2000, 0.2000, 0.2000, 0.2000, 0.2000, 0.2000, 0.2000, 0.2000,
        0.2000]) tensor([0.2000, 0.2000, 0.2000, 0.2000, 0.2000, 0.2000, 0.2000, 0.2000, 0.2000,
        0.2000])


## Hooks for nn.Module objects
For **backward hook**:
`hook(module, grad_input, grad_output)`
___
For **forward hook**:
`hook(module, input, output)`
___

In [0]:
import torch.nn as nn
class myNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(3,10,2, stride=2) # (8-2+0)/2+1 = 4
        self.relu = nn.ReLU()
        self.flatten = lambda x: x.view(-1)
        self.fc1  = nn.Linear(160,5)

    def forward(self, x):
        x = self.relu(self.conv(x))
        return self.fc1(self.flatten(x))

In [17]:
Net = myNet()
Net.named_modules

<bound method Module.named_modules of myNet(
  (conv): Conv2d(3, 10, kernel_size=(2, 2), stride=(2, 2))
  (relu): ReLU()
  (fc1): Linear(in_features=160, out_features=5, bias=True)
)>