# Intro to ``torch.autograd``

In [1]:
import torch,torchvision

In [2]:
model=torchvision.models.resnet18(pretrained=True)
data=torch.rand(1,3,64,64)
labels=torch.rand(1,1000)

Downloading: "https://download.pytorch.org/models/resnet18-5c106cde.pth" to /root/.cache/torch/hub/checkpoints/resnet18-5c106cde.pth


HBox(children=(FloatProgress(value=0.0, max=46827520.0), HTML(value='')))




In [3]:
prediction=model(data) #forward pass

In [4]:
loss=(prediction-labels).sum()
loss.backward() #backward pass

In [5]:
optim=torch.optim.SGD(model.parameters(),lr=1e-2,momentum=0.9)

In [6]:
optim.step() #gradient descent

#### Differentiation in Autograd

$ Q = 3a^3 - b^2 $

In [7]:
a=torch.tensor([2.,3.],requires_grad=True)
b=torch.tensor([6.,4.],requires_grad=True)
Q=3*a**3-b**2

In [8]:
external_grad=torch.tensor([1.,1.]) #gradient must have same shape as Q
Q.backward(gradient=external_grad) #external gradient is needed, since Q is not a scalar but a vector

$ \frac{\partial Q}{\partial a}=9a^2 \quad \frac{\partial Q}{\partial b}=-2b $

In [9]:
print(a.grad==9*a**2)
print(b.grad==-2*b)

tensor([True, True])
tensor([True, True])


#### Vector calculus using Autograd

The gradient of $\vec{y}=f(\vec{x})$ with
respect to $\vec{x}$, a Jacobian matrix $J$:

\begin{align}J
     =
      \left(\begin{array}{cc}
      \frac{\partial \bf{y}}{\partial x_{1}} &
      ... &
      \frac{\partial \bf{y}}{\partial x_{n}}
      \end{array}\right)
     =
     \left(\begin{array}{ccc}
      \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}}\\
      \vdots & \ddots & \vdots\\
      \frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
      \end{array}\right)\end{align}

Let $v$ the gradient of a scalar function

\begin{align}l
   =
   g\left(\vec{y}\right)
   =
   \left(\begin{array}{ccc}\frac{\partial l}{\partial y_{1}} & \cdots & \frac{\partial l}{\partial y_{m}}\end{array}\right)^{T}\end{align}

By the chain rule, the vector-Jacobian product would be the gradient of $l$ with respect to $\vec{x}$:

\begin{align}J^{T}\cdot \vec{v}=\left(\begin{array}{ccc}
      \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}}\\
      \vdots & \ddots & \vdots\\
      \frac{\partial y_{1}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
      \end{array}\right)\left(\begin{array}{c}
      \frac{\partial l}{\partial y_{1}}\\
      \vdots\\
      \frac{\partial l}{\partial y_{m}}
      \end{array}\right)=\left(\begin{array}{c}
      \frac{\partial l}{\partial x_{1}}\\
      \vdots\\
      \frac{\partial l}{\partial x_{n}}
      \end{array}\right)\end{align}

This characteristic of vector-Jacobian product is used in the above example;``external_grad`` represents $\vec{v}$.

``torch.autograd`` is an engine for computing vector-Jacobian product $J^{T}\cdot \vec{v}$.

Computational Graph
- ``autograd`` keeps a record of data and executed operations in a DAG
- Setting ``requires_grad`` flag to ``False`` excludes the tensor from the DAG





In [10]:
x=torch.rand(5,5)
y=torch.rand(5,5)
z=torch.rand((5,5),requires_grad=True)
a=x+y;b=x+z
print(a.requires_grad,b.requires_grad)

False True


In [11]:
from torch import nn,optim
model=torchvision.models.resnet18(pretrained=True)
for param in model.parameters(): #frozen parameters
    param.requires_grad=False
model.fc=nn.Linear(512,10) #finetuning

In [12]:
optimizer=optim.SGD(model.fc.parameters(),lr=1e-2,momentum=0.9)