PyTorch's Official Beginner Tutorial : Deep Learning with PyTorch: A 60 Minute Blitz

A GENTLE INTRODUCTION TO TORCH.AUTOGRAD

https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html

Autograd is PyTorch's automatic differentiation engine

The following code only works on the CPU and not on GPU devices

In [6]:
import torch
from torchvision.models import resnet18, ResNet18_Weights
model = resnet18(weights =  ResNet18_Weights.DEFAULT)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)

Run data through model

In [7]:
prediction = model(data)

Now we perform backpropagation by first computing the loss and calling .backward(). The gradients are stored inside .grad

In [8]:
loss = (prediction - labels).sum()
loss.backward()

tensor(-494.4492, grad_fn=<SumBackward0>)


Load an optimizer, in this case, we use SGD with momentum

In [10]:
optim = torch.optim.SGD(model.parameters(), lr = 1e-2, momentum = 0.9)

Initiate gradient descent with .step()

In [11]:
optim.step()

Differentiation in autograd. We create two tensors a, b with require_grad = True, which signals that every operation needs to be tracked

In [12]:
a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)

Create another tensor Q defined by a and b

\begin{align*}
Q = 3a^3-b^2
\end{align*}

In [13]:
Q = 3*a**3 - b**2

Assume a and b are parameters of the neural network and Q is the error. In training, we want the gradients of the error w.r.t. parameters, i.e.

\begin{align*}
\frac{\partial Q}{\partial a} = 9a^2
\end{align*}

\begin{align*}
\frac{\partial Q}{\partial b} = -2b
\end{align*}




We need to pass a gradient argument in Q.backward() since it's a vector. gradient is a tensor in the same shape as Q and represents

\begin{align*}
\frac{dQ}{dQ}=1
\end{align*}

Equivalently, we can aggregate Q into a scalar and then call .backward(), i.e., Q.sum().backward()

In [14]:
external_grad = torch.tensor([1., 1.])
Q.backward(gradient = external_grad)

Now, we can check a.grad and b.grad

In [18]:
print(9*a**2 == a.grad)
print(-2*b == b.grad)

tensor([True, True])
tensor([True, True])


### Optional Reading - Vector Calculus using ``autograd``

Mathematically, if you have a vector valued function
$\vec{y}=f(\vec{x})$, then the gradient of $\vec{y}$ with
respect to $\vec{x}$ is a Jacobian matrix $J$:

\begin{align*}J
     =
      \left(\begin{array}{cc}
      \frac{\partial \bf{y}}{\partial x_{1}} &
      ... &
      \frac{\partial \bf{y}}{\partial x_{n}}
      \end{array}\right)
     =
     \left(\begin{array}{ccc}
      \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}}\\
      \vdots & \ddots & \vdots\\
      \frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
      \end{array}\right)
\end{align*}

Generally speaking, ``torch.autograd`` is an engine for computing
vector-Jacobian product. That is, given any vector $\vec{v}$, compute the product
$J^{T}\cdot \vec{v}$

If $\vec{v}$ happens to be the gradient of a scalar function $l=g\left(\vec{y}\right)$:

\begin{align*}\vec{v}
   =
   \left(\begin{array}{ccc}\frac{\partial l}{\partial y_{1}} & \cdots & \frac{\partial l}{\partial y_{m}}
   \end{array}\right)^{T}
\end{align*}

then by the chain rule, the vector-Jacobian product would be the
gradient of $l$ with respect to $\vec{x}$:

\begin{align*}
J^{T}\cdot \vec{v}=
\left(\begin{array}{ccc}
      \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}}\\
      \vdots & \ddots & \vdots\\
      \frac{\partial y_{1}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
      \end{array}\right)
      \left(\begin{array}{c}
      \frac{\partial l}{\partial y_{1}}\\
      \vdots\\
      \frac{\partial l}{\partial y_{m}}
      \end{array}\right)=
      \left(\begin{array}{c}
      \frac{\partial l}{\partial x_{1}}\\
      \vdots\\
      \frac{\partial l}{\partial x_{n}}
      \end{array}\right)
\end{align*}

This characteristic of vector-Jacobian product is what we use in the above example;
``external_grad`` represents $\vec{v}$.
