`torch.autograd` is pytorch's automatic differentiation engine
what happens in a NN
1. Forward()
2. Backward()

- we load a pretrained resnet18 model from `torchvision`
-  create a random data tensor to represent a single image with 3 channels, and height & width of 64
- label initialized to some random values
- Label in pretrained models has shape (1,1000).

In [1]:
import torch
from torchvision.models import resnet18, ResNet18_Weights

In [2]:
model = resnet18(weights = ResNet18_Weights.DEFAULT)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to C:\Users\dell/.cache\torch\hub\checkpoints\resnet18-f37072fd.pth
100%|█████████████████████████████████████████████████████████████████████████████| 44.7M/44.7M [18:06<00:00, 43.1kB/s]


In [3]:
data = torch.rand(1,3,64,64)
labels = torch.rand(1,1000)

In [4]:
# forward pass
prediction = model(data)

- we will use model predictions and the labels to calculate the loss
- after calculating the loss we will backpropagate - which starts once we say `.backward()`
- Autograd then calculates and stores the gradients for each model parameter and can be access by `parameter.grad`

In [6]:
loss = (prediction - labels).sum()
loss.backward()

In [7]:
optimizer = torch.optim.SGD(model.parameters(),lr = 1e-3, momentum = 0.9)

In [8]:
# to initiate the gradient descent we call 
optimizer.step()
# each parameters gradient is stored in parameter.grad and optimizer uses it to optimize it

## Differentiation in Autograd
Let's take a look at how ``autograd`` collects gradients. We create two tensors ``a`` and ``b`` with
``requires_grad=True``. This signals to ``autograd`` that every operation on them should be tracked.



In [9]:
a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)

We create another tensor ``Q`` from ``a`` and ``b``.

\begin{align}Q = 3a^3 - b^2\end{align}



In [10]:
Q = 3*a**3 - b**2

Let's assume ``a`` and ``b`` to be parameters of an NN, and ``Q``
to be the error. In NN training, we want gradients of the error
w.r.t. parameters, i.e.

\begin{align}\frac{\partial Q}{\partial a} = 9a^2\end{align}

\begin{align}\frac{\partial Q}{\partial b} = -2b\end{align}


When we call ``.backward()`` on ``Q``, autograd calculates these gradients
and stores them in the respective tensors' ``.grad`` attribute.

We need to explicitly pass a ``gradient`` argument in ``Q.backward()`` because it is a vector.
``gradient`` is a tensor of the same shape as ``Q``, and it represents the
gradient of Q w.r.t. itself, i.e.

\begin{align}\frac{dQ}{dQ} = 1\end{align}

Equivalently, we can also aggregate Q into a scalar and call backward implicitly, like ``Q.sum().backward()``.




In [11]:
external_grad = torch.tensor([1., 1.])
Q.backward(gradient=external_grad)

In [12]:
# check if collected gradients are correct
print(9*a**2 == a.grad)
print(-2*b == b.grad)

tensor([True, True])
tensor([True, True])
