# Autograd in PyTorch

In [1]:
import torch
from torchvision.models import resnet18, ResNet18_Weights
model = resnet18(weights = ResNet18_Weights.DEFAULT)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to C:\Users\KIIT0001/.cache\torch\hub\checkpoints\resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:02<00:00, 15.8MB/s]


## Forward pass

- input data is passed to the model and after going through each of the layes, a prediction is made 

In [2]:
prediction = model(data)

In [None]:
prediction.size()

torch.Size([1, 1000])

In [10]:
labels.size()

torch.Size([1, 1000])

## Backward Propagation

- Here loss is simply the differnce of predictions values and their corresponding truth values and all the differences are summed

In [3]:
loss = (prediction - labels).sum()

loss.backward()

## Optimization

In [11]:
optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum = 0.9)

## Initialize Gradient Descent

In [12]:
optim.step()

## Differentiation

In [18]:
a = torch.tensor([2., 3.], requires_grad = True)
b = torch.tensor([6., 4.], requires_grad = True)

In [19]:
Q = 3*a**3 - b**2

$$ Q = 3a^3 - b^2  $$

In [20]:
Q

tensor([-12.,  65.], grad_fn=<SubBackward0>)

In [21]:
external_grad = torch.tensor([1., 1.])
Q.backward(gradient=external_grad)

- Initially : 
$$ \frac{dQ}{dQ} = 1 $$

- When  `.backward()` is called on Q, autograd calculates the below gradients and stores them in the respective tensors’ `.grad` attribute.

$$ \frac{dQ}{da} = 9a^2$$

$$ \frac{dQ}{db} = -2b

In [22]:
print(a.grad)
print(b.grad)

tensor([36., 81.])
tensor([-12.,  -8.])


In [23]:
print(9*a**2)
print(-2*b)

tensor([36., 81.], grad_fn=<MulBackward0>)
tensor([-12.,  -8.], grad_fn=<MulBackward0>)


- In a forward pass, autograd does two things simultaneously:

    - run the requested operation to compute a resulting tensor, and

    - maintain the operation’s gradient function in the DAG (Directed Acyclic Graph).

- The backward pass kicks off when `.backward()` is called on the DAG root. autograd then:

    - computes the gradients from each .grad_fn,

    - accumulates them in the respective tensor’s .grad attribute, and

    - using the chain rule, propagates all the way to the leaf tensors.

- torch.autograd tracks operations on all tensors which have their requires_grad flag set to True. For tensors that don’t require gradients, setting this attribute to False excludes it from the gradient computation DAG.

- parameters that don’t compute gradients are called __frozen parameters__.

**In finetuning, we freeze most of the model and typically only modify the classifier layers to make predictions on new labels.**