# autograd

In [1]:
%%HTML
<style> body {--vscode-font-family: "aptos"} </style>

### input initialization
data is a single 3-channel 64x64 image

label is a single random 1000 class label

In [2]:
import torch
from torchvision.models import resnet18, ResNet18_Weights
model = resnet18(weights=ResNet18_Weights.DEFAULT)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)

In [3]:
# forward pass
prediction = model(data) # forward pass

# backward pass
loss = (prediction - labels).sum()  # calculate error
loss.backward()  # kick off backward propagation. gradients for each parameter in their .grad attributes

# load optimizer
optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)  # register model parameters in optimizer
optim.step()  # gradient descent: adjust each parameter by gradient stored in .grad

### differentiation in autograd
create two tensors a and b with **requires_grad=True**, which signals to autograd that every operation on them should be tracked.

then create another tensor Q from a and b: $Q=3a^3-b^2$

In [4]:
a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)

Q = 3*a**3 - b**2


Let’s assume a and b to be parameters of an NN, and Q to be the error. In NN training, we want gradients of the error w.r.t. parameters, i.e.

$\frac{\partial Q}{\partial a} = 9a^2$

$\frac{\partial Q}{\partial b} = -2b$

When we call .backward() on Q, autograd calculates these gradients and stores them in the respective tensors’ .grad attribute.

We need to explicitly pass a gradient argument in Q.backward() because it is a vector. Gradient is a tensor of the same shape as Q, and it represents the gradient of Q w.r.t. itself, i.e.

$\frac{dQ}{dQ} = 1$

Equivalently, we can also aggregate Q into a scalar and call backward implicitly, like Q.sum().backward().

In [5]:
external_grad = torch.tensor([1., 1.])
Q.backward(gradient=external_grad)

Gradients are now deposited in a.grad and b.grad

In [6]:
# check if collected gradients are correct
print(9*a**2 == a.grad)
print(-2*b == b.grad)

tensor([True, True])
tensor([True, True])
