- `torch.autograd`: PyTorch's **automatic differentiation engine** that powers NN training
- In this section, we will get a understanding of how **autograd** helps a NN train.

**Background**
- NN: collection of nested **functions** that are executed on some input data (중첩된 함수들의 collection)
- Functions are defined by **parameters**(weights, biases, ...), which in PyTorch are stored in **tensors**.
- Training a NN happens in 2 steps:
1. **Forward Propagation**
    - NN makes its best **guess** about the correct output.
    - It runs the input data through each of its functions to make this guess.
2. **Backward Propagation**
  - NN adjusts its parameters proportionate to the **error in its guess**.
  - It does this by traversing backwards from the output, collecting the derivatives of the error with respect to the parameters of the functions(**gradients**), and optimizing the parameters using **gradient descent**.

**Usage in PyTorch**
- load a pretrained **resnet 18 model** from `torchvision`
- create a **random data tensor** (represents a single image with 3 channels, height & width of 64) and its **corresponding label** initialized to some random values
- label in pretrained models has shape (1, 1000)

In [1]:
import torch
from torchvision.models import resnet18, ResNet18_Weights
model = resnet18(weights=ResNet18_Weights.DEFAULT)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth


  0%|          | 0.00/44.7M [00:00<?, ?B/s]

- **Forward pass**: run the input data through the model through each of its layers to make a prediction

In [2]:
prediction = model(data) # forward pass

- use the model's prediction and the corresponding label to calculate the error (loss)
- backpropagate this error through the network
- Backward propagation starts when we call `.backward()` on the error tensor


In [3]:
loss = (prediction - labels).sum()
loss.backward() # backward pass

- load an optimizer (SGD) with a learning rate of 0.01 and momentum of 0.9
- register all the parameters of the model in the optimizer

In [4]:
optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)

- call `.step()` to initiate gradient descent
- Optimizer adjusts each parameter by its gradient stored in `.grad`

In [5]:
optim.step() # gradient descent