## Intro to `torch.autograd`

### Background
Neural networks have two broad steps:
1. Forward propagation: the NN runs to input data and generates guesses based on current weights and biases  
  
2. Backward propagation: the NN adjusts its parameters proportionate to the error in its guess by traversing backwards from the output and collecting derivatives of the error w.r.t. the weights and biases (called gradients) and optimizing the parameters using gradient descent

### Usage in PyTorch

- Looking at a single training step
- Load a pretrained model from `torchvision`
- Create a random data tensor to represent a single image with 3 channels, and height/width of 64
- Corresponding label is initialized to random values (shape (1,1000))

In [4]:
import torch
from torchvision.models import resnet18, ResNet18_Weights

model = resnet18(weights=ResNet18_Weights.DEFAULT)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)

##### Forward Pass

- Use model to make predictions

In [10]:
prediction = model(data)

##### Back Propagation

- User the model's prediction to calculate error
  
- Back propagation is triggered when we call `.backward()` on the error tensor
  
- Use `autograd` to calculate and store the gradients for each model parameter in the parameter's `.grad` attribute

In [11]:
loss = (prediction - labels).sum()
loss.backward()

##### Load an optimizer

- Load an optimizer
  
- Call `.step()` to initiate gradient descent
  
- Optimizer adjusts each parameter by its gradient stored in `.grad()`

In [12]:
optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)
optim.step()