## Autograd

torch.autograd is PyTorch's Automatic Differentiation engine that powers the neural network training.

### NN happens in two steps:
- Forward propagation: In forward prop, the NN makes its best guess about the correct output. It runs the input data throught each of its function to make this guess.
- Backpropagation is like a NN’s way of learning. It adjusts its parameters based on how wrong it was in its guess. It starts from the output, collects the errors with respect to the parameters, and then uses gradient descent to optimise the parameters. 

In [1]:
import torch
from torchvision.models import resnet18, ResNet18_Weights

In [2]:
print(resnet18)

<function resnet18 at 0x148790ea0>


we'll be using resnet18 model from torchvision. we create random data tensor to represent a single image with 3 channels and height & width of 64, and its corresponding `label` initilalised to some random values.

In [3]:
model =  resnet18(weights=ResNet18_Weights.DEFAULT)

In [4]:
data = torch.rand(1,3, 64, 64)
labels = torch.rand(1, 1000)   #Label in pretrained models has a shape(1, 1000)

In [10]:
# forward function
prediction = model(data)
prediction.shape, prediction.grad_fn

(torch.Size([1, 1000]), <AddmmBackward0 at 0x1487c0ca0>)

In [6]:
# Calculating the model loss by comparing the prediction and label
loss = prediction - labels
loss.shape

torch.Size([1, 1000])

In [8]:
loss = loss.sum()
loss

tensor(-496.0244, grad_fn=<SumBackward0>)

In [9]:
# Next step is to backproporgate this error throght the network
loss.backward()

In [14]:
# We load an optimizer in this case SGD with learning rate of 0.01 and momentum of 0.9
optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)

In [28]:
for name, param in model.named_parameters():
    print(f"Parameter Name: {name}")
    print(f"Gradient: {param.grad.shape}")

Parameter Name: conv1.weight
Gradient: torch.Size([64, 3, 7, 7])
Parameter Name: bn1.weight
Gradient: torch.Size([64])
Parameter Name: bn1.bias
Gradient: torch.Size([64])
Parameter Name: layer1.0.conv1.weight
Gradient: torch.Size([64, 64, 3, 3])
Parameter Name: layer1.0.bn1.weight
Gradient: torch.Size([64])
Parameter Name: layer1.0.bn1.bias
Gradient: torch.Size([64])
Parameter Name: layer1.0.conv2.weight
Gradient: torch.Size([64, 64, 3, 3])
Parameter Name: layer1.0.bn2.weight
Gradient: torch.Size([64])
Parameter Name: layer1.0.bn2.bias
Gradient: torch.Size([64])
Parameter Name: layer1.1.conv1.weight
Gradient: torch.Size([64, 64, 3, 3])
Parameter Name: layer1.1.bn1.weight
Gradient: torch.Size([64])
Parameter Name: layer1.1.bn1.bias
Gradient: torch.Size([64])
Parameter Name: layer1.1.conv2.weight
Gradient: torch.Size([64, 64, 3, 3])
Parameter Name: layer1.1.bn2.weight
Gradient: torch.Size([64])
Parameter Name: layer1.1.bn2.bias
Gradient: torch.Size([64])
Parameter Name: layer2.0.conv1.we

In [29]:
# Gradient Descent. the optimizer adjust each parametes by its gradient store d in `.grad`
optim.step()

### Dataset vs Dataloader vs Dataloaders
Dataset -> Dataloader -> Dataloaders (FastAI)  

`list[(x,y),...]` -> `[((x1, x2, x3), (y1, y2, y3)), ....]` -> `(Training Dataloader, Validation Dataloader)`