## Autograd : Automatic Differentiation Engine

Automatic differentiation is a building block of every  deep learning library out there.

PyTorch's Automatic Differentiation Engine called Autograd is a tool to understand how automatic differentiation works.
Mordern neural architectures can have millions of learnable parameters. From a computation point of view, training a neural network consists of two phases.

### Two phases of training NN(neural network)
Forward propagation - makes its best guess about the correct output. It runs the input data through each of its functions to make this guess.

Backward propagation - Computes the gradients of the learnable parameters. The nueral network adjusts its parameters proportionate to the error and its guesses. It does this through three steps:
    
    1. Traverse back from the output
    2. Collects derivatives of error with respect to the parameters of the functions/gradients
    3. Optimizes the parameters using gradient descent
    
The forward propagation is pretty straghtforward. The output of one layer is the input to the next and so forth.

Backward propagation is a bit more complicated since it requiresus to use the chain rule to compute the gradiesnts of the weights to the loss function. It is impractical to calculate gradients of such large composite functionsby solving mathematical equations, especially because these curves exist in a large number of dimensions and are impossible to fathom. This is where PyTorch's Autograd comes in.

Autograd can calculate gradients of high-dimensionsal curves with only a few lines of code.

In [18]:
import torch
from torch.autograd import Variable

In [19]:
x = Variable(torch.ones(2, 2), requires_grad=True)
print(x)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


In [20]:
y = x + 2
print(y)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)


In [21]:
print(y.grad_fn)

<AddBackward0 object at 0x000001BF6FB4FA90>


In [22]:
z = y * y * 3
out = z.mean()
print(z, out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)


In [23]:
out.backward()

In [24]:
print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


## Advanced Autograd

In [29]:
x = torch.randn(3)
x = Variable(x, requires_grad=True)
print(x)

tensor([ 1.0034, -0.0717, -0.5494], requires_grad=True)


In [30]:
y = x * 2
while y.data.norm() < 1000:
    y = y * 2
print(y)

tensor([1027.5193,  -73.4579, -562.6027], grad_fn=<MulBackward0>)


In [31]:
gradients = torch.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients)

In [32]:
print(x.grad)

tensor([1.0240e+02, 1.0240e+03, 1.0240e-01])


### Frozen Parameters
Parameters that dont compute gradients are called frozed parameters. It is useful to freeze part of your model if you know in advance that you wont need the gradients of those parameters. This offers some performace benefits by reducing AutoGrad computations.

In [33]:
from torch import nn, optim
import torchvision

In [36]:
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

In [38]:
model = torchvision.models.resnet18(pretrained=True)  #pretrained resnet model

In [39]:
#Freeze all the parameters
for param in model.parameters():
    param.requires_grad = False

In ResNet, the last classifier is the last linear layer, model.fc . We can simply replace it with a new linear layer, unfrozen by default, that acts as our classifier.

In [40]:
model.fc = nn.Linear(512, 10)

All our parameters in this model, exceot for the  parameters of model.fc, are frozen. The only parameters that compute gradients are the weights and bias of model.fc

In [42]:
#Optimize only the classifier
optimizer = optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)

In [43]:
print(optimizer)

SGD (
Parameter Group 0
    dampening: 0
    differentiable: False
    foreach: None
    lr: 0.01
    maximize: False
    momentum: 0.9
    nesterov: False
    weight_decay: 0
)
