# `autograd` introduction

<img src='assets/computational_graph_example_pytorch.png'>

Torch provides a module, `autograd`, for automatically calculating the gradients of tensors. We can use `autograd` to calculate the gradients of all our parameters with respect to the loss. 

`autograd` works by *keeping track of operations performed on tensors*, then going backwards through those operations, calculating gradients along the way.


In [287]:
import torch

In [36]:
x = torch.tensor([1., 2., 3.])
print(x)

tensor([1., 2., 3.])


To make sure PyTorch **keeps track of operations on a tensor** (i.e., creating computational graph), you need to set `requires_grad = True` on a tensor.

In [37]:
x = torch.tensor([1., 2., 3.], requires_grad=True)
print(x)

tensor([1., 2., 3.], requires_grad=True)


Or, similarly, use `requires_grad_(True)`:

In [38]:
y = torch.tensor([1., 2., 3])
y.requires_grad_(True)
print(y)

tensor([1., 2., 3.], requires_grad=True)


Let's perform a simple operation:

In [39]:
z = 2*y
print(z)

tensor([2., 4., 6.], grad_fn=<MulBackward0>)


`z` has a `grad_fn` attribute that can be used to compute the gradients. In fact, each variable has this attribute, but if no operation is performed, it is set to `None`.

In [40]:
print(y.grad_fn)
print(z.grad_fn)

None
<MulBackward0 object at 0x0000023580EDE160>


PyTorch uses `grad_fn` as a reference to the final operation node. Then, whenever backpropagation is performed, this attribute is used as the starting point to go backward.

Let's compute the derivative of `z` w.r.t. `y`:

In [41]:
# z.backward() # ERROR: grad can be implicitly created only for scalar outputs
z = z.mean()
print(z)

tensor(4., grad_fn=<MeanBackward0>)


In [42]:
z.backward()

In [43]:
print(y.grad)

tensor([0.6667, 0.6667, 0.6667])


## Prevent gradient computations

There are various ways to turn off gradient computations:

In [19]:
x = torch.ones(1, requires_grad=True)
print(x)

tensor([1.], requires_grad=True)


Way 1

In [20]:
x.requires_grad_(False)

tensor([1.])

Way 2

In [21]:
x = torch.zeros(1, requires_grad=True)
y = x.detach() # returns a copy with requires_grad=False
print(x)
print(y)

tensor([0.], requires_grad=True)
tensor([0.])


Way 3

In [22]:
x = torch.ones(1, requires_grad=True)

with torch.no_grad():
    y = 2 * x
    print(y)

tensor([2.])


## By default, PyTorch accumulates the gradients

### $\text{output} = \frac{1}{3}\sum_{i = 1}^{3} y_i ==> \frac{d(\text{output})}{d{y_i}} = \[1/3, 1/3, 1/3]\$

In [46]:
y = torch.tensor([1., 2., 3.], requires_grad=True)

for i in range(4):
    
    model_output = y.mean()
    
    model_output.backward()
    
    print(y.grad)
    
    # y.grad.zero_()

tensor([0.3333, 0.3333, 0.3333])
tensor([0.6667, 0.6667, 0.6667])
tensor([1., 1., 1.])
tensor([1.3333, 1.3333, 1.3333])


## Loss and Autograd together

When we create a network with PyTorch, all of the parameters are initialized with `requires_grad = True`. 

This means that when we calculate the loss and call `loss.backward()`, the gradients for the parameters are calculated. These gradients are used to update the weights with gradient descent. 

Below you can see an example of calculating the gradients using a backwards pass.

### Data

In [307]:
import numpy as np

In [308]:
### Run this cell

from torchvision import datasets, transforms

# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.5,), (0.5,)),
                              ])
# Download and load the training data
trainset = datasets.MNIST('~/.pytorch/MNIST_data/', download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

### build a model and compute loss

In [309]:
from torch import nn

# Build a feed-forward network
model = nn.Sequential(nn.Linear(784, 128),
                      nn.ReLU(),
                      nn.Linear(128, 64),
                      nn.ReLU(),
                      nn.Linear(64, 10))

criterion = nn.CrossEntropyLoss()
images, labels = next(iter(trainloader))
images = images.view(images.shape[0], -1)

scores = model(images)
loss = criterion(scores, labels)

In [310]:
print('Before backward pass: \n', model[0].weight.grad)

loss.backward()

print('After backward pass: \n', model[0].weight.grad)

Before backward pass: 
 None
After backward pass: 
 tensor([[-0.0020, -0.0020, -0.0020,  ..., -0.0020, -0.0020, -0.0020],
        [ 0.0002,  0.0002,  0.0002,  ...,  0.0002,  0.0002,  0.0002],
        [-0.0005, -0.0005, -0.0005,  ..., -0.0005, -0.0005, -0.0005],
        ...,
        [-0.0007, -0.0007, -0.0007,  ..., -0.0007, -0.0007, -0.0007],
        [ 0.0022,  0.0022,  0.0022,  ...,  0.0022,  0.0022,  0.0022],
        [ 0.0019,  0.0019,  0.0019,  ...,  0.0019,  0.0019,  0.0019]])
