# Autograd

How to use loss to perform backpropagation? 

Torch provides a module, `autograd`, for automatically calculating the gradients of tensors. We can use it to calculate the gradients of all our parameters with respect to the loss. 

Autograd works by keeping track of operations performed on tensors, then going backwards through those operations, calculating gradients along the way. To make sure PyTorch keeps track of operations on a tensor and calculates the gradients, you need to set `requires_grad = True` on a tensor. You can do this at creation with the `requires_grad` keyword, or at any time with `x.requires_grad_(True)`.

You can turn off gradients for a block of code with the `torch.no_grad()` content:
```python
x = torch.zeros(1, requires_grad=True)
>>> with torch.no_grad():
...     y = x * 2
>>> y.requires_grad
False
```

Also, you can turn on or off gradients altogether with `torch.set_grad_enabled(True|False)`.

The gradients are computed with respect to some variable `z` with `z.backward()`. This does a backward pass through the operations that created `z`.

Let's compute:

$$
z = \left[\frac{1}{n}\sum_i^n x_i^2\right]
$$

In [1]:
import torch

In [2]:
x = torch.randn(2,2, requires_grad=True)
print(x)

tensor([[ 1.0606, -0.1873],
        [ 1.7177, -1.3912]], requires_grad=True)


In [3]:
y = x**2
print(y)

tensor([[1.1249, 0.0351],
        [2.9506, 1.9355]], grad_fn=<PowBackward0>)


The autograd module keeps track of all operations and knows how to calculate the gradient for each one.

Let's check it for y: (`grad_fn` shows the function that generated this variable)

In [4]:
print(y.grad_fn)

<PowBackward0 object at 0x0000022DC917BEB8>


In [5]:
z = y.mean()
print(z)

tensor(1.5115, grad_fn=<MeanBackward0>)


Note: If `requires_grad` is set to False, `grad_fn` would be None.

You can check the gradients for `z` but it is empty currently.

In [6]:
print(z.grad)

None


To calculate the gradients, you need to run the `.backward` method on a Variable, `z` for example. This will calculate the gradient for `z` with respect to `x`
$$
\frac{\partial z}{\partial x} = \frac{\partial}{\partial x}\left[\frac{1}{n}\sum_i^n x_i^2\right] = \frac{x}{2}
$$

In [7]:
z.backward()

In [8]:
print(z.grad)
print(x.grad)

None
tensor([[ 0.5303, -0.0937],
        [ 0.8589, -0.6956]])


In [9]:
print(x/2)

tensor([[ 0.5303, -0.0937],
        [ 0.8589, -0.6956]], grad_fn=<DivBackward0>)


Note: 

you cannot compute gradient of a vector with respect to another vector (Jacobian). 

In [16]:
y

tensor([[1.1249, 0.0351],
        [2.9506, 1.9355]], grad_fn=<PowBackward0>)

If you call `y.backward()`, you will get:

```shell
RuntimeError: grad can be implicitly created only for scalar outputs
```

## Loss and Autograd together

When we create a network with PyTorch, all of the parameters are initialized with `requires_grad = True`. 

This means that when we calculate the loss and call `loss.backward()`, the gradients for the parameters are calculated. These gradients are used to update the weights with gradient descent. 

Below you can see an example of calculating the gradients using a backwards pass.

### Data

In [11]:
import numpy as np

In [12]:
# The MNIST datasets are hosted on yann.lecun.com that has moved under CloudFlare protection
# Run this script to enable the datasets download
# Reference: https://github.com/pytorch/vision/issues/1938

from six.moves import urllib
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)

In [13]:
### Run this cell

from torchvision import datasets, transforms

# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.5,), (0.5,)),
                              ])
# Download and load the training data
trainset = datasets.MNIST('~/.pytorch/MNIST_data/', download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

### build a model and compute loss

In [14]:
from torch import nn

# Build a feed-forward network
model = nn.Sequential(nn.Linear(784, 128),
                      nn.ReLU(),
                      nn.Linear(128, 64),
                      nn.ReLU(),
                      nn.Linear(64, 10))

criterion = nn.CrossEntropyLoss()
images, labels = next(iter(trainloader))
images = images.view(images.shape[0], -1)

scores = model(images)
loss = criterion(scores, labels)

In [15]:
print('Before backward pass: \n', model[0].weight.grad)

loss.backward()

print('After backward pass: \n', model[0].weight.grad)

Before backward pass: 
 None
After backward pass: 
 tensor([[ 0.0044,  0.0044,  0.0044,  ...,  0.0044,  0.0044,  0.0044],
        [ 0.0005,  0.0005,  0.0005,  ...,  0.0005,  0.0005,  0.0005],
        [-0.0009, -0.0009, -0.0009,  ..., -0.0009, -0.0009, -0.0009],
        ...,
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [-0.0009, -0.0009, -0.0009,  ..., -0.0009, -0.0009, -0.0009],
        [ 0.0002,  0.0002,  0.0002,  ...,  0.0002,  0.0002,  0.0002]])
