<h1>torch.autograd</h1> is PyTorch’s automatic differentiation engine that powers neural network training. In this section, you will get a conceptual understanding of how autograd helps a neural network train.

*<h3>Background</h3>*<br>
Neural networks (NNs) are a collection of nested functions that are executed on some input data. These functions are defined by parameters (consisting of weights and biases), which in PyTorch are stored in tensors.

Training a NN happens in two steps:

Forward Propagation: In forward prop, the NN makes its best guess about the correct output. It runs the input data through each of its functions to make this guess.

Backward Propagation: In backprop, the NN adjusts its parameters proportionate to the error in its guess. It does this by traversing backwards from the output, collecting the derivatives of the error with respect to the parameters of the functions (gradients), and optimizing the parameters using gradient descent. For a more detailed walkthrough of backprop.

In [1]:
import torch
from torch.autograd import Variable

Next, we load an optimizer, in this case SGD with a learning rate of 0.01 and momentum of 0.9. We register all the parameters of the model in the optimizer.

<h2>Differentiation in Autograd</h2><br>
Let’s take a look at how autograd collects gradients. We create two tensors a and b with requires_grad=True. This signals to autograd that every operation on them should be tracked.

In [None]:
a=torch.tensor(2.0,requires_grad=True)
b= torch.tensor(6.0,requires_grad=True)

In [None]:
a+b,a-b,a*b,b/a


(tensor(8., grad_fn=<AddBackward0>),
 tensor(-4., grad_fn=<SubBackward0>),
 tensor(12., grad_fn=<MulBackward0>),
 tensor(3., grad_fn=<DivBackward0>))

We create another tensor Q from a and b.

               Q=3a3−b2

In [None]:
Q = 3*a**3 - b**2

Let’s assume a and b to be parameters of an NN, and Q to be the error. In NN training, we want gradients of the error w.r.t. parameters, i.e.

∂Q/∂a=9a2<br>
∂Q./b=−2b<br>
When we call .backward() on Q, autograd calculates these gradients and stores them in the respective tensors’ .grad attribute.

We need to explicitly pass a gradient argument in Q.backward() because it is a vector. gradient is a tensor of the same shape as Q, and it represents the gradient of Q w.r.t. itself, i.e.

dQ/dQ1<br>
Equivalently, we can also aggregate Q into a scalar and call backward implicitly, like Q.sum().backward().



In [None]:
Q.backward()

In [None]:
a.grad.data

tensor(36.)

In [None]:
b.grad.data

tensor(-12.)

In [None]:
print(9*a**2 == a.grad)
print(-2*b == b.grad)

tensor(True)
tensor(True)
