A Gentle Introduction to `torch.autograd`
=========================================

`torch.autograd` is PyTorch's automatic differentiation engine that
powers neural network training. In this section, you will get a
conceptual understanding of how autograd helps a neural network train.



Differentiation in Autograd
===========================



In [None]:
import torch

# Create tensors.
x = torch.tensor(3.)
m = torch.tensor(4., requires_grad=True)
c = torch.tensor(5., requires_grad=True)

In [None]:
x,m,c

(tensor(3.), tensor(4., requires_grad=True), tensor(5., requires_grad=True))

We can combine tensors with the usual arithmetic operations.



In [None]:
y = m * x + c
print(y)

tensor(17., grad_fn=<AddBackward0>)


$$y = mx +c$$

In [None]:
# Compute gradients
y.backward()

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

$$\frac{d y}{d m} = x$$

$$\frac{d y}{d c} = 1$$

In [None]:
# Display gradients
print('dy/dm:', m.grad)
print('dy/dc:', c.grad)

dy/dm: tensor(3.)
dy/dc: tensor(1.)


The `autograd` tracks only those tensors for which the `requires_grad=True`.

Let\'s now create two tensors `a` and `b` with `requires_grad=True`. This signals to
`autograd` that every operation on them should be tracked.

In [None]:
a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)

We create another tensor `Q` from `a` and `b`.

$$Q = 3a^3 - b^2$$


In [None]:
Q = 3*a**3 - b**2
print(Q)

tensor([-12.,  65.], grad_fn=<SubBackward0>)


Let\'s assume `a` and `b` to be parameters of an NN, and `Q` to be the
error. In NN training, we want gradients of the error w.r.t. parameters,
i.e.

$$\frac{\partial Q}{\partial a} = 9a^2$$

$$\frac{\partial Q}{\partial b} = -2b$$

When we call `.backward()` on `Q`, autograd calculates these gradients
and stores them in the respective tensors\' `.grad` attribute.


In [None]:
Q.backward()

RuntimeError: grad can be implicitly created only for scalar outputs

`grad` can be implicitly created only for scalar outputs. Since $Q$ (or `Q`) is not a scalar we cannot compute gradients over it (*or atleast directly*).

In `PyTorch`, when you perform a backward pass using `.backward()`, gradients are computed for scalar outputs by default, which is straightforward for most loss functions used in training neural networks.


However, when the output (`Q` in our case) is not a scalar but a vector, `PyTorch` requires an additional argument to `.backward()`, specifying the gradients of the output tensor with respect to some scalar value. This additional argument is `gradient=external_grad` in our example.

In [None]:
external_grad = torch.tensor([1., 1.])
Q.backward(gradient=external_grad)

The `external_grad` tensor we are passing as a parameter to `Q.backward(gradient=external_grad)` specifies the gradients of some higher-level scalar with respect to `Q`.

Essentially, it tells `PyTorch` how to weight each element of `Q` in the gradient computation, effectively treating the elements of `external_grad` as coefficients for a linear combination of the elements in `Q` that results in a scalar.



In [None]:
a.grad, b.grad

(tensor([36., 81.]), tensor([-12.,  -8.]))

Gradients are now deposited in `a.grad` and `b.grad`


In [None]:
# check if collected gradients are correct
print(9*a**2 == a.grad)
print(-2*b == b.grad)

tensor([True, True])
tensor([True, True])


Equivalently, we can also aggregate Q into a scalar and call backward
implicitly, like `Q.sum().backward()`.


Source: [A Gentle Introduction to torch.autograd
](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html)