Based on: https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html

This is the same as autodiff and GradientTapes for TF: https://www.tensorflow.org/guide/autodiff

In [1]:
import torch

In [2]:
x = torch.ones(2, 2, requires_grad=True)

In [3]:
x

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

In [4]:
y = x + 2

In [5]:
y

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)

In [6]:
y.grad_fn

<AddBackward0 at 0x7f9360a167f0>

Since we did an add (`+`) operation, pytorch knows that the operation that should be used for differentiation it's `AddBackward`

In [7]:
z = y * y * 3
out = z.mean()

print(z, out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)


### Gradients

We can compute the gradient now

In [8]:
out.backward()

Because we are tracking `x` (we did `requires_grad`), then the value of the gradient here is `d(out)/d(x)`

In [9]:
x.grad

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])

Generally speaking, `torch.autograd` is an engine for computing vector-Jacobian product.

Therefore the output is a matrix, the Jacobian Matrix, see the docs for explanation on this.

Since this is `d(out)/d(x)`, the values on the grad matrix represent how much a change on the input `x` affects the output, in this case its `4.5`.

### vector-Jacobian example

In [10]:
x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

In [11]:
y

tensor([  309.4257, -1326.7697,  1308.4841], grad_fn=<MulBackward0>)

In [12]:
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
# v = torch.tensor([1, 1, 1], dtype=torch.float)
y.backward(v)

x.grad

tensor([1.0240e+02, 1.0240e+03, 1.0240e-01])