In [None]:
import torch

# Gradients
## Calculating gradients

In PyTorch, it's as easy as calling `backward()` on a tensor.
Only requirement: the tensor has to be enabled for gradient calculation.
If you look at our previous tensors e.g., they won't have the `requires_grad` attribute set:

In [None]:
a = torch.tensor([1.0, 2.0, 3.0], requires_grad=False)
print(a.requires_grad)

This is easy to fix: we can just tell PyTorch that a tensor should record gradients:

In [None]:
x = torch.tensor([1.0], requires_grad=True)
y = torch.tensor([2.0], requires_grad=True)
print("x =", x)
print("y =", y)
z = x * 3 + y + 2
print("z = x * 3 + y + 2 =", z)
u = z * z
print("u = z * z =", u)

From this point on, PyTorch will record all differentiable operations (see `grad_fn`) for the contributing tensors (that were not done in-place) and we can calculate the gradients:

In [None]:
z.retain_grad() # keeps gradient info for z instead of only for the leave nodes x,y
# backward pass from u
u.backward()
print("du/dz =", z.grad) # du/dz
print("du/dx =", x.grad) # du/dx
print("du/dy =", y.grad) # du/dy

**NOTE:** Don't try to backward multiple times on the same graph instance, it will throw an error (you probably don't want to do this anyway)! 
If you really have to do this, pass the parameter `retain_graph=True` as an argument to the backward call.

> We usually only need to call backward on the output of our loss function.

You can also assign elements in a vector.\
There is an important difference though:\
**NEVER modify individual tensor entries in-place if you want to calculate a gradient later** (those operations are not recorded for the computational graph)!\
Choose to use PyTorch operations instead:

In [None]:
bad = torch.zeros(2,2, requires_grad=True)
bad[0,0] = 1.0 # in-place modification

In [None]:
good = torch.zeros(2,2, requires_grad=True)
good = good + torch.eye(2) # re-assignment: new tensor is created, but you loose access to original tensor `good`
good.retain_grad() # only to be able to access `grad` attribute later
print(good)

We can easily get a different view of the tensor, e.g. flatten it:

In [None]:
good_loss = good.view(-1).sum(dim=0) # flatten tensor first, then sum all values
print(good_loss)

In [None]:
good_loss.backward() # success -> gradients computed
print(good.grad) # the computed gradient for tensor `good`

CAUTION: Gradients are accumulated hence you have to zero them after you've used them (e.g. after optimizing your weights)

In [None]:
good_loss.backward() # compute gradients again
print(good.grad)