# Imports

In [1]:
import torch

  cpu = _conversion_method_template(device=torch.device("cpu"))


# Calculating Gradients

In PyTorch, it's as easy as calling `backward()` on a tensor.
Only requirement: the tensor has to be enabled for gradient calculation.
If you look at previously used tensors e.g., they won't have the `requires_grad` attribute set:

In [2]:
a = torch.tensor([1.0, 2.0, 3.0])
print(a.requires_grad)

False


This is easy to fix: we can just tell PyTorch that a tensor should record gradients:

In [3]:
x = torch.tensor([1.0], requires_grad=True)
y = torch.tensor([2.0], requires_grad=True)
print("x =", x)
print("y =", y)
z = x * 3 + y + 2
print("z = x * 3 + y + 2 =", z)
u = z * z
print("u = z * z =", u)

x = tensor([1.], requires_grad=True)
y = tensor([2.], requires_grad=True)
z = x * 3 + y + 2 = tensor([7.], grad_fn=<AddBackward0>)
u = z * z = tensor([49.], grad_fn=<MulBackward0>)


From this point on, PyTorch will record all differentiable operations (see `grad_fn`) for the contributing tensors (that were not done in-place) and we can calculate the gradients:

In [4]:
z.retain_grad() # keeps gradient info for z instead of only for the leave nodes x,y
# backward pass from u
u.backward()
print("du/dz =", z.grad) # du/dz
print("du/dx =", x.grad) # du/dx
print("du/dy =", y.grad) # du/dy

du/dz = tensor([14.])
du/dx = tensor([42.])
du/dy = tensor([14.])


**NOTE:** Don't try to backward multiple times on the same graph instance, it will throw an error (you probably don't want to do this anyway)! 
If you really have to do this, pass the parameter `retain_graph=True` as an argument to the backward call.

> We usually only need to call backward on the output of our loss function.

You can also assign elements in a vector.\
There is an important difference though:\
**NEVER modify individual tensor entries in-place if you want to calculate a gradient later** (those operations are not recorded for the computational graph)!

In [5]:
bad = torch.zeros(2,2, requires_grad=True)
bad[0,0] = 1.0 # in-place modification

RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

Choose to use PyTorch operations instead:

In [6]:
good = torch.zeros(2,2, requires_grad=True)
good = good + torch.eye(2) # re-assignment: new tensor is created, but you loose access to original tensor `good`
good.retain_grad() # only to be able to access `grad` attribute later; by default, `grad` is not stored for intermediate tensors
print(good)

tensor([[1., 0.],
        [0., 1.]], grad_fn=<AddBackward0>)


Sometimes you will need a different view on your tensor.\
That is, a tensor with the same data but of different shape.\
For this, PyTorch provides the `view` function on tensors.\
For example:

In [7]:
# create a 1D tensor with values 0,1,2,...,15
tensor_1d = torch.tensor(range(16))
print(tensor_1d)

# view tensor as 4x4 tensor
tensor_2d = tensor_1d.view(4,4)
print(tensor_2d)

tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15]])


Note how the 2x2 tensors is created: The inner dimensions are filled first.
You can use the same function to view any tensor as 1D tensor:

In [8]:
# view tensor as vector of length 16
tensor_2d_as_1d = tensor_2d.view(-1) # -1 means "infer this dimension"
print(tensor_2d_as_1d)

tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])


Applying this to our tensor `good`, we can compute a loss as the sum (not really a loss, but we'd like to have a scalar we can compute the derivative for):

In [9]:
good_loss = good.view(-1).sum(dim=0) # flatten tensor first, then sum all values
print(good_loss)
good_loss = good.sum() # equivalent to above
print(good_loss)

tensor(2., grad_fn=<SumBackward1>)
tensor(2., grad_fn=<SumBackward0>)


In [13]:
good_loss.backward() # success -> gradients computed
print(good.grad) # the computed gradient for tensor `good`

tensor([[4., 4.],
        [4., 4.]])


CAUTION: Gradients are accumulated hence you have to zero them after you've used them (e.g., after optimizing your weights)

In [14]:
good_loss.backward() # compute gradients again
print(good.grad)

tensor([[5., 5.],
        [5., 5.]])
