# Automatic differentiation
torch.autograd is a built-in differentiation engine that:
- automatically computes the gradient of any computational graph.
- keeps a record of data (tensors) and all executed options (along with resulting new tensors).

This is useful for neural network training, specifically the back propagation algorithm where the model weights (or parameter) are adjusted based on the gradient of the loss function w.r.t. given model parameter.

## NEED TO UNDERSTAND BACK PROPRAGATION ALGORITHM AND EACH EQUATION (ENGSCI 712)

In [2]:
import torch

# When a tensor is first created - it becomes a leaf node.
# All inputs and weights of neural network are leaf nodes of computational graph.
# When any operation is performed on a tensor - it is not a leaf node anymore.
x = torch.ones(5)
y = torch.zeros(3)
w = torch.randn(5, 3, requires_grad=True)  #  nn parameters that we want to optimise.
b = torch.randn(3, requires_grad=True) #  nn parameters that we want to optimise.
z = torch.matmul(x, w) + b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

In [5]:
print("Gradient function for z = {}".format(z.grad_fn))
print("Gradient function for loss = {}".format(loss.grad_fn))

Gradient function for z = <AddBackward0 object at 0x7be8d3a41ae0>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x7be8213c7ee0>


Remember that we set "requires_grad=True" for a tensor when we want to optimise the parameter for that tensor. We can do this after creating the tensor, using "x.requires_grad_(True)".

## Computing gradients

We want to compute the derivatives of our loss function w.r.t. the neural network parameters that we want to optimise.

In [6]:
loss.backward()
print("dLoss/dw:", w.grad)
print("dLoss/db:", b.grad)

dLoss/dw: tensor([[0.3303, 0.2642, 0.3327],
        [0.3303, 0.2642, 0.3327],
        [0.3303, 0.2642, 0.3327],
        [0.3303, 0.2642, 0.3327],
        [0.3303, 0.2642, 0.3327]])
dLoss/db: tensor([0.3303, 0.2642, 0.3327])


After each ".backward()" call, autograd is recreated from scratch and populates a new graph. What also happens after ".backward()" is called:
- computes gradients from each ".grad_fn"
- accumulates them in that tensor's ".grad" attribute
- using chain rule, propagates all the way to leaf tensors.

## Disabling gradient tracking

Why do we want to disable gradient tracking?
- Mark parameters in neural network as frozen parameters
- Speed up computations when performing forward pass, tracking gradients is less efficient.

In [8]:
z = torch.matmul(x, w) + b
print(z.requires_grad)

# How we can stop tracking computations so that we only perform forward
# computations through network.
with torch.no_grad():
  z = torch.matmul(x, w) + b
print(z.requires_grad)

# We can do the equivalent of torch.no_grad() using "parameter".detach().
z = torch.matmul(x, w) + b
z_det = z.detach()
print(z_det.requires_grad)

True
False
False
