## AUTOMATIC DIFFERENTIATION WITH `TORCH.AUTOGRAD` ## 

When training neural networks, the most frequently used algorithm is back propagation. In this algorithm, parameters (model weights) are adjusted according to the gradient of the loss function with respect to the given parameter.

To compute those gradients, PyTorch has a built-in differentiation engine called torch.autograd. It supports automatic computation of gradient for any computational graph.

In [5]:
import torch 

x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y) # reference is added for backward propagation

In [6]:
print('Gradient function for b =',b.grad_fn)
print('Gradient function for z =',z.grad_fn)
print('Gradient function for loss =', loss.grad_fn)

Gradient function for b = None
Gradient function for z = <AddBackward0 object at 0x000002373D6D4AF0>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward object at 0x000002373D6D43A0>


To optimize weights of parameters in the neural network, we need to compute the derivatives of our loss function with respect to parameters. To compute those derivatives, we call loss.backward(), and then retrieve the values from w.grad and b.grad:

#### We can only perform gradient calculations using backward once on a given graph, for performance reasons. If we need to do several backward calls on the same graph, we need to pass `retain_graph=True` to the backward call ####

In [10]:
loss.backward(retain_graph=False)
print(w.grad)
print(b.grad)

tensor([[1.2419, 0.3172, 1.1146],
        [1.2419, 0.3172, 1.1146],
        [1.2419, 0.3172, 1.1146],
        [1.2419, 0.3172, 1.1146],
        [1.2419, 0.3172, 1.1146]])
tensor([1.2419, 0.3172, 1.1146])


By default, all tensors with requires_grad=True are tracking their computational history and support gradient computation. However, there are some cases when we do not need to do that, for example, when we have trained the model and just want to apply it to some input data, i.e. we only want to do forward computations through the network. We can stop tracking computations by surrounding our computation code with torch.no_grad() block:

In [25]:
z = torch.matmul(x, w)+b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)

True
False


In [28]:
# Achieves the same result

z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

False


### There are reasons you might want to disable gradient tracking: ###
- To mark some parameters in your neural network at frozen parameters. This is a very common scenario for finetuning a pretrained network (more on this later!)
- To speed up computations when you are only doing forward pass, because computations on tensors that do not track gradients would be more efficient.