# Automatic differentiation with torch.autograd

When training neural networks, the most frequentily used algorithm is **back propagation**. In this algorithm, parameters (model weights) are adjusted according to the **gradient** of the loss function with respect to the given parameter.

To compute those gradients, PyTorch has a built-in differentiation engine called `torch.autograd`. It supports automatic computation of gradient for any computational graph.

Consider the simplest one-layer neural network, with input `x`, parameters `w` and `b`, and some loss function. It can be defined in PyTorch in the following manner:

In [14]:
import torch

x = torch.ones(5) # input tensor
y = torch.zeros(3) # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w) + b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

You can set the value of `requires_grad` when creating a tensor, or later by using `x.required_grad_(True)`

In [15]:
print('Gradient function for z =', z.grad_fn)
print('Gradient function for z =', loss.grad_fn)

Gradient function for z = <AddBackward0 object at 0x7fc570250280>
Gradient function for z = <BinaryCrossEntropyWithLogitsBackward object at 0x7fc570250310>


## Computing Gradients

In [16]:
loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.3233, 0.1993, 0.1272],
        [0.3233, 0.1993, 0.1272],
        [0.3233, 0.1993, 0.1272],
        [0.3233, 0.1993, 0.1272],
        [0.3233, 0.1993, 0.1272]])
tensor([0.3233, 0.1993, 0.1272])


We can only obtain the `grad` properties for the leaf nodes of the computational graph, which have `requires_grad` property set to `True`. For all other nodes in our graph, gradients will not be available. 
We can only perform gradient calculations using `backward` once on a given graph, for performance reasons. If we need to do several `backwards` calls on the same graph, we need to pass `retain_graph=True` to the `backward` call.

## Disabling Gradient Tracking

By default, all tensors with `requires_grad=True` are tracking their computational history and support gradient computation. However, there are some cases when we do not need to do that, for example, when we have trained the model and just want to apply it to some input data, e.g. we only want to do *forward* computations through the network. We can stop tracking computations by surrounding our computation code with a `torch.no_grad()` block.

In [18]:
z = torch.matmul(x, w) + b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x, w) + b
print(z.requires_grad)

True
False


In [19]:
z = torch.matmul(x, w) + b
z_det = z.detach()
print(z_det.requires_grad)

False


Some reasons to disable gradient tracking:
- To mark some parameters in your neural network as **frozen parameters**. This is a very common scenario for **finetuning a pretrained network**
- To **speed up computations** when you are only doing forwards pass, because computations of tensors that do not track gradients would be more efficient