# Automatic Differentiation with torch.autograd

When training neural networks, the most frequently used algorithm is backpropagation. This algorithm involves adjusting parameters (model weights) based on the gradient of the loss function with respect to each parameter. In PyTorch, this process is facilitated by its built-in differentiation engine called torch.autograd, which enables automatic computation of gradients for any computational graph and offers a methods to solve the *derivates* and in particular the *gradients*.
In backpropagation, the algorithm works by first forward-propagating input data through the neural network to compute the predicted output. Then, it calculates the loss between the predicted output and the actual target output. Afterward, it computes the gradients of the loss function with respect to each parameter of the network using the chain rule of calculus, propagating the gradients backward through the network. Finally, it updates the parameters using an optimization algorithm like stochastic gradient descent (SGD) to minimize the loss and improve the model's performance.

Consider the simplest one-layer neural network, with input x, parameters w and b, and some loss function. It can be defined in PyTorch in the following manner:

In [9]:
import torch

x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
w = torch.randn(5, 3, requires_grad=True)  # weight tensor with random values and gradient tracking enabled
b = torch.randn(3, requires_grad=True)  # bias tensor with random values and gradient tracking enabled
z = torch.matmul(x, w) + b  # linear transformation followed by addition of bias
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)  # compute binary cross-entropy loss
print(loss)  # print the computed loss


tensor(0.8102, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)


![Testo alternativo](https://pytorch.org/tutorials/_images/comp-graph.png)

In this network, ***w and b are parameters***, which we need to optimize. Thus, we need to be able to compute the gradients of loss function with respect to those variables. In order to do that, we set the requires_grad property of those tensors.

A function that we apply to tensors to construct computational graph is in fact an object of class Function. This object knows how to compute the function in the forward direction, and also how to compute its derivative during the backward propagation step

In [10]:
print(f"Gradient function for z = {z.grad_fn}")
print(f"Gradient function for loss = {loss.grad_fn}")

Gradient function for z = <AddBackward0 object at 0x73b14b087130>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x73b14b087a00>


# Computing Gradients
To optimize weights of parameters in the neural network, we need to compute the derivatives of our loss function with respect to parameters, namely, we need 
$$
\frac{\sigma_{\text{loss}}}{\sigma_{\text{w}}} \
$$ and 
$$
\frac{\sigma_{\text{loss}}}{\sigma_{\text{b}}}
$$
under some fixed values of x and y. To compute those derivatives, we call loss.backward(), and then retrieve the values from w.grad and b.grad:

In [11]:
loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.2379, 0.1330, 0.1629],
        [0.2379, 0.1330, 0.1629],
        [0.2379, 0.1330, 0.1629],
        [0.2379, 0.1330, 0.1629],
        [0.2379, 0.1330, 0.1629]])
tensor([0.2379, 0.1330, 0.1629])
