# Pytorch beginner course: Autograd (Pytorch for gradient computation)

## Summary

- [What is Autograd?](#what-is-autograd)
- [How machines compute derivates](#how-machines-compute-derivates)
- [Gradient calculation](#gradient-calculation)
- [Backward function and Jacobian matrix](#backward-function-and-jacobian-matrix)
- [Gradient history management](#gradient-history-management)
- [Glossary of the used tools](#glossary-of-the-used-tools)
    - [Methods](#methods)
    - [Properties](#properties)
- [References](#references)
- [Author](#author)

## What is autograd?

Machine learning can be considered as an optimization problem, usually a minimization problem, for this reason the **derivates** are the main tools used by the training algorithms, thankfully pytorch offers a specifi module to solve the *derivates* and in particular the *gradients* of the functions, this module is `torch.autograd`.

As we will see, the concept of the derivates is very important to understand the mechanism of a learning algorithms, and with `autograd` we will not waste time to implements manually all functions that calculate the gradients, pytorch is one of the most used deep learning framework also for the `autograd`.

The main pros of the `autograd` are:
1. *Easy to use*
2. *Efficiency*
3. *Flexibility*

## How machines compute derivates

In computer science we have three different ways to calculate derivates:

* Numerical differentiation
* Symbolic differentiation
* Automatic differentiation *(that combine the two previous approaches)*

in this lecture we will not study the theory behind this three approaches, but that's important to know the main pros and cons among them:

| Method | Precision | Velocity | Application |
|:------:|:---------:|:--------:|:-----------:|
| Numerical Differentiation | Approximate | Rapidly | Simple function |
| Symbolic Differentiation | Exactly | Slow | Complex function |
| Automatic Differentiation | Exactly or Approximate | Rapidly | Complex function |

The `autograd` module use the **Automatic differentiation**, which combines the **Numerical** and **Symbolic** approaches

## Gradient calculation

Now we will see in practice how is possible to calculate the gradients with `autograd`.

For the first is necessary to set the attribute `requires_grad=True` to specify that we want track the tensor during the gradients calculation

In [27]:
import torch

x = torch.tensor([2.,3.], requires_grad=True)
y = torch.tensor([4.,2.], requires_grad=True)

print(x)
print(y)

tensor([2., 3.], requires_grad=True)
tensor([4., 2.], requires_grad=True)


Now our tensors are ready to know their gradients during the computation.

For semplicity now we apply a simple operation between this two tensors and we will save the output of this opertation into an another tensor `z`

In [28]:
z = (x**3)+(y**2)

print(z)

tensor([24., 31.], grad_fn=<AddBackward0>)


Now is very importanto to pay attention to the last output, we will see that the our tensor `z` have a particular attribute named `grad_fn` that explain the gradient function that we will use to apply the **backward** method, as we can see the gradient function is named `<AddBackward0>` and this information is very important for us because the name of gradient function explain us that the tensor `z` was born from a sum operation *(the sum between $x^3$ and $y^2$)*.

The name of *gradient function* change if we change the basic operation to obtain the output tensor, as follow we show some of these functions:

| Operation | Gradient Function |
|:---------:|:-----------------:|
| `+` | `<AddBackward>` |
| `-` | `<SubBackward>` |
| `*` | `<MulBackward>` |
| `/` | `<DivBackward>` |
| `mean()` | `<MeanBackward>` |

Essentially, the `grad_fn` contain an object instance pointer of the class `torch.autograd.Function` if the tensor was made by an operation between two tensor, otherwise the `grad_fn` attribute have `None` value.

In [29]:
print("grad_fn of z: ", z.grad_fn)   # z was made by an operation between x and y
print("grad_fn of x: ", x.grad_fn)   # x was made by a user (don't have a gradient function)
print("grad_fn of y: ", y.grad_fn)   # y was made by a user (don't have a gradient function)

grad_fn of z:  <AddBackward0 object at 0x000001B454F939A0>
grad_fn of x:  None
grad_fn of y:  None


Now we can procede to compute a gradients of `x` and `y`

In [30]:
# backward() method compute the gradients respect to the leaf of the graph (x and y are the leafs in our scenario)
# the backward() method work only on a scalar value, for this reason we must do an operation that compress the output tensor in a scalar value,
# in this case we have used a sum() function, but we could have used any function, for example mean()
z = z.sum()

# Now we can calculate the gradients...
z.backward()

# ...and print the gradients of our input tensor x and y
print("Gradients of x:", x.grad)

print("Gradients of y:", y.grad)

Gradients of x: tensor([12., 27.])
Gradients of y: tensor([8., 4.])


The output obtained is given by the calculation:

$$
\begin{bmatrix}
\frac{\partial z}{\partial x}
\\
\frac{\partial z}{\partial y}
\end{bmatrix}
=
\begin{bmatrix}
3x^2
\\
2y
\end{bmatrix}
=
\begin{bmatrix}
3x_1^2, 3x_2^2
\\
2y_1, 2y_2
\end{bmatrix}
=
\begin{bmatrix}
3*2^2, 3*3^2
\\
2*4, 2*2
\end{bmatrix}
=
\begin{bmatrix}
12, 27
\\
8, 4
\end{bmatrix}
$$

Below we can see the representation of our simple computation

![backward concept](images/backward.svg)

## Backward function and Jacobian matrix

parlare della funzione backward di quando prende in input un vettore, di come funziona sotto la funzione backward che sostanzialmente si basa su un prodotto Jacobiana per vettore.

## Gradient history management

Mettere tutti i metodi e le procedure per cancellare il tracciamento del gradiente come with torch.no_grad() e ricordare che non tenere traccia snellisce di molto i costi computazionali.

## Glossary of the used tools

### Methods

- `torch.autograd.backward()`

### Properties

- `torch.autograd.grad`
- `torch.autograd.requires_grad`
- `torch.autograd.grad_fn`

## References

[Pytorch documentation](https://pytorch.org/docs/stable/index.html)

## Author

Emilio Garzia, 2024

[Github](https://github.com/EmilioGarzia)

[Linkedin](https://www.linkedin.com/in/emilio-garzia-58a934294/)