<a href="https://githubtocolab.com/BorjaRequena/Neural-Network-Course/blob/master/nbs/course/deep_learning/pytorch_intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/></a>

# PyTorch basics

[PyTorch](https://pytorch.org) is an automatic differentiation framework that, essentially, is your [NumPy](https://numpy.org) for machine learning and anything that involves exact derivatives.
PyTorch natively supports hardware accelerators, such as GPUs, that can significantly speed up matrix multiplication operations, as well as distributed computing to handle large workloads.

The main element of PyTorch is a tensor, which behaves very similarly to NumPy arrays.

In [None]:
import torch

a_tensor = torch.tensor([1.0, 2.0, 3.0])
b_tensor = torch.tensor([4.0, 5.0, 6.0])
type(a_tensor)

torch.Tensor

We can perform any kind of operations over tensors, from matrix to element-wise operations.

In [None]:
a_tensor + 3

tensor([4., 5., 6.])

In [None]:
a_tensor @ b_tensor  # dot product

tensor(32.)

Tensors have `requires_grad` a property that indicates whether gradients should be computed with respect to their values.
By default, this is set to `False`.

In [None]:
a_tensor.requires_grad

False

However, if we set it to `True`, we will be able to compute the gradient of scalar quantities with respect to the tensor.
Let's consider a simple example where we add the sine and cosine of both tensors:
$$\sum_i\sin(a_i) + \cos(b_i)$$

::: {.callout-note}
# Derivatives
Recall that $\frac{d}{dx}\sin(x) = \cos(x)$ and $\frac{d}{dx}\cos(x) = -\sin(x)$.
:::

In [None]:
a_tensor.requires_grad = True
b_tensor.requires_grad = True

value = torch.sum(torch.sin(a_tensor) + torch.cos(b_tensor))
value.backward()

The result of the sum, `value`, is also a tensor.
When we call the `backward` method, it computes the gradient over all the tensors that have been involved in its calculation.
The resulting gradients are stored in the tensors themselves. 

In [None]:
a_tensor.grad, torch.cos(a_tensor)

(tensor([ 0.5403, -0.4161, -0.9900]),
 tensor([ 0.5403, -0.4161, -0.9900], grad_fn=<CosBackward0>))

In [None]:
b_tensor.grad, -torch.sin(b_tensor)

(tensor([0.7568, 0.9589, 0.2794]),
 tensor([0.7568, 0.9589, 0.2794], grad_fn=<NegBackward0>))

::: {.callout-warning}
## Zero out gradients

Subsequent gradient computations with respect to the same tensor will add the new gradient to the previous one.
We must take this into account and reset the gradients manually when needed. 

:::

Computing the gradient of another quantity with respect to the same tensors will modify its gradient.
Consider the sum of all the entries of $\mathbf{a}$.
The gradient with respect to itself is 1 for every entry.
This value will be added to the previously existing gradient, although $\mathbf{b}$ will not be affected.

In [None]:
a_sum = torch.sum(a_tensor)
a_sum.backward()

a_tensor.grad, torch.cos(a_tensor) + 1

(tensor([1.5403, 0.5839, 0.0100]),
 tensor([1.5403, 0.5839, 0.0100], grad_fn=<AddBackward0>))

In [None]:
b_tensor.grad

tensor([0.7568, 0.9589, 0.2794])

To reset the gradients of a tensor, we can manually set them to `None` or zero.

In [None]:
a_tensor.grad.zero_()
b_tensor.grad = None

::: {.callout-note}
In machine learning applications, we hardly ever zero out gradients at the tensor level.
We typically rely on the `zero_grad()` method from either our [optimizer](https://pytorch.org/docs/stable/optim.html#torch.optim.Optimizer) or [module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) to reset the gradients.
See [the docs](https://pytorch.org/tutorials/recipes/recipes/zeroing_out_gradients.html) for further details.
::: 