<a href="https://colab.research.google.com/github/AleMazzeo2001/PyTorch_Tutorials/blob/main/01_Pytorch_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LABORARTORY OF MACHINE LEARNING

# Introduction to Pytorch

[Pytorch](https://pytorch.org/) is a free and open-source deep learning framework originally developed by Meta AI and now part of the Linux Foundation.
Pytorch is widely used by the research community (while the industry may prefer [Tensorflow](https://www.tensorflow.org/))).
The reference language is Python, but C++ and Java are also supported.
Pytorch supports the main building blocks of a deep learning system: optimized numerical methods, automatic differentiation, and GPU computing.


Topics:
- Tensors
- Autograd
- Models
- Data
- Optimization


To start using Pytorch in Colab, import it with the command:

## Tensors

Pytorch stores data in **tensors** of class `torch.Tensor`. In this context, a tensor is a multi-dimensional array of homogenous numeric values.

Pytorch tensors are similar to numpy arrays but with all the extra features needed to build deep learning systems (namely, automatic differentiation and GPU computing).

You can create a tensor by calling the `torch.tensor' function and passing a list of values as argument.

You can also create tensors by using one of the many factory functions.

Two important properties of tensors are the shape and the data type.

You can change the shape and the dtype of tensors with the `view` and `to` methods (`view` creates a new tensor for the same data while `to` may create a copy).

Tensors support the usual aritmetic operations `+`, `-`, `*`, `/`, `**`. There is also the `@` operator for vector inner product, and matrix multiplication.

pTensors also support reduction operators like `sum`, `mean`, `max`, `min`, `argmax`, `argmin`.

They can work on all the values in the tensor, or along a given dimension.

A very useful mechanism in Pytorch is *broadcasting*. It allows to use together tensors of different shapes, provided that some conditions are fulfilled:
- corresponding dimensions agree;
- or when they disagree, at least one is exactly 1;
- if the number of dimensions is different, dimensions of length one are added at the beginning of the shape until they match.

Broadcasted dimensions are replicated.

For instance:
- shapes $(3, 4)$ and $(3, 4)$ are OK (identical);
- shapes $(3, 1)$ and $(3, 4)$ are OK (the second dimension is broadcasted);
- shapes $(3, 1)$ and $(1, 4)$ are OK (both dimensions are broadcasted);
- shapes $(4,)$ and $(3, 4)$ are OK (the first dimension is added and broadcasted.

Dimensions of length one can be removed with the `squeeze` method, and added with `unsqueeze`. The `T` attribute transposes 2D tensors.

### Exercise

Given the $4 \times 2$ tensor `x` and the $5 \times 2$ tensor `y`, each representing a set of vectors of length 2, compute the tensor $4 \times 5$ tensor `M` representing the pairwise Euclidean distance between elements of `x` and `y`:

$\rm M[i,j] = \| x[i, :] - y[j, :] \|$

Try to avoid loops, and use only basic operators and reduction methods (and the function `torch.sqrt` to compute the square root).

In [None]:
x = torch.tensor([[1.0, 2], [0, 1], [2, 2], [-1, 3]])
y = torch.tensor([[0.0, -1], [1, 1], [2, 2], [0, 3], [-1, -1]])
# Expected result
# tensor([[3.1623, 1.0000, 1.0000, 1.4142, 3.6056],
#         [2.0000, 1.0000, 2.2361, 2.0000, 2.2361],
#         [3.6056, 1.4142, 0.0000, 2.2361, 4.2426],
#         [4.1231, 2.8284, 3.1623, 1.0000, 4.0000]])


## Autograd

Autograd is the Pytorch component providing classes and functions for the automatic differentiation of arbitrary scalar valued functions. Typically you don't need to explicitly refer to autograd primitives. The other components of Pytorch will manage them in the right way unless special behavior is needed. Nevertheless, some knowledge od the basics of autograd can be useful.

The `requires_grad` attribute of a tensor must be set to `True` to enable the automatic differentation mechanism. Pytorch will implicitly build a graph representing the computation involving these tensors. The `backward` method called on a scalar tensor will compute all the relevant derivatives and will store them in the `grad` attribute of the tensors.


tensor(66.5500, grad_fn=<SumBackward0>)


Gradients need to be reset before starting a new computation.

Sometimes it is useful to ignore how a tensor has been obtained, blocking the backward computation at that point. To do so, it is enough to call the `detach` mehots, which will return a new tensor for the same data, detached from the original graph.

If you want to completely disable autograd from a sequence of operations, you can enclose them in the `torch.no_grad` context manager. This is often used to save computation after a model has been trained.

### Exercise

Consider the function $L$ defined as:

$L(y, z) = -y \log p - (1 - y) \log (1 - p), \text{ where } p = \frac{1}{1 + \exp(-z)}.$

Compute the value of $L$ in the following four cases:

| z | y |
|-|-|
| -3 | 0 |
| -1 | 0 |
|  1.5 | 1 |
|  4 | 1 |

Then use autograd to compute the derivative of the average of $L$ with respect to the four values of $z$.

The natural logarithm is computed by `torch.log`, and the exponential by `torch.exp`.

In [None]:
z = torch.tensor([-3, -1, 1.5, 4])
y = torch.tensor([0, 0, 1, 1])
# Expected result:
# 0.14535307884216309 (average L)
# tensor([ 0.0119, 0.0672, -0.0456, -0.0045]) (gradient wrt z)


## Models

In pytoch machine learning models are the composition of "modules". The most important class is `torch.nn.Module` which is the base for common ML operators and for their compositions.

For instance, are subclasses of `torch.nn.Module` the following operations, widely used in the definition of neural networks:
- `torch.nn.Linear`: fully connected layers;
- `torch.nn.ReLU`, `torch.nn.Sigmoid`, and other activation functions;
- `torch.nn.Conv`, `torch.nn.Conv2d`: convolutional layers;
- `torch.nn.BatchNorm`, `torch.nn.InstanceNorm`, and other normalization functions;
- `torch.nn.LSTM`, `torch.nn.GRU`: recurrent modules;
- ...
- and many others.

The main feature of modules is the `forward` method, which takes as input a tensor, and computes a new tensor (some modules have multiple input and/or output tensors, but are not very common).

A module stores its own parameters which, by default, are randomly initialized.

Simple models, consisting in a linear chain of modules can be defined using the `torch.nn.Sequential` class.  

For less simple models, where the application of modules is not strictly sequential, you must define your own `torch.nn.Module` subclass.

## Dataset

Pytorch uses instances of `torch.utils.data.Dataset` to manage data.  Many popular datasets have been made available by the pytorch community. If you want to define your own, it is enogh to:
1. define a new subclass of `torch.utils.data.Dataset`;
2. implement a suitable `__init__` method;
3. implement the `__len__` method, that returns the number of element in the dataset;
4. implement the `__getitem__` method, that returns an element of the dataset, given its index.

The elements in the dataset are typically tuples of tensors, or other kind of data.

Im many cases, you cannot afford to keep all the data in memory. The class `torch.utils.data.DataLoader` is pytorch's main solution to iterate over a dataset.

Dataloaders provide several important features
- they collate data into batches;
- they randomly shuffle the data in the dataset;
- they allows for parallel processing of the data by instantiating multiple workers (i.e., threads).

To create a dataloder, just call the constructor, pass the dataset, and configure it by setting the many options it supports.
To use the dataloader, just iterate over it with a python loop.

## Optimization

Optimization is a key component of modern Machine Learning.
Pytorch offers many optimization algorithms, based on gradient descent.

Thanks to autograd, using them is very simple. First, you have to create one optimizer object, by selecting one of the many optimization algorithm. Then, in the main training loop, you must:

1. reset the gradient by calling `optimizer.zero_grad()`;
2. compute the loss function;
3. compute the gradient by calling `loss.backward()`;
4. take a step by calling `optimizer.step()`.
