<img src='https://upload.wikimedia.org/wikipedia/fr/thumb/e/ed/Logo_Universit%C3%A9_du_Maine.svg/1280px-Logo_Universit%C3%A9_du_Maine.svg.png' width="300" height="500">
<br>
<div style="border: solid 3px #000;">
    <h1 style="text-align: center; color:#000; font-family:Georgia; font-size:26px;">Introduction à l'IA</h1>
    <p style='text-align: center;'>Master Informatique</p>
    <p style='text-align: center;'>Anhony Larcher</p>
</div>


# Introduction à PyTorch

**Objectif**: 





# Introduction a PyTorch: les classes de base

La classe de base en **PyTorch** est le tenseur; les opérations de base sur les tenseurs sont décrites dans la [documentation de **PyTorch**](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html)

En Pytorch il y a deux façons de définir un réseau de neurone:
* dériver la classe **module**
* utiliser un objet **sequential**

In [2]:
import torch

Tout calcul en PyTorch est fait sur un *device* qu'il vous appartient de définir.

Généralement, on utilise les **cpu** pour préparer les données et visualiser, et les **gpu** pour le calcul dans les réseaux.

```python
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Using {} device'.format(device))
```

## La classe de base pour définir un réseau de neurones est la classe **module**

On dérive cette classe pour créer sa propre architecture puis on implémente la fonction **forward**.

Nous allons définir un block résiduel (pour implémenter un ResNet).

```python
class ResBlock(torch.nn.Module):
    """

    """
    expansion = 1

    def __init__(self, channels, stride=1):
        super(ResBlock, self).__init__()
        ... a completer
        
    def forward(self, x):
        ... a completer
```

Ce bloc contient dans l'ordre:

* 1 couche convolutionnelle 2D avec:
    * dimension d'entree: = *channels*
    * dimension de sortie = *channels*
    * un noyau de taille 3
    * un stride égale a celui fournis en paramètre
    * padding = 1
    * aucun biais
* une couche de batch normalization (de dimension *channels*)
* 1 activation de type *LeakyReLU*
* une autre couche convolutionnelle avec les mêmes paramètres
* une couche de batch normalization (de dimension *channels*)
* 1 activation de type *LeakyReLU*
* Une fois passé ces couches, on ajoute l'entrée non modifiée.
* 1 activation de type *LeakyReLU*

In [4]:
class ResBlock(torch.nn.Module):
    """

    """
    def __init__(self, channels, stride):

        super(ResBlock, self).__init__()
 

    def forward(self, x):
        """

        :param x:
        :return:
        """


In [5]:
# Initialisez un ResBlock avec deux canaux et un stride de 1
rb = ResBlock(2, 1)

In [20]:
# Create 1 batch of 2 sample of dimension 10, 100 and test
data = torch.rand(1, 2, 10, 100)

## Sequential

Créer maintenant une classe *ResNet* en dérivant la classe module.
Ce réseau contient 1 couche d'entrée: convolutionnelle 2D 
    * dimension d'entree: = 2
    * dimension de sortie = 2
    * un noyau de taille 3
    * un stride égale a celui fournis en paramètre
    * padding = 1
    * aucun biais

suivie d'une couche de batch_norm et de 5 blocks *ResBlock(2, 1)* qui sont ajoutés dans un *torch.nn.Sequential*


```python 
class ResNet(torch.nn.Module):
    """

    """
    def __init__(self):
        super(ResNet, self).__init__()
        ...
        
    def forward(self, x):
        ...
```




In [36]:
class ResNet(torch.nn.Module):
    """

    """
    def __init__(self):
        super(ResNet, self).__init__()
        
    def forward(self, x):

In [37]:
# Instanciez un ResNet
nnet = ResNet()

In [38]:
output = nnet(data)

In [39]:
output.shape

torch.Size([1, 2, 10, 100])

# Managing models

During training: 
* Certain layers of neural networks (batchnorm, dropout) have different behaviours between train and eval so you need to switch from one mode to the other depending on the use.
* Neural Networks require the computation of gradients in order to back-propagate during the backward pass. This is **extremely coslty and not necessary when used in production** for inference. For this reason, you need to specify PyTorch that you don't want to compute the gradient.

`model.train()` tells your model that you are training. 

`model.eval()`tells your model you are in inference mode.

`torch.no_grad()` impacts the autograd engine and deactivates it. It will reduce memory usage and speed up computations but you won’t be able to backprop (which you don’t want in an eval script).

`with torch.no_grad():
    ... your code...
`

# Devices

Pytorch enables computation on CPU or GPU.

It is your responsability to decide where to put your data and models to optimize the computation.

Fortunately, this is very easy thanks to the PyTorch API.

An example of which is given below.

In [None]:
# Generate random data in the CPU memory

# Move this data to GPU

# Move it back to CPU

# Convert it to Numpy 

# In case your node has several GPUs, it is possible to decide which one to use.


# Optimizer

Training of a Neural Network requires to set up an optimizer.
Several optimization algorithms are available in PyTorch amongst which the most often use:

* SGD
* Adam
* ...


In [None]:
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

optimizer = optim.Adam([var1, var2], lr=0.0001)

How to use an optimizer?

In [None]:
for input, target in dataset:
    optimizer.zero_grad()            # reset the gradient
    output = model(input)            # Forward pass
    loss = loss_fn(output, target)   # compute the loss
    loss.backward()                  # backward pass
    optimizer.step()                 # keep track of the training and modify the optimizer accordingly

# Scheduler

The Scheduler is responsible to manage the evolution of Learning rate across epochs. It is a very sensitive element of the training process and should be carefully chosen depending on the data, and the type of model.

Learning rate scheduling should be applied after optimizer’s update (but not necessarily after each batch);
e.g., you should write your code this way:

In [None]:
model = [Parameter(torch.randn(2, 2, requires_grad=True))
optimizer = SGD(model, 0.1)                                 # set up the optimizer
scheduler = ExponentialLR(optimizer, gamma=0.9)             # Chose a scheduler

for epoch in range(20):                                     # training loop
    for input, target in dataset:                           # batch loop
        optimizer.zero_grad()            
        output = model(input)
        loss = loss_fn(output, target)
        loss.backward()
        optimizer.step()
    scheduler.step()                                        # update the learning rate according to the scehduler

# Parallel computation

There are several ways to parallelize the computation on GPUs:
* DataParallel: to split the data on several GPUs
* Distributed_Data_Parallel: to split the data on several GPUs and nodes
* Model_Parallel: to split the model on several GPUs and nodes

DataParallel is not recommended, even on one single node.

## Distributed_Data_Parallel

## Model_Parallel

# Datasets, Dataloaders et Datasamplers

Les DataSets sont les classes qui permettent de charger et préparer les exemples.

Les dataloaders permettent de gérer les batchs

Les DataSamplers permettent de définir précisément les données fournies au réseau à chaque époque (équilibrage des batchs, ordre...)

Expliquez ce que font les blocs suivant:


In [48]:
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from torch.utils.data import Sampler

In [54]:
class RandomDataset(Dataset):
    def __init__(self):
        """
        de nombreuses transformations existent qui permettent d'augmenter les données simplement
        """
        self.data = numpy.random.randn(128 * 10 * 100).reshape(128, 10, 100)
        self.labels = numpy.random.randint(0, 10, 128)

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        return self.data[idx], self.labels[idx]

In [55]:
training_set = RandomDataset()

training_loader = DataLoader(training_set,
                             batch_size=16,
                             shuffle=True,
                             drop_last=True,
                             pin_memory=True,
                             sampler=None,
                             num_workers=1,
                             persistent_workers=False,
                             worker_init_fn=42)

In [67]:
class BatchSampler(torch.utils.data.Sampler):
    """
    Data Sampler used to generate uniformly distributed batches
    """

    def __init__(self, batch_size):
        tmp = numpy.arange(128)
        numpy.random.shuffle(tmp)
        #self.order = tmp.reshape(batch_size,...)
        self.batch_size = batch_size
        self.order = tmp

    def __iter__(self):
        return iter(self.order)

    def __len__(self) -> int:
        return len(self.order)

    def set_epoch(self, epoch: int) -> None:
        self.epoch = epoch

In [69]:
sampler = BatchSampler(16)
training_loader = DataLoader(training_set,
                             batch_size=16,
                             shuffle=False,
                             drop_last=True,
                             pin_memory=True,
                             sampler=sampler,
                             num_workers=1,
                             persistent_workers=False,
                             worker_init_fn=42)