# Introduction to PyTorch and Poutyne

In this notebook, we train a simple fully-connected network and a simple convolutional network on MNIST. First, we train it by coding our own training loop as the PyTorch library expects of us to. Then, we use Poutyne to simplify our code.

In [None]:
# Import the package needed.
%matplotlib inline
import matplotlib.pyplot as plt

import math
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data.dataset import Subset

from torchvision import transforms, utils
from torchvision.datasets.mnist import MNIST

from poutyne.framework import Model, ModelCheckpoint, CSVLogger, Callback, Experiment
from poutyne import torch_to_numpy, set_seeds

In [None]:
# Set Pythons's, NumPy's and PyTorch's seeds so that our training are (almost) reproducible.
set_seeds(42)

# Basis of Training a Neural Network

In **stochastic gradient descent**, a **batch** of `m` examples are drawn from the train dataset. In the so-called forward pass, these examples are passed through the neural network and an average of their loss values is done. In the backward pass, the average loss is backpropagated through the network to compute the gradient of each parameter. In practice, the `m` examples of a batch are drawn without replacement. Thus, we define one **epoch** of training being the number of batches needed to loop through the entire training dataset.

In addition to the training dataset, a **validation dataset** is used to evaluate the neural network at the end of each epoch. This validation dataset can be used to select the best model during training and thus avoiding overfitting the training set. It also can have other uses such as selecting hyperparameters

Finally, a **test dataset** is used at the end to evaluate the final model.

## Training constants

In [None]:
# Train on GPU if one is present
cuda_device = 0
device = torch.device("cuda:%d" % cuda_device if torch.cuda.is_available() else "cpu")

# The dataset is split 80/20 for the train and validation datasets respectively.
train_split_percent = 0.8

# The MNIST dataset has 10 classes
num_classes = 10

# Training hyperparameters
batch_size = 32
learning_rate = 0.1
num_epochs = 5

## Loading the MNIST dataset

The following loads the MNIST dataset and creates the PyTorch DataLoaders that split our datasets into batches. The train DataLoader shuffles the examples of the train dataset to draw the examples without replacement.

In [None]:
train_dataset = MNIST('./mnist/', train=True, download=True, transform=transforms.ToTensor())
valid_dataset = MNIST('./mnist/', train=True, download=True, transform=transforms.ToTensor())
test_dataset = MNIST('./mnist/', train=False, download=True, transform=transforms.ToTensor())

num_data = len(train_dataset)
indices = list(range(num_data))
np.random.shuffle(indices)

split = math.floor(train_split_percent * num_data)

train_indices = indices[:split]
train_dataset = Subset(train_dataset, train_indices)

valid_indices = indices[split:]
valid_dataset = Subset(valid_dataset, valid_indices)

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=batch_size)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size)

loaders = train_loader, valid_loader, test_loader

Let's take a look at some examples of the dataset.

In [None]:
# Get the first batch in our train DataLoader and 
# format it in grid.
inputs = next(iter(train_loader))[0]
input_grid = utils.make_grid(inputs)

# Plot the images.
fig = plt.figure(figsize=(10, 10))
inp = input_grid.numpy().transpose((1, 2, 0))
plt.imshow(inp)

## Neural Network Architectures

We train a fully-connected neural network and a convolutional neural network with approximately the same number of parameters.

### Fully-connected Network
In short, the fully-connected network follows this architecture: ``Input -> [Linear -> ReLU]*3 -> Linear``. The following table shows it in details:

| Layer Type                  | Output Size |    # of Parameters   |
|-----------------------------|:-----------:|:--------------------:|
| Input                       |   1x28x28   |           0          |
| Flatten                     |  1\*28\*28  |           0          |
| **Linear with 256 neurons** |     256     | 28\*28*256 = 200,704 |
| ReLU                        |      *      |           0          |
| **Linear with 128 neurons** |     128     |   256*128 = 32,768   |
| ReLU                        |      *      |           0          |
| **Linear with 64 neurons**  |     64      |    128*64 = 8,192    |
| ReLU                        |      *      |           0          |
| **Linear with 10 neurons**  |      10     |     64*10 = 640      |

Total # of parameters of the fully-connected network: 242,304

### Convolutional Network

The convolutional neural network architecture starts with some convolution and max-pooling layers. These are then followed by fully-connected layers. We calculate the total number of parameters that the network needs. In short, the convolutional network follows this architecture: ``Input -> [Conv -> ReLU -> MaxPool]*2 -> Dropout -> Linear -> ReLU -> Dropout -> Linear``. The following table shows it in details:

| Layer Type                                     | Output Size |     # of Parameters     |
|------------------------------------------------|:-----------:|:-----------------------:|
| Input                                          |   1x28x28   |            0            |
| **Conv with 16 3x3 filters with padding of 1** |   16x28x28  |      16\*3\*3 = 144     |
| ReLU                                           |   16x28x28  |            0            |
| MaxPool 2x2                                    |   16x14x14  |            0            |
| **Conv with 32 3x3 filters with padding of 1** |   32x14x14  |      32\*3\*3 = 288     |
| ReLU                                           |   32x14x14  |            0            |
| MaxPool 2x2                                    |    32x7x7   |            0            |
| Dropout of 0.25                                |    32x7x7   |            0            |
| Flatten                                        |   32\*7\*7  |            0            |
| **Linear with 128 neurons**                    |     128     | 32\*7\*7\*128 = 200,704 |
| ReLU                                           |     128     |            0            |
| Dropout of 0.5                                 |     128     |            0            |
| **Linear with 10 neurons**                     |      10     |      128\*10 = 1280     |

Total # of parameters of the convolutional network: 202,416

In [None]:
def create_fully_connected_network():
    """
    This function should return the fully-connected network layed out above.
    """
    pass

def create_convolutional_network():
    """
    This function should return the convolutional network layed out above.
    """
    pass

# Training the PyTorch way 

That is, doing your own training loop.

In [None]:
def pytorch_accuracy(y_pred, y_true):
    """
    Computes the accuracy for a batch of predictions
    
    Args:
        y_pred (torch.Tensor): the logit predictions of the neural network.
        y_true (torch.Tensor): the ground truths.
        
    Returns:
        The average accuracy of the batch.
    """
    y_pred = y_pred.argmax(1)
    return (y_pred == y_true).float().mean() * 100

def pytorch_train_one_epoch(pytorch_network, optimizer, loss_function):
    """
    This function should train the neural network for one epoch on the train DataLoader.
    
    Args:
        pytorch_network (torch.nn.Module): The neural network to train.
        optimizer (torch.optim.Optimizer): The optimizer of the neural network
        loss_function: The loss function.
    
    Returns:
        A tuple (loss, accuracy) corresponding to an average of the losses and
        an average of the accuracy, respectively, on the train DataLoader.
    """
    pytorch_network.train(True)
    with torch.enable_grad():
        pass

def pytorch_test(pytorch_network, loader, loss_function):
    """
    This function should test the neural network on a DataLoader.
    
    Args:
        pytorch_network (torch.nn.Module): The neural network to test.
        loader (torch.utils.data.DataLoader): The DataLoader to test on.
        loss_function: The loss function.
    
    Returns:
        A tuple (loss, accuracy) corresponding to an average of the losses and
        an average of the accuracy, respectively, on the DataLoader.
    """
    pytorch_network.eval()
    with torch.no_grad():
        pass
        
    
def pytorch_train(pytorch_network):   
    """
    This function should transfer the neural network to the right device,
    train it for a certain number of epochs, test at each epoch on the 
    validation set and output the results on the test set at the end of
    training.
    
    Args:
        pytorch_network (torch.nn.Module): The neural network to train.
        
    Example:
        This function should display something like this:
        
        .. code-block:: python

            Epoch 1/5: loss: 0.5026924496193726, acc: 84.26666259765625, val_loss: 0.17258917854229608, val_acc: 94.75
            Epoch 2/5: loss: 0.13690324830015502, acc: 95.73332977294922, val_loss: 0.14024296019474666, val_acc: 95.68333435058594
            Epoch 3/5: loss: 0.08836929737279813, acc: 97.29582977294922, val_loss: 0.10380942322810491, val_acc: 96.66666412353516
            Epoch 4/5: loss: 0.06714504160980383, acc: 97.91874694824219, val_loss: 0.09626663728555043, val_acc: 97.18333435058594
            Epoch 5/5: loss: 0.05063822727650404, acc: 98.42708587646484, val_loss: 0.10017542181412378, val_acc: 96.95833587646484
            Test:
                Loss: 0.09501855444908142
                Accuracy: 97.12999725341797
    """
    print(pytorch_network)
    
    pass

In [None]:
fc_net = create_fully_connected_network()
pytorch_train(fc_net)

In [None]:
conv_net = create_convolutional_network()
pytorch_train(conv_net)

# Training the Poutyne way

That is, only 8 lines of code with a better output.

In [None]:
def poutyne_train(pytorch_network):
    """
    This function should create a Poutyne Model (see https://poutyne.org/model.html), 
    send the Model on the right device, and use the `fit_generator` method to train 
    the neural network. At the end, the `evaluate_generator` should be used on the test
    set.
    
    Args:
        pytorch_network (torch.nn.Module): The neural network to train.
    """
    print(pytorch_network)
    
    pass

In [None]:
fc_net = create_fully_connected_network()
poutyne_train(fc_net)

In [None]:
conv_net = create_convolutional_network()
poutyne_train(conv_net)

# Poutyne Callbacks

One nice feature of Poutyne is [callbacks](https://poutyne.org/callbacks.html). Callbacks allow to do actions during training of the neural network. In the following exercice, use 3 callbacks. One that saves the latest weights in a file to be able to continue the optimization at the end of training if more epochs are needed. Another one that saves the best weights according to the performance on the validation dataset. Finally, another one that saves the displayed logs into TSV file.

In [None]:
def train_with_callbacks(name, pytorch_network):
    """
    In addition to the the `poutyne_train`, this function should save checkpoints and logs as described above.

    Args:
        name (str): a name used to save logs and checkpoints.
        pytorch_network (torch.nn.Module): The neural network to train.
    """
    print(pytorch_network)
    
    pass

In [None]:
fc_net = create_fully_connected_network()
train_with_callbacks('fc', fc_net)

In [None]:
conv_net = create_convolutional_network()
train_with_callbacks('conv', conv_net)

# Making Your Own Callback

While Poutyne provides a great number of [predefined callbacks](https://poutyne.org/callbacks.html), it is sometimes useful to make your own callback. 

In the following exercice, we want to see the effect of temperature on the optimization of our neural network. To do so, you should either increase or decrease the temperature during the optimization. As one will see in the result, temperature either as no effect or has detrimental effect on the performance of the neural network. This is so because the temperature has for effect to artificially changing the learning rates. Since we have found the right learning rate, increasing or decreasing it shows no improvement on the results.

In [None]:
class CrossEntropyLossWithTemperature(nn.Module):
    """
    This loss module is the cross-entropy loss function
    with temperature. It divides the logits by a temperature
    value before computing the cross-entropy loss.
    
    Args:
        initial_temperature (float): The initial value of the temperature.
    """
    def __init__(self, initial_temperature):
        super().__init__()
        self.temperature = initial_temperature
        self.celoss = nn.CrossEntropyLoss()
        
    def forward(self, y_pred, y_true):
        pass

class TemperatureCallback(Callback):
    """
    This callback multiply the loss temperature with a decay before
    each batch.
    
    Args:
        celoss_with_temp (CrossEntropyLossWithTemperature): the loss module.
        decay (float): The value of the temperature decay.
    """
    def __init__(self, celoss_with_temp, decay):
        super().__init__()
        self.celoss_with_temp = celoss_with_temp
        self.decay = decay
    
    def on_batch_begin(self, batch, logs):
        pass

def train_with_temperature(pytorch_network, initial_temperature, temperature_decay):
    """
    In addition to the the `poutyne_train`, this function should use a cross-entropy
    loss with temperature and should decay the temperature at each batch.

    Args:
        pytorch_network (torch.nn.Module): The neural network to train.
        initial_temperature (float): The initial value of the temperature.
        decay (float): The value of the temperature decay.
    """
    print(pytorch_network)
    
    pass

In [None]:
conv_net = create_convolutional_network()
# Initial temperature = 0.1
# Final temperature ≈ 0.1 * 1.0008^7500 ≈ 40.25
train_with_temperature(conv_net, 
                       initial_temperature=0.1, 
                       temperature_decay=1.0008)

In [None]:
conv_net = create_convolutional_network()
# Initial temperature = 40.25
# Final temperature ≈ 40.25 * 0.9992^7500 ≈ 0.1
train_with_temperature(conv_net,
                       initial_temperature=4.25, 
                       temperature_decay=0.9995)

# Poutyne Experiment
Most of the time when using Poutyne (or even Pytorch in general), we will find ourselves in an iterative model hyperparameters finetuning loop. For efficient model search, we will usually wish to save our best performing models, their training and testing statistics and even sometimes wish to retrain an already trained model for further tuning. All of the above can be easily implemented with the flexibility of Poutyne Callbacks, but having to define and initialize each and every Callback object we wish for our model quickly feels cumbersome.

This is why Poutyne provides an [Experiment class](https://poutyne.org/experiment.html), which aims specifically at enabling quick model iteration search, while not sacrifying on the quality of a single experiment - statistics logging, best models saving, etc. Experiment is actually a simple wrapper between a Pytorch module and Poutyne's core Callback objects for logging and saving. Given a working directory where to output the various logging files and a Pytorch module, the Experiment class reduces the whole training loop to a single line.

The following code should use [Poutyne's Experiment class](https://poutyne.org/experiment.html) to train a network for 5 epochs. The code should be quite simpler than the code in the Poutyne Callbacks section while doing more (only 3 lines). Once trained for 5 epochs, it is then possible to resume the optimization at the 5th epoch for 5 more epochs until the 10th epoch using the same function.

In [None]:
def experiment_train(pytorch_network, working_directory, epochs=5):
    """
    This function should create a Poutyne Experiment, and use it to train the input
    module on the train loader and test its performance on the test loader.
    All training and testing statistics will be saved by the Experiment class, as well 
    as best model checkpoints.
    
    Args:
        pytorch_network (torch.nn.Module): The neural network to train.
        working_directory (str): The directory where to output files to save.
        epochs (int): The number of epochs. (Default: 5)
    """
    print(pytorch_network)

    pass

In [None]:
conv_net = create_convolutional_network()
experiment_train(conv_net, './conv_net_experiment')

In [None]:
conv_net = create_convolutional_network()
experiment_train(conv_net, './conv_net_experiment', epochs=10)