# 02 - Define, train and evaluate a basic Neural Network in Pytorch

#### Introductory note 

These tutorials are inspired by the book "[Deep Learning with PyTorch](https://pytorch.org/assets/deep-learning/Deep-Learning-with-PyTorch.pdf)" by Stevens et al. They can be seen as a summary of the part I of book regarding PyTorch itself. Normally, following the tutorials should be enough and reading the book is not required. But of course, if you are interested and curious you can try to follow the book while reading these tutorials. I tried to associate the most important part of these tutorials with their respective book sections. Some other parts of the tutorials have been done from scratch or inspired by the PyTorch documentation. If you have any questions, you can ask me (Natacha), it could help me improve these tutorials and / or help other students who are struggling as much as you are. 

These tutorials are a "bonus", they are not mandatory and are not graded (there is nothing to do anyway, just read and run). They are just here to help you if you are new to PyTorch and to help you save some time by not reading the book (or at least less intensively). 

In short: To understand deep learning concepts, the number one priority is Andrew's course. To understand PyTorch, the priority is the documentation (always), these tutorials and if it's still not enough, don't be afraid of trying to find good tutorials on the internet, there are plenty of them and you can share them with other students (and with us) if you find some really good ones.

## Contents

1. Loading data 

  1.1 Loading CIFAR-10  (see previous tutorial)
  1.2 From CIFAR-10 to CIFAR-2  

2. Basic building blocks for neural networks in PyTorch  

  2.1 The 'torch.nn' module and the 'torch.nn.Module' class  
  2.2 Our network as a nn.Sequential object  
  2.3 Inspecting a module object

3. Training our model

  3.1 Training on CPU  
  3.2 Training on GPU 

  

In [1]:
import numpy as np
import collections
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
from torchvision import datasets, transforms
import datetime

torch.manual_seed(123)

<torch._C.Generator at 0x7f4f2c09d710>

## 1. Loading data

### 1.1 Loading CIFAR-10  (see previous tutorial)

In [2]:
# Where to find the data or where to download the data if not found
data_path = 'data/'

# Instantiates a dataset for the training data and downloads the data if it is not present
cifar10_train = datasets.CIFAR10(
    data_path,       # location from which the data will be downloaded
    train=True,      # says whether we’re interested in the training set or the validation set
    download=True,   # says whether we allow PyTorch to download the data if not found in 'data_path'
    transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.4915, 0.4823, 0.4468),
                             (0.2470, 0.2435, 0.2616))
    ]))

print('Size of the training dataset: ', len(cifar10_train))

cifar10_val = datasets.CIFAR10(
    data_path, 
    train=False,
    download=True,
    transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.4915, 0.4823, 0.4468),
                             (0.2470, 0.2435, 0.2616))
    ]))
print('Size of the validation dataset: ', len(cifar10_val))

Files already downloaded and verified
Size of the training dataset:  50000
Files already downloaded and verified
Size of the validation dataset:  10000


### 1.2 From CIFAR-10 to CIFAR-2

We define a lighter version of CIFAR-10, which is now CIFAR-2

In [3]:
label_map = {0: 0, 2: 1}
class_names = ['airplane', 'bird']

cifar2_train = [(img, label_map[label])
          for img, label in cifar10_train
          if label in [0, 2]]
print('Size of the training dataset: ', len(cifar2_train))

cifar2_val = [(img, label_map[label])
              for img, label in cifar10_val
              if label in [0, 2]]

print('Size of the validation dataset: ', len(cifar2_val))

Size of the training dataset:  10000
Size of the validation dataset:  2000


## 2. Basic building blocks for neural networks in PyTorch 

### 2.1 The 'torch.nn' module and the 'torch.nn.Module' class

In Pytorch, the basic building blocks for neural networks are available in the [torch.nn](https://pytorch.org/docs/stable/nn.html) module (often imported as 'nn')

Then the base class for all the basic components of a neural network is the [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html)

For example:

- the [nn.ReLU](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html#torch.nn.ReLU) activation fonction is a subclass of the [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) class
- the 1D convolutional layer [nn.Conv1d](https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html#torch.nn.Conv1d) is a subclass of the [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) class
- the MSE loss function [nn.MSELoss](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss) is a subclass of the [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) class
- the distance function [nn.PairwiseDistance ](https://pytorch.org/docs/stable/generated/torch.nn.PairwiseDistance.html#torch.nn.PairwiseDistance) is a subclass of the [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) class
- the container [nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html#torch.nn.Sequential) (will see in the next cell what it is exactly)  is also a subclass of the [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) class

Exception: 

- [nn.Parameter](https://pytorch.org/docs/stable/generated/torch.nn.parameter.Parameter.html) is not a subclass of [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) but of [torch.Tensor](https://pytorch.org/docs/stable/tensors.html#torch.Tensor) instead (the other extremely important class in PyTorch)

So in short, almost everything in torch.nn can be seen as a nn.Module in PyTorch :) 

In [4]:
print("Things implemented in nn.module inherit from the nn.Module class")
print(issubclass(nn.ReLU, nn.Module))
print(issubclass(nn.Conv1d, nn.Module))
print(issubclass(nn.MSELoss, nn.Module))
print(issubclass(nn.PairwiseDistance, nn.Module))
print(issubclass(nn.Sequential, nn.Module))
print("nn.Parameter is not a subclass of nn.Module but of torch.Tensor instead")
print(issubclass(nn.Parameter, nn.Module))
print(issubclass(nn.Parameter, torch.Tensor))
print("Things implemented outside nn module don't inherit from the nn.Module class")
print(issubclass(torchvision.transforms.Resize, nn.Module))
print(issubclass(torch.Tensor, nn.Module))

Things implemented in nn.module inherit from the nn.Module class
True
True
True
True
True
nn.Parameter is not a subclass of nn.Module but of torch.Tensor instead
False
True
Things implemented outside nn module don't inherit from the nn.Module class
False
False




### 2.2 Our network as a nn.Sequential object

*(inspired by 6.3. Finally a neural network)*

Now what is this 
*[nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html#torch.nn.Sequential) container* thing? 
Well [nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html#torch.nn.Sequential) provides a simple way to concatenate [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) objects

In [5]:
n_in = 32*32*3   # Determined by our dataset: 32x32 RGB images
n_hidden1 = 256  # Choose whatever you want here, often powers of 2
n_hidden2 = 64
n_out = 2        # Determined by our number of classes, so 2: birds and planes

model_seq = nn.Sequential(
    # Flatten is required in our case (and in many cases) because
    # our input are (32x32x3) dimensional and nn layers 
    # expect 1D inputs
    nn.Flatten(),                    
    nn.Linear(n_in, n_hidden1),
    nn.Tanh(),                       # Choose any activation function available in the nn module
    nn.Linear(n_hidden1, n_hidden2), # Choose any layer available in the nn module
    nn.ReLU(),
    nn.Linear(n_hidden2, n_out),
    )

#### Feeding one image to our custom neural network

In [6]:
batch_t = torch.unsqueeze(cifar2_train[0][0], 0)
out = model_seq(batch_t)
print(batch_t)
print(out)   # out values are just rubbish since the nn is not trained yet!

tensor([[[[ 0.6139, -0.3228, -0.1164,  ..., -0.2593, -0.2752, -0.5451],
          [ 0.6615, -0.1482, -0.8467,  ..., -0.3228, -0.3228, -0.5768],
          [ 0.2329,  0.2646, -0.1005,  ..., -0.3387, -0.6562, -0.7515],
          ...,
          [ 0.2170,  0.2646,  0.1535,  ..., -0.5768, -0.4498,  0.0106],
          [ 0.5980,  0.4393,  0.3281,  ..., -0.6404, -0.4340,  0.0265],
          [ 0.9156,  0.8044,  0.4551,  ..., -0.4975, -0.5451, -0.0529]],

         [[ 1.3369,  0.2740,  0.4028,  ...,  0.3867,  0.3867,  0.0968],
          [ 1.4497,  0.5961, -0.2253,  ...,  0.3062,  0.3062,  0.0646],
          [ 1.0954,  1.1276,  0.6444,  ...,  0.2579, -0.0481, -0.1286],
          ...,
          [ 0.4028,  0.5156,  0.5317,  ...,  0.1774,  0.4028,  0.8538],
          [ 0.5478,  0.6605,  0.6605,  ...,  0.1130,  0.4028,  0.8860],
          [ 0.4834,  0.9504,  0.4995,  ...,  0.1774,  0.1613,  0.7572]],

         [[-0.4487, -0.7935, -0.1939,  ..., -0.6136, -0.6736, -0.8535],
          [-0.4487, -0.9734, -

### 2.3 Inspecting a module object

We saw earlier that the [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) is an essential part of the PyTorch library and that it is the base class for all the basic components of a neural network.
The fact that so many PyTorch objects inherit from this class has many advantages. One oh them is that they share many important methods such as:

- [forward](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.forward) Defines the computation performed at every call. **Should be overridden by all subclasses** (We'll see that later)
- [modules](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.modules): Returns an iterator over all modules in the network.
- [named\_modules](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.named_modules): Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
- [parameters](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.parameters): Returns an iterator over module parameters (the so called [nn.Parameter](https://pytorch.org/docs/stable/generated/torch.nn.parameter.Parameter.html), the subclass of Tensor)
- [named\_parameters](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.named_parameters): Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
- [requires\_grad\_](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.requires_grad_) Change if autograd should record operations on parameters in this module. This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).
- [state_dict](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.state_dict): Returns a dictionary containing a whole state of the module
- [load\_state\_dict](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.load_state_dict): Copies parameters and buffers from state_dict into this module and its descendants
- [to](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.to) Moves and/or casts the parameters and buffers (typically to a GPU or CPU)
- [cpu](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.cpu) / [cuda](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.cuda): Moves all model parameters and buffers to the CPU / GPU
- [train](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.train) / [eval](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.eval): Sets the module in training/evaluation mode
- [zero_grad](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.zero_grad): Sets gradients of all model parameters to zero.

We will use most of these method in this tutorial already, and let's start with the ones returning parameters / modules.

[torch.numel](https://pytorch.org/docs/stable/generated/torch.numel.html#torch.numel) returns the total number of elements in a given tensor

In [7]:
print("Inspecting parameters")
for p in model_seq.named_parameters():
    print("name: ", p[0], "   length: ", p[1].numel())
print([p.numel() for p in model_seq.parameters() if p.requires_grad == True])
print("Total number of parameters: ", sum([p.numel() for p in model_seq.parameters()]))

print("\nInspecting modules")
for m in model_seq.named_modules():
    print(m)


Inspecting parameters
name:  1.weight    length:  786432
name:  1.bias    length:  256
name:  3.weight    length:  16384
name:  3.bias    length:  64
name:  5.weight    length:  128
name:  5.bias    length:  2
[786432, 256, 16384, 64, 128, 2]
Total number of parameters:  803266

Inspecting modules
('', Sequential(
  (0): Flatten()
  (1): Linear(in_features=3072, out_features=256, bias=True)
  (2): Tanh()
  (3): Linear(in_features=256, out_features=64, bias=True)
  (4): ReLU()
  (5): Linear(in_features=64, out_features=2, bias=True)
))
('0', Flatten())
('1', Linear(in_features=3072, out_features=256, bias=True))
('2', Tanh())
('3', Linear(in_features=256, out_features=64, bias=True))
('4', ReLU())
('5', Linear(in_features=64, out_features=2, bias=True))


## 3 Training our model

### 3.1 Training on CPU

#### Defining the training loop 

*(inspired by 8.4 Training our convnet)*

In [8]:
def training_loop(n_epochs, optimizer, model, loss_fn, train_loader):
    for epoch in range(1, n_epochs + 1): 
        loss_train = 0

        # Loop over our dataset in the batches the data loader creates for us
        for imgs, labels in train_loader:
            
            # Feed a batch through our model
            outputs = model(imgs)
            
            # Compute the loss we wish to minimize
            loss = loss_fn(outputs, labels) 
            
            # Get rid of the gradients from the last round
            optimizer.zero_grad() 
            
            # Perform the backward step. That is, compute the gradients of all parameters we want the network to learn
            loss.backward()
            
            # Updates the model
            optimizer.step() 

            # Sums the losses we saw over the epoch.
            # Recall that it is important to transform the loss to a Python number with .item()
            loss_train += loss.item() 

        if epoch == 1 or epoch % 10 == 0:
            print('{} Epoch {}, Training loss {}'.format(
                datetime.datetime.now(), epoch,
                loss_train / len(train_loader))) 

#### Train the model using the training loop

On my computer training 10 epochs takes 20 secondes (we will compare this with the gpu version of the training loop)

In [9]:
# The DataLoader batches up the examples of our cifar2 dataset
# Here we use shuffle = True to shuffle the dataset for the training
train_loader = torch.utils.data.DataLoader(cifar2_train, batch_size=64, shuffle=True) 

# Instantiate the optimizer, here:
# 1. Stochastic Gradient Descent optimizer, 
# 2. that has to be applied to our parameters (model.parameters())
# 3. With a learning rate of 1e-2
optimizer = optim.SGD(model_seq.parameters(), lr=1e-2)

# Instantiate the loss function (here we use cross entropy)
loss_fn = nn.CrossEntropyLoss()

# Now all we have to do is calling the training loop
# WARNING THIS MIGHT BE EXTREMELY SLOW. STOP YOUR KERNEL TO STOP THE TRAINING
# DON'T TRY TO LET IT FINISH WE WILL SEE HOW TO USE GPUs AT THE END OF THIS NOTEBOOK
training_loop(
    n_epochs = 21,
    optimizer = optimizer,
    model = model_seq,
    loss_fn = loss_fn,
    train_loader = train_loader,
)

2021-02-08 10:54:53.259972 Epoch 1, Training loss 0.5393446640224214
2021-02-08 10:55:04.230352 Epoch 10, Training loss 0.33766555340047094
2021-02-08 10:55:14.297124 Epoch 20, Training loss 0.23722393321953003


#### Measuring accuracy

In [10]:
# Here we use shuffle = False
# Because it is easier to check the predictions made.
train_loader = torch.utils.data.DataLoader(cifar2_train, batch_size=64, shuffle=False)
val_loader = torch.utils.data.DataLoader(cifar2_val, batch_size=64, shuffle=False)

def validate(model, train_loader, val_loader):
    for name, loader in [("train", train_loader), ("val", val_loader)]:
        correct = 0
        total = 0

        # We do not want gradients here, as we will not want to update the parameters.
        with torch.no_grad(): 
            for imgs, labels in loader:
                outputs = model(imgs)
                _, predicted = torch.max(outputs, dim=1) 
                total += labels.shape[0] 
                correct += int((predicted == labels).sum()) 

        print("Accuracy {}: {:.2f}".format(name , correct / total))

validate(model_seq, train_loader, val_loader)

Accuracy train: 0.92
Accuracy val: 0.85


### 3.2 Training on GPU

*(Inspired by 8.4.3 Training on the GPU)*

#### Check if a GPU is available


In [11]:
device = (torch.device('cuda') if torch.cuda.is_available()
          else torch.device('cpu'))
print(f"Training on device {device}.")

Training on device cuda.


#### Defining the training loop 

In [12]:
def training_loop_on_gpu(n_epochs, optimizer, model, loss_fn, train_loader):
    for epoch in range(1, n_epochs + 1):
        loss_train = 0.0
        for imgs, labels in train_loader:
            # These two lines following lines are what differs from 
            # our previous training_loop function.
            # They move imgs and labels to the device we are training
            # on (gpu if available, cpu otherwise)
            imgs = imgs.to(device=device) 
            labels = labels.to(device=device)

            outputs = model(imgs)
            loss = loss_fn(outputs, labels)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            loss_train += loss.item()

        if epoch == 1 or epoch % 10 == 0:
            print('{} Epoch {}, Training loss {}'.format(
                datetime.datetime.now(), epoch,
                loss_train / len(train_loader)))

#### Train the model using the training loop

On my computer training 10 epochs takes now only 3 seconds, so it's around 7 times faster than using the cpu! :) 

In [13]:
# Again shuffle = True for the training phase
train_loader = torch.utils.data.DataLoader(cifar2_train, batch_size=64, shuffle=True)

# Moves our model (all parameters) to the GPU. If 
# you forget to move either the model or the inputs to the
# GPU, you will get errors about tensors not being on the same
# device, because the PyTorch operators do not support
# mixing GPU and CPU inputs.
model_seq.to(device=device) 
optimizer = optim.SGD(model_seq.parameters(), lr=1e-2)
loss_fn = nn.CrossEntropyLoss()


# WARNING THIS IS SUPPOSED TO MUCH MUCH FASTER THAN PREVIOUSLY BUT IT MIGHT STILL
# TAKE A WHILE IF:
#  - YOUR GPU IS NOT AVAILABLE
#  - YOUR GPU IS NOT THE BEST GPU EVER. (Trying not to hurt your GPU's feeling here :) )
# AGAIN STOP YOUR KERNEL IF IT'S TOO SLOW 
training_loop_on_gpu(
    n_epochs = 100,
    optimizer = optimizer,
    model = model_seq,
    loss_fn = loss_fn,
    train_loader = train_loader,
)

2021-02-08 10:55:18.710844 Epoch 1, Training loss 0.21863427715506523
2021-02-08 10:55:21.117956 Epoch 10, Training loss 0.13799390324931235
2021-02-08 10:55:23.789728 Epoch 20, Training loss 0.07704896918812375
2021-02-08 10:55:26.571278 Epoch 30, Training loss 0.05107200264361254
2021-02-08 10:55:29.385551 Epoch 40, Training loss 0.021427364219098712
2021-02-08 10:55:32.053173 Epoch 50, Training loss 0.013399796317195057
2021-02-08 10:55:34.861965 Epoch 60, Training loss 0.00854919287880325
2021-02-08 10:55:38.069740 Epoch 70, Training loss 0.005819924617672612
2021-02-08 10:55:41.179055 Epoch 80, Training loss 0.004303469106104154
2021-02-08 10:55:44.255177 Epoch 90, Training loss 0.003382411909995565
2021-02-08 10:55:47.308105 Epoch 100, Training loss 0.0028218572935575894


#### Measuring accuracy

In [14]:
# Again shuffle = False for the validation phase
train_loader = torch.utils.data.DataLoader(cifar2_train, batch_size=64, shuffle=False)
val_loader = torch.utils.data.DataLoader(cifar2_val, batch_size=64, shuffle=False)
all_acc_dict = collections.OrderedDict()

def validate_on_gpu(model, train_loader, val_loader):
    accdict = {}
    for name, loader in [("train", train_loader), ("val", val_loader)]:
        correct = 0
        total = 0

        with torch.no_grad():
            for imgs, labels in loader:
                # These two lines following lines are what differs from 
                # our previous validate function.
                # They move imgs and labels to the device we are predicting
                # on (gpu if available, cpu otherwise)
                imgs = imgs.to(device=device)
                labels = labels.to(device=device)

                outputs = model(imgs)
                _, predicted = torch.max(outputs, dim=1)
                total += labels.shape[0]
                correct += int((predicted == labels).sum())

        print("Accuracy {}: {:.2f}".format(name , correct / total))
        accdict[name] = correct / total
    return accdict

all_acc_dict["baseline"] = validate_on_gpu(model_seq, train_loader, val_loader)

Accuracy train: 1.00
Accuracy val: 0.85
