# 03 - Define a custom deep Neural Network in Pytorch

#### Introductory note 

These tutorials are inspired by the book "[Deep Learning with PyTorch](https://pytorch.org/assets/deep-learning/Deep-Learning-with-PyTorch.pdf)" by Stevens et al. They can be seen as a summary of the part I of book regarding PyTorch itself. Normally, following the tutorials should be enough and reading the book is not required. But of course, if you are interested and curious you can try to follow the book while reading these tutorials. I tried to associate the most important part of these tutorials with their respective book sections. Some other parts of the tutorials have been done from scratch or inspired by the PyTorch documentation. If you have any questions, you can ask me (Natacha), it could help me improve these tutorials and / or help other students who are struggling as much as you are. 

These tutorials are a "bonus", they are not mandatory and are not graded (there is nothing to do anyway, just read and run). They are just here to help you if you are new to PyTorch and to help you save some time by not reading the book (or at least less intensively). 

In short: To understand deep learning concepts, the number one priority is Andrew's course. To understand PyTorch, the priority is the documentation (always), these tutorials and if it's still not enough, don't be afraid of trying to find good tutorials on the internet, there are plenty of them and you can share them with other students (and with us) if you find some really good ones.

## Contents 

1. Loading data, training loop and validation loop (see previous tutorial)
2. Define a simple custom neural network

  2.1 Naive (but totally ok) method  
  2.2 Figuring out input and output shapes  
  2.3 Using the functional API  
  2.4 Train our custom network (as any other model)  
  2.5 Measuring accuracy (as any other model)  

3. Going deeper: defining blocks of layers

  3.1 Using nn.Sequential  
  3.2 Using a subclass of nn.Module  

  
  

In [1]:
import numpy as np
import collections
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
from torchvision import datasets, transforms
import datetime

torch.manual_seed(123)

<torch._C.Generator at 0x7f94582d26d0>

## 1. Loading data, training loop and validation loop (see previous tutorial)

#### Loading CIFAR-10

In [2]:
# Where to find the data or where to download the data if not found
data_path = 'data/'

# Instantiates a dataset for the training data and downloads the data if it is not present
cifar10_train = datasets.CIFAR10(
    data_path,       # location from which the data will be downloaded
    train=True,      # says whether we’re interested in the training set or the validation set
    download=True,   # says whether we allow PyTorch to download the data if not found in 'data_path'
    transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.4915, 0.4823, 0.4468),
                             (0.2470, 0.2435, 0.2616))
    ]))

print('Size of the training dataset: ', len(cifar10_train))

cifar10_val = datasets.CIFAR10(
    data_path, 
    train=False,
    download=True,
    transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.4915, 0.4823, 0.4468),
                             (0.2470, 0.2435, 0.2616))
    ]))
print('Size of the validation dataset: ', len(cifar10_val))

Files already downloaded and verified
Size of the training dataset:  50000
Files already downloaded and verified
Size of the validation dataset:  10000


#### From CIFAR-10 to CIFAR-2

We define a lighter version of CIFAR-10, which is now CIFAR-2

In [3]:
label_map = {0: 0, 2: 1}
class_names = ['airplane', 'bird']

cifar2_train = [(img, label_map[label])
          for img, label in cifar10_train
          if label in [0, 2]]
print('Size of the training dataset: ', len(cifar2_train))

cifar2_val = [(img, label_map[label])
              for img, label in cifar10_val
              if label in [0, 2]]

print('Size of the validation dataset: ', len(cifar2_val))

Size of the training dataset:  10000
Size of the validation dataset:  2000


#### Training loop and validation loop on GPU

In [4]:
def training_loop_on_gpu(n_epochs, optimizer, model, loss_fn, train_loader):
    for epoch in range(1, n_epochs + 1):
        loss_train = 0.0
        for imgs, labels in train_loader:
            imgs = imgs.to(device=device) 
            labels = labels.to(device=device)

            outputs = model(imgs)
            loss = loss_fn(outputs, labels)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            loss_train += loss.item()

        if epoch == 1 or epoch % 10 == 0:
            print('{} Epoch {}, Training loss {}'.format(
                datetime.datetime.now(), epoch,
                loss_train / len(train_loader)))

def validate_on_gpu(model, train_loader, val_loader):
    accdict = {}
    for name, loader in [("train", train_loader), ("val", val_loader)]:
        correct = 0
        total = 0

        with torch.no_grad():
            for imgs, labels in loader:
                imgs = imgs.to(device=device)
                labels = labels.to(device=device)

                outputs = model(imgs)
                _, predicted = torch.max(outputs, dim=1)
                total += labels.shape[0]
                correct += int((predicted == labels).sum())

        print("Accuracy {}: {:.2f}".format(name , correct / total))
        accdict[name] = correct / total
    return accdict

## 2. Define a simple custom neural network

### 2.1 Naive (but totally ok) method

*(Inspired by 8.3.1 Our network as subclass of an nn.Module)*

We saw earlier how to define a neural network using [nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html#torch.nn.Sequential). This solution is simple and convenient but might suffer from a lack of flexibility. In order to take advantage of Pytorch's flexibility we need to define our own [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). 

Since most of the basic building blocks for neural networks are [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) in Pytorch, we will proceed in a similar way if we want to define a custom layer, block of layers, neural network, activation function, loss function etc. etc. It will always start by subclassing the [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) class. 

Let's start with a custom neural network!

In order to subclass nn.Module, at a minimum we need to define a forward function that takes the inputs to the module and returns the output. This is where we define our module’s computation. With PyTorch, if we use standard torch operations, autograd will take care of the backward pass automatically.

In [5]:
class MyNet(nn.Module):
    def __init__(self):
        super().__init__()  # to inherit the '__init__' method from the 'nn.Module' class
        # Add whatever you want here (e.g layers and activation functions)
        # The order and names don't matter here but it is easier to understand
        # if you go for Layer1, activation fun, layer2, fun2, etc
        # Some conventions:
        # - conv stands for convolution
        # - pool for pooling
        # - fc for fully connected

        # A few comments about the shapes are coming in the next cell
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, padding=1)  
        self.act1 = nn.Tanh()
        self.pool1 = nn.MaxPool2d(2)
        self.conv2 = nn.Conv2d(16, 8, kernel_size=3, padding=1)
        self.act2 = nn.Tanh()
        self.pool2 = nn.MaxPool2d(2)
        self.fc1 = nn.Linear(8 * 8 * 8, 32)
        self.act3 = nn.Tanh()
        self.fc2 = nn.Linear(32, 2)

    # Remember, we saw earlier that `forward` defines the 
    # computation performed at every call and that it
    # should be overridden by all subclasses.
    # So here we go, overidding the forward method!
    def forward(self, x):
        # Now the order matters! 
        # This function defines the forward pass of your neural network.
        out = self.pool1(self.act1(self.conv1(x)))
        out = self.pool2(self.act2(self.conv2(out)))
        out = out.view(-1, 8 * 8 * 8) # This reshape operation was not possible when using directly nn.Sequential
        out = self.act3(self.fc1(out))
        out = self.fc2(out)
        return out

### 2.2 Figuring out input and output shapes

**see Andrew Ng's videos about convolution and pooling for detailed info (especially from C4W1L02 to C4W1L11) **

**[Convolution](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#conv2d)**

Let's recall that we are dealing with 32x32 RGB images and let's take a closer look at the following line: 

``nn.Conv2d(in_channels=3, out_channels16, kernel_size=3, padding=1)  ``

- ``in_channels=3`` because we have 3 channels in our data (RGB). You are not free to choose whatever you want here
- ``out_channels16`` You can put whatever you want here. Andrew refers to this number as the number of filter.
- ``kernel_size=3`` You can put whatever you want here also. It will not really affect the shape of the output
- ``stride=1`` (default value). THIS WILL AFFECT THE SHAPE OF YOUR OUTPUT, the greater the stride, the smaller your output image gets.
- ``padding=1`` THIS WILL AFFECT THE SHAPE OF YOUR OUTPUT. If you choose ``padding=0`` your image will get smaller (30x30 instead of 32x32 if ``stride=1``) In order to keep the same dimension you must choose ``padding=1`` if ``stride=1``

Therefore here the input shape is $(1, 3, 32, 32)$ and the output shape is $(1, 16, 32, 32)$


**[MaxPool](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html#maxpool2d)**

- ``kernel_size=2`` You can put whatever you want here. It will not really affect the shape of the output
- ``stride=kernel_size`` (default value) THIS WILL AFFECT THE SHAPE OF YOUR OUTPUT. By default ``stride=kernel_size``, as a consequence, your image size is divide by ``kernel_size``. 

Therefore here the input shape is $(1, 16, 32, 32)$ and the output shape is $(1, 16, 16, 16)$



In [6]:
model = MyNet()

numel_list = [p.numel() for p in model.parameters()]
print("Total number of parameters: ", sum(numel_list))
print("Number of parameter per layer: ", numel_list)

img, _ = cifar2_train[0]
output_tensor = model(img.unsqueeze(0))
print("Output: \n", output_tensor)

Total number of parameters:  18090
Number of parameter per layer:  [432, 16, 1152, 8, 16384, 32, 64, 2]
Output: 
 tensor([[0.0908, 0.0938]], grad_fn=<AddmmBackward>)


### 2.3 Using the functional API
*(Inspired by 8.3.3 The functional API)*

We could write a more concise - but equivalent - definition of our custom network. Many things are automatically managed when using already defined [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) objects. For instance we don't need to specify the convolution operation nor what the parameters that need to be trained are nor how to train (update) them. Now some of the operations used above are let's say simpler than others. Indeed, nn.Linear and nn.Conv2d automatically instanciate trainable parameters (see [nn.parameter.Parameter](https://pytorch.org/docs/stable/generated/torch.nn.parameter.Parameter.html?highlight=parameter#torch.nn.parameter.Parameter)), link them to the network, tell the network how to do the operations, how to derive them, etc. But the nn.MaxPool2d has no associated trainable parameters and the same holds for activation functions. Modules (eg layers or activation functions) that do not generate trainable parameters can be more concisely used in Pytorch using [nn.functional](https://pytorch.org/docs/stable/nn.functional.html#torch-nn-functional) (often imported as ``F``) 

For example, the functional counterpart of [nn.MaxPool2d](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html#torch.nn.MaxPool2d) is [nn.functional.max_pool2d](https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.max_pool2d) (often imported as ``F.max_pool2d``). And the functional counterpart of [nn.ReLU](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html?highlight=relu#torch.nn.ReLU) is [relu](https://pytorch.org/docs/stable/nn.functional.html?highlight=relu#torch.nn.functional.relu) (often imported as ``F.relu``). Since ``tanh`` is a generic math function and not only used as an activation function, the counterpart of [nn.Tanh](https://pytorch.org/docs/stable/generated/torch.nn.Tanh.html#torch.nn.Tanh) is directly implemented at [torch.tanh](https://pytorch.org/docs/stable/generated/torch.tanh.html?highlight=tanh#torch.tanh)

We need to keep using [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) for nn.Linear and nn.Conv2d so that our custom net will be able to manage their Parameters during training. However, we can safely switch to the functional counterparts of pooling and activation, since they have no trainable parameters. 

This is a lot more concise than and fully equivalent to our previous definition of CustomNet

Whether to use the [functional]((https://pytorch.org/docs/stable/nn.functional.html#torch-nn-functional)) or the [modular](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) API regarding operations without trainable parameters is a decision based on style and taste. When part of a network is so simple that we want to use nn.Sequential , we're in the modular realm. When we are writing our own forwards, it may be more natural to use the functional interface for things that do not need state in the form of parameters.



In [7]:
class MyNetBis(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(16, 8, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(8 * 8 * 8, 32)
        self.fc2 = nn.Linear(32, 2)
        
    def forward(self, x):
        out = F.max_pool2d(torch.tanh(self.conv1(x)), 2)
        out = F.max_pool2d(torch.tanh(self.conv2(out)), 2)
        out = out.view(-1, 8 * 8 * 8)
        out = torch.tanh(self.fc1(out))
        out = self.fc2(out)
        return out

In [8]:
img, _ = cifar2_train[0]
model = MyNetBis()
output_tensor = model(img.unsqueeze(0))
print(output_tensor)

tensor([[ 0.0746, -0.0411]], grad_fn=<AddmmBackward>)


### 2.4 Train our custom network (as any other model)

On my computer training 10 epochs takes 7 seconds

In [9]:
device = (torch.device('cuda') if torch.cuda.is_available()
          else torch.device('cpu'))
print(f"Training on device {device}.")

train_loader = torch.utils.data.DataLoader(cifar2_train, batch_size=64, shuffle=True)
model = MyNetBis().to(device=device) 
optimizer = optim.SGD(model.parameters(), lr=1e-2)
loss_fn = nn.CrossEntropyLoss()


# WARNING THIS IS SUPPOSED TO MUCH MUCH FASTER THAN PREVIOUSLY BUT IT MIGHT STILL
# TAKE A WHILE IF:
#  - YOUR GPU IS NOT AVAILABLE
#  - YOUR GPU IS NOT THE BEST GPU EVER. (Trying not to hurt your GPU's feeling here :) )
# AGAIN STOP YOUR KERNEL IF IT'S TOO SLOW 
training_loop_on_gpu(
    n_epochs = 21,
    optimizer = optimizer,
    model = model,
    loss_fn = loss_fn,
    train_loader = train_loader,
)

Training on device cuda.
2021-02-08 10:57:10.152071 Epoch 1, Training loss 0.5686810261504666
2021-02-08 10:57:15.922905 Epoch 10, Training loss 0.33619848273362324
2021-02-08 10:57:22.285335 Epoch 20, Training loss 0.298646777773359


### 2.5 Measuring accuracy (as any other model)

In [10]:
train_loader = torch.utils.data.DataLoader(cifar2_train, batch_size=64, shuffle=False)
val_loader = torch.utils.data.DataLoader(cifar2_val, batch_size=64, shuffle=False)
all_acc_dict = collections.OrderedDict()

all_acc_dict["baseline"] = validate_on_gpu(model, train_loader, val_loader)

Accuracy train: 0.87
Accuracy val: 0.87


## 3. Going deeper: defining blocks of layers 

### 3.1 Using nn.Sequential

In [11]:
# output shape of conv1 will be: (1, 8, 32, 32)
# We will use a Pooling with stride = 4 in the forward function 
# so n_in of fcblock must be: 8*(32/4)*(32/4)
# Also n_in and n_out in nn.Linear must be consistent with each other so (n_in = n_out)

class MyDeepNN(nn.Module):
    def __init__(self, n_blocks=10):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 8, kernel_size=3, padding=1)
        n_in_out = 8*8*8
        self.fcblock = nn.Sequential(
            *[nn.ReLU( nn.Linear(n_in_out, n_in_out) ) for _ in range(n_blocks)]
        )
        self.fc1 = nn.Linear(n_in_out, 2)
        
    def forward(self, x):
        out = F.max_pool2d(torch.tanh(self.conv1(x)), 4)
        out = out.view(-1, 8 * 8 * 8)
        out = self.fcblock(out)
        out = F.relu(self.fc1(out))
        return out



In [12]:
img, _ = cifar2_train[0]
model = MyDeepNN(n_blocks=10)
output_tensor = model(img.unsqueeze(0))
print(output_tensor)

numel_list = [p.numel() for p in model.parameters()]
print("\nTotal number of parameters: ", sum(numel_list))
print("Number of layers: ", len(numel_list))
print("Number of parameter per layer: ", numel_list)

print("\n", model)

tensor([[0.7003, 0.2633]], grad_fn=<ReluBackward0>)

Total number of parameters:  2627810
Number of layers:  24
Number of parameter per layer:  [216, 8, 262144, 512, 262144, 512, 262144, 512, 262144, 512, 262144, 512, 262144, 512, 262144, 512, 262144, 512, 262144, 512, 262144, 512, 1024, 2]

 MyDeepNN(
  (conv1): Conv2d(3, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (fcblock): Sequential(
    (0): ReLU(
      inplace=True
      (inplace): Linear(in_features=512, out_features=512, bias=True)
    )
    (1): ReLU(
      inplace=True
      (inplace): Linear(in_features=512, out_features=512, bias=True)
    )
    (2): ReLU(
      inplace=True
      (inplace): Linear(in_features=512, out_features=512, bias=True)
    )
    (3): ReLU(
      inplace=True
      (inplace): Linear(in_features=512, out_features=512, bias=True)
    )
    (4): ReLU(
      inplace=True
      (inplace): Linear(in_features=512, out_features=512, bias=True)
    )
    (5): ReLU(
      inplace=True
      (inpla

### 3.2 Using a subclass of nn.Module

In [13]:

class MyBlock(nn.Module):
    def __init__(
        self, 
        n_in_out,
        conv_sizes = [16, 8],
        kernel_sizes = [3,3,3],
    ):
        super().__init__()
        # If you want to stack your blocks, the input size and the 
        # output size must be consistent, so here n_in_out
        self.conv1 = nn.Conv2d(n_in_out, conv_sizes[0], kernel_size=kernel_sizes[0], padding=1)
        self.conv2 = nn.Conv2d(conv_sizes[0], conv_sizes[1], kernel_size=kernel_sizes[1], padding=1)
        self.conv3 = nn.Conv2d(conv_sizes[1], n_in_out, kernel_size=kernel_sizes[2], padding=1)

    def forward(self, x):
        #shape = (1, n_in_out, 32, 32)
        out = torch.tanh(self.conv1(x))
        #shape = (1, conv_sizes[0], 32, 32)
        out = torch.tanh(self.conv2(out))
        #shape = (1, conv_sizes[1], 32, 32)
        out = torch.tanh(self.conv3(out))
        #shape = (1, n_in_out, 32, 32)
        return out

class MyDeepNN_WithMyBlock(nn.Module):
    def __init__(
        self, 
        n_blocks=10, 
        conv_sizes=[16, 8], 
        kernel_sizes=[3,3,3]
    ):
        super().__init__()
        # shape = (1, 3, 32, 32)
        self.myblocks = nn.Sequential(
            *[MyBlock(n_in_out=3, conv_sizes=conv_sizes, kernel_sizes=kernel_sizes) for _ in range(n_blocks)]
        )
        # shape = (1, n_in_out, 32, 32) but we will use maxpool with stride = 4 in the forward method so:
        # shape = (1, n_in_out, 32/4, 32/4)
        self.fc1 = nn.Linear(3*8*8, 2)

        
    def forward(self, x):
        out = self.myblocks(x)
        out = F.max_pool2d(out, 4)
        out = out.view(-1,3*8*8)
        out = F.relu(self.fc1(out))
        return out

In [14]:
img, _ = cifar2_train[0]
model = MyDeepNN_WithMyBlock(n_blocks=30)
output_tensor = model(img.unsqueeze(0))
print(output_tensor)


numel_list = [p.numel() for p in model.parameters()]
print("\nTotal number of parameters: ", sum(numel_list))
print("Number of layers: ", len(numel_list))
print("Number of parameter per layer: ", numel_list)

print("\n", model)

tensor([[0.0244, 0.0096]], grad_fn=<ReluBackward0>)

Total number of parameters:  55196
Number of layers:  182
Number of parameter per layer:  [432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 432, 16, 1152, 8, 216, 3, 384, 2]

 MyDeepNN_WithMyBlock(
  (myblocks): Sequential(
    (0): MyBlock(
