# Deep Learning course - LAB 7

## ConvNets 101

### Recap from previous Lab

* we introduced some basic image processing functionalities with OpenCV
* we saw how to import a custom dataset in PyTorch, how to operate data augmentation, and how to create a DataLoader out of it

### Agenda for today

* we will construct our first Convolutional Neural Network (CNN)
* we will show how to do transfer learning on CNNs
* we will show how to introduce Deconvolution/Inverse Convolution inside CNNs
* we will construct our own implementation of two famous CNN architectures: ResNet and U-Net

In [1]:
import torch
from torch import nn
from torchvision.datasets import MNIST
from scripts.mnistm import MNISTM

## Our first CNN

Building CNNs is actually not hard once you know all the pieces to construct a MultiLayer Perceptron.

We can distinguish between two macro-categories of CNNs, at least for the part concerning image classification. We might call them "historical" and "modern", although characteristics of both can sometimes get pretty much mixed-up.

* "Historical" CNNs are a stack composed of two parts:
  * a **convolutional** part, where have a cascade of convolutional layers intertwined with pooling layers for dimensionality and complexity reduction
    * usually the filters in each convolutional layer are more numerous as the image size shrinks (i.e., as we get further from the input)
  * a **fully-connected** part, where we have a sequence of few fully-connected layer, ending up in the output layer, where, as usual, we have as many neurons as there are categories
  
 the epitome of the historical CNN (which is still used in research today nonetheless) is VGGNet (or simply VGG). Its *core* is a **convolutional block** composed of two or three convolutional layers each with the same number of filters, which is double the number of filters of the previous layer, up to 512 filters per layer. At the end of each block, there's a Max Pooling layer which halves the spatial dimension of the image.

 In the picture below, you can see a modern implementation of VGG with only one fully-connected layer.
  ![](img/vgg.png)
  


* "Modern" CNNs, instead, get rid of the fully-connected part, as usually
    1. it introduces a considerable amount of parameters in the network (note that the convolutional layers, due to their local connectivity and shared weights, have a much lower number of parameters w.r.t. fully connected layers)
    2. it keeps the 2D spatial structure of the image intact up to the last hidden layer, allowing for more interpretal parameters/neurons (*insert Olah citation here*)
    3. it represents a "rigid" portion of the network in the sense that it constrains they size of the image to be fixed. We will see later how this can pose a problem.

 Sometimes, even the pooling layers may get replaced by convolutional layers with large kernel size, as its effect is to reduce dimensionality (i.e., height and width) of the corresponding image.
 
 Recently, the **residual block** has become one of the paramount structures in modern CNNs. It forces the network to learn image features by actually learning to *reconstruct itself* (sometimes in lower spatial dimensionality) rather than learn immediately features for the classification task. In the image below, you can see how a **residual network (resnet)** can be structured:

 ![](img/resnet.png)

 The 2 residual blocks are composed of two convolutional layers, after which the input of the first layer bypasses the whole two layers and gets summed to the output of the second layer. This bypass is called **skip connection**.

In [4]:
class CNN(nn.Module):
    def _build_vgg_block(self, num_conv_layers, in_channels, out_channels, batchnorm=True, activ=nn.ReLU, maxpool=True):
        layers = []
        for i in range(num_conv_layers):
            if i == 0:
                num_channels_in = in_channels
            else:
                num_channels_in = out_channels
            
            layers.append(nn.Conv2d(in_channels=num_channels_in, out_channels=out_channels))
            if batchnorm:
                layers.append(nn.BatchNorm2d(out_channels))
            layers.append(activ)
        
        if maxpool:
            layers.append(nn.MaxPool2d(kernel_size=2))
        return nn.Sequential(*layers)

    def __init__(self, num_classes=10):
        super().__init__()
        self.features = nn.Sequential(
            self._build_vgg_block(2, 3, 16, activation=nn.SiLU),
            self._build_vgg_block(2, 3, 32, activation=nn.SiLU),
            self._build_vgg_block(2, 3, 64, activation=nn.SiLU)
        )
        self.avgpool = nn.AdaptiveAvgPool2d(out_size=1)
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(64, num_classes)
        )

In [5]:
net = CNN()
net

AttributeError: module 'torch.nn' has no attribute 'SiLU'