# Advanced Conv

In the "Advanced Convolutional Networks" practical, we'll embark on a hands-on journey to implement foundational models that have shaped the landscape of deep learning. Starting with LeNet, the pioneer of convolutional networks, we'll delve into the architecture that began it all. Then, we'll tackle the Inception network, known for its innovative use of parallel convolutional paths to increase depth and width without the cost of computational complexity. Finally, we'll explore the Residual Block (ResBlock) concept, central to ResNet architectures, which introduced skip connections to enable the training of substantially deeper networks. This practical offers a unique opportunity to build these models from scratch, understanding the mechanics and innovations that drive their success.

In [1]:
import time

import torch
from torch import nn
from torch.nn import functional as F
from torch.utils.data import DataLoader
from torchvision.datasets import CIFAR10
from torchvision import transforms
import matplotlib.pyplot as plt
from PIL import Image

In [2]:
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
device

device(type='cuda')

For your practical, you will be adapting the structure of the famous LeNet architecture to create a modified version. Here’s what your model should include:

- Start with a convolutional layer. This layer should have 32 output channels and use a 5x5 kernel. Follow this with a ReLU activation function.

- Apply a max pooling layer with a 2x2 kernel to reduce the spatial dimensions.
Add another convolutional layer, this time with 64 output channels and a 5x5 kernel, again followed by a ReLU activation function.

- Use a max pooling layer with a 2x2 kernel, similar to step 2.
Incorporate a third convolutional layer with 64 output channels and a 5x5 kernel, followed by a ReLU activation function.

- Follow this with a max pooling layer using a 2x2 kernel, as in steps 2 and 4.
Flatten the output from the previous layers to prepare it for the fully connected layers.

- Add a fully connected (dense) layer with 1000 output neurons, combined with a ReLU activation function.

- Conclude with a fully connected layer that has 10 output neurons.
An important note for your assignment: Ensure the spatial dimensions are preserved after each convolutional layer. This will require careful selection of padding and stride parameters.

The input images for your model will have the shape (3, 32, 32), indicating 3 color channels with 32x32 pixel resolution.

In [None]:
class CustomLeNet(nn.Module):
  def __init__(self):
    super().__init__()  # Important, otherwise will throw an error

    # FIXME

  def forward(self, x):
    # FIXME
    return x


net = CustomLeNet()
print(net)

Make a small test with random data. Make the inference of the model

You can use the following code, to download CIFAR10. Update the code to transform the images to tensor and to normalize the images

In [7]:
cifar_mean = torch.tensor([0.5071, 0.4867, 0.4408])
cifar_std = torch.tensor([0.2675, 0.2565, 0.2761])

train_transforms = transforms.Compose([
    # FIXME
])
test_transforms = transforms.Compose([
    # FIXME
])


train_dataset = CIFAR10("/tmp", download=True, train=True, transform=train_transforms)
train_dataset, val_dataset = torch.utils.data.random_split(
    train_dataset,
    [int(0.8 * len(train_dataset)), int(0.2 * len(train_dataset))]
  )

val_dataset.transform = test_transforms
test_dataset = CIFAR10("/tmp", download=False, train=False, transform=test_transforms)

Files already downloaded and verified


Display the number of images in the train / val / test set

In [None]:
# FIXME

Build the dataloader and display the number of batches

In [None]:
# FIXME

Now display some images with the associated classes from the dataset.

In [13]:
classes = [
    'plane',
    'car',
    'bird',
    'cat',
    'deer',
    'dog',
    'frog',
    'horse',
    'ship',
    'truck'
]

In [None]:
# FIXME

Now build a function that will be used to evaluate the model

- Loop into the dataloader
- Make the inference of the model
- Compute the loss function
- Compute the argmax of the prediction logits
- Then compute the accuracy

In [None]:
# FIXME

Now:

- Send you model on the device.

- Build the SGD Optimizer

- Use the cross entropy loss

- Build the training loops

do not forget to keep train / val accuracy and loss for each epoch


In [None]:
# FIXME

Plot the train and val accuracy and loss

In [None]:
# FIXME

Eval the model now

In [None]:
# FIXME

Your goal is to determine the specific contributions of each enhancement to the model's performance. Here's how you should proceed:

- Data Augmentation: Start by enriching the training data with more varied examples. This can include applying mirror flips, random crops, and color jittering to the input images. The aim is to make your model more robust to variations in the input data.

- Regularization: Incorporate techniques like dropout or weight decay to prevent the model from overfitting. Overfitting happens when a model learns the training data too well, including its noise, which hurts its performance on unseen data.

- Improved Architecture: Integrate Batch Normalization into your model. Batch Normalization can accelerate training, in part by reducing internal covariate shift. This involves normalizing the input of each layer in a way that maintains the mean output close to 0 and the output standard deviation close to 1.

- Optimization Techniques: Experiment with advanced optimization strategies. Consider using different learning rates, momentum (including the possibility of Nesterov momentum), and adaptive learning rate methods like RMSProp or Adam.


These techniques can help in finding better minima faster and more reliably than standard gradient descent.

In [None]:
# FIXME

You've successfully implemented a modified version of LeNet, a pioneering convolutional neural network by Yann Le Cun. While impressive for its time, today's demands call for more sophisticated models.

Moving forward, we'll explore advanced techniques that have significantly boosted neural network performance since the era of AlexNet. These improvements encompass data augmentation, regularization, architectural enhancements, and optimization strategies.

Our next step is to create a versatile convolution block. This block will include a convolutional layer, followed by batch normalization, and conclude with a ReLU activation function. This will serve as a foundational component for building more complex and efficient networks. As a practice, try implementing and training these enhanced models on datasets like CIFAR10 to see their full potential.

In [None]:
class ConvBlock(nn.Module):
  def __init__(self, in_channels, out_channels, kernel_size, padding=0):
    super().__init__()
    # TODO

  def forward(self, x):
    # TODO
    return x

In [None]:
# FIXME

Now we are going to make the implementation of inception

Paper: https://arxiv.org/pdf/1409.4842.pdf

In [19]:
class InceptionBlock(nn.Module):
  def __init__(self, in_channels, reduced_channels, out_channels):
    super().__init__()

    # TODO

  def forward(self, x):
    # TODO
    return x

Inception full implementation:

https://github.com/pytorch/vision/blob/main/torchvision/models/inception.py

Now we are going to implement a ResNet block

Paper: https://arxiv.org/pdf/1512.03385.pdf

In [21]:
class ResBlock(nn.Module):
  def __init__(self, input_channels, hidden_channels):
    super().__init__()
    # Use a kernel size of 3

    # FIXME

  def forward(self, x):
    # FIXME

    return x

ResNet implem: https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py

To go further, read the following papers, implement them, and try to run them on a small dataset like CIFAR.

Here is a non-exhaustive list:

- MobileNet: https://arxiv.org/abs/1704.04861
- ShuffleNet: https://arxiv.org/abs/1707.01083
- SEnet: https://arxiv.org/abs/1709.01507
- DenseNet: https://arxiv.org/abs/1608.06993
- ResNext: https://arxiv.org/abs/1611.05431
- EfficientNet: https://arxiv.org/abs/1905.11946