# Regularizations with CNNs

## Lab 2 Regularization by Model Size

Author: M. Rußwurm, 2024, based on notebooks from D.Tuia (2020)

In this lab, we start with a complex CNN model that does not train. We simplify it by removing complexity. 

Main takeaway from this lab:
Dont Overkill: Use simple models for simple problems (like FashionMNIST) and complex models for complex problems. $\leftarrow$ this is prior knowledge (about the problem) and inducing bias (through changing model architectures) in practice.

### Setup

Let's get the required python packages

**d2l** Package:
The "d2l" (short for "dive into deep learning") package is a Python library designed to accompany the book "Dive into Deep Learning"

**Pytoch**:
Pytorch is an open-source machine learning library and scientific computing framework, primarily used for deep learning applications. 

**sklearn.metrics**:
The "sklearn.metrics" module is part of the scikit-learn library, a popular machine learning library in Python. The metrics module specifically focuses on providing tools for evaluating the performance of machine learning models.

In [13]:
!pip install -q d2l

from d2l import torch as d2l
import torch
from torch import nn
from sklearn.metrics import classification_report
    

## Data - FashionMNIST

Let's start by loading FashionMNIST data

Fashion MNIST is a dataset used in machine learning and computer vision, serving as a benchmark for image classification tasks. It consists of 70,000 grayscale images of clothing items, categorized into 10 classes such as t-shirts, dresses, and sneakers. Fashion MNIST is a popular alternative to the traditional handwritten digit MNIST dataset, providing a more complex challenge for developing and testing image recognition algorithms.

In [14]:

fashionMNIST = d2l.FashionMNIST(batch_size=512)

train_dataloader = fashionMNIST.get_dataloader(train=True)
val_dataloader = fashionMNIST.get_dataloader(train=False)

text_labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat',
                   'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']

for batch in train_dataloader:
    X,y = batch
    fashionMNIST.visualize(batch)
    break

Later, we would like to validate the model. 
This given function 
1. iterates through all data in a (validaiton) dataloader
2. stores the ground truth (y_true) and predictions (y_pred)
3. prints a classification report

In [15]:
@torch.no_grad()
def validate(model, dataloader):
    y_pred = []
    y_true = []
    for X,y in dataloader:
        y_true.append(y)
        y_pred.append(model(X).argmax(1))
        
    y_true = torch.hstack(y_true)
    y_pred = torch.hstack(y_pred)
    
    print(classification_report(y_pred=y_pred.numpy(), y_true=y_true.numpy(), labels=torch.arange(10).numpy(), target_names=text_labels))
    

## Run 1: Model - Convolutional Neural Network (LeNet)

Let's create an instance of the LeNet model

In [26]:
class LeNetModel(d2l.Classifier):
    def __init__(self, num_classes=10, lr=1):
        super().__init__()
        self.save_hyperparameters()
        self.net = nn.Sequential(
            nn.LazyConv2d(6, kernel_size=5, padding=2), nn.Sigmoid(),
            nn.AvgPool2d(kernel_size=2, stride=2),
            nn.LazyConv2d(16, kernel_size=5), nn.Sigmoid(),
            nn.AvgPool2d(kernel_size=2, stride=2),
            nn.Flatten(),
            nn.LazyLinear(120), nn.Sigmoid(),
            nn.LazyLinear(84), nn.Sigmoid(),
            nn.LazyLinear(num_classes))

    def training_step(self, batch):
        Y_hat = self(*batch[:-1])
        loss = self.loss(Y_hat, batch[-1])
        self.plot('loss', loss, train=True)
        self.plot('acc', self.accuracy(Y_hat, batch[-1]), train=True)
        return loss # the package takes care of the

    def validation_step(self, batch):
        Y_hat = self(*batch[:-1])
        self.plot('loss', self.loss(Y_hat, batch[-1]), train=False)
        self.plot('acc', self.accuracy(Y_hat, batch[-1]), train=False)

In [27]:
model = LeNetModel()
model.layer_summary(X_shape=X.shape)

**Task**
* initialize the LeNetModel with a learning rate of 1 and train it on the data fashionMNIST for 5 epochs

Hint: dont get stuck when the model does not train well. This is by design. Sometimes larger models are hard to train and dont find a good solution in the first place. Go ahead and simplify the model in the next step.

In [28]:
# TODO train the model 
# model = ...
# trainer = ...
# trainer.fit(...)

#SOLUTIONSTART
model = LeNetModel()

trainer = d2l.Trainer(max_epochs=5, num_gpus=1) # set num_gps
trainer.fit(model, fashionMNIST)
#SOLUTIONEND

validate(model, val_dataloader)

## Run 2 - MLP Model - remove the convolutions from the LeNet.

Removing complexity is a first step for regularization by simplifying the model architecture. It makes models run faster and can yield surprisingly good results.
No Overkill: Use simple models for simple problems (like FashionMNIST) and complex models for complex problems. $\leftarrow$ this is prior knowledge and inductive bias in practice.

**Task**
* Simplify the model by removing (e.g., commenting out the Convolution and pooling layers)

In [30]:
class MLPModel(d2l.Classifier):
    def __init__(self, num_classes=10, lr=1):
        super().__init__()
        self.save_hyperparameters()
        #TODO: copy the self.net from above but remove the CNN layer
        #self.net = nn.Sequential(...)
        #SOLUTIONSTART
        self.net = nn.Sequential(
            #nn.LazyConv2d(6, kernel_size=5),
            #nn.Sigmoid(), nn.AvgPool2d(kernel_size=2, stride=2),
            #nn.LazyConv2d(16, kernel_size=5),
            #nn.Sigmoid(), nn.AvgPool2d(kernel_size=2, stride=2),
            nn.Flatten(), nn.LazyLinear(120),
            nn.Sigmoid(), nn.LazyLinear(84),
            nn.Sigmoid(), nn.LazyLinear(num_classes))
        #SOLUTIONEND

    def training_step(self, batch):
        Y_hat = self(*batch[:-1])
        loss = self.loss(Y_hat, batch[-1])
        self.plot('loss', loss, train=True)
        self.plot('acc', self.accuracy(Y_hat, batch[-1]), train=True)
        return loss # the package takes care of the

    def validation_step(self, batch):
        Y_hat = self(*batch[:-1])
        self.plot('loss', self.loss(Y_hat, batch[-1]), train=False)
        self.plot('acc', self.accuracy(Y_hat, batch[-1]), train=False)

model = MLPModel()
model.layer_summary(X_shape=X.shape)

trainer = d2l.Trainer(max_epochs=10)
trainer.fit(model, fashionMNIST)
validate(model, val_dataloader)

## Deeper MLP Model

For reference, let's complicate the model again by adding additional linear layers to make it deeper.

**Task**
* add two additional nn.LazyLinear(120) layers to the MLP model from above.

the model summary should look like this:

```
Flatten output shape:	 torch.Size([512, 784])
Linear output shape:	 torch.Size([512, 120])
Sigmoid output shape:	 torch.Size([512, 120])
Linear output shape:	 torch.Size([512, 120])
Sigmoid output shape:	 torch.Size([512, 120])
Linear output shape:	 torch.Size([512, 120])
Sigmoid output shape:	 torch.Size([512, 120])
Linear output shape:	 torch.Size([512, 84])
Sigmoid output shape:	 torch.Size([512, 84])
Linear output shape:	 torch.Size([512, 10])
```

In [36]:
class DeepMLPModel(d2l.Classifier):
    def __init__(self, lr=1, num_classes=10):
        super().__init__()
        self.save_hyperparameters()

        #TODO: add two additional nn.LazyLinear(120) layers to the MLP model from above.
        #self.net = nn.Sequential(...)
        #SOLUTIONSTART
        self.net = nn.Sequential(
            nn.Flatten(),
            nn.LazyLinear(120), nn.Sigmoid(),
            nn.LazyLinear(120), nn.Sigmoid(),
            nn.LazyLinear(120), nn.Sigmoid(),
            nn.LazyLinear(84), nn.Sigmoid(),
            nn.LazyLinear(num_classes))
        #SOLUTIONEND

    def training_step(self, batch):
        Y_hat = self(*batch[:-1])
        loss = self.loss(Y_hat, batch[-1])
        self.plot('loss', loss, train=True)
        self.plot('acc', self.accuracy(Y_hat, batch[-1]), train=True)
        return loss # the package takes care of the

    def validation_step(self, batch):
        Y_hat = self(*batch[:-1])
        self.plot('loss', self.loss(Y_hat, batch[-1]), train=False)
        self.plot('acc', self.accuracy(Y_hat, batch[-1]), train=False)

model = DeepMLPModel()
model.layer_summary(X_shape=X.shape)

In [35]:
model = DeepMLPModel()

trainer = d2l.Trainer(max_epochs=20)
trainer.fit(model, fashionMNIST)
validate(model, val_dataloader)

# Questions

1. Why did the CNN (Run 1) not train? Please explain with a loss surface in mind.

#SOLUTIONSTART
The model was too complex. Without any regularization, the loss surface was full of local minima that the gradient descent algorithm could not escape. So a better minimum was not found
#SOLUTIONEND

2. In Run 2, you simplified the model by removing the convolutions. Why did this model train now?

#SOLUTIONSTART
Removing layers and simplifying the model removes complexity. This effectively smoothes the loss surface, which makes sure that the model finds a good optimimum from any initialization.
#SOLUTIONEND

3. In Run 3, the training curve does not look promising until it suddenly decreases. What is going on in terms of gradient descent? How do the deeper layers affect the loss surface?

#SOLUTIONSTART
Here, the model was stuck on a saddle point for a while before moving into an optimum. Similar to Q2, additional layers (depth) increase complexity and make it more difficult for the model to find an optimum.
#SOLUTIONEND