# 1. Adding dropout and normalization layers
Study the pytorch documentation for:
- Dropout https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html
- normalization layers https://pytorch.org/docs/stable/nn.html#normalization-layers

Experiment with adding dropout and normalization layers to your model. Some rough guidelines where to add them relative to Linear or Conv2d layers:
- Dropout: after Linear or Conv2d layers. Often added after the last Linear layer *before* the output layer, but could occur more often.
- Normalization layers: right after (blocks of) Linear or Conv2d layers, but before activation functions.

In [1]:
from pathlib import Path
import torch
import torch.nn as nn
from loguru import logger
import warnings
warnings.simplefilter("ignore", UserWarning)

In [2]:
from mads_datasets import DatasetFactoryProvider, DatasetType
from mltrainer.preprocessors import BasePreprocessor

for dataset in DatasetType:
    print(dataset)

DatasetType.FLOWERS
DatasetType.IMDB
DatasetType.GESTURES
DatasetType.FASHION
DatasetType.SUNSPOTS
DatasetType.IRIS
DatasetType.PENGUINS
DatasetType.FAVORITA
DatasetType.SECURE


In [4]:
from mads_datasets import DatasetFactoryProvider, DatasetType
from mltrainer.preprocessors import BasePreprocessor
preprocessor = BasePreprocessor()

#fashionfactory = DatasetFactoryProvider.create_factory(DatasetType.FASHION)
#streamers = fashionfactory.create_datastreamer(batchsize=64, preprocessor=preprocessor)
flowersfactory = DatasetFactoryProvider.create_factory(DatasetType.FLOWERS)
streamers = flowersfactory.create_datastreamer(batchsize=32, preprocessor=preprocessor)
train = streamers["train"]
valid = streamers["valid"]

[32m2024-11-26 20:02:12.030[0m | [1mINFO    [0m | [36mmads_datasets.base[0m:[36mdownload_data[0m:[36m94[0m - [1mStart download...[0m
  0%|[38;2;30;71;6m                                                                                      [0m| 0.00/229M [00:00<?, ?iB/s][0m[32m2024-11-26 20:02:12.239[0m | [1mINFO    [0m | [36mmads_datasets.datatools[0m:[36mget_file[0m:[36m105[0m - [1mDownloading C:\Users\Francesca\.cache\mads_datasets\flowers\flowers.tgz[0m
100%|[38;2;30;71;6m██████████████████████████████████████████████████████████████████████████████[0m| 229M/229M [00:08<00:00, 26.4MiB/s][0m
[32m2024-11-26 20:02:20.922[0m | [1mINFO    [0m | [36mmads_datasets.datatools[0m:[36mextract[0m:[36m128[0m - [1mUnzipping C:\Users\Francesca\.cache\mads_datasets\flowers\flowers.tgz[0m
[32m2024-11-26 20:02:28.494[0m | [1mINFO    [0m | [36mmads_datasets.base[0m:[36mdownload_data[0m:[36m112[0m - [1mDigest of C:\Users\Francesca\.cache\mads_datasets

In [5]:
len(train), len(valid)

(91, 22)

In [6]:
trainstreamer = train.stream()
validstreamer = valid.stream()
x, y = next(iter(trainstreamer))
x.shape, y.shape

(torch.Size([32, 3, 224, 224]), torch.Size([32]))

In [8]:
# in_channels - RGB
in_channels = x.shape[1]

3

In [18]:
import torch
from torch import nn
from loguru import logger
from torchsummary import summary
import copy


# Define model
class CNN(nn.Module):
    """
    filters: int, out_channels = number of kernels
    units1: int, units for first linear Fully connected layer output
    units2: int, units for second linear Fully connected layer output
    input_size: tuple
    """
    def __init__(self, filters: int, units1: int, units2: int, input_size: tuple):
        super().__init__()
        self.in_channels = input_size[1]
        self.input_size = input_size

        self.convolutions = nn.Sequential(
            nn.Conv2d(self.in_channels, filters, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),
            nn.Dropout(p=0.5),
            nn.Conv2d(filters, filters, kernel_size=3, stride=1, padding=0),
            
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),
            nn.Conv2d(filters, filters, kernel_size=3, stride=1, padding=0),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),
        )

        activation_map_size = self._conv_test(self.input_size)
        print(activation_map_size)
        logger.info(f"Aggregating activationmap with size {activation_map_size}")
        self.agg = nn.AvgPool2d(activation_map_size)

        self.dense = nn.Sequential(
            nn.Flatten(),
            nn.Linear(filters, units1),
            nn.ReLU(),
            nn.Linear(units1, units2),
            nn.ReLU(),
            nn.Linear(units2, 32)
        )

    def _conv_test(self, input_size):
        x = torch.ones(input_size, dtype=torch.float32)
        x = self.convolutions(x)
        return x.shape[-2:]

    def forward(self, x):
        x = self.convolutions(x)
        x = self.agg(x)
        logits = self.dense(x)
        return logits

In [19]:
model = CNN(filters=128, units1=128, units2=224, input_size=(32, 3, 224, 224))
summary(model, input_size=(3, 224, 224), device="cpu")

[32m2024-11-26 21:12:37.291[0m | [1mINFO    [0m | [36m__main__[0m:[36m__init__[0m:[36m37[0m - [1mAggregating activationmap with size torch.Size([26, 26])[0m


torch.Size([26, 26])
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1        [-1, 128, 224, 224]           3,584
              ReLU-2        [-1, 128, 224, 224]               0
         MaxPool2d-3        [-1, 128, 112, 112]               0
           Dropout-4        [-1, 128, 112, 112]               0
            Conv2d-5        [-1, 128, 110, 110]         147,584
              ReLU-6        [-1, 128, 110, 110]               0
         MaxPool2d-7          [-1, 128, 55, 55]               0
            Conv2d-8          [-1, 128, 53, 53]         147,584
              ReLU-9          [-1, 128, 53, 53]               0
        MaxPool2d-10          [-1, 128, 26, 26]               0
        AvgPool2d-11            [-1, 128, 1, 1]               0
          Flatten-12                  [-1, 128]               0
           Linear-13                  [-1, 128]          16,512
             ReLU-

# 2. Adding convolutional and pooling layers
Previous lessons, you have started to experiment with you model.
You might have tested the impact of the amount of units, the depth of layers and different learning rates.

This lesson, we have added some new types of layers: convolutional and pooling layers.
Experiment with adding these new layers.

Also, have a look at the `ModuleList`: https://pytorch.org/docs/stable/generated/torch.nn.ModuleList.html#modulelist
It can be really useful to create a list of layers from a configfile, and then use that list to create your model.
Instead of just adding a single layer, you could also add a block of layers (eg a Conv2d layer, followed by a ReLU layer, followed by a BatchNorm2d layer, followed by a MaxPool2d layer) and repeat that in a loop, adding it to the `ModuleList`.

# 3. Improve your pipeline
In addition to new layers, we have expanded our logging tools with MLFlow, so we currently can choose between gin-config, tensorboard and MLFlow.

Expand your training pipeline you started in the previous lesson such that:

- you can switch between models by changing a config file
- you can test different hyperparameters by changing a config file
- you automatically log settings: model picked, hyperparameters, metrics, etc. : use either gin-config, tensorboard or MLFlow to log that, or a combination, whatever you prefer.
- Important: doing a master means you don't just start engineering a pipeline, but you need to reflect. Why do you see the results you see? What does this mean, considering the theory? Write down lessons learned and reflections, based on experimental results.
- continuously improve your code: 
    - clean up your experimental environment, such that it doesnt get too messy
    - automate the boring stuff: use a Makefile, use configfiles, automate logging, etc.
    - use git: commit your changes often and with descriptive messages
    - separate code for pipelines, configs, models, modeltraining and results.