# 1. Adding dropout and normalization layers
Study the pytorch documentation for:
- Dropout https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html
- normalization layers https://pytorch.org/docs/stable/nn.html#normalization-layers

Experiment with adding dropout and normalization layers to your model. Some rough guidelines where to add them relative to Linear or Conv2d layers:
- Dropout: after Linear or Conv2d layers. Often added after the last Linear layer *before* the output layer, but could occur more often.
- Normalization layers: right after (blocks of) Linear or Conv2d layers, but before activation functions.

# 2. Adding convolutional and pooling layers
Previous lessons, you have started to experiment with you model.
You might have tested the impact of the amount of units, the depth of layers and different learning rates.

This lesson, we have added some new types of layers: convolutional and pooling layers.
Experiment with adding these new layers.

Also, have a look at the `ModuleList`: https://pytorch.org/docs/stable/generated/torch.nn.ModuleList.html#modulelist
It can be really useful to create a list of layers from a configfile, and then use that list to create your model.
Instead of just adding a single layer, you could also add a block of layers (eg a Conv2d layer, followed by a ReLU layer, followed by a BatchNorm2d layer, followed by a MaxPool2d layer) and repeat that in a loop, adding it to the `ModuleList`.

# 3. Improve your pipeline
In addition to new layers, we have expanded our logging tools with MLFlow, so we currently can choose between gin-config, tensorboard and MLFlow.

Expand your training pipeline you started in the previous lesson such that:

- you can switch between models by changing a config file
- you can test different hyperparameters by changing a config file
- you automatically log settings: model picked, hyperparameters, metrics, etc. : use either gin-config, tensorboard or MLFlow to log that, or a combination, whatever you prefer.
- Important: doing a master means you don't just start engineering a pipeline, but you need to reflect. Why do you see the results you see? What does this mean, considering the theory? Write down lessons learned and reflections, based on experimental results.
- continuously improve your code: 
    - clean up your experimental environment, such that it doesnt get too messy
    - automate the boring stuff: use a Makefile, use configfiles, automate logging, etc.
    - use git: commit your changes often and with descriptive messages
    - separate code for pipelines, configs, models, modeltraining and results.

To imrove the model drop out and normalization layers are added. Due to performance issues fashion data sets with 3 epochs training is used similar to ML_flow exercise. Additionally best results of the ML_flow exercise are ({'filters': 32.0, 'units1': 56.0, 'units2': 120.0}) deployed as parameters.

In [15]:
from pathlib import Path
import torch
import torch.nn as nn
from loguru import logger
import warnings
warnings.simplefilter("ignore", UserWarning)

In [16]:
from mads_datasets import DatasetFactoryProvider, DatasetType
from mltrainer.preprocessors import BasePreprocessor

In [17]:
fashionfactory = DatasetFactoryProvider.create_factory(DatasetType.FASHION)
batchsize = 64
preprocessor = BasePreprocessor()
streamers = fashionfactory.create_datastreamer(batchsize=batchsize, preprocessor=preprocessor)
train = streamers["train"]
valid = streamers["valid"]
trainstreamer = train.stream()
validstreamer = valid.stream()

[32m2024-11-17 13:02:13.507[0m | [1mINFO    [0m | [36mmads_datasets.base[0m:[36mdownload_data[0m:[36m121[0m - [1mFolder already exists at C:\Users\dilek\.cache\mads_datasets\fashionmnist[0m
[32m2024-11-17 13:02:13.509[0m | [1mINFO    [0m | [36mmads_datasets.base[0m:[36mdownload_data[0m:[36m124[0m - [1mFile already exists at C:\Users\dilek\.cache\mads_datasets\fashionmnist\fashionmnist.pt[0m


In [24]:
x, y = next(iter(trainstreamer))
x.shape, y.shape

(torch.Size([64, 1, 28, 28]), torch.Size([64]))

In [25]:
import torch
if torch.backends.mps.is_available() and torch.backends.mps.is_built():
    device = torch.device("mps")
    print("Using MPS")
elif torch.cuda.is_available():
    device = "cuda:0"
    print("using cuda")
else:
    device = "cpu"
    print("using cpu")

using cpu


In [26]:
from torch import nn
print(f"Using {device} device")

# Define model
class CNN(nn.Module):
    def __init__(self, filters, units1, units2, input_size=(32, 1, 28, 28)):
        super().__init__()

        self.convolutions = nn.Sequential(
            nn.Conv2d(1, filters, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(filters),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),
            nn.Conv2d(filters, filters, kernel_size=3, stride=1, padding=0),
            nn.BatchNorm2d(filters),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),
            nn.Conv2d(filters, filters, kernel_size=3, stride=1, padding=0),
            nn.BatchNorm2d(filters),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2)
        )

        activation_map_size = self._conv_test(input_size)
        logger.info(f"Aggregating activationmap with size {activation_map_size}")
        self.agg = nn.AvgPool2d(activation_map_size)

        self.dense = nn.Sequential(
            nn.Flatten(),
            nn.Linear(filters, units1),
            nn.ReLU(),
            nn.Dropout(p=0.5),  # Adding Dropout
            nn.Linear(units1, units2),
            nn.ReLU(),
            nn.Dropout(p=0.5),  # Adding Dropout
            nn.Linear(units2, 10)
        )

    def _conv_test(self, input_size = (32, 1, 28, 28)):
        x = torch.ones(input_size)
        x = self.convolutions(x)
        return x.shape[-2:]

    def forward(self, x):
        x = self.convolutions(x)
        x = self.agg(x)
        logits = self.dense(x)
        return logits

model = CNN(filters=32, units1=56, units2=120).to("cpu")  ## {'filters': 32.0, 'units1': 56.0, 'units2': 120.0} best parameters of the ML_Flow exercise

[32m2024-11-17 13:11:13.514[0m | [1mINFO    [0m | [36m__main__[0m:[36m__init__[0m:[36m25[0m - [1mAggregating activationmap with size torch.Size([2, 2])[0m


Using cpu device


In [27]:
from torchsummary import summary
summary(model, input_size=(1, 28, 28), device="cpu")

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 32, 28, 28]             320
       BatchNorm2d-2           [-1, 32, 28, 28]              64
              ReLU-3           [-1, 32, 28, 28]               0
         MaxPool2d-4           [-1, 32, 14, 14]               0
            Conv2d-5           [-1, 32, 12, 12]           9,248
       BatchNorm2d-6           [-1, 32, 12, 12]              64
              ReLU-7           [-1, 32, 12, 12]               0
         MaxPool2d-8             [-1, 32, 6, 6]               0
            Conv2d-9             [-1, 32, 4, 4]           9,248
      BatchNorm2d-10             [-1, 32, 4, 4]              64
             ReLU-11             [-1, 32, 4, 4]               0
        MaxPool2d-12             [-1, 32, 2, 2]               0
        AvgPool2d-13             [-1, 32, 1, 1]               0
          Flatten-14                   

In [28]:
import torch.optim as optim
from mltrainer import metrics
optimizer = optim.Adam
loss_fn = torch.nn.CrossEntropyLoss()
accuracy = metrics.Accuracy()

In [29]:
yhat = model(x.to("cpu"))
accuracy(y.to("cpu"), yhat)

tensor(0.1094)

In [30]:
experiment_path = "mlflow_test"

In [31]:
import mlflow
mlflow.set_tracking_uri("sqlite:///mlflow.db")
mlflow.set_experiment(experiment_path)

<Experiment: artifact_location='file:///C:/Users/dilek/Desktop/Advanced_AI_Applications_WS24-25_MADS_HSRW/notebooks/2_convolutions/mlruns/1', creation_time=1730121984736, experiment_id='1', last_update_time=1730121984736, lifecycle_stage='active', name='mlflow_test', tags={}>

In [32]:
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from hyperopt.pyll import scope

In [33]:
modeldir = Path("../../models/mnist").resolve()
if not modeldir.exists():
    modeldir.mkdir()
    print(f"Created {modeldir}")

In [34]:
import torch.optim as optim
from mltrainer import metrics, Trainer, TrainerSettings, ReportTypes
from datetime import datetime

# Define the hyperparameter search space
settings = TrainerSettings(
    epochs=3,
    metrics=[accuracy],
    logdir="modellog",
    train_steps=100,
    valid_steps=100,
    reporttypes=[ReportTypes.MLFLOW],
)


# Define the objective function for hyperparameter optimization
def objective(params):
    # Start a new MLflow run for tracking the experiment
    with mlflow.start_run():
        # Set MLflow tags to record metadata about the model and developer
        mlflow.set_tag("model", "convnet")
        mlflow.set_tag("dev", "raoul")
        # Log hyperparameters to MLflow
        mlflow.log_params(params)
        mlflow.log_param("batchsize", f"{batchsize}")


        # Initialize the optimizer, loss function, and accuracy metric
        optimizer = optim.Adam
        loss_fn = torch.nn.CrossEntropyLoss()
        accuracy = metrics.Accuracy()

        # Instantiate the CNN model with the given hyperparameters
        model = CNN(**params)
        # Train the model using a custom train loop
        trainer = Trainer(
            model=model,
            settings=settings,
            loss_fn=loss_fn,
            optimizer=optimizer,
            traindataloader=trainstreamer,
            validdataloader=validstreamer,
            scheduler=optim.lr_scheduler.ReduceLROnPlateau,
            device=device,
        )
        trainer.loop()

        # Save the trained model with a timestamp
        tag = datetime.now().strftime("%Y%m%d-%H%M")
        modelpath = modeldir / (tag + "model.pt")
        torch.save(model, modelpath)

        # Log the saved model as an artifact in MLflow
        mlflow.log_artifact(local_path=modelpath, artifact_path="pytorch_models")
        return {'loss' : trainer.test_loss, 'status': STATUS_OK}

In [35]:
search_space = {
    'filters' : scope.int(hp.quniform('filters', 16, 128, 8)),
    'units1' : scope.int(hp.quniform('units1', 32, 128, 8)),
    'units2' : scope.int(hp.quniform('units2', 32, 128, 8)),
}

In [36]:
best_result = fmin(
    fn=objective,
    space=search_space,
    algo=tpe.suggest,
    max_evals=3,
    trials=Trials()
)

  0%|                                                                            | 0/3 [00:00<?, ?trial/s, best loss=?]

[32m2024-11-17 13:13:07.531[0m | [1mINFO    [0m | [36m__main__[0m:[36m__init__[0m:[36m25[0m - [1mAggregating activationmap with size torch.Size([2, 2])[0m
[32m2024-11-17 13:13:07.533[0m | [1mINFO    [0m | [36mmltrainer.trainer[0m:[36mdir_add_timestamp[0m:[36m29[0m - [1mLogging to modellog\20241117-131307[0m
[32m2024-11-17 13:13:07.536[0m | [1mINFO    [0m | [36mmltrainer.trainer[0m:[36m__init__[0m:[36m70[0m - [1mFound earlystop_kwargs in settings.Set to None if you dont want earlystopping.[0m
  0%|[38;2;30;71;6m                                                                                            [0m| 0/3 [00:00<?, ?it/s][0m
[A
  0%|[38;2;30;71;6m                                                                                          [0m| 0/100 [00:00<?, ?it/s][0m
[A[A
  3%|[38;2;30;71;6m##4                                                                               [0m| 3/100 [00:00<00:03, 25.93it/s][0m
[A[A
  7%|[38;2;30;71;6m

 33%|████████████████▋                                 | 1/3 [00:14<00:29, 14.51s/trial, best loss: 0.5331311982870102]

[32m2024-11-17 13:13:22.021[0m | [1mINFO    [0m | [36m__main__[0m:[36m__init__[0m:[36m25[0m - [1mAggregating activationmap with size torch.Size([2, 2])[0m
[32m2024-11-17 13:13:22.021[0m | [1mINFO    [0m | [36mmltrainer.trainer[0m:[36mdir_add_timestamp[0m:[36m29[0m - [1mLogging to modellog\20241117-131322[0m
[32m2024-11-17 13:13:22.021[0m | [1mINFO    [0m | [36mmltrainer.trainer[0m:[36m__init__[0m:[36m70[0m - [1mFound earlystop_kwargs in settings.Set to None if you dont want earlystopping.[0m
  0%|[38;2;30;71;6m                                                                                            [0m| 0/3 [00:00<?, ?it/s][0m
[A
  0%|[38;2;30;71;6m                                                                                          [0m| 0/100 [00:00<?, ?it/s][0m
[A[A
  2%|[38;2;30;71;6m#6                                                                                [0m| 2/100 [00:00<00:05, 18.83it/s][0m
[A[A
  5%|[38;2;30;71;6m

 67%|█████████████████████████████████▎                | 2/3 [00:34<00:17, 17.98s/trial, best loss: 0.5331311982870102]

[32m2024-11-17 13:13:42.416[0m | [1mINFO    [0m | [36m__main__[0m:[36m__init__[0m:[36m25[0m - [1mAggregating activationmap with size torch.Size([2, 2])[0m
[32m2024-11-17 13:13:42.418[0m | [1mINFO    [0m | [36mmltrainer.trainer[0m:[36mdir_add_timestamp[0m:[36m29[0m - [1mLogging to modellog\20241117-131342[0m
[32m2024-11-17 13:13:42.419[0m | [1mINFO    [0m | [36mmltrainer.trainer[0m:[36m__init__[0m:[36m70[0m - [1mFound earlystop_kwargs in settings.Set to None if you dont want earlystopping.[0m
  0%|[38;2;30;71;6m                                                                                            [0m| 0/3 [00:00<?, ?it/s][0m
[A
  0%|[38;2;30;71;6m                                                                                          [0m| 0/100 [00:00<?, ?it/s][0m
[A[A
  3%|[38;2;30;71;6m##4                                                                               [0m| 3/100 [00:00<00:04, 20.86it/s][0m
[A[A
  6%|[38;2;30;71;6m

100%|██████████████████████████████████████████████████| 3/3 [00:57<00:00, 19.06s/trial, best loss: 0.5331311982870102]


In [37]:
best_result

{'filters': 80.0, 'units1': 88.0, 'units2': 72.0}

After adding normalization layers (BatchNorm2d) and dropout layers our best results has changed. Previous best results:{'filters': 32.0, 'units1': 56.0, 'units2': 120.0}. Additionally, our best loss in previous version was 0.8067, but with the new layers it is reduced to 0.5331.