In [1]:
%config Completion.use_jedi = False
%reload_ext autoreload
%autoreload 2

import sys
import os
sys.path.append("../")
#os.environ["MLFLOW_TRACKING_URI"] = 'http://localhost:5000/'

In [2]:
from torchvision import transforms
import torch

from lightning.pytorch import Trainer
from lightning.pytorch.loggers import MLFlowLogger
from lightning.pytorch.callbacks import ModelCheckpoint

from src.custom_datasets import MultiLabelDataModule
from src.model import MultiLabelClassifier

# Table of Content
1. [General Information](#intro)
2. [Experiment Design](#design) <br/>
    2.1 [General Settings](#settings) <br/>
    2.2 [Transform/Augmentations](#aug) <br/>
3. [Experiments](#exp) <br/>
    3.1 [Experiment 1 - Backbone pretrained and frozen weights](#id1) <br/>
    3.2 [Experiment 2 - Backbone pretrained and unfrozen weights](#id2) <br/>
    3.3 [Experiment 3 - Backbone untrained](#id3) <br/>
4. [Summary/Evaluation](#eval)


# 1. Introduction <a class="anchor" id="intro"></a>

As mentioned in the readme I'll built a multi-label classifier. So the model can predict more than one class per image (e.g. the color of the card and whether it's a creature or special card).  For this purpose I'll use the binary crossentropy loss with [logits](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html). This loss treats the outputs/logits independently and is [suitable for multi-label classification](https://discuss.pytorch.org/t/is-there-an-example-for-multi-class-multilabel-classification-in-pytorch/53579/7). For this experiment I'll use a Resnet18 as a backbone. <br/> 

**_Visual concept:_** <br/>
<img src="../img/NeuralNetwork_MultiLabel_Concept.svg" width=500 height=400 align="center"/>

**Training:** via pytorch Lightning with torch 2.0.1 - I will use torch.compile  <br/>
**Accelerator:** MPS (M1 Max) <br/>
**Logger:** MLFlow (local - `mlflow ui --backend-store-uri Coding_Projects/MLFlow_runs`) <br/>
**Datamodule:** Custom Module in src (built on the CustomDataset [MultiLabelImageFolder](../src/custom_datasets.py)) <br/>
**metrics:** MultiLabel accuracy will be the benchmark. I will also track the Precision and Recall with [MetricCollection](https://torchmetrics.readthedocs.io/en/stable/pages/overview.html?highlight=metriccollection#metriccollection) of torchmetrics <br/>
**Callbacks**: ModelCheckpoint, monitoring the validation accuracy <br/>
**Profiler**: While Testing I used "simple". Really informative profiler which measures the execution time per each action

# 2. Experiment Design <a class="anchor" id="design"></a>

I'll run 3 different experiments with the same augmentations, image input size and hyperparameters. The backbone will be Resnet18 to keep the model small. The training will run for 50 epochs per each experiment.

1. Backbone pretrained frozen
2. Backbone pretrained unfrozen
3. Backbone untrained unfrozen (normalized on training data)

At the end I'll take the best model and will use this model in a streamlit app to visualize the model's inference.

#### <u> Attention </u>
- If you're planning to make more experiments or want to do hyperparameter tuning you should look for [Ray (Hyperparameter Tuning)](https://docs.ray.io/en/latest/tune/examples/includes/mlflow_ptl_example.html), [papermill (Parameterize Notebooks)](https://papermill.readthedocs.io/en/latest/) or using the [LightningCLI/script parameters](https://lightning.ai/docs/pytorch/stable/common/hyperparameters.html). 
- Sometimes I had out of memory issues on my Mac where the memory was still allocated after the training. As a temporary solution I'm "emptying" the cache after each training to avoid that problem
- set persistent_worker to True in your DataLoader. It made the training time much faster

### 2.1 General Settings <a class="anchor" id="settings"></a>

In [3]:
# MLFlow Experiment Name
PROJECT_NAME = "Magic The Gathering - Multilabel Classification"

# Other MLFlow related parameters
LOCAL_MLFLOW_URI = f"/Users/ryoshibata/Coding_projects/MLFlow_runs/"

# Local paths where I stored the data
data_dirs = {"train": "../data/0.7-0.15-0.15_split/train/",
             "test": "../data/0.7-0.15-0.15_split/test/",
             "val": "../data/0.7-0.15-0.15_split/val/"}

# Experiment Settings
batch_size = 32 # with 64 and 4 workers I had memory leak issues 
hidden_size = 1024
lr = 0.001
num_classes = 10
n_epochs = 50

PROFILER = None

### 2.2 Transform/Augmentations <a class="anchor" id="aug"></a>

- I kept the main image shape 445x312 of my dataset (see Image Exploration)
    - it's usually recommended to use 224x224 if you use ImageNet trained architectures but I wanted to keep the image ratio of my dataset
- I only used the augmentations "RandomRotation" and "RandomHorizontalFlip" to keep it simple. Nevertheless I tried to improve the performance of the model by adding some randomness
    - I will use some real cards for my inference, so that's why I added the rotation augmentation
- In the last experiment I changed the mean and std of normalize step to the values I calculated in th exploration notebook

In [4]:
train_transform = transforms.Compose([transforms.RandomRotation(degrees=(0, 180)),
                                      transforms.RandomHorizontalFlip(),
                                      transforms.Resize((445, 312)),
                                      transforms.ToTensor(),
                                      transforms.Normalize([0.485, 0.456, 0.406],
                                                           [0.229, 0.224, 0.225])])

inference_transform = transforms.Compose([transforms.Resize((445, 312)),
                                          transforms.ToTensor(),
                                          transforms.Normalize([0.485, 0.456, 0.406],
                                                               [0.229, 0.224, 0.225])])

# 3. Experiments <a class="anchor" id="exp"></a>

In [5]:
# Helperfunction to reduce some code repetitions
from typing import Tuple

def set_mlflow_and_checkpoint_callback(run_name: str,) -> Tuple[MLFlowLogger, ModelCheckpoint]:
    """I'm only modifying the run_name for the two instances per each 
    experiment. It returns the MLFLowLogger and the ModelCheckpoint Callback."""

    checkpoint_callback = ModelCheckpoint(
        dirpath=f"./checkpoints/{run_name}/",
        save_top_k=2,
        monitor="val_MultilabelAccuracy",
        mode="max",
    )

    mlf_logger = MLFlowLogger(
        experiment_name=PROJECT_NAME,
        run_name=run_name,
        tracking_uri=LOCAL_MLFLOW_URI,
        log_model=True,
    )

    return mlf_logger, checkpoint_callback


def set_MultiLabelClassifier(backbone_config: dict):
    """I'm only modifying the backbones config in this experiment"""
    multilabel_model = MultiLabelClassifier(
        backbone_config=backbone_config,
        num_classes=num_classes,
        hidden_size_1=hidden_size,
        hidden_size_2=hidden_size,
        lr=lr
    )

    return multilabel_model

### 3.1 Experiment 1 - Backbone pretrained and frozen weights <a class="anchor" id="id1"></a>

In [6]:
run_name = "Experiment_1-Resnet18-pretrained_frozen-weights"

mlf_logger, checkpoint_callback = set_mlflow_and_checkpoint_callback(run_name)

mtg_data = MultiLabelDataModule(data_dirs=data_dirs,
                                train_transform=train_transform,
                                inference_transform=inference_transform,
                                batch_size=batch_size)

mlf_logger.log_hyperparams(train_transform.__dict__)

In [7]:
# Backbone settings
backbone_config = {
    "freeze_params": True,
    "backbone": "resnet18",
    "weights": "IMAGENET1K_V1",
}
# model settings
multilabel_model = set_MultiLabelClassifier(backbone_config)

# enable torch 2.x new features compile
torch.compile(multilabel_model)

# Trainer settings
trainer = Trainer(
    max_epochs=n_epochs,
    log_every_n_steps=5,
    logger=mlf_logger,
    accelerator="mps",
    profiler=PROFILER,
    callbacks=[checkpoint_callback],
)

trainer.fit(model=multilabel_model, datamodule=mtg_data)


Using cache found in /Users/ryoshibata/.cache/torch/hub/pytorch_vision_main
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

  | Name          | Type              | Params
----------------------------------------------------
0 | backbone      | ResnetBackbone    | 11.2 M
1 | classifier    | ClassifierHead    | 1.6 M 
2 | criterion     | BCEWithLogitsLoss | 0     
3 | train_metrics | MetricCollection  | 0     
4 | valid_metrics | MetricCollection  | 0     
5 | test_metrics  | MetricCollection  | 0     
----------------------------------------------------
1.6 M     Trainable params
11.2 M    Non-trainable params
12.8 M    Total params
51.047    Total estimated model params size (MB)


Epoch 49: 100%|██████████| 42/42 [00:04<00:00,  8.79it/s, v_num=11cb]      

`Trainer.fit` stopped: `max_epochs=50` reached.


Epoch 49: 100%|██████████| 42/42 [00:04<00:00,  8.63it/s, v_num=11cb]


In [8]:
trainer.test(multilabel_model,
             datamodule=mtg_data,
             ckpt_path="best")

Restoring states from the checkpoint path at /Users/ryoshibata/PycharmProjects/MultiLabelClassification/notebooks/checkpoints/Experiment_1-Resnet18-pretrained_frozen-weights/epoch=49-step=2100.ckpt
Loaded model weights from the checkpoint at /Users/ryoshibata/PycharmProjects/MultiLabelClassification/notebooks/checkpoints/Experiment_1-Resnet18-pretrained_frozen-weights/epoch=49-step=2100.ckpt


Testing DataLoader 0: 100%|██████████| 10/10 [00:00<00:00, 11.09it/s]


[{'test_loss': 0.13455015420913696,
  'test_MultilabelAccuracy': 0.9451826810836792,
  'test_MultilabelPrecision': 0.9019148349761963,
  'test_MultilabelRecall': 0.8737625479698181}]

In [9]:
torch._C._mps_emptyCache()

### 3.2 Experiment 2 - Backbone pretrained, not frozen parameters <a class="anchor" id="id2"></a>

In [10]:
run_name = "Experiment_2-Resnet18-pretrained_unfrozen-weights"

mlf_logger, checkpoint_callback = set_mlflow_and_checkpoint_callback(run_name)

mlf_logger.log_hyperparams(train_transform.__dict__)

In [11]:
# Backbone Settings
backbone_config = {
    "freeze_params": False,
    "backbone": "resnet18",
    "weights": "IMAGENET1K_V1",
}

# Model settings
multilabel_model = set_MultiLabelClassifier(backbone_config)

# enable torch 2.x new features compile
torch.compile(multilabel_model)

# train model
trainer = Trainer(
    max_epochs=n_epochs,
    log_every_n_steps=5,
    logger=mlf_logger,
    accelerator="mps",
    profiler=PROFILER,
    callbacks=[checkpoint_callback],
)

trainer.fit(model=multilabel_model, datamodule=mtg_data)

Using cache found in /Users/ryoshibata/.cache/torch/hub/pytorch_vision_main
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

  | Name          | Type              | Params
----------------------------------------------------
0 | backbone      | ResnetBackbone    | 11.2 M
1 | classifier    | ClassifierHead    | 1.6 M 
2 | criterion     | BCEWithLogitsLoss | 0     
3 | train_metrics | MetricCollection  | 0     
4 | valid_metrics | MetricCollection  | 0     
5 | test_metrics  | MetricCollection  | 0     
----------------------------------------------------
12.8 M    Trainable params
0         Non-trainable params
12.8 M    Total params
51.047    Total estimated model params size (MB)


Epoch 49: 100%|██████████| 42/42 [00:11<00:00,  3.52it/s, v_num=1ac3]      

`Trainer.fit` stopped: `max_epochs=50` reached.


Epoch 49: 100%|██████████| 42/42 [00:11<00:00,  3.52it/s, v_num=1ac3]


In [12]:
trainer.test(multilabel_model,
             datamodule=mtg_data,
             ckpt_path="best")

Restoring states from the checkpoint path at /Users/ryoshibata/PycharmProjects/MultiLabelClassification/notebooks/checkpoints/Experiment_2-Resnet18-pretrained_unfrozen-weights/epoch=47-step=2016.ckpt
Loaded model weights from the checkpoint at /Users/ryoshibata/PycharmProjects/MultiLabelClassification/notebooks/checkpoints/Experiment_2-Resnet18-pretrained_unfrozen-weights/epoch=47-step=2016.ckpt


Testing DataLoader 0: 100%|██████████| 10/10 [00:00<00:00, 11.30it/s]


[{'test_loss': 0.005829510744661093,
  'test_MultilabelAccuracy': 0.9990032911300659,
  'test_MultilabelPrecision': 0.999009907245636,
  'test_MultilabelRecall': 0.9986666440963745}]

In [13]:
torch._C._mps_emptyCache() # sometimes I had issues with memory leakage and found that solution

### 3.3 Experiment 3 - Backbone untrained <a class="anchor" id="id3"></a>

In [14]:
# changing mean and std of normalization step
train_transform = transforms.Compose([transforms.RandomRotation(degrees=(0, 180)),
                                      transforms.RandomHorizontalFlip(),
                                      transforms.Resize((445, 312)),
                                      transforms.ToTensor(),
                                      transforms.Normalize([0.50476729, 0.48440304, 0.46218942], 
                                                           [0.30981703, 0.3034715 , 0.30258951])])

inference_transform = transforms.Compose([transforms.Resize((445, 312)),
                                          transforms.ToTensor(),
                                          transforms.Normalize([0.50476729, 0.48440304, 0.46218942],
                                                               [0.30981703, 0.3034715 , 0.30258951])])

In [15]:
run_name = "Experiment_3-Resnet18-untrained"

mlf_logger, checkpoint_callback = set_mlflow_and_checkpoint_callback(run_name)

mtg_data = MultiLabelDataModule(data_dirs=data_dirs,
                                train_transform=train_transform,
                                inference_transform=inference_transform,
                                batch_size=batch_size)

mlf_logger.log_hyperparams(train_transform.__dict__)

In [16]:
# backbone settings
backbone_config = {
    "freeze_params": False,
    "backbone": "resnet18",
    "weights": None,
}
# model settings
multilabel_model = set_MultiLabelClassifier(backbone_config)

# enable torch 2.x new features compile
torch.compile(multilabel_model)

# Trainer settings
trainer = Trainer(
    max_epochs=n_epochs,
    log_every_n_steps=5,
    logger=mlf_logger,
    accelerator="mps",
    profiler=PROFILER,
    callbacks=[checkpoint_callback],
)

trainer.fit(model=multilabel_model, datamodule=mtg_data)

Using cache found in /Users/ryoshibata/.cache/torch/hub/pytorch_vision_main
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

  | Name          | Type              | Params
----------------------------------------------------
0 | backbone      | ResnetBackbone    | 11.2 M
1 | classifier    | ClassifierHead    | 1.6 M 
2 | criterion     | BCEWithLogitsLoss | 0     
3 | train_metrics | MetricCollection  | 0     
4 | valid_metrics | MetricCollection  | 0     
5 | test_metrics  | MetricCollection  | 0     
----------------------------------------------------
12.8 M    Trainable params
0         Non-trainable params
12.8 M    Total params
51.047    Total estimated model params size (MB)


Epoch 49: 100%|██████████| 42/42 [00:11<00:00,  3.57it/s, v_num=4a0c]      

`Trainer.fit` stopped: `max_epochs=50` reached.


Epoch 49: 100%|██████████| 42/42 [00:11<00:00,  3.57it/s, v_num=4a0c]


In [17]:
trainer.test(multilabel_model,
             datamodule=mtg_data,
             ckpt_path="best")

Restoring states from the checkpoint path at /Users/ryoshibata/PycharmProjects/MultiLabelClassification/notebooks/checkpoints/Experiment_3-Resnet18-untrained/epoch=35-step=1512.ckpt
Loaded model weights from the checkpoint at /Users/ryoshibata/PycharmProjects/MultiLabelClassification/notebooks/checkpoints/Experiment_3-Resnet18-untrained/epoch=35-step=1512.ckpt


Testing DataLoader 0: 100%|██████████| 10/10 [00:00<00:00, 11.70it/s]


[{'test_loss': 0.1341966986656189,
  'test_MultilabelAccuracy': 0.9245847463607788,
  'test_MultilabelPrecision': 0.845035195350647,
  'test_MultilabelRecall': 0.8440000414848328}]

In [18]:
torch._C._mps_emptyCache()

# 4. Evaluation Summary <a class="anchor" id="eval"></a>

By looking at the test metrics (Accuracy, Precision, Recall) the 2nd model is the best performing one. So fine-tuning the whole network led to the best results. In particular the 2nd model outperforms the others in precision and recall. I will use this model for the inference in the next notebook and streamlit app. All metrics including losses of the 2nd experiment are provided [here](./experiment_results/experiment_2/metrics/) (copied from my MLFlow folder). While the training the validation loss was a spiky but with a decreasing trend, maybe for another run it will be better to use a smaller learning rate. The best checkpoint was saved at the 47 epoch(Step=2016).

**Table View MLFlow:**<br/><br/>
<img src="../img/MLFlow_tracking_results_table.png" width=800 height=180 align="center"/> <br/>

**Barcharts MLFlow:**<br/><br/>
<img src="../img/MLFlow_tracking_results_chart.png" width=800 height=180 align="center"/>

**Experiment 2 Accuracy Train/Val Chart:**<br/><br/>
<img src="../img/experiment_2_accuracy_train_val_chart.png" width=800 height=400 align="center"/>

**Experiment 2 Loss Train/Val Chart:**<br/><br/>
<img src="../img/experiment_2_loss_train_val_chart .png" width=800 height=400 align="center"/>

# Next Step - [Model Inference](./02_Model_Inference.ipynb)