#**Models training**

---

This notebook is dedicated to training deep learning models for the task of **aortic segmentation**. To address this challenge, we selected five different segmentation models for comparison:
1. **UNet**
2. **FCN with ResNet-50 backbone**
3. **FCN with pretrained ResNet-50 backbone**
4. **DeepLabV3 with MobileNetV3-Large backbone**
5. **DeepLabV3 with pretrained MobileNetV3-Large backbone**

The notebook includes:
* Loading and preprocessing of the dataset  
* Definition of the training class (`TrainModel`)  
* Hyperparameter optimization using **Optuna**

During optimization, the **best-performing models from each trial loop are saved** and will later be **used for evaluation and comparison of final results**.

**Note:** The notebook has been launched on the Kaggle platform - it is worth remembering to install the Monai library

 **Importing Libraries and Modules**
 
---

The first step includes importing all necessary libraries and helper modules:

- `os` – for navigating the dataset folder structure
- `json` – for saving and loading file names in JSON format (e.g., for dataset splits)
- `train_test_split` from `sklearn.model_selection` – to split the data into training and validation sets
- `optuna` – for hyperparameter optimization
- `torch`, `torch.nn` – core PyTorch libraries used for training models
- `time` – for tracking training duration
- `torchvision.models` – includes prebuilt segmentation models like FCN and DeepLabV3
- `monai.networks.nets.UNet`, `monai.losses.DiceLoss` – UNet architecture and the Dice loss function used for the segmentation task
- `sys` – used to add a custom dataset loader module (`Dataset_loader`) to the Python path (to se more look to Dataset_loader.py)

From the custom module `Dataset_loader`, the `Dataset` class is imported, which handles loading and preprocessing of the aorta segmentation dataset.

In [None]:
import os
import json
from sklearn.model_selection import train_test_split
import optuna
import torch as tc
import torch.nn as nn
import json
import time
from torchvision import models
from torchvision.models.segmentation import FCN_ResNet50_Weights, DeepLabV3_MobileNet_V3_Large_Weights
from monai.networks.nets import UNet
from monai.losses import DiceLoss


import sys
sys.path.insert(1, "/kaggle/input/dataset-loader/") #Dataloader path
from Dataset_loader import Dataset

**Dataset splitting and loading**

---
In this step, we prepare the dataset splits for model training, validation, and testing. Each of the three sub-datasets — **Dongyang**, **KiTS**, and **Rider** — is first processed using a helper function `get_files()` (as in the `preprocessing_testing` and `data_overview` notebooks), which collects paths to all raw `.nrrd` image files (excluding segmentation masks).

After collecting all file paths, the `split_data()` helper function is used to divide each dataset into training, validation, and test sets. The split is done in two stages:
* 80% of the data is kept for training, and 20% is set aside as "pre-test".
* The "pre-test" is then split 50/50 into validation and test sets.

A fixed random seed ensures that the splits remain consistent across different runs.

Finally, the file paths for each subset (training, validation, and test) are combined across all three datasets and stored in a dictionary. This dictionary is saved to disk as a JSON file. In future notebooks, especially during model evaluation, we will load this saved file directly to ensure consistency and avoid regenerating the splits.


In [None]:
def get_files(data_path, folder):
    names = []
    folder_path = os.path.join(data_path, folder)
    for subfolder in os.listdir(folder_path):
        subfolder_path = os.path.join(folder_path, subfolder)
        for file in os.listdir(subfolder_path):
            if file.endswith(".nrrd") and not file.endswith(".seg.nrrd"):
                names.append(os.path.join(subfolder_path, file))
    return names

def split_data(names_list, pre_test_size = 0.2, val_ratio = 0.5, seed=5):
    train_names, pre_test_names = train_test_split(names_list, test_size = pre_test_size, random_state = seed)
    val_names, test_names = train_test_split(pre_test_names, test_size = val_ratio, random_state = seed)
    return train_names, val_names, test_names

Data_path = r"/kaggle/input/mri-data/Data" #Dataset path

Dongyang_data_names = get_files(Data_path, "Dongyang")
KiTS_data_names = get_files(Data_path, "KiTS")
Rider_data_names = get_files(Data_path, "Rider")

D_train_names, D_val_names, D_test_names = split_data(Dongyang_data_names)
K_train_names, K_val_names, K_test_names = split_data(KiTS_data_names)
R_train_names, R_val_names, R_test_names = split_data(Rider_data_names)

train_data_names = D_train_names + K_train_names + R_train_names
val_data_names = D_val_names + K_val_names + R_val_names
test_data_names = D_test_names + K_test_names + R_test_names

data_names_dict = {
    "train": train_data_names,
    "validation": val_data_names,
    "test": test_data_names
}

with open("/kaggle/working/data_names_dict.txt", "w") as file:
    json.dump(data_names_dict, file, indent=2)

After splitting the dataset, we use our custom `Dataset` class to load and preprocess the data. The class takes a list of file paths and handles the reading and formatting of image-mask pairs.

We initialize two datasets:
* `Training_dataset` – containing the training samples
* `Validation_dataset` – used to monitor model performance during training

The `.preprocess_data()` method is responsible for preprocessing the data – it loads the image and corresponding mask volumes, applies the necessary transformations, and prepares the data for training. For more details on the preprocessing steps, see the **`preprocessing_testing`** notebook and **`Dataset_loader.py`** file. At this stage, the data is ready for model training and hyperparameter optimization.

In [9]:
Training_dataset = Dataset(train_data_names)
Training_dataset.preprocess_data()

Validation_dataset = Dataset(val_data_names)
Validation_dataset.preprocess_data()

**Training Class Definition**

---

To modularize the training process and avoid code repetition, a custom `TrainModel` class was implemented. This class encapsulates the full training workflow, from loss computation and optimization to epoch-wise loss tracking and early stopping. The model itself is passed externally and not instantiated within the class.

Key features of the class include:
* **Separation of training and validation**: The `training_loop(validation=False)` method handles both training and validation. Its behavior changes depending on the `validation` flag, allowing shared logic while avoiding redundant code.
* **Early stopping mechanism**: A basic early stopping strategy is included. Training is stopped if the validation loss does not improve for a specified number of consecutive epochs (default: 3), helping prevent overfitting.
* **Loss tracking**: Both training and validation losses are recorded per epoch, making it easier to visualize learning dynamics or analyze performance trends.
* **Device handling**: The class moves the model to the appropriate device (`cuda` if available), and supports multi-GPU training via `DataParallel` (when running on Kaggle).
* **Model and loss access**: The trained model and recorded losses can be retrieved through `get_model()` and `get_losses()` respectively.
* **Mode-based behavior**: The class supports configurable output depending on the mode (`"Training"` or `"Study"`), such as printing per-epoch logs only when desired.

For the segmentation task, the **DiceLoss** function from the MONAI library was selected, as it performs well in medical image segmentation tasks. As the optimizer, **Adam** was used — a widely adopted choice known for its reliable performance and adaptive learning rate behavior across a variety of deep learning tasks.

This object-oriented structure improves code organization and reuse throughout the notebook.

**Note:** Some segmentation models from `torchvision`, such as **FCN** and **DeepLabV3**, return their output wrapped in a dictionary under the `"out"` key. Therefore, before computing the loss, we check if the model output is a dictionary:
```python
if isinstance(output, dict):
    output = output["out"]
```

In [10]:
class TrainModel:
    def __init__(self, model, training_loader, validation_loader, learning_rate, num_epochs, early_stopping = True, mode = "Study"):
        self.model = model
        self.training_loader = training_loader
        self.validation_loader = validation_loader
        self.learning_rate = learning_rate
        self.num_epochs = num_epochs
        self.early_stopping = early_stopping 
        self.mode = mode
        
        self.losses = []
        self.val_losses = []
        self.best_val_loss = float("inf")
        self.patience = 0
        self.patience_limit = 3

        self.obj_func = DiceLoss(sigmoid=True)
        self.optimizer = tc.optim.Adam(model.parameters(), lr=learning_rate)
        self.device = tc.device("cuda" if tc.cuda.is_available() else "cpu")

    def training_loop(self, validation=False):
        epoch_loss = 0
        loader = self.validation_loader if validation else self.training_loader

        for images, masks in loader:
            images, masks = images.to(self.device), masks.to(self.device)
            output = self.model(images.unsqueeze(1))
            if isinstance(output, dict):
                output = output["out"]
            loss = self.obj_func(output, masks.unsqueeze(1))
            if not validation:
                loss.backward()
                self.optimizer.step()
                self.optimizer.zero_grad()
            epoch_loss += loss.item() * images.size(0)

        self.val_losses.append(epoch_loss/len(loader.dataset)) if validation else self.losses.append(epoch_loss/len(loader.dataset))
    
    def check_early_stopping(self, val_loss):
        if val_loss < self.best_val_loss:
            self.best_val_loss = val_loss
            self.patience = 0
        else:
            self.patience += 1
        if self.patience >= self.patience_limit:
            return True
        return False
    
    def print_epoch_info(self, epoch):
        print(f"\nCurrent epoch: {epoch+1}")
        print(f"Train loss: {self.losses[-1]:.4f}") 
        print(f"Validation loss: {self.val_losses[-1]:.4f}") 

    def train(self):
        self.model = self.model.to(self.device)
        if tc.cuda.device_count() > 1: #Kaggle offers 2xT4, we divide our computing power beetwen those two (DataParallel)
            self.model = nn.DataParallel(self.model)
        
        for epoch in range(self.num_epochs):
            self.model.train()
            self.training_loop(validation=False)
            self.model.eval()
            with tc.no_grad():
                self.training_loop(validation=True)
            
            self.print_epoch_info(epoch) if self.mode == "Training" else None
            if self.early_stopping and self.check_early_stopping(val_loss=self.val_losses[-1]):
                break

    def get_losses(self):
        return self.losses, self.val_losses
    
    def get_model(self):
        return self.model

**Hyperparameter Optimization and Training**

---

We begin this section by implementing two utility functions essential for the Optuna-based hyperparameter tuning workflow:
* `logging_callback()` - this function is responsible for printing Optuna trial-related messages. For each trial, it displays the tested hyperparameters, and if the current trial achieves the best validation result. This improves traceability and visibility of the tuning process, especially when running multiple trials.
* `run_trial_training()` - The second function is shared across all model-specific objective() functions and encapsulates the core training routine for a single trial. It performs the following steps:
    * initializes the TrainModel class, which handles the training loop and optional early stopping,
    * starts the training process for the current model and hyperparameters,
    * after training, compares the final validation loss with the best loss so far, if a new best is found saves the trained model weights (.pth file) and stores training information such as loss curves and training time (.pt file),
    *  prints a brief summary of the trial,
    *  returns the final validation loss from the training and the best validation loss achieved to the main Optuna optimization function.

**Note:** The model saved is the one with the best final validation performance (after all epochs or early stopping) but we also print information about epoch with best validation loss to better understand training process.




In [11]:
def logging_callback(study, trial):
    print(f"[Trial {trial.number}] Params: {trial.params}")
    if study.best_trial.number == trial.number:
        print(f"New best result")

optuna.logging.set_verbosity(optuna.logging.WARNING)

In [12]:
def run_trial_training(model, training_loader, validation_loader, learning_rate, num_epochs, best_val_loss, model_name, trial):
    start_time = time.time()
    train_model = TrainModel(model, training_loader, validation_loader, learning_rate, num_epochs, early_stopping = True, mode = "Study")
    train_model.train()
    final_losses, final_val_losses = train_model.get_losses()
    end_time =  time.time()
    
    if final_val_losses[-1] < best_val_loss:
        best_val_loss = final_val_losses[-1]
        training_info = {'train_loss_lst': final_losses, 'val_loss_lst': final_val_losses, 'time': end_time-start_time}
        model = train_model.get_model()
        tc.save(training_info, f"/kaggle/working/{model_name}_info.pt")
        tc.save(model.state_dict(), f"/kaggle/working/{model_name}_trained.pth")

    best_epoch = final_val_losses.index(min(final_val_losses)) + 1
    training_time = time.strftime("%H:%M:%S", time.gmtime(end_time-start_time))
    print(f"\n--- Trial {trial.number} ---")
    print(f"Final val loss: {final_val_losses[-1]:.4f}")
    print(f"Best epoch: {best_epoch} - loss value: {min(final_val_losses)}")
    print(f"Training time: {training_time}")
    
    return final_val_losses[-1], best_val_loss

After defining utility functiins we perform hyperparameter optimization using **Optuna** for multiple segmentation models. Each model type (UNet, FCN, DeepLabV3) has a dedicated `objective()` function where we define the hyperparameters to be tuned, the model architecture, and the training procedure. The optimization is performed based on the **final validation loss** at the end of each training run. The model is saved only if its final validation loss is the best among all trials so far. This ensures that the saved model represents the one with the best overall performance at the end of training, which will later be used for evaluation on the test set (model_ebal.ipynb).

The number of optimization trials for each model was selected based on their computational complexity:
* **UNet**: 20 trials,
* **FCN** (ResNet-50 backbone):
    * 5 trials without pre-trained weights,
    * 5 trials with pre-trained weights,
* **DeepLabV3** (MobileNetV3-Large backbone):
    * 10 trials without pre-trained weights,
    * 10 trials with pre-trained weights.

**Note:** Unlike U-Net, which was specifically designed for medical image segmentation, FCN and DeepLabV3 (with MobileNet or ResNet backbones) were originally developed for general-purpose semantic segmentation tasks. As a result, their architectures assume RGB input and multi-class outputs, which requires structural adjustments when applied to medical segmentation tasks — such as changing the first convolutional layer to accept single-channel input and modifying the classifier to output a binary mask.

In [25]:
best_val_loss = float("inf")

def objective(trial):
    global best_val_loss
    channels = trial.suggest_categorical("channels", [[16, 32, 64, 128], [16, 32, 64, 128, 256]])
    dropout = trial.suggest_float("dropout", 0.0, 0.2)
    learning_rate = trial.suggest_float("learning_rate", 1e-5, 1e-2, log=True)
    num_epochs = trial .suggest_int("num_epochs", 2, 15)
    

    training_loader = tc.utils.data.DataLoader(Training_dataset, batch_size=32, shuffle=True)
    validation_loader = tc.utils.data.DataLoader(Validation_dataset, batch_size=32, shuffle=True)

    strides = (2,) * (len(channels) - 1)
    model = UNet(
        spatial_dims = 2,
        in_channels=1,
        out_channels=1,
        channels=channels,
        strides=strides,
        dropout=dropout
    )

    last_loss, best_val_loss = run_trial_training(model, training_loader, validation_loader, learning_rate, num_epochs, best_val_loss, model_name="UNet", trial=trial)
    
    tc.cuda.empty_cache()
    return last_loss

study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=20, callbacks=[logging_callback])

print(f"\nBest trial: #{study.best_trial.number}")
print(f"  Value (val_loss): {study.best_trial.value:.4f}")
print(f"  Params: {study.best_trial.params}")


--- Trial 0 ---
Final val loss: 0.3488
Best epoch: 3 - loss value: 0.3360832684304808
Training time: 00:01:41
[Trial 0] Params: {'channels': [16, 32, 64, 128, 256], 'dropout': 0.01077287031055012, 'learning_rate': 0.00866984140875161, 'num_epochs': 12}
New best result

--- Trial 1 ---
Final val loss: 0.3331
Best epoch: 9 - loss value: 0.32696925308567387
Training time: 00:02:52
[Trial 1] Params: {'channels': [16, 32, 64, 128], 'dropout': 0.11292249111318349, 'learning_rate': 0.0011799895057967428, 'num_epochs': 12}
New best result

--- Trial 2 ---
Final val loss: 0.3250
Best epoch: 9 - loss value: 0.32497288228004806
Training time: 00:02:09
[Trial 2] Params: {'channels': [16, 32, 64, 128], 'dropout': 0.01754145086034069, 'learning_rate': 0.009957769564287594, 'num_epochs': 9}
New best result

--- Trial 3 ---
Final val loss: 0.3921
Best epoch: 2 - loss value: 0.3710207793695685
Training time: 00:00:43
[Trial 3] Params: {'channels': [16, 32, 64, 128], 'dropout': 0.0150118858947073, 'lea

In [None]:
best_val_loss = float("inf")

def pytorch_objective_FCN(trial):
    global best_val_loss
    learning_rate = trial.suggest_float("learning_rate", 1e-5, 1e-2, log=True)
    num_epochs = trial .suggest_int("num_epochs", 2, 15)

    training_loader = tc.utils.data.DataLoader(Training_dataset, batch_size=64, shuffle=True)
    validation_loader = tc.utils.data.DataLoader(Validation_dataset, batch_size=64, shuffle=True)

    model = models.segmentation.fcn_resnet50(weights=None, num_classes=1)
    model.backbone.conv1 = nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) #Adjusting the first convolutional layer to accept single-channel (grayscale) input

    last_loss, best_val_loss = run_trial_training(model, training_loader, validation_loader, learning_rate, num_epochs, best_val_loss, model_name="FCN", trial=trial)
    
    tc.cuda.empty_cache()
    return last_loss

study = optuna.create_study(direction="minimize")
study.optimize(pytorch_objective_FCN, n_trials=5, callbacks=[logging_callback])

print(f"\nBest trial: #{study.best_trial.number}")
print(f"  Value (val_loss): {study.best_trial.value:.4f}")
print(f"  Params: {study.best_trial.params}")


--- Trial 0 ---
Final val loss: 0.3838
Best epoch: 8 - loss value: 0.3838325616157463
Training time: 00:52:30
[Trial 0] Params: {'learning_rate': 4.7326111416923234e-05, 'num_epochs': 8}
New best result

--- Trial 1 ---
Final val loss: 0.3375
Best epoch: 6 - loss value: 0.31712396952742683
Training time: 00:58:37
[Trial 1] Params: {'learning_rate': 0.0004635054719612116, 'num_epochs': 10}
New best result

--- Trial 2 ---
Final val loss: 0.5405
Best epoch: 14 - loss value: 0.5404831943318027
Training time: 01:32:21
[Trial 2] Params: {'learning_rate': 1.4579548300895439e-05, 'num_epochs': 14}

--- Trial 3 ---
Final val loss: 0.6268
Best epoch: 5 - loss value: 0.6267939262654876
Training time: 00:33:02
[Trial 3] Params: {'learning_rate': 1.2871503088557745e-05, 'num_epochs': 5}

--- Trial 4 ---
Final val loss: 0.3624
Best epoch: 7 - loss value: 0.35577251570235535
Training time: 01:02:46
[Trial 4] Params: {'learning_rate': 0.005876816821760395, 'num_epochs': 15}

Best trial: #1
  Value (

In [None]:
best_val_loss = float("inf")

def pytorch_objective_FCN_pretrained(trial):
    global best_val_loss
    learning_rate = trial.suggest_float("learning_rate", 1e-5, 1e-2, log=True)
    num_epochs = trial .suggest_int("num_epochs", 2, 15)

    training_loader = tc.utils.data.DataLoader(Training_dataset, batch_size=64, shuffle=True)
    validation_loader = tc.utils.data.DataLoader(Validation_dataset, batch_size=64, shuffle=True)

    model = models.segmentation.fcn_resnet50(weights=FCN_ResNet50_Weights)
    model.backbone.conv1 = nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    model.classifier[4] = nn.Conv2d(512, 1, kernel_size=1) #Replace the final classifier layer to output a single class (binary segmentation)

    last_loss, best_val_loss = run_trial_training(model, training_loader, validation_loader, learning_rate, num_epochs, best_val_loss, model_name="FCN_pretrained", trial=trial)
    
    tc.cuda.empty_cache()
    return last_loss

study = optuna.create_study(direction="minimize")
study.optimize(pytorch_objective_FCN_pretrained, n_trials=5, callbacks=[logging_callback])

print(f"\nBest trial: #{study.best_trial.number}")
print(f"  Value (val_loss): {study.best_trial.value:.4f}")
print(f"  Params: {study.best_trial.params}")

Downloading: "https://download.pytorch.org/models/fcn_resnet50_coco-1167a1af.pth" to /root/.cache/torch/hub/checkpoints/fcn_resnet50_coco-1167a1af.pth
100%|██████████| 135M/135M [00:00<00:00, 186MB/s]  



--- Trial 0 ---
Final val loss: 0.3513
Best epoch: 7 - loss value: 0.35128252509334457
Training time: 00:46:35
[Trial 0] Params: {'learning_rate': 0.00012397419410209337, 'num_epochs': 7}
New best result

--- Trial 1 ---
Final val loss: 0.4337
Best epoch: 9 - loss value: 0.4336885421256083
Training time: 01:00:24
[Trial 1] Params: {'learning_rate': 4.7121657525888465e-05, 'num_epochs': 9}

--- Trial 2 ---
Final val loss: 0.4228
Best epoch: 9 - loss value: 0.42280878056396276
Training time: 01:00:18
[Trial 2] Params: {'learning_rate': 4.8698753287319715e-05, 'num_epochs': 9}

--- Trial 3 ---
Final val loss: 0.3530
Best epoch: 2 - loss value: 0.35302645229692997
Training time: 00:13:11
[Trial 3] Params: {'learning_rate': 0.0006344993994652242, 'num_epochs': 2}

--- Trial 4 ---
Final val loss: 0.4913
Best epoch: 2 - loss value: 0.49129426308106117
Training time: 00:13:18
[Trial 4] Params: {'learning_rate': 0.0002665698117907311, 'num_epochs': 2}

Best trial: #0
  Value (val_loss): 0.3513

In [15]:
best_val_loss = float("inf")

def objective_deeplabv3(trial):
    global best_val_loss
    learning_rate = trial.suggest_float("learning_rate", 1e-5, 1e-2, log=True)
    num_epochs = trial .suggest_int("num_epochs", 2, 20)

    training_loader = tc.utils.data.DataLoader(Training_dataset, batch_size=64, shuffle=True)
    validation_loader = tc.utils.data.DataLoader(Validation_dataset, batch_size=64, shuffle=True)

    model = models.segmentation.deeplabv3_mobilenet_v3_large(weights=None, num_classes=1)
    model.backbone._modules["0"]._modules["0"] = nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1, bias=False)

    last_loss, best_val_loss = run_trial_training(model, training_loader, validation_loader, learning_rate, num_epochs, best_val_loss, model_name="DeepLabv3", trial=trial)
    
    tc.cuda.empty_cache()
    return last_loss

study = optuna.create_study(direction="minimize")
study.optimize(objective_deeplabv3, n_trials=10, callbacks=[logging_callback])

print(f"\nBest trial: #{study.best_trial.number}")
print(f"  Value (val_loss): {study.best_trial.value:.4f}")
print(f"  Params: {study.best_trial.params}")


--- Trial 0 ---
Final val loss: 0.6195
Best epoch: 3 - loss value: 0.6195443281440908
Training time: 00:03:27
[Trial 0] Params: {'learning_rate': 0.00041286358842804703, 'num_epochs': 3}
New best result

--- Trial 1 ---
Final val loss: 0.4825
Best epoch: 10 - loss value: 0.48248552621911617
Training time: 00:11:28
[Trial 1] Params: {'learning_rate': 0.0015652831479416713, 'num_epochs': 10}
New best result

--- Trial 2 ---
Final val loss: 0.4845
Best epoch: 7 - loss value: 0.4845328123589498
Training time: 00:08:01
[Trial 2] Params: {'learning_rate': 0.0033826286582314125, 'num_epochs': 7}

--- Trial 3 ---
Final val loss: 0.5356
Best epoch: 6 - loss value: 0.5243412455179524
Training time: 00:10:18
[Trial 3] Params: {'learning_rate': 0.003658942384153327, 'num_epochs': 11}

--- Trial 4 ---
Final val loss: 0.6404
Best epoch: 2 - loss value: 0.6404491218926447
Training time: 00:02:17
[Trial 4] Params: {'learning_rate': 0.001206018114562143, 'num_epochs': 2}

--- Trial 5 ---
Final val los

In [14]:
best_val_loss = float("inf")

def objective_deeplabv3_pretrained(trial):
    global best_val_loss
    learning_rate = trial.suggest_float("learning_rate", 1e-5, 1e-2, log=True)
    num_epochs = trial .suggest_int("num_epochs", 2, 20)

    training_loader = tc.utils.data.DataLoader(Training_dataset, batch_size=64, shuffle=True)
    validation_loader = tc.utils.data.DataLoader(Validation_dataset, batch_size=64, shuffle=True)

    model = models.segmentation.deeplabv3_mobilenet_v3_large(weights=DeepLabV3_MobileNet_V3_Large_Weights)
    model.backbone._modules["0"]._modules["0"] = nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1, bias=False)
    model.classifier[4] = nn.Conv2d(256, 1, kernel_size=1)

    last_loss, best_val_loss = run_trial_training(model, training_loader, validation_loader, learning_rate, num_epochs, best_val_loss, model_name="DeepLabv3_pretrained", trial=trial)
    
    tc.cuda.empty_cache()
    return last_loss

study = optuna.create_study(direction="minimize")
study.optimize(objective_deeplabv3_pretrained, n_trials=10, callbacks=[logging_callback])

print(f"\nBest trial: #{study.best_trial.number}")
print(f"  Value (val_loss): {study.best_trial.value:.4f}")
print(f"  Params: {study.best_trial.params}")

Downloading: "https://download.pytorch.org/models/deeplabv3_mobilenet_v3_large-fc3c493d.pth" to /root/.cache/torch/hub/checkpoints/deeplabv3_mobilenet_v3_large-fc3c493d.pth
100%|██████████| 42.3M/42.3M [00:00<00:00, 163MB/s]



--- Trial 0 ---
Final val loss: 0.5207
Best epoch: 9 - loss value: 0.5207283987469485
Training time: 00:10:58
[Trial 0] Params: {'learning_rate': 0.00032718360242898033, 'num_epochs': 9}
New best result

--- Trial 1 ---
Final val loss: 0.4534
Best epoch: 12 - loss value: 0.4534124423797243
Training time: 00:14:38
[Trial 1] Params: {'learning_rate': 0.0011963329885838906, 'num_epochs': 12}
New best result

--- Trial 2 ---
Final val loss: 0.6264
Best epoch: 19 - loss value: 0.6263532185646855
Training time: 00:23:10
[Trial 2] Params: {'learning_rate': 1.773118724942854e-05, 'num_epochs': 19}

--- Trial 3 ---
Final val loss: 0.5445
Best epoch: 6 - loss value: 0.5134639170494597
Training time: 00:08:30
[Trial 3] Params: {'learning_rate': 0.0024503796227529624, 'num_epochs': 7}

--- Trial 4 ---
Final val loss: 0.6070
Best epoch: 2 - loss value: 0.6069670074443189
Training time: 00:02:25
[Trial 4] Params: {'learning_rate': 0.0032835626906802824, 'num_epochs': 2}

--- Trial 5 ---
Final val l