### Introduction
This notebook addresses the problem of **depth estimation in comics images**, which presents unique challenges compared to natural image depth estimation. Comics often feature exaggerated perspectives, inconsistent object scaling, and non-realistic visual elements, making it difficult to infer accurate depth information. The goal of this project is to predict two types of depth values in comics images: **intra-depth** (depth within objects) and **inter-depth** (depth between objects).

To tackle this problem, we implement a deep learning solution using a **Convolutional Neural Network (CNN)**. The network is designed to extract hierarchical features from the images and predict depth values. The notebook follows a structured workflow:
1. **Data Loading and Preprocessing**: Custom dataset loading that handles comics images and their depth annotations.
2. **Model Architecture**: Two model versions are explored, including a baseline model and an improved model, which incorporates batch normalization, additional convolutional layers, and dropout for better generalization.
3. **Training and Evaluation**: The models are trained using PyTorch, and their performance is evaluated on validation and test sets using mean squared error (MSE) as the evaluation metric.
4. **Inference**: The trained model is applied to unseen test data to generate depth predictions, which are saved for further analysis.

**Note:** to import the dataset needed for the Colab notebook, you need to add a shortcut to the shared folder to your Drive

- Click on the shared folder name
- "Add shortcut to Drive"
- "My Drive" 

You can also run the notebook directly by opening it from the shared folder.

In [16]:
import json
import pytz
import os
import cv2
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
from sklearn.model_selection import train_test_split
from datetime import datetime
from sklearn.metrics import mean_squared_error
import pandas as pd
from google.colab import drive
import glob
from torch.optim.lr_scheduler import ReduceLROnPlateau

drive.mount('/content/drive')
PROJECT_PATH = '/content/drive/MyDrive/Progetto_Deep_Learning/'      #mettere il proprio path al progetto

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Device: ", device)


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Device:  cpu


# **Data processing**

In this section, we define a custom dataset class, `DepthDataset`, which is essential for loading and preprocessing the comics images along with their corresponding depth annotations. This class is specifically designed to handle the unique structure of our dataset, where each image is associated with two types of depth values: **intra-depth** (depth within an object) and **inter-depth** (depth between objects).

The class performs the following key tasks:
1. **Loading Image Data**: The images are loaded from a specified directory, resized to a uniform size of 128x128 pixels, and normalized to a [0, 1] range.
2. **Loading Annotations**: Depth annotations are read from a JSON file, which provides the intra-depth and inter-depth values for each image.
3. **Preprocessing**: The class applies any specified transformations (e.g., augmentations or tensor conversions) and ensures that the images and depth values are properly formatted for input into a deep learning model.

In [17]:
# Dataset class for custom dataset loading
class DepthDataset(Dataset):
    def __init__(self, data_dir, annotations_file, img_size=(128, 128), transform=None):
        """
        Args:
        - data_dir (str): La directory dove si trovano le immagini.
        - annotations_file (str): Il file JSON contenente le annotazioni.
        - img_size (tuple): La dimensione a cui ridimensionare le immagini (default: (128, 128)).
        - transform: Trasformazioni da applicare alle immagini.
        """
        self.data_dir = data_dir
        self.img_size = img_size
        self.transform = transform

        with open(annotations_file, 'r') as f:
            self.annotations = json.load(f)

        self.image_paths = []
        self.intradepth = []
        self.interdepth = []

        # Processa ogni immagine e le sue annotazioni
        for img_info in self.annotations['images']:
            img_id = img_info['id']
            img_name = img_info['file_name']
            img_path = os.path.join(data_dir, img_name)

            # Verifica che l'immagine esista
            if not os.path.exists(img_path):
                print(f"Warning: Image {img_name} not found.")
                continue

            self.image_paths.append(img_path)

            # Trova le annotazioni per questa immagine
            for annotation in self.annotations['annotations']:
                if annotation['image_id'] == img_id:
                    self.intradepth.append(annotation['attributes'].get('Intradepth', 0))
                    self.interdepth.append(annotation['attributes'].get('Interdepth', 0))

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        img_path = self.image_paths[idx]
        image = cv2.imread(img_path)
        image = cv2.resize(image, self.img_size)
        image = image / 255.0  # Normalizza

        if self.transform:
            image = self.transform(image)

        # Converte l'immagine in torch.float32
        image = image.float()

        intradepth = torch.tensor(self.intradepth[idx], dtype=torch.float32)
        interdepth = torch.tensor(self.interdepth[idx], dtype=torch.float32)

        return image, intradepth, interdepth

# **Model architecture definition**
In this section, we define two convolutional neural network (CNN) architectures for the task of depth estimation in comics images: the **default model** and an enhanced **Version 2 model**. These models are designed to predict two types of depth values—**intra-depth** (depth within an object) and **inter-depth** (depth between objects).

1. **Default Model**:
   The default model is a basic CNN architecture that consists of two convolutional layers followed by max-pooling for downsampling, a fully connected layer, and two separate output layers for intra-depth and inter-depth predictions. While this model provides a solid starting point, its simplicity may limit its ability to generalize well on complex visual scenes in comics.

.

In [18]:
# Modello default
class DepthOrderingModel(nn.Module):
    def __init__(self):
        super(DepthOrderingModel, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.upsample = nn.Upsample(scale_factor=2, mode='nearest')
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(128 * 128 * 128, 256)
        self.fc_inter = nn.Linear(256, 1)
        self.fc_intra = nn.Linear(256, 1)

    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = self.pool(x)
        x = torch.relu(self.conv2(x))
        x = self.upsample(x)
        x = self.flatten(x)
        x = torch.relu(self.fc1(x))
        inter_depth = self.fc_inter(x)
        intra_depth = self.fc_intra(x)
        return inter_depth, intra_depth

2. **Version 2 Model**:
   The Version 2 model introduces several improvements over the default model:
   - **Additional Convolutional Layer**: A third convolutional layer is added to extract more complex features from the images.
   - **Batch Normalization**: After each convolutional layer, batch normalization is applied to stabilize and accelerate training.
   - **Global Average Pooling**: Replaces the flattening operation to reduce the number of parameters and improve generalization.
   - **Dropout**: A 10% dropout layer is included before the fully connected layer to prevent overfitting

In [19]:
#modello v2
class DepthOrderingModel(nn.Module):
    def __init__(self):
        super(DepthOrderingModel, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm2d(64)
        self.conv2 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm2d(128)
        self.conv3 = nn.Conv2d(128, 256, kernel_size=3, padding=1)
        self.bn3 = nn.BatchNorm2d(256)

        self.pool = nn.MaxPool2d(2, 2)
        self.global_avg_pool = nn.AdaptiveAvgPool2d(1)  # Global average pooling
        self.dropout = nn.Dropout(p=0.1)  # Dropout with 10% probability

        self.fc1 = nn.Linear(256, 128)
        self.fc_inter = nn.Linear(128, 1)
        self.fc_intra = nn.Linear(128, 1)

    def forward(self, x):
        x = torch.relu(self.bn1(self.conv1(x)))
        x = self.pool(x)

        x = torch.relu(self.bn2(self.conv2(x)))
        x = self.pool(x)

        x = torch.relu(self.bn3(self.conv3(x)))
        x = self.global_avg_pool(x)  # Shape: (batch_size, 256, 1, 1)

        x = x.view(x.size(0), -1)  # Flatten the tensor
        x = self.dropout(torch.relu(self.fc1(x)))

        inter_depth = self.fc_inter(x)
        intra_depth = self.fc_intra(x)
        return inter_depth, intra_depth


# **Training**
In this section, we define the complete training pipeline for the **Depth Ordering Model**, ensuring that the model can be trained, validated, and saved efficiently. This part of the notebook is responsible for managing the training process, data loading, and model saving in an organized and automated manner.

Key steps in this cell include:

1. **Data Preparation**:
   - We define the transformations needed for the images (converting to tensors) and load the dataset from the project directory.
   - The dataset is split into training and validation sets, following an 80/20 ratio.

2. **Model Initialization**:
   - The `DepthOrderingModel` is instantiated and transferred to the available device (GPU if available).
   
3. **Training Loop**:
   - The model is trained over 30 epochs, with each epoch consisting of a forward pass, loss calculation, backpropagation, and optimizer step.
   - The model is evaluated on the validation set after each epoch to monitor performance.
   - The `ReduceLROnPlateau` scheduler dynamically adjusts the learning rate if the validation loss plateaus.

4. **Early Stopping and Model Saving**:
   - The script implements early stopping, terminating the training process if the validation loss does not improve for 5 consecutive epochs.
   - The best performing model (in terms of validation loss) is saved automatically with a timestamp in the filename, ensuring that the optimal model is preserved without manual intervention.

In [20]:
# Definisco le trasformazioni per le immagini
transform = transforms.Compose([
    transforms.ToTensor()
])

# Dati di training
data_dir = PROJECT_PATH + 'data/images/train/'  #nella cartella del progetto
annotations_file = PROJECT_PATH + 'data/annotations/train-annotations.json'
saving_directory = PROJECT_PATH + 'models/best_model_'+str(datetime.now(pytz.timezone('Europe/Rome')).strftime("%Y-%m-%d_%H:%M")) + '.pth'

dataset = DepthDataset(data_dir, annotations_file, transform=transform)

# Divido il dataset in training e validation set
train_size = int(0.8 * len(dataset))
val_size = len(dataset) - train_size
train_dataset, val_dataset = torch.utils.data.random_split(dataset, [train_size, val_size])

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)

# Istanzia il modello e lo sposta sulla GPU se disponibile
model = DepthOrderingModel().to(device)

# Initialize optimizer and learning rate scheduler
optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = ReduceLROnPlateau(optimizer, 'min', patience=3, factor=0.5)
criterion = nn.MSELoss()

num_epochs = 30  # Increased epochs
patience = 5
best_val_loss = float('inf')
early_stop_counter = 0

for epoch in range(num_epochs):
    model.train()
    train_loss = 0.0
    for inputs, y_intra, y_inter in train_loader:
        inputs, y_intra, y_inter = inputs.to(device), y_intra.to(device), y_inter.to(device)

        optimizer.zero_grad()

        outputs_inter, outputs_intra = model(inputs)
        loss_inter = criterion(outputs_inter.squeeze(), y_inter)
        loss_intra = criterion(outputs_intra.squeeze(), y_intra)

        # Loss balancing (if needed, based on dataset scale)
        loss = loss_inter + loss_intra

        loss.backward()
        optimizer.step()

        train_loss += loss.item()

    train_loss /= len(train_loader)
    print(f"Epoch {epoch+1}/{num_epochs}, Training Loss: {train_loss:.4f}")

    # Validation step
    model.eval()
    val_loss = 0.0
    with torch.no_grad():
        for inputs, y_intra, y_inter in val_loader:
            inputs, y_intra, y_inter = inputs.to(device), y_intra.to(device), y_inter.to(device)
            outputs_inter, outputs_intra = model(inputs)
            loss_inter = criterion(outputs_inter.squeeze(), y_inter)
            loss_intra = criterion(outputs_intra.squeeze(), y_intra)
            val_loss += (loss_inter.item() + loss_intra.item())

    val_loss /= len(val_loader)
    print(f"Validation Loss: {val_loss:.4f}")

    # Learning rate scheduler
    scheduler.step(val_loss)

    # Early stopping logic
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        torch.save(model.state_dict(), saving_directory)  # Save the best model
        early_stop_counter = 0
    else:
        early_stop_counter += 1

    if early_stop_counter >= patience:
        print("Early stopping triggered")
        break

print("Training completed.")
print("Best validation loss:", best_val_loss)

Epoch 1/30, Training Loss: 2.5653
Validation Loss: 2.3268
Validation Loss: 1.8595
Epoch 3/30, Training Loss: 1.6458
Validation Loss: 2.5933
Epoch 4/30, Training Loss: 1.5324
Validation Loss: 2.3803
Epoch 5/30, Training Loss: 1.5065
Validation Loss: 2.9026
Epoch 6/30, Training Loss: 1.4849
Validation Loss: 3.7843
Epoch 7/30, Training Loss: 1.4303
Validation Loss: 2.9217
Early stopping triggered
Training completed.
Best validation loss: 1.8595452457666397


# **Model Evaluation**
This section of the notebook performs the evaluation of the latest trained **Depth Ordering Model** on the validation dataset. It automates the process of loading the most recently trained model, applying it to the validation images, and calculating the evaluation metrics, specifically the **mean squared error (MSE)** for both **intra-depth** and **inter-depth** predictions.

The evaluation process is structured as follows:

1. **Loading the Latest Model**:
   - The script automatically retrieves the most recent model from the saved models directory. This ensures that you are always evaluating the latest version without manually specifying the model path.

2. **Ground Truth and Image Loading**:
   - The ground truth annotations (intra-depth and inter-depth) are loaded from a JSON file.
   - Validation images are preprocessed, including resizing, normalization, and conversion into PyTorch tensors, preparing them for input to the model.

3. **Model Inference**:
   - The validation images are passed through the model to generate predictions for both intra-depth and inter-depth.
   - These predictions are flattened into 1D arrays to match the structure of the ground truth annotations.

4. **Evaluation**:
   - The predictions are compared against the ground truth annotations using the **mean squared error (MSE)** metric for both intra-depth and inter-depth. The overall MSE is also calculated by averaging these two values.
   
5. **Saving Results**:
   - The evaluation metrics (MSE values) are saved to a text file in the results directory, along with the date and time of the evaluation and the model path.
   - This ensures that each evaluation is logged and can be easily referenced in the future.

In [22]:
%%time
# geth the last model trained
directory = PROJECT_PATH + 'models/'
list_of_files = sorted(filter(os.path.isfile, glob.glob(directory + '*')), key=os.path.getmtime)
latest_file = list_of_files[-1]

#paths
model_path = latest_file  # Path al modello PyTorch
val_data_dir = PROJECT_PATH + 'data/images/val/'              #Caricamento dei dati di validazione
gt_path = PROJECT_PATH + 'data/annotations/val-annotations.json'  # Path agli annotations del ground truth
results_dir = PROJECT_PATH + 'results'           #risultati della valutazione

# Funzione per valutare il modello
def evaluate_model(predictions, ground_truth):
    pred_inter_depths = []
    pred_intra_depths = []
    true_inter_depths = []
    true_intra_depths = []

    # Processa ogni previsione
    for index, row in predictions.iterrows():
        img_id = str(row['img_id'])
        category_id = str(row['category_id'])

        pred_inter_depth = row['pred_Interdepth']
        pred_intra_depth = row['pred_Intradepth']

        # Verifica se l'immagine e la categoria esistono nel ground truth
        if img_id in ground_truth and category_id in ground_truth[img_id]:
            true_inter_depth = ground_truth[img_id][category_id]['Interdepth']
            true_intra_depth = ground_truth[img_id][category_id]['Intradepth']

            # Aggiungi previsioni e ground truth
            pred_inter_depths.append(pred_inter_depth)
            pred_intra_depths.append(pred_intra_depth)
            true_inter_depths.append(true_inter_depth)
            true_intra_depths.append(true_intra_depth)
        else:
            print(f"Warning: Image ID {img_id} or Category ID {category_id} not found in ground truth")

    # Calcolo del MSE
    if len(true_inter_depths) > 0 and len(pred_inter_depths) > 0:
        mse_inter = mean_squared_error(true_inter_depths, pred_inter_depths)
        mse_intra = mean_squared_error(true_intra_depths, pred_intra_depths)
        mse_overall = (mse_inter + mse_intra) / 2
    else:
        mse_inter, mse_intra, mse_overall = None, None, None
        print("No valid data for MSE calculation.")

    return mse_inter, mse_intra, mse_overall


# Carica le annotazioni ground truth
with open(gt_path, 'r') as f:
    ground_truth_data = json.load(f)

# Trasforma i dati del ground truth per un lookup più semplice
ground_truth = {}
for ann in ground_truth_data['annotations']:
    img_id = str(ann['image_id'])
    category_id = str(ann['category_id'])
    intradepth = ann['attributes'].get('Intradepth', 0)
    interdepth = ann['attributes'].get('Interdepth', 0)
    if img_id not in ground_truth:
        ground_truth[img_id] = {}
    ground_truth[img_id][category_id] = {
        'Intradepth': intradepth,
        'Interdepth': interdepth
    }

# Definizione del dispositivo (CPU o GPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Carica il modello PyTorch
model = DepthOrderingModel()  # Usa la stessa architettura del modello usata in fase di training
if torch.cuda.is_available():
    model.load_state_dict(torch.load(model_path, weights_only= True)) # Carica lo stato del modello su gpu
else:
    model.load_state_dict(torch.load(model_path, map_location=torch.device('cpu'), weights_only= True)) # Carica lo stato del modello e mappa i pesi sulla CPU se la GPU non è disponibile
model.to(device)
model.eval()


# Trasformazione delle immagini in tensori PyTorch
transform = transforms.Compose([
    transforms.ToTensor(),  # Converte le immagini in tensor
    transforms.Resize((128, 128)),  # Ridimensiona le immagini
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])  # Normalizza le immagini
])

# Carica e processa le immagini
X_val = []
img_ids = []
for img_info in ground_truth_data['images']:
    img_id = img_info['id']
    img_name = img_info['file_name']
    img_path = os.path.join(val_data_dir, img_name)
    image = cv2.imread(img_path)
    if image is not None:
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # Converti in RGB
        image = transform(image).to(device)  # Applica la trasformazione
        X_val.append(image)
        img_ids.append(img_id)

# Converti la lista di immagini in un batch di tensori
X_val = torch.stack(X_val)

# Ottieni le previsioni dal modello PyTorch
with torch.no_grad():
    pred_intra_depth, pred_inter_depth = model(X_val)

# Trasforma le previsioni in array 1D
pred_intra_depth = pred_intra_depth.cpu().numpy().flatten()
pred_inter_depth = pred_inter_depth.cpu().numpy().flatten()

# Assicurati che tutte le liste abbiano la stessa lunghezza
category_ids = [cat_id for img in ground_truth.values() for cat_id in img.keys()]
min_length = min(len(pred_intra_depth), len(pred_inter_depth), len(img_ids), len(category_ids))

# Taglia gli array alla lunghezza minima
pred_intra_depth = pred_intra_depth[:min_length]
pred_inter_depth = pred_inter_depth[:min_length]
img_ids = img_ids[:min_length]
category_ids = category_ids[:min_length]

# Crea il DataFrame delle previsioni
predictions_df = pd.DataFrame({
    'pred_Intradepth': pred_intra_depth,
    'pred_Interdepth': pred_inter_depth,
    'img_id': img_ids,
    'category_id': category_ids
})

# Valuta il modello
mse_inter, mse_intra, mse_overall = evaluate_model(predictions_df, ground_truth)


if not os.path.exists(results_dir):
    os.makedirs(results_dir)

# Ottieni la data e l'ora corrente
current_time = datetime.now(pytz.timezone('Europe/Rome')).strftime("%Y-%m-%d %H:%M")

# Scrivi i risultati nel file, includendo data e ora
with open(os.path.join(results_dir, 'evaluation_metrics.txt'), 'a') as file:
    file.write(f"\nEvaluation written on: {current_time}\n")
    file.write(f'Model: {model_path}\n')
    file.write(f'MSE of Inter-depth: {mse_inter}\n')
    file.write(f'MSE of Intra-depth: {mse_intra}\n')
    file.write(f'Overall MSE: {mse_overall}\n')

print(f'MSE of Inter-depth: {mse_inter}')
print(f'MSE of Intra-depth: {mse_intra}')
print(f'Overall MSE: {mse_overall}')


MSE of Inter-depth: 1.322116233084517
MSE of Intra-depth: 2.4228754494645957
Overall MSE: 1.8724958412745565
CPU times: user 11.8 s, sys: 1.13 s, total: 12.9 s
Wall time: 13.6 s


# **Model Inference**
This section of the notebook focuses on running **inference** using the latest trained **Depth Ordering Model** on a separate test dataset. The objective is to apply the trained model to unseen data, generate depth predictions, and store the results for further evaluation or analysis.

Key steps in this cell include:

1. **Loading the Latest Model**:
   - The script automatically identifies and loads the most recently trained model from the saved models directory. This ensures consistency by always using the latest version without manual intervention.

2. **Loading Test Annotations**:
   - The ground truth depth annotations for the test dataset are loaded from a JSON file, which provides **intra-depth** and **inter-depth** values for each image category in the test set.
   - A lookup dictionary is created to efficiently access the test segments data during prediction processing.

3. **Data Preprocessing**:
   - Test images are preprocessed through transformations such as resizing, normalization, and conversion into PyTorch tensors, making them suitable for input into the model.

4. **Model Inference**:
   - The test images are passed through the model to obtain predictions for intra-depth and inter-depth.
   - The predicted depth values are flattened into 1D arrays.

5. **Storing Results**:
   - The predictions, along with their associated image and category IDs, are compiled into a Pandas DataFrame.
   - The resulting DataFrame is then saved as a CSV file in the results directory for further evaluation or visualization.

In [29]:
%%time
# geth the last model trained
directory = PROJECT_PATH + 'models/'
list_of_files = sorted(filter(os.path.isfile, glob.glob(directory + '*')), key=os.path.getmtime)
latest_file = list_of_files[-1]

#paths
model_path = latest_file  # Path dell'ultimo modello PyTorch
test_data_dir = PROJECT_PATH + 'data/images/test/'              #Caricamento dei dati di validazione
test_segments_path = PROJECT_PATH + 'data/annotations/depth_TEST_segments.json'
results_dir = PROJECT_PATH + 'results'           #risultati della inferenza


# Carica le annotazioni ground truth
with open(test_segments_path, 'r') as f:
    test_segments = json.load(f)

# Trasforma i dati per un lookup più semplice
test_segments_dict = {}
for ann in test_segments['annotations']:
    img_id = str(ann['image_id'])
    category_id = str(ann['category_id'])
    intradepth = ann['attributes'].get('Intradepth', 0)
    interdepth = ann['attributes'].get('Interdepth', 0)
    if img_id not in test_segments_dict:
        test_segments_dict[img_id] = {}
    test_segments_dict[img_id][category_id] = {
        'Intradepth': intradepth,
        'Interdepth': interdepth
    }

# Definizione del dispositivo (CPU o GPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Carica il modello PyTorch
model = DepthOrderingModel()  # Usa la stessa architettura del modello usata in fase di training
if torch.cuda.is_available():
    model.load_state_dict(torch.load(model_path, weights_only= True)) # Carica lo stato del modello su gpu
else:
    model.load_state_dict(torch.load(model_path, map_location=torch.device('cpu'), weights_only= True)) # Carica lo stato del modello e mappa i pesi sulla CPU se la GPU non è disponibile
model.to(device)
model.eval()


# Trasformazione delle immagini in tensori PyTorch
transform = transforms.Compose([
    transforms.ToTensor(),  # Converte le immagini in tensor
    transforms.Resize((128, 128)),  # Ridimensiona le immagini
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])  # Normalizza le immagini
])

# Carica e processa le immagini
X_test = []
img_ids = []
for img_info in test_segments['images']:
    img_id = img_info['id']
    img_name = img_info['file_name']
    img_path = os.path.join(test_data_dir, img_name)
    image = cv2.imread(img_path)
    if image is not None:
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # Converti in RGB
        image = transform(image).to(device)  # Applica la trasformazione
        X_test.append(image)
        img_ids.append(img_id)

# Converti la lista di immagini in un batch di tensori
X_test = torch.stack(X_test)

# Ottieni le previsioni dal modello PyTorch
with torch.no_grad():
    pred_intra_depth, pred_inter_depth = model(X_test)

# Trasforma le previsioni in array 1D
pred_intra_depth = pred_intra_depth.cpu().numpy().flatten()
pred_inter_depth = pred_inter_depth.cpu().numpy().flatten()

# Assicurati che tutte le liste abbiano la stessa lunghezza
category_ids = [cat_id for img in test_segments_dict.values() for cat_id in img.keys()]
min_length = min(len(pred_intra_depth), len(pred_inter_depth), len(img_ids), len(category_ids))

# Taglia gli array alla lunghezza minima
pred_intra_depth = pred_intra_depth[:min_length]
pred_inter_depth = pred_inter_depth[:min_length]
img_ids = img_ids[:min_length]
category_ids = category_ids[:min_length]

# Crea il DataFrame delle previsioni
predictions_df = pd.DataFrame({
    'img_id': img_ids,
    'category_id': category_ids,
    'pred_Intradepth': pred_intra_depth,
    'pred_Interdepth': pred_inter_depth
})

if not os.path.exists(results_dir):
    os.makedirs(results_dir)

# Save the DataFrame to a CSV file
pred_csv = predictions_df.to_csv(os.path.join(results_dir, 'test-predictions.csv'), index=False)
print(predictions_df.to_string())

    img_id category_id  pred_Intradepth  pred_Interdepth
0      131          28         0.341507         0.700975
1      158          22         0.223486         0.466395
2      163           1         0.372284         0.699417
3      188          16         0.344535         0.519607
4      198          12         0.426177         0.770635
5      207          25         0.392649         0.699364
6      213          27         0.475495         0.848142
7      245          24         0.337580         0.594709
8      261          23         0.398065         0.787483
9      269          11         0.316297         0.644748
10     275          28         0.390858         0.697132
11     282          12         0.418563         0.764017
12       1          22         0.451510         0.864075
13       8          23         0.729622         1.435773
14      21          16         0.417270         0.823564
15      25           1         0.271126         0.547276
16      38          19         