# Deep Lab V3+ Manual Implementation 
### With ResNet-50 Backbone for high and low level features

In [1]:
"""
***COMP9517 Group Project: Manual DeepLabV3+/ResNet-50 based Image Segmentation***
Author: Maximilian Keller (maximilian.keller@unsw.edu.au)
Date: 18/07/24
Purpose: This script is designed for semantic segmentation using a manually defined DeepLabv3+ model on the WildScenes 2D dataset using a predefined train-val-test split.
Notes: Uses entire dataset (uneven class distrinution) through stratified data split by class label.

Revision History:

18/07/24 - Document created
20/07/24 - Implemented baseline DeepLabV3+ model (https://arxiv.org/pdf/1802.02611); (https://medium.com/@r1j1nghimire/semantic-segmentation-using-deeplabv3-from-scratch-b1ff57a27be)
27/07/24 - Created Data Loader to handle pre-defined train-val-test file structure (https://github.com/csiro-robotics/WildScenes/tree/main)
29/07/24 - First time training on entire Dataset. 4 Epochs of entire dataset. mIoU: 15
31/07/24 - Fixed class indexation issue and re-trained model. Drastic improvement in classification accuracy. Only trained 1 Epoch. mIoU: 17
31/07/24 - Experimented with different backbones of the ResNet-50 and ResNet-100 models (using different layers). 4 Epoch for each config. mIoU: 18
01/08/24 - Finalised script for submission. Added comments and headings.

Assumed File Hierarchy:

The script expects a specific file structure for input data, model checkpoints, and output predictions. Change the PATH variables if desired.

WildScenes_base_dir/
|-- DeepLabV3Plus_Backbone_Study.ipynb          # This Jupyter Notebook
|-- data/
|   |-- WildScenes/                             # The WildScenes dataset
|       |-- WildScenes2d/
|           |-- V-01/
|           |   |-- image/
|           |   |-- indexLabel/
|           |   |-- label/
|           |-- V-02/
|           |   |-- image/
|           |   |-- indexLabel/
|           |   |-- label/
|           |-- | ....
|           |-- Test/
|               |-- predictions/                # Folder for 'RGB label' predictions output
|               |-- predictions_label/          # Folder for 'indexLabel' predictions output
|-- splits/
|   |-- train.csv                               # CSV file containing training image paths and label paths
|   |-- val.csv                                 # CSV file containing validation image paths and label paths
|   |-- test.csv                                # CSV file containing test image paths and label paths
|-- jupyter_images/                             # Folder containing the images embedded in this notebook

CSV File Format:

The CSV files (train.csv, val.csv, test.csv) should have the following format:
id,im_path,label_path
1623370893-376340233,WildScenes2d/V-02/image/1623370893-376340233.png,WildScenes2d/V-02/indexLabel/1623370893-376340233.png
1623370896-529507864,WildScenes2d/V-02/image/1623370896-529507864.png,WildScenes2d/V-02/indexLabel/1623370896-529507864.png
"""

'\n***COMP9517 Group Project: Manual DeepLabV3+/ResNet-50 based Image Segmentation***\nCreated by: Maximilian Keller (maximilian.keller@unsw.edu.au)\nDate: 18/07/24\nPurpose: This script is designed for semantic segmentation using a manually defined DeepLabv3+ model on the WildScenes 2D dataset using a predefined train-val-test split.\nNotes: Uses entire dataset (uneven class distrinution) through stratified data split by class label.\n\nRevision History:\n\n18/07/24 - Document created\n23/07/24 - Implemented baseline DeepLabV3+ model (https://arxiv.org/pdf/1802.02611); (https://medium.com/@r1j1nghimire/semantic-segmentation-using-deeplabv3-from-scratch-b1ff57a27be)\n27/07/24 - Created Data Loader to handle pre-defined train-val-test file structure (https://github.com/csiro-robotics/WildScenes/tree/main)\n29/07/24 - First time training on entire Dataset. 4 Epochs of entire dataset. mIoU: 15\n31/07/24 - Fixed class indexation issue and re-trained model. Drastic improvement in classifica

#### Import Libraries

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models
import os
from PIL import Image
import numpy as np
import albumentations as A
from albumentations.pytorch import ToTensorV2
from torch.utils.data import Dataset, DataLoader
import torch.optim as optim
from tqdm import tqdm
import pandas as pd
import torchvision
import cv2
from copy import deepcopy
import json
from sklearn.metrics import f1_score

#### Setting global variables

In [3]:
# DEVICE SELECTION
DEVICE = (
    "cuda"                                                                  # Prefers CUDA acceleration (NVIDIA) -> then Metal acceleration (MAC) -> then CPU
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)

# HYPERPARAMETERS
LEARNING_RATE = 1e-4
BATCH_SIZE = 8
NUM_EPOCHS = 4
NUM_WORKERS = 0                                                             # Sub-processes used by Data Loader (set to 0 for main thread only)
PIN_MEMORY = True
NUM_CLASSES = 18
IMG_RESIZE_DIM = 800                                                        # Image dimension (in pixels) for training and inference (512 in paper)
OUTPUT_DIM = (2016, 1512)                                                   # Output prediction dimension (in pixels) for 'IndexLabel' and 'label'

# PATHS
TRAIN_CSV = "splits/train.csv"                                              # to CSV files (for train-val-test-split to be used in 'CustomDataset' class)
VAL_CSV = "splits/val.csv"
TEST_CSV = "splits/test.csv"
WILDSCENES_PATH = "data/WildScenes"                                         # path to folder containing 'WildScenes2d'
MODEL_PATH = "checkpoint.pth.tar"                                           # location to write/save training weights + optimiser state / read for testing
PREDICTIONS = 'data/WildScenes/WildScenes2d/Test/predictions'               # location to save RGB predictions
PREDICTIONS_LABELS = 'data/WildScenes/WildScenes2d/Test/predictions_label'  # location to save 'indexLabel' style index predictions

#### Class for Atrous Convolution filter definition

In [4]:
class Atrous_Convolution(nn.Module):
    def __init__(self, input_channels, kernel_size, pad, dilation_rate, output_channels=256):
        super(Atrous_Convolution, self).__init__()
        self.conv = nn.Conv2d(in_channels=input_channels, out_channels=output_channels, kernel_size=kernel_size, padding=pad, dilation=dilation_rate, bias=False)
        self.batchnorm = nn.BatchNorm2d(output_channels)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.conv(x)
        x = self.batchnorm(x)
        x = self.relu(x)
        return x

![Atrous.png](jupyter_images/Atrous-convolution.png)

#### Class for Atrous Spatial Pyramid Pooling (ASSP)

In [5]:
class ASSP(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(ASSP, self).__init__()
        self.conv_1x1 = Atrous_Convolution(in_channels, 1, 0, 1, out_channels)
        self.conv_3x3_6 = Atrous_Convolution(in_channels, 3, 6, 6, out_channels)
        self.conv_3x3_12 = Atrous_Convolution(in_channels, 3, 12, 12, out_channels)
        self.conv_3x3_18 = Atrous_Convolution(in_channels, 3, 18, 18, out_channels)
        self.image_pool = nn.Sequential(
            nn.AdaptiveAvgPool2d(1),
            nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True)
        )
        self.final_conv = nn.Sequential(
            nn.Conv2d(out_channels * 5, out_channels, kernel_size=1, bias=False),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        x1 = self.conv_1x1(x)
        x2 = self.conv_3x3_6(x)
        x3 = self.conv_3x3_12(x)
        x4 = self.conv_3x3_18(x)
        x5 = self.image_pool(x)
        x5 = F.interpolate(x5, size=x4.size()[2:], mode='bilinear', align_corners=True)
        x = torch.cat([x1, x2, x3, x4, x5], dim=1)
        x = self.final_conv(x)
        return x

![ASSP.png](jupyter_images/ASPP.png)

#### Backbone Feature Extraction Selection

In [6]:
# List available models
print("Available Torchvision classification models:",models.list_models(module=torchvision.models))
# print("ResNet-50:",models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V2))
# print("ResNet-101:",models.resnet101(weights=models.ResNet101_Weights.IMAGENET1K_V2))

# Chosen experimental configurations
experiments = [
    {
        "backbone": "resnet50",
        "output_layer_high": "layer4",
        "output_layer_low": "layer1"
    },
    {
        "backbone": "resnet101",
        "output_layer_high": "layer4",
        "output_layer_low": "layer1"
    },
    {
        "backbone": "resnet50",
        "output_layer_high": "layer3",
        "output_layer_low": "layer1"
    },
    {
        "backbone": "resnet101",
        "output_layer_high": "layer3",
        "output_layer_low": "layer1"
    }
]

# Function to get the backbone network
def get_backbone(name, output_layer):
    if name == "resnet50":
        pretrained_model = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V2)       # IMAGENET1K_V1 (accuracy 76.130%); IMAGENET1K_V2 (accuracy 80.858%)
    elif name == "resnet101":
        pretrained_model = models.resnet101(weights=models.ResNet101_Weights.IMAGENET1K_V2)     # IMAGENET1K_V1 (accuracy 77.374%); IMAGENET1K_V2 (accuracy 81.886%)
    else:
        raise ValueError(f"Unsupported backbone: {name}")
    
    layers = list(pretrained_model.children())
    if output_layer == 'layer1':  # Up to the first residual block
        net = nn.Sequential(*layers[:5])
    elif output_layer == 'layer2':  # Up to the second residual block
        net = nn.Sequential(*layers[:6])
    elif output_layer == 'layer3':  # Up to the third residual block
        net = nn.Sequential(*layers[:7])
    elif output_layer == 'layer4':  # Up to the fourth residual block
        net = nn.Sequential(*layers[:8])
    else:
        raise ValueError(f"Invalid output_layer: {output_layer}")
    
    return net

Available Torchvision classification models: ['alexnet', 'convnext_base', 'convnext_large', 'convnext_small', 'convnext_tiny', 'densenet121', 'densenet161', 'densenet169', 'densenet201', 'efficientnet_b0', 'efficientnet_b1', 'efficientnet_b2', 'efficientnet_b3', 'efficientnet_b4', 'efficientnet_b5', 'efficientnet_b6', 'efficientnet_b7', 'efficientnet_v2_l', 'efficientnet_v2_m', 'efficientnet_v2_s', 'googlenet', 'inception_v3', 'maxvit_t', 'mnasnet0_5', 'mnasnet0_75', 'mnasnet1_0', 'mnasnet1_3', 'mobilenet_v2', 'mobilenet_v3_large', 'mobilenet_v3_small', 'regnet_x_16gf', 'regnet_x_1_6gf', 'regnet_x_32gf', 'regnet_x_3_2gf', 'regnet_x_400mf', 'regnet_x_800mf', 'regnet_x_8gf', 'regnet_y_128gf', 'regnet_y_16gf', 'regnet_y_1_6gf', 'regnet_y_32gf', 'regnet_y_3_2gf', 'regnet_y_400mf', 'regnet_y_800mf', 'regnet_y_8gf', 'resnet101', 'resnet152', 'resnet18', 'resnet34', 'resnet50', 'resnext101_32x8d', 'resnext101_64x4d', 'resnext50_32x4d', 'shufflenet_v2_x0_5', 'shufflenet_v2_x1_0', 'shufflenet_v

#### DeepLab V3+ Definition

In [7]:
class Deeplabv3Plus(nn.Module):
    def __init__(self, num_classes, backbone_name, output_layer_high, output_layer_low):
        super(Deeplabv3Plus, self).__init__()
        self.backbone = get_backbone(backbone_name, output_layer_high)
        self.low_level_features = get_backbone(backbone_name, output_layer_low)
        
        # Determine the output channels of the high-level features based on the output layer
        if output_layer_high == 'layer4':
            high_level_channels = 2048
        elif output_layer_high == 'layer3':
            high_level_channels = 1024
        else:
            raise ValueError(f"Unsupported output layer for high-level features: {output_layer_high}")
        
        self.assp = ASSP(in_channels=high_level_channels, out_channels=256)
        self.conv1x1 = Atrous_Convolution(256, 1, 0, 1, 48)
        self.conv_3x3 = nn.Sequential(
            nn.Conv2d(304, 256, kernel_size=3, padding=1, bias=False),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True)
        )
        self.classifier = nn.Conv2d(256, num_classes, kernel_size=1)

    def forward(self, x):
        x_backbone = self.backbone(x)
        x_low_level = self.low_level_features(x)
        x_assp = self.assp(x_backbone)

        x_assp_upsampled = F.interpolate(x_assp, size=x_low_level.shape[2:], mode='bilinear', align_corners=True)
        x_conv1x1 = self.conv1x1(x_low_level)
        x_cat = torch.cat([x_conv1x1, x_assp_upsampled], dim=1)
        x_3x3 = self.conv_3x3(x_cat)
        x_3x3_upscaled = F.interpolate(x_3x3, scale_factor=(4, 4), mode='bilinear', align_corners=True)
        x_out = self.classifier(x_3x3_upscaled)
        return x_out


![DeepLabV3plus.png](jupyter_images/Modified_DeepLabV3Plus.png)

#### Creating custom Data Loader

In [8]:
# Inherit from dataset class and define the essential __len__ and __getitem__ methods
class CustomDataset(Dataset):
    def __init__(self, csv_file, WildScenes2d_path, transform=None):
        self.base_path = WildScenes2d_path
        self.data = pd.read_csv(csv_file)
        self.transform = transform

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        img_path = self.data.iloc[idx, 1]
        label_path = self.data.iloc[idx, 2]
        
        #print(f"{self.base_path}/{img_path}")
        image = Image.open(f"{self.base_path}/{img_path}").convert("RGB")
        label = Image.open(f"{self.base_path}/{label_path}").convert("RGB")

        image = np.array(image)
        label = np.array(label)[:, :, 0]  # Extract the red channel as the class index
        label = label - 1  # Shift labels from range 1-18 to 0-17
        
        # Apply image transform (normalisation)
        if self.transform:
            augmented = self.transform(image=image, mask=label)
            image = augmented['image']
            label = augmented['mask']
        
        return image, label

train_transform = A.Compose([
    A.Resize(IMG_RESIZE_DIM, IMG_RESIZE_DIM),  # Resize to avoid memory issues
    A.Normalize(mean=[0.0, 0.0, 0.0], std=[1.0, 1.0, 1.0], max_pixel_value=255.0),
    ToTensorV2()
])

val_transform = A.Compose([
    A.Resize(IMG_RESIZE_DIM, IMG_RESIZE_DIM),  # Resize to avoid memory issues
    A.Normalize(mean=[0.0, 0.0, 0.0], std=[1.0, 1.0, 1.0], max_pixel_value=255.0),
    ToTensorV2()
])

test_transform = A.Compose([
    A.Resize(IMG_RESIZE_DIM, IMG_RESIZE_DIM),  # Resize to avoid memory issues
    A.Normalize(mean=[0.0, 0.0, 0.0], std=[1.0, 1.0, 1.0], max_pixel_value=255.0),
    ToTensorV2()
])

def get_data_loaders(train_csv, val_csv, test_csv, root_path, batch_size, train_transform, val_transform, test_transform, num_workers=4, pin_memory=True):

    train_ds = CustomDataset(csv_file=train_csv, WildScenes2d_path=root_path, transform=train_transform)
    val_ds = CustomDataset(csv_file=val_csv, WildScenes2d_path=root_path, transform=val_transform)
    test_ds = CustomDataset(csv_file=test_csv, WildScenes2d_path=root_path, transform=test_transform)

    train_loader = DataLoader(
        train_ds,
        batch_size=batch_size,
        num_workers=num_workers,
        pin_memory=pin_memory,
        shuffle=False,
    )

    val_loader = DataLoader(
        val_ds,
        batch_size=batch_size,
        num_workers=num_workers,
        pin_memory=pin_memory,
        shuffle=False,
    )

    test_loader = DataLoader(
        test_ds,
        batch_size=batch_size,
        num_workers=num_workers,
        pin_memory=pin_memory,
        shuffle=False,
    )

    return train_loader, val_loader, test_loader

train_loader, val_loader, test_loader = get_data_loaders(
    TRAIN_CSV,
    VAL_CSV,
    TEST_CSV,
    WILDSCENES_PATH,
    BATCH_SIZE,
    train_transform, 
    val_transform,
    test_transform,
    num_workers=NUM_WORKERS,
    pin_memory=PIN_MEMORY
)

In [9]:
print(f"Number of training images: {len(train_loader)*BATCH_SIZE}")
print(f"Number of validation images: {len(val_loader)*BATCH_SIZE}")
print(f"Number of testing images: {len(test_loader)*BATCH_SIZE}")

Number of training images: 6056
Number of validation images: 288
Number of testing images: 2136


#### Training Implementation

In [10]:
# Fraction of the training file (train.csv) to use for each training iteration
TRAIN_FRACTION = 1

# Training and evaluation function
def train_and_evaluate(config, train_loader, val_loader, num_classes, device, epochs=10, learning_rate=1e-4):
    # Instantiate a new DeepLabV3+ model with backbone config
    model = Deeplabv3Plus(num_classes=num_classes, 
                          backbone_name=config["backbone"], 
                          output_layer_high=config["output_layer_high"], 
                          output_layer_low=config["output_layer_low"]).to(device)
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    loss_fn = nn.CrossEntropyLoss()
    
    best_val_iou = 0
    best_metrics = {}
    # Main training loop
    for epoch in range(epochs):
        model.train()
        total_loss = 0
        # Determine the number of batches to process based on train_fraction
        total_batches = int(len(train_loader) * TRAIN_FRACTION)
        for batch_idx, (data, targets) in enumerate(tqdm(train_loader, desc=f"Training Epoch {epoch + 1}")):
            # Break Training early if desired training fraction is reached
            if batch_idx >= total_batches:
                break
            data, targets = data.to(device), targets.to(device)
            optimizer.zero_grad()
            outputs = model(data)
            loss = loss_fn(outputs, targets)
            loss.backward()
            optimizer.step()
            total_loss += loss.item()
        
        # Print metrics
        val_metrics = evaluate(model, val_loader, device, num_classes)
        print(f"Epoch {epoch + 1}, Loss: {total_loss / len(train_loader)}, Validation Metrics: {val_metrics}")
        # Save the best model weights across all epochs
        if val_metrics['mean_iou'] > best_val_iou:
            best_val_iou = val_metrics['mean_iou']
            best_metrics = val_metrics
            torch.save(model.state_dict(), f"best_model_{config["backbone"]}_{config["output_layer_high"]}_{config["output_layer_low"]}.pth")
    
    return best_metrics

# Function to evaluate the performance metrics
def evaluate(model, val_loader, device, num_classes):
    model.eval()
    total_iou_per_class = np.zeros(num_classes)
    all_preds = []
    all_targets = []
    with torch.no_grad():
        for data, targets in tqdm(val_loader, desc="Evaluating"):
            data, targets = data.to(device), targets.to(device)
            outputs = model(data)
            preds = torch.argmax(outputs, dim=1)
            iou_per_class = compute_iou(preds, targets, num_classes)
            total_iou_per_class += np.nan_to_num(iou_per_class)
            all_preds.extend(preds.cpu().numpy().flatten())
            all_targets.extend(targets.cpu().numpy().flatten())
    
    mean_iou_per_class = total_iou_per_class / len(val_loader)
    mean_iou = np.nanmean(mean_iou_per_class)
    accuracy = np.mean(np.array(all_preds) == np.array(all_targets))
    f1 = f1_score(all_targets, all_preds, average='weighted')
    
    return {"mean_iou": mean_iou, "accuracy": accuracy, "f1_score": f1, "class_iou": mean_iou_per_class.tolist()}

# Function to manually compute the IoU
def compute_iou(preds, targets, num_classes):
    iou_per_class = []
    for cls in range(num_classes):
        intersection = ((preds == cls) & (targets == cls)).sum().item()
        union = ((preds == cls) | (targets == cls)).sum().item()
        if union == 0:
            iou_per_class.append(np.nan)
        else:
            iou_per_class.append(intersection / union)
    return iou_per_class

#### Iterate through the experimental configurations

In [10]:
results = []
# Loop through each backbone configuration and retrain the model
for config in experiments:
    print(f"Running experiment with config: {config}")
    val_metrics = train_and_evaluate(config, train_loader, val_loader, NUM_CLASSES, DEVICE, epochs=NUM_EPOCHS, learning_rate=LEARNING_RATE)
    results.append({"config": config, "metrics": val_metrics})

# Save results to a file
with open("experiment_results.json", "w") as f:
    json.dump(results, f, indent=4)

print("Experiments completed. Results saved to experiment_results.json")

Running experiment with config: {'backbone': 'resnet50', 'output_layer_high': 'layer4', 'output_layer_low': 'layer1'}


Training Epoch 1: 100%|██████████| 757/757 [28:45<00:00,  2.28s/it]
Evaluating: 100%|██████████| 36/36 [00:49<00:00,  1.36s/it]


Epoch 1, Loss: 0.5290548138448395, Validation Metrics: {'mean_iou': 0.1739582516075541, 'accuracy': 0.8421132950530036, 'f1_score': 0.8360674306378592, 'class_iou': [0.0, 0.5590915472165695, 0.0, 0.0, 0.0006078201727154711, 0.0, 0.45820561555074224, 0.8367674474826029, 6.906314916727569e-05, 0.0, 0.009134198821373778, 0.0, 0.0, 0.0, 0.06284386460945039, 0.005633216279488564, 0.6339286105486445, 0.5649671451052191]}


Training Epoch 2: 100%|██████████| 757/757 [28:02<00:00,  2.22s/it]
Evaluating: 100%|██████████| 36/36 [00:54<00:00,  1.51s/it]


Epoch 2, Loss: 0.3487351562537479, Validation Metrics: {'mean_iou': 0.18129346036478491, 'accuracy': 0.8510170494699647, 'f1_score': 0.8451913836007036, 'class_iou': [0.0, 0.6160892017729683, 0.0, 0.0, 0.0015019397999240117, 0.0, 0.48572259374269744, 0.8411310771566076, 0.010099187586213912, 0.0, 0.012485967410953565, 0.0, 0.0, 0.0, 0.08435723626398044, 0.009054872473402826, 0.6382877976532169, 0.5645524127061635]}


Training Epoch 3: 100%|██████████| 757/757 [28:05<00:00,  2.23s/it]
Evaluating: 100%|██████████| 36/36 [00:57<00:00,  1.60s/it]


Epoch 3, Loss: 0.2945696851103278, Validation Metrics: {'mean_iou': 0.17867544747882821, 'accuracy': 0.8456877760600706, 'f1_score': 0.8433651262355633, 'class_iou': [0.0, 0.5464308082803019, 0.0, 0.0, 0.0005968940334419477, 0.0, 0.4945735853916585, 0.8422806934158831, 0.014166953283923374, 0.0, 0.022447499300438236, 0.0, 0.0, 0.0, 0.0907664853516072, 0.00980435815223845, 0.6306202583317898, 0.5644705190776256]}


Training Epoch 4: 100%|██████████| 757/757 [28:14<00:00,  2.24s/it]
Evaluating: 100%|██████████| 36/36 [00:56<00:00,  1.57s/it]


Epoch 4, Loss: 0.25515021391636156, Validation Metrics: {'mean_iou': 0.17998776279208215, 'accuracy': 0.8471141177120142, 'f1_score': 0.8451568351323852, 'class_iou': [0.0, 0.5595091717189109, 0.0, 0.0, 0.0006050738031458829, 0.0, 0.503425722328688, 0.8422096009848682, 0.024849086051835428, 0.0, 0.017851433281242227, 0.0, 0.0, 0.0, 0.08177979450380536, 0.014076403016665637, 0.6199560861539456, 0.5755173584143717]}
Running experiment with config: {'backbone': 'resnet101', 'output_layer_high': 'layer4', 'output_layer_low': 'layer1'}


Training Epoch 1: 100%|██████████| 757/757 [35:38<00:00,  2.82s/it]
Evaluating: 100%|██████████| 36/36 [00:58<00:00,  1.61s/it]


Epoch 1, Loss: 0.5729096755877511, Validation Metrics: {'mean_iou': 0.17699884306912372, 'accuracy': 0.8440756349381625, 'f1_score': 0.8388631058737326, 'class_iou': [0.0, 0.5672619592911528, 0.0, 0.0, 0.0005807843056108786, 0.0, 0.4696332818020108, 0.8384354862234713, 0.00041153067101642863, 0.0, 0.011465732524726678, 0.0, 0.0, 0.0, 0.08482649568307613, 0.012277856258393066, 0.6321193172388448, 0.5689667312459243]}


Training Epoch 2: 100%|██████████| 757/757 [35:27<00:00,  2.81s/it]
Evaluating: 100%|██████████| 36/36 [01:00<00:00,  1.69s/it]


Epoch 2, Loss: 0.3460655860219008, Validation Metrics: {'mean_iou': 0.18170381765847549, 'accuracy': 0.8491220682420495, 'f1_score': 0.8445987838022346, 'class_iou': [0.0, 0.6068247634646092, 0.0, 0.0, 0.0009188527402424619, 0.0, 0.4863505340037952, 0.8401191559777014, 0.015766797872184133, 0.0, 0.020857254692516886, 0.0, 0.0, 0.0, 0.08997479241986667, 0.01447585978835979, 0.6381222686536989, 0.5572584382395842]}


Training Epoch 3: 100%|██████████| 757/757 [35:33<00:00,  2.82s/it]
Evaluating: 100%|██████████| 36/36 [01:03<00:00,  1.77s/it]


Epoch 3, Loss: 0.2918993609033456, Validation Metrics: {'mean_iou': 0.18456825272090846, 'accuracy': 0.8520105565371024, 'f1_score': 0.8489384747824524, 'class_iou': [0.0, 0.6021493082209688, 0.0, 0.0, 0.001132443550178612, 0.0, 0.504243628068186, 0.843423684893764, 0.022507889618699144, 0.0, 0.02473269393922742, 0.0, 0.0, 0.0, 0.1022257994051618, 0.015317738125729246, 0.6375437397417856, 0.5689516234126518]}


Training Epoch 4: 100%|██████████| 757/757 [35:30<00:00,  2.81s/it]
Evaluating: 100%|██████████| 36/36 [01:01<00:00,  1.72s/it]


Epoch 4, Loss: 0.2569111863168417, Validation Metrics: {'mean_iou': 0.18058369893927695, 'accuracy': 0.8469532575088339, 'f1_score': 0.8442150427465841, 'class_iou': [0.0, 0.5687016380966664, 0.0, 0.0, 0.0006219111439853096, 0.0, 0.4991325355499748, 0.8439697673285216, 0.01590326377082026, 0.0, 0.03284351984021219, 0.0, 0.0, 0.0, 0.08428341961450235, 0.014071572461192085, 0.6218253833127558, 0.5691535697883542]}
Running experiment with config: {'backbone': 'resnet50', 'output_layer_high': 'layer3', 'output_layer_low': 'layer1'}


Training Epoch 1: 100%|██████████| 757/757 [28:04<00:00,  2.23s/it]
Evaluating: 100%|██████████| 36/36 [00:55<00:00,  1.55s/it]


Epoch 1, Loss: 0.5409465094381154, Validation Metrics: {'mean_iou': 0.17907080086009333, 'accuracy': 0.8492385711130742, 'f1_score': 0.8462843191020996, 'class_iou': [0.0, 0.5617384659860583, 0.0, 0.0, 0.0005692741703163306, 0.0, 0.5152392227919677, 0.8460883832506503, 3.069877030457266e-05, 0.0, 0.018120707889262885, 0.0, 0.0, 0.0, 0.07626838642971334, 0.0, 0.6355078419748316, 0.5697114342185753]}


Training Epoch 2: 100%|██████████| 757/757 [28:00<00:00,  2.22s/it]
Evaluating: 100%|██████████| 36/36 [00:55<00:00,  1.54s/it]


Epoch 2, Loss: 0.3464593748750271, Validation Metrics: {'mean_iou': 0.18347130720872662, 'accuracy': 0.8548314211572439, 'f1_score': 0.8520396128179081, 'class_iou': [0.0, 0.5788132025709126, 0.0, 0.0, 0.000616721988181396, 0.0, 0.5375748656193703, 0.8498227460096861, 0.005229686176742266, 0.0, 0.018450069426673276, 0.0, 0.0, 0.0, 0.08591216877413278, 0.007306065212524009, 0.6423037984693954, 0.5764542055094606]}


Training Epoch 3: 100%|██████████| 757/757 [28:01<00:00,  2.22s/it]
Evaluating: 100%|██████████| 36/36 [00:54<00:00,  1.50s/it]


Epoch 3, Loss: 0.31213464425529797, Validation Metrics: {'mean_iou': 0.18339193936999137, 'accuracy': 0.8537919887367491, 'f1_score': 0.8517695365495741, 'class_iou': [0.0, 0.5785006322676575, 0.0, 0.0, 0.0006087949297954551, 0.0, 0.538946910343594, 0.849084850922203, 0.01590927625010066, 0.0, 0.020499444171209528, 0.0, 0.0, 0.0033672891907187323, 0.08403860555042501, 0.0073395964556421485, 0.6421759550365492, 0.5605835535419497]}


Training Epoch 4: 100%|██████████| 757/757 [28:04<00:00,  2.23s/it]
Evaluating: 100%|██████████| 36/36 [01:02<00:00,  1.75s/it]


Epoch 4, Loss: 0.28281824420670065, Validation Metrics: {'mean_iou': 0.1834707726995304, 'accuracy': 0.8522773078621908, 'f1_score': 0.8520318090900258, 'class_iou': [0.0, 0.5503010807435238, 0.0, 0.0, 0.0005783204857278931, 0.0, 0.5387008910677895, 0.8505573617176075, 0.022347206806299827, 0.0, 0.02883003219813753, 0.0, 0.0, 0.0038777032065622666, 0.0946008972391352, 0.00456713304741147, 0.6424863359082951, 0.5656269461710574]}
Running experiment with config: {'backbone': 'resnet101', 'output_layer_high': 'layer3', 'output_layer_low': 'layer1'}


Training Epoch 1: 100%|██████████| 757/757 [35:15<00:00,  2.80s/it]
Evaluating: 100%|██████████| 36/36 [01:01<00:00,  1.70s/it]


Epoch 1, Loss: 0.5249223810284739, Validation Metrics: {'mean_iou': 0.18320535710286007, 'accuracy': 0.8508658734540636, 'f1_score': 0.8488046724746044, 'class_iou': [0.0, 0.578501010358892, 0.0, 0.0, 0.0005779987190739648, 0.0, 0.5251002296549848, 0.8460416341136783, 0.00031523225042525457, 0.0, 0.019102823830561645, 0.0, 0.0, 0.0, 0.10079220292454392, 0.013698041475819253, 0.6433294712313514, 0.5702377832921507]}


Training Epoch 2: 100%|██████████| 757/757 [35:13<00:00,  2.79s/it]
Evaluating: 100%|██████████| 36/36 [01:01<00:00,  1.70s/it]


Epoch 2, Loss: 0.33443515546019514, Validation Metrics: {'mean_iou': 0.18741579824464066, 'accuracy': 0.8567411605565372, 'f1_score': 0.8547944051525402, 'class_iou': [0.0, 0.5958474386099571, 0.0, 0.0, 0.000703845942033955, 0.0, 0.5464750424965014, 0.851115169536453, 0.007057597135578117, 0.0, 0.02410394408537695, 0.0, 0.0, 0.0, 0.1073477601321176, 0.014323770491803278, 0.6455982917937341, 0.580911508179976]}


Training Epoch 3: 100%|██████████| 757/757 [35:22<00:00,  2.80s/it]
Evaluating: 100%|██████████| 36/36 [01:02<00:00,  1.74s/it]


Epoch 3, Loss: 0.29316964184513494, Validation Metrics: {'mean_iou': 0.18549826580727566, 'accuracy': 0.85282761704947, 'f1_score': 0.8534718662788048, 'class_iou': [0.0, 0.5531995520191597, 0.0, 0.0, 0.0006076436837615243, 0.0, 0.5494103362376799, 0.8495820714269072, 0.021017906891624173, 0.0, 0.025020060644421922, 0.0, 0.0, 0.0, 0.1090985798801154, 0.01569611265099408, 0.6425946926319981, 0.5727418284642999]}


Training Epoch 4: 100%|██████████| 757/757 [35:55<00:00,  2.85s/it]
Evaluating: 100%|██████████| 36/36 [01:04<00:00,  1.78s/it]


Epoch 4, Loss: 0.26186677051740087, Validation Metrics: {'mean_iou': 0.18420417387073423, 'accuracy': 0.8542459750441697, 'f1_score': 0.8525610554196401, 'class_iou': [0.0, 0.5335501697556532, 0.0, 0.0, 0.0006726891691802028, 0.0, 0.5442503513607826, 0.8530091498251285, 0.014694408860222012, 0.0, 0.02331869098493842, 0.0, 0.0, 0.0, 0.10732931267136286, 0.015268896540416462, 0.6357052216878669, 0.5878762388176653]}
Experiments completed. Results saved to experiment_results.json


#### Definition of class label indeces/RGB values (from Github)

In [11]:
METAINFO = {
    "classes": (
        "unlabelled",
        "asphalt",
        "dirt",
        "mud",
        "water",
        "gravel",
        "other-terrain",
        "tree-trunk",
        "tree-foliage",
        "bush",
        "fence",
        "structure",
        "pole",
        "vehicle",
        "rock",
        "log",
        "other-object",
        "sky",
        "grass",
    ),
    "palette": [
        (0, 0, 0),
        (230, 25, 75),
        (60, 180, 75),
        (255, 225, 25),
        (0, 130, 200),
        (145, 30, 180),
        (70, 240, 240),
        (240, 50, 230),
        (210, 245, 60),
        (230, 25, 75),
        (0, 128, 128),
        (170, 110, 40),
        (255, 250, 200),
        (128, 0, 0),
        (170, 255, 195),
        (128, 128, 0),
        (250, 190, 190),
        (0, 0, 128),
        (128, 128, 128),
    ],
        "cidx": [
            0,
            1,
            2,
            3,
            4,
            5,
            6,
            7,
            8,
            9,
            10,
            11,
            12,
            13,
            14,
            15,
            16,
            17,
            18
        ]
    }

#### Testing Implementation

In [13]:
# Fraction of the testing file (testcsv) to use for testing models
TEST_FRACTION = 1
# Choose to save image predictions
SAVE_IMAGES = False

# Function to carry out testing of image segmentation
def SegmentScenes(test_loader, model_path, num_classes, config, device, save_images=False):
    # Instantiate DeepLabV3+ model and load saved weights
    model = Deeplabv3Plus(num_classes=num_classes, 
                          backbone_name=config["backbone"], 
                          output_layer_high=config["output_layer_high"], 
                          output_layer_low=config["output_layer_low"]).to(device)
    model.load_state_dict(torch.load(model_path))
    model.eval()
    
    total_iou_per_class = np.zeros(num_classes)
    all_preds = []
    all_targets = []
    
    # Main testing loop (with break condition if TEST_FRACTION reached)
    with torch.no_grad():
        for batch_idx, (data, targets) in enumerate(tqdm(test_loader, desc="Testing")):
            if batch_idx >= len(test_loader) * TEST_FRACTION:
                break
            data, targets = data.to(device), targets.to(device)
            outputs = model(data)
            preds = torch.argmax(outputs, dim=1)
            iou_per_class = compute_iou(preds, targets, num_classes)
            total_iou_per_class += np.nan_to_num(iou_per_class)
            all_preds.extend(preds.cpu().numpy().flatten())
            all_targets.extend(targets.cpu().numpy().flatten())
            
            # Save images only if desired
            if save_images:
                save_predictions(data, preds, targets)
    
    # Peformance metrics
    mean_iou_per_class = total_iou_per_class / (len(test_loader) * TEST_FRACTION)
    mean_iou = np.nanmean(mean_iou_per_class)
    accuracy = np.mean(np.array(all_preds) == np.array(all_targets))
    f1 = f1_score(all_targets, all_preds, average='weighted')
    
    return {"mean_iou": mean_iou, "accuracy": accuracy, "f1_score": f1, "class_iou": mean_iou_per_class.tolist()}

def save_predictions(data, preds, targets):
    for i in range(data.size(0)):
        image = data[i].cpu().numpy().transpose(1, 2, 0)
        preds_resized = preds[i].cpu().numpy()
        filename = f"test_image_{i}.png"
        
        # Save prediction (RGB label format)
        pred_rgb = np.zeros((*preds_resized.shape, 3), dtype=np.uint8)
        for class_idx, color in enumerate(METAINFO['palette']):
            pred_rgb[preds_resized == class_idx] = color
        pred_image = Image.fromarray(pred_rgb)
        pred_filename = f"prediction_{filename}"
        pred_image.save(os.path.join(PREDICTIONS, pred_filename))

        # Save prediction (index label format)
        pred_label = np.zeros((*preds_resized.shape, 3), dtype=np.uint8)
        for class_idx, color in enumerate(METAINFO['palette']):
            pred_label[preds_resized == class_idx] = color
        pred_label[..., 0] = preds_resized  # Red channel
        pred_label[..., 1] = preds_resized  # Green channel
        pred_label[..., 2] = preds_resized  # Blue channel
        pred_label_image = Image.fromarray(pred_label)
        pred_label_filename = f"prediction_label_{filename}"
        pred_label_image.save(os.path.join(PREDICTIONS_LABELS, pred_label_filename))

# Loop through each backbone configuration and test the saved model
results = []
for exp_id, config in enumerate(experiments):
    model_path = f"best_model_{config['backbone']}_{config['output_layer_high']}_{config['output_layer_low']}.pth"
    config = experiments[exp_id]
    print("Running experiment with config:", config)
    val_metrics = SegmentScenes(test_loader, model_path, NUM_CLASSES, config, DEVICE, save_images=SAVE_IMAGES)
    print(val_metrics)
    results.append({"config": config, "metrics": val_metrics})

# Save results to a file
with open("experiment_results_test_set.json", "w") as f:
    json.dump(results, f, indent=4)

print("Testing Complete. Results saved to experiment_results_test_set.json")

Running experiment with config: {'backbone': 'resnet50', 'output_layer_high': 'layer4', 'output_layer_low': 'layer1'}


  model.load_state_dict(torch.load(model_path))
Testing: 100%|██████████| 267/267 [05:51<00:00,  1.32s/it]


{'mean_iou': 0.19008691577114056, 'accuracy': 0.8445787798874824, 'f1_score': 0.8360513886814105, 'class_iou': [0.0, 0.6304506815910224, 0.00023967924877025043, 0.029427619695273435, 4.138473317193288e-06, 0.0, 0.4817987515783103, 0.8420911469038814, 0.031030748976616323, 0.0, 0.03788944113119673, 0.0, 0.0, 0.01397874935240465, 0.13365205193324425, 0.0479065291767016, 0.6056355423286769, 0.5674594034911149]}
Running experiment with config: {'backbone': 'resnet101', 'output_layer_high': 'layer4', 'output_layer_low': 'layer1'}


  model.load_state_dict(torch.load(model_path))
Testing: 100%|██████████| 267/267 [06:53<00:00,  1.55s/it]


{'mean_iou': 0.19310884108787235, 'accuracy': 0.8445811181434599, 'f1_score': 0.839494745385048, 'class_iou': [0.0, 0.6101475659231544, 0.0031461536980999312, 0.026887461420761018, 0.002333143788986656, 4.208033404607587e-05, 0.4995293403954374, 0.8428774556484061, 0.04732471705020927, 0.0008013488699387541, 0.0395954850502689, 0.0, 0.0, 0.013094932634762203, 0.15032935822903623, 0.06130080640401789, 0.6070629753297718, 0.5714863148048053]}
Running experiment with config: {'backbone': 'resnet50', 'output_layer_high': 'layer3', 'output_layer_low': 'layer1'}


  model.load_state_dict(torch.load(model_path))
Testing: 100%|██████████| 267/267 [06:03<00:00,  1.36s/it]


{'mean_iou': 0.1928018330619099, 'accuracy': 0.8464796977555087, 'f1_score': 0.8411083588394737, 'class_iou': [0.0, 0.6054698931002198, 0.0, 0.026803288055493563, 0.001039225143950104, 0.0, 0.5249150655917117, 0.850248274048556, 0.020234899963034212, 0.0, 0.030847333905563226, 0.0, 0.0, 0.020061711949647766, 0.14419808665454964, 0.05241936681813284, 0.6147813909156281, 0.5794144589678912]}
Running experiment with config: {'backbone': 'resnet101', 'output_layer_high': 'layer3', 'output_layer_low': 'layer1'}


  model.load_state_dict(torch.load(model_path))
Testing: 100%|██████████| 267/267 [06:53<00:00,  1.55s/it]


{'mean_iou': 0.19945503354982563, 'accuracy': 0.8532773258028599, 'f1_score': 0.8458698779788422, 'class_iou': [0.0, 0.6465552159972368, 0.0, 0.032013878437491254, 0.00030508743389685996, 0.0, 0.5339058524221552, 0.8520785341515528, 0.015775702300860592, 0.0, 0.03573066912369483, 0.0, 0.0, 0.0170214526311022, 0.1773166136526842, 0.07790119778358737, 0.6188645346354121, 0.5827218653271867]}
Testing Complete. Results saved to experiment_results_test_set.json


#### (OPTIONAL) Check the prediction set class distribution

In [None]:
import re
 
def get_all_unique_pixel_values(directory_path):
    unique_values_set = set()
    # Iterate over each file in the directory
    for filename in tqdm(os.listdir(directory_path), desc=f"Processing Images"):
        # Construct the full file path
        file_path = os.path.join(directory_path, filename)
        # Check if the file is an image file
        if filename.endswith(".png"):
            # Open the image
            image = Image.open(file_path)
            # Convert the image to a NumPy array
            image_array = np.array(image)
            # Get unique pixel values and add them to the set
            unique_values = np.unique(image_array)
            unique_values_set.update(unique_values)
    return unique_values_set

unique_pixel_values = get_all_unique_pixel_values(PREDICTIONS_LABELS)
# Print the unique pixel values across all images
print(f"Unique pixel values across all predictions: {unique_pixel_values}")

Processing Images: 100%|██████████| 2133/2133 [03:26<00:00, 10.35it/s]

Unique pixel values across all predictions: {2, 4, 5, 7, 8, 9, 11, 14, 15, 16, 17, 18}





In [None]:
def get_all_unique_pixel_values2(directory_path):
    unique_values_set = set()
    # Iterate over each file in the directory
    for idx in tqdm(range(len(directory_path)), desc=f"Processing Images"):
        # Construct the full file path
        label_path = test_paths.iloc[idx, 2]
        image = Image.open(f"{WILDSCENES_PATH}/{label_path}").convert("RGB")
        image_array = np.array(image)
        # Get unique pixel values and add them to the set
        unique_values = np.unique(image_array)
        unique_values_set.update(unique_values)
    return unique_values_set

test_paths = pd.read_csv(TEST_CSV)
unique_pixel_values = get_all_unique_pixel_values2(test_paths)
# Print the unique pixel values across all images
print(f"Unique pixel values across same GT labels (indexLabel for test.csv): {unique_pixel_values}")

Processing Images: 100%|██████████| 2133/2133 [02:55<00:00, 12.17it/s]

Unique pixel values across same GT labels (indexLabel for test.csv): {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18}



