## RetinaNet Implementation

**Please Note:** For better usability `Training` and `Testing` sections should be placed in seperate source files. Furtheremore some functions like `getModel` are defined twice. Once in each section.

### Imports

For this project, to install PyTorch the command below was used since it download `PyTroch with Cuda compatability` which allowed the model to run on GPU instead of CPU which `helped decrease execution time`.

In [None]:
!pip install pillow
!pip install matplotlib
!pip install seaborn
!pip install scikit-learn
!pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 --no-cache-dir

In [1]:
import os
import numpy as np
import torch
import torchvision
from torch.utils.data import DataLoader
from torchvision.datasets import VisionDataset
from torchvision.models.detection import RetinaNet_ResNet50_FPN_Weights
from torchvision.transforms import functional
import xml.etree.ElementTree as ET
from PIL import Image
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import precision_recall_curve, auc, confusion_matrix

%matplotlib inline

## Training

### Helper Functions

The code in this section helps prepare data and configuring a RetinaNet model. The `CLASS_MAPPING` dictionary assigns numerical IDs to object classes, mapping them as follows: `Background to 0`, `Mixed Waste - Black Bag to 1`, `Organic Waste - White Bag to 2`, `Other to 3`, and `Recycled Waste - Grey or Green Bag to 4`. This mapping is needed for converting textual class labels into numerical IDs required by the model.

Furthermore, `get_retinanet_model` function initializes a pre-trained RetinaNet model with default weights (RetinaNet_ResNet50_FPN_Weights.DEFAULT) and customizes it for the number of classes in the dataset. The function also updates the model's classification head by configuring its output layer to handle the specified number of classes and adjusts the num_classes attribute for dataset compatability.

Additionally, the `VOCTransform` class processes images and their annotations from the VOC dataset format. It converts images to PyTorch tensors and parses  the bounding boxes and class labels from the XML files. Bounding box coordinates are validated and class labels are mapped to numerical IDs using the CLASS_MAPPING dictionary. The transformed data is returned as PyTorch-compatible tensors.

Lastly for this section. The `VOCDataset` class is designed to load a dataset following the VOC format. When called, it sorts the fils into image files (.jpg) and their corresponding annotation files (.xml). The `__getitem__` method retrieves an image and its annotation based on an index, iterates through the XML file to extract bounding box coordinates and labels, and applies optional transformations. The `__len__` method returns the total number of samples in the dataset.

In [2]:
# Class Mapping
CLASS_MAPPING = {
    "Background": 0,
    "Mixed Waste -Black Bag-": 1,
    "Organic Waste -White Bag-": 2,
    "Other": 3,
    "Recycled Waste -Grey or Green Bag-": 4
}


# * ##############################################################################################################


# Function to get RetinaNet model
def get_retinanet_model(num_classes):
    # Load pre-trained RetinaNet model
    model = torchvision.models.detection.retinanet_resnet50_fpn(weights=RetinaNet_ResNet50_FPN_Weights.DEFAULT)

    # Update the number of classes in the classification head
    in_features = model.head.classification_head.cls_logits.in_channels
    num_anchors = model.head.classification_head.num_anchors
    
    # Update classification head
    model.head.classification_head.cls_logits = torch.nn.Conv2d(
        in_features, num_anchors * num_classes, kernel_size=3, stride=1, padding=1
    )
    
    # Update number of classes
    model.head.classification_head.num_classes = num_classes
    
    # Return model
    return model

# Class to transform VOC XML annotations to PyTorch tensors
class VOCTransform:
    def __call__(self, image, target):
        # Convert PIL image to tensor
        image = functional.to_tensor(image)

        # Parse bounding boxes and labels from XML
        boxes = []
        labels = []
        
        # Extract bounding boxes and labels from XML (target)
        for obj in target['annotation']['object']:
            bbox = obj['bndbox']
            xmin = float(bbox['xmin'])
            ymin = float(bbox['ymin'])
            xmax = float(bbox['xmax'])
            ymax = float(bbox['ymax'])

            # Validate bounding boxes
            if xmax <= xmin or ymax <= ymin:
                print(f"Invalid bounding box detected: {bbox}")
                continue
            
            # Append bounding boxes
            boxes.append([xmin, ymin, xmax, ymax])

            # Map class name to numerical ID
            class_name = obj['name'] # get class name
            
            # If class name is in the mapping
            if class_name in CLASS_MAPPING:
                # Append the numerical ID
                labels.append(CLASS_MAPPING[class_name])
            else:
                # Error
                raise ValueError(f"Unknown class name: {class_name}")

        # Filter out invalid bounding boxes before creating tensors
        if len(boxes) == 0 or len(labels) == 0:
            print(f"Warning: Skipping image due to invalid annotations.")
            return None, None

        # Convert target to PyTorch tensors
        target = {
            "boxes": torch.tensor(boxes, dtype=torch.float32),
            "labels": torch.tensor(labels, dtype=torch.int64),
        }
        
        # Return transformed image and target
        return image, target

# Class to load VOC dataset
class VOCDataset(VisionDataset):
    def __init__(self, root, transforms=None):
        super().__init__(root, transforms=transforms)   # Initialise parent class
        self.images = sorted([f for f in os.listdir(root) if f.endswith(".jpg")])   # Get sorted list of all images in dataset
        self.annotations = sorted([f for f in os.listdir(root) if f.endswith(".xml")]) # Get sorted list of all annotations in dataset

    def __getitem__(self, index):
        image_path = os.path.join(self.root, self.images[index])    # Get image path
        annotation_path = os.path.join(self.root, self.annotations[index]) # Get annotation path

        # Load image
        image = functional.to_pil_image(torchvision.io.read_image(image_path)) # Read image and convert to PIL image

        # Load annotation
        tree = ET.parse(annotation_path)
        root = tree.getroot() 

        # Parse annotation into dictionary
        target = {"annotation": {}}             # Empty dictionary to store annotation key
        target['annotation']['object'] = []     # Empty list to store objects
        
        # Iterate over all 'object' elements in the XML root
        for obj in root.findall('object'):
            
            # Create dictionary for each object with name and bounding box coordinates
            obj_struct = {
                "name": obj.find("name").text,
                "bndbox": {
                    "xmin": obj.find("bndbox/xmin").text,
                    "ymin": obj.find("bndbox/ymin").text,
                    "xmax": obj.find("bndbox/xmax").text,
                    "ymax": obj.find("bndbox/ymax").text,
                },
            }
            
            # Append object dictionary to the target dictionary
            target['annotation']['object'].append(obj_struct)

        # Apply transformations
        if self.transforms is not None:
            image, target = self.transforms(image, target)
            
            # If image or target is invalid
            if image is None or target is None:
                # Skip invalid samples
                return self.__getitem__((index + 1) % len(self))

        # Return image and target
        return image, target

    def __len__(self):
        # Return number of samples in dataset
        return len(self.images)

### Training One Epoch Function

The `train_one_epoch` function trains the RetinaNet model for one epoch using the provided `data_loader`, `optimizer`, and `device`. It sets the model to `training mode` with `model.train()` and initializes total_loss to track the cumulative loss.

For each batch, images and targets are moved to the specified device, and the model calculates the losses. The losses are summed, and any NaN values trigger an early exit with an error message. Gradients are then reset using optimizer.zero_grad(), and backpropagation is performed with losses.backward(). To prevent exploding gradients, they are clipped to a maximum norm of 10.0, followed by updating the model parameters with optimizer.step().

The average loss over all batches in the epoch is then calculated and displayed.

In [3]:
# Function to train one epoch
def train_one_epoch(model, optimizer, data_loader, device, epoch):
    # Set model to training mode
    model.train()
    # Initialize total loss
    total_loss = 0

    # Iterate over all batches and extract images and targets
    for batch_idx, (images, targets) in enumerate(data_loader):
        # Output progress
        print(f"Epoch: {epoch + 1} | Batch: {batch_idx + 1}/{len(data_loader)}", end='\r')

        # Move images and targets to device
        images = [img.to(device) for img in images]
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

        # Forward pass
        loss_dict = model(images, targets)
        losses = sum(loss for loss in loss_dict.values())

        # Debugging loss incase of NaN values
        if torch.isnan(losses).any():
            print(f"NaN Detected in Loss at Epoch {epoch + 1}, Batch {batch_idx + 1}")
            return
        
        # Update total loss
        total_loss += losses.item()

        # Perform backpropagation and optimizer step
        optimizer.zero_grad()   # Reset gradients for all model parameters to zero
        losses.backward()       # Compute gradients by performing backpropagation using the loss

        # Clip gradients to a maximum norm of 10.0 prevent exploding gradients
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=10.0)
        optimizer.step()        # Update model parameters using the computed gradients

    # Calculate average loss over all batches and display
    average_loss = total_loss / len(data_loader)
    print(f"Epoch: {epoch + 1} | Batch: {len(data_loader)}/{len(data_loader)} | Average Loss (over all batches in epoch): {average_loss:.3f}")

### Training Main Pipeline

#### Loading Dataset and Creating DataLoader

The `train_dataset` is an instance of the VOCDataset class comprised of images and targets. The transforms=VOCTransform() parameter applies the VOCTransform class to preprocess the images and annotations, converting them into PyTorch-compatible tensors. This dataset serves as the source for the DataLoader during training.

The `train_loader` loads data from the train_dataset, with each batch containing 8 samples (batch_size=8). The shuffle=True parameter ensures the data is randomized at the start of each epoch to improve generalization and the collate_fn=lambda x: tuple(zip(*x)) organizes each batch into separate lists of images and targets, making them compatible with the model's input requirements. This setup allows for smooth and efficient data handling during training.

In [4]:
# Load Dataset and Create DataLoader
print("----- <Loading Training Dataset> -----")
train_dir = "./trashy-dataset-roboflow.voc/train"

train_dataset = VOCDataset(train_dir, transforms=VOCTransform())
train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True, collate_fn=lambda x: tuple(zip(*x)))
print(f"----- <Training Dataset Loaded Successfully - {len(train_dataset)} Training Samples> -----\n")

----- <Loading Training Dataset> -----
----- <Training Dataset Loaded Successfully - 1215 Training Samples> -----



#### Load RetinaNet Architecture Model

This code initializes the RetinaNet model for 5 classes (including the background) using `get_retinanet_model`. It checks for GPU availability and sets the device to cuda if available; otherwise, it defaults to cpu. The model is then moved to the chosen device, ensuring compatibility with the hardware for training or inference.

In [5]:
# Number of classes (including background)
num_classes = 5  # Background, Mixed Waste, Organic Waste, Other, Recycled Waste

# Get RetinaNet model
print("----- <Loading RetinaNet Model Architecture> -----")
model = get_retinanet_model(num_classes)

# Device configuration
if torch.cuda.is_available():
    device = torch.device("cuda")
    # output message that GPU is available and display the device name
    print(f"{torch.cuda.get_device_name(0)} Found - Moving to GPU")
else:
    device = torch.device("cpu")
    # output message that GPU is not available
    print("GPU Not Found - Moving to CPU")
    
# Move model to device
model.to(device)
print("----- <Model Loaded and Moved Successfully> -----\n")

----- <Loading RetinaNet Model Architecture> -----
NVIDIA GeForce RTX 3070 Ti Laptop GPU Found - Moving to GPU
----- <Model Loaded and Moved Successfully> -----



#### Define Hyperparameters and Training Loop

This code defines the training process for the RetinaNet model. An `SGD optimizer` is configured with a learning rate of 0.001, momentum of 0.9, and weight decay of 0.0005, while a `StepLR scheduler` reduces the learning rate by a factor of 0.1 every 3 epochs. 

The training loop runs for 100 epochs, calling `train_one_epoch` to train the model for each epoch and updating the learning rate using the scheduler. At the final epoch, the model's weights are saved to a specified directory, creating it if it doesn't already exist.

In [6]:
# Optimizer and Scheduler
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=0.0005)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1) # Reduce learning rate by a factor of 0.1 every 3 epochs

# Training loop
print("----- <Training Model> -----")

num_epochs = 100
model_save_path = "./RetinaNet-Weights"

# Iterate over all epochs
for epoch in range(num_epochs):
    # Train epoch
    train_one_epoch(model, optimizer, train_loader, device, epoch)
    
    # Update learning rate
    scheduler.step()

    # If last epoch
    if epoch + 1 == num_epochs:
        # Create directory if it doesn't exist
        if not os.path.exists(model_save_path):
            os.makedirs(model_save_path)
            
        # Save model
        torch.save(model.state_dict(), os.path.join(model_save_path, f"retinanet_epoch_{epoch + 1}.pth"))
        print(f"\nModel saved at epoch {epoch + 1} in {model_save_path}")
        
# Completion message
print("----- <Model Trained Successfully> -----\n")

----- <Training Model> -----
Epoch: 1 | Batch: 152/152 | Average Loss (over all batches in epoch): 39.003
Epoch: 2 | Batch: 152/152 | Average Loss (over all batches in epoch): 0.525
Epoch: 3 | Batch: 152/152 | Average Loss (over all batches in epoch): 0.443
Epoch: 4 | Batch: 152/152 | Average Loss (over all batches in epoch): 0.383
Epoch: 5 | Batch: 152/152 | Average Loss (over all batches in epoch): 0.371
Epoch: 6 | Batch: 152/152 | Average Loss (over all batches in epoch): 0.363
Epoch: 7 | Batch: 152/152 | Average Loss (over all batches in epoch): 0.356
Epoch: 8 | Batch: 152/152 | Average Loss (over all batches in epoch): 0.355
Epoch: 9 | Batch: 152/152 | Average Loss (over all batches in epoch): 0.355
Epoch: 10 | Batch: 152/152 | Average Loss (over all batches in epoch): 0.354
Epoch: 11 | Batch: 152/152 | Average Loss (over all batches in epoch): 0.354
Epoch: 12 | Batch: 152/152 | Average Loss (over all batches in epoch): 0.354
Epoch: 13 | Batch: 152/152 | Average Loss (over all bat

## Testing

### Helper Functions

The code in this section helps prepare data and configuring a RetinaNet model. The `CLASS_MAPPING` dictionary assigns numerical IDs to object classes, mapping them as follows: `Background to 0`, `Mixed Waste - Black Bag to 1`, `Organic Waste - White Bag to 2`, `Other to 3`, and `Recycled Waste - Grey or Green Bag to 4`. This mapping is needed for converting textual class labels into numerical IDs required by the model.

Furthermore, `get_retinanet_model` function initializes a pre-trained RetinaNet model with default weights (RetinaNet_ResNet50_FPN_Weights.DEFAULT) and customizes it for the number of classes in the dataset. The function also updates the model's classification head by configuring its output layer to handle the specified number of classes and adjusts the num_classes attribute for dataset compatability.

The `preprocess_image` function prepares an image for inference. It opens the image from the given path, converts it to RGB, transforms it into a PyTorch tensor, adds a batch dimension, and moves it to the specified device (GPU or CPU).

The `load_ground_truth` function processes annotation files in the dataset directory. For each XML file, it gets the class labels from the annotations using the CLASS_MAPPING dictionary. These labels are associated with their corresponding image filenames in a dictionary. If an unrecognised class name is encountered, a warning is printed. The function returns a dictionary mapping image filenames to their ground truth labels.

In [7]:
# # Class Mapping - ALREADY DEFINED ABOVE
# CLASS_MAPPING = {
#     "Background": 0,
#     "Mixed Waste -Black Bag-": 1,
#     "Organic Waste -White Bag-": 2,
#     "Other": 3,
#     "Recycled Waste -Grey or Green Bag-": 4
# }


# * ######################################################################################################################


# # Function to get RetinaNet model - ALREADY DEFINED ABOVE
# def get_retinanet_model(num_classes):
#     # Load pre-trained RetinaNet model
#     model = torchvision.models.detection.retinanet_resnet50_fpn(weights=RetinaNet_ResNet50_FPN_Weights.DEFAULT)

#     # Update the number of classes in the classification head
#     in_features = model.head.classification_head.cls_logits.in_channels
#     num_anchors = model.head.classification_head.num_anchors
    
#     # Update classification head
#     model.head.classification_head.cls_logits = torch.nn.Conv2d(
#         in_features, num_anchors * num_classes, kernel_size=3, stride=1, padding=1
#     )
    
#     # Update number of classes
#     model.head.classification_head.num_classes = num_classes
    
#     # Return model
#     return model

# Function to preprocess image
def preprocess_image(img_path, device):
    # Open image
    img = Image.open(img_path).convert('RGB')
    
    # Convert image to tensor and add batch dimension
    img_tensor = functional.to_tensor(img).unsqueeze(0)
    
    # move image to device and return it
    return img_tensor.to(device)

# Function to load ground truth from XML files in dataset directory
def load_ground_truth(dataset_dir, CLASS_MAPPING):
    ground_truth = {}
    
    # Iterate through all file in dataset directory
    for file in os.listdir(dataset_dir):
        # Check if file is a XML (annotation) file
        if file.endswith('.xml'):
            # Pase filr and extract structure
            tree = ET.parse(os.path.join(dataset_dir, file))
            root = tree.getroot()  # Get the root element of the XML tree

            # Empty list for labels
            labels = []
            
            # Iterate through all objects in the XML file
            for obj in root.findall('object'):
                # Extract class name
                class_name = obj.find('name').text
                
                # Map class name to class ID and append to labels
                if class_name in CLASS_MAPPING:
                    labels.append(CLASS_MAPPING[class_name])
                else:
                    # Error
                    print(f"Warning: Unknown class '{class_name}' in {file}")
                    
            # Extract image name
            image_name = root.find('filename').text
            # Add labels to ground truth dictionary
            ground_truth[image_name] = labels
    
    # return ground truth dictionary
    return ground_truth

The `plot_precision_recall` function calculates and visualizes precision-recall curves for each class, along with their `AUC (Area Under Curve)` values. It uses predictions (y_scores) and true labels (y_true), skipping classes without positive examples. The precision, recall, and AUC for each class are plotted and saved as an image in the specified output_dir.

The `plot_multiclass_confusion_matrix` function generates a confusion matrix to evaluate the model's performance across classes. It maps numerical predictions and ground truth labels to class names using `CLASS_MAPPING`. The confusion matrix is computed, optionally normalized, and displayed as a heatmap with annotations. The plot is saved to the output directory, offering insights into classification accuracy.

In [8]:
# Function to plot precision-recall curve and calculate AUC
def plot_precision_recall(y_true, y_scores, num_classes, output_dir, CLASS_MAPPING):
    # Output progress
    print("Plotting Precision-Recall Curve and Calculating AUC", end='\r')
    
    # Convert dictionary keys to a list for consistent indexing
    class_names = list(CLASS_MAPPING.keys())

    # Set up the figure size for the plot
    plt.figure(figsize=(10, 8))

    # Initialize variables for micro-average
    y_true_flat = y_true.ravel()
    y_scores_flat = y_scores.ravel()

    # Compute the micro-average Precision-Recall curve
    precision_micro, recall_micro, _ = precision_recall_curve(y_true_flat, y_scores_flat)

    # Calculate the micro-average AUC
    micro_auc = auc(recall_micro, precision_micro)

    # Plot the micro-average curve
    plt.plot(recall_micro, precision_micro, linestyle='--', color='mediumblue', linewidth=2.5,
             label=f"Overall Micro-average (AUC = {micro_auc:.3f})")

    # Loop through each class and compute its precision-recall curve
    for class_id in range(num_classes):
        class_name = class_names[class_id]
        # Check if there are any positive examples for the class
        if np.sum(y_true[:, class_id]) == 0:
            print(f"Warning: No positive samples found for class '{class_name}'. Skipping.")
            continue

        # Compute precision, recall, and thresholds for the class
        precision, recall, _ = precision_recall_curve(y_true[:, class_id], y_scores[:, class_id])

        # Calculate the area under the precision-recall curve (AUC)
        pr_auc = auc(recall, precision)

        # Plot the precision-recall curve for the class, including the AUC in the label
        plt.plot(recall, precision, label=f"{class_name} (AUC = {pr_auc:.3f})")

    # Label the axes
    plt.xlabel("Recall")
    plt.ylabel("Precision")
    plt.title("RetinaNet - Precision-Recall Curve")
    # Add a legend to identify the curves by class
    plt.legend(loc="best")
    # Add a grid for better readability
    plt.grid()
    # Save the plot as an image file
    plt.savefig(f"{output_dir}/precision_recall_curve_with_auc.png")
    # Close the plot to free up memory
    plt.close('all')
    
    print("Plotting Precision-Recall Curve and Calculating AUC | Done")

# Function to plot confusion matrix
def plot_multiclass_confusion_matrix(y_true, y_pred, num_classes, output_dir, CLASS_MAPPING, normalize=False):
    # Output progress
    print("Plotting Confusion Matrix", end='\r')
    
    # Reverse CLASS_MAPPING to get class labels
    class_names = list(CLASS_MAPPING.keys())

    # Compute the confusion matrix
    cm = confusion_matrix(
        y_true.argmax(axis=1),
        y_pred.argmax(axis=1),
        labels=range(num_classes)
    )

    if normalize:
        cm = cm.astype('float') / (cm.sum(axis=1, keepdims=True) + 1e-10)
        cm = np.nan_to_num(cm)  # Replace NaN with 0 if division by 0 occurs

    plt.figure(figsize=(14, 10))

    # Create a heatmap for the confusion matrix
    heatmap = sns.heatmap(cm, 
                          annot=True, 
                          fmt='.2f' if normalize else 'd', 
                          cmap='Blues', 
                          xticklabels=class_names, 
                          yticklabels=class_names,
                          annot_kws={"size": 14})  # Annotation font size

    # Add titles and axis labels
    plt.title("RetinaNet - Confusion Matrix", fontsize=18)
    plt.xlabel("Predicted Labels", fontsize=14)
    plt.ylabel("True Labels", fontsize=14)
    # Rotate the x-axis labels for better visibility
    plt.xticks(rotation=30, ha='right', fontsize=12)
    plt.yticks(rotation=0, fontsize=12)
    # Add a color bar label
    colorbar = heatmap.collections[0].colorbar
    colorbar.set_label("Count" if not normalize else "Proportion", fontsize=12)
    # Automatically adjust the layout to avoid truncation
    plt.tight_layout()
    # Save the plot
    plt.savefig(os.path.join(output_dir, "confusion_matrix_multiclass.png"))
    plt.close('all')
    
    print("Plotting Confusion Matrix | Done")

### Function to Draw Bounding Boxes on Test Images

The `draw_bboxes` function draws predicted bounding boxes, labels, and confidence scores on an image and saves the result. It gets the bounding box coordinates, class labels, and prediction scores from the prediction object. For each valid prediction, the function draws a bounding box and annotates it with the class name and confidence score. The image is saved to the specified output_dir, and progress is displayed during the process. This function helps visually evaluate the model's predictions.

In [9]:
# Function to draw bounding boxes on images
def draw_bboxes(output_dir, image, image_name, prediction, fig_size, CLASS_MAPPING, saved_images_counter, total_images):
    boxes = prediction[0]['boxes'].cpu().numpy() # get predicted bounding boxes
    labels = prediction[0]['labels'].cpu().numpy() # get predicted labels
    scores = prediction[0]['scores'].cpu().numpy() # get predicted scores

    # Set a threshold for showing bounding boxes
    threshold = 0.35

    fig, ax = plt.subplots(figsize=fig_size)
    ax.imshow(image)

    # Draw bboxes
    for box, label, score in zip(boxes, labels, scores):
        # If score is below threshold, ignore
        if score > threshold:
            # Get box coordinates
            x_min, y_min, x_max, y_max = box
            # Get class name from mapping - Switching from IDs to class names
            class_name = CLASS_MAPPING.get(label)

            # Draw bbox
            ax.add_patch(
                plt.Rectangle(
                    (x_min, y_min), x_max - x_min, y_max - y_min,
                    fill=False, edgecolor='red', linewidth=2
                )
            )
            
            # Add class name and confidence score
            ax.text(
                x_min, y_min,
                f'{class_name} ({score:.3f})',
                color='blue',
                fontsize=10,
            )
    
    # Remove axis
    ax.axis('off')
    # Save image
    fig.savefig(f'{output_dir}/{image_name}.png', bbox_inches='tight', pad_inches=0)
    
    # Display progress
    if (saved_images_counter + 1) == (total_images - 1):
        print(f"Saved image {saved_images_counter + 1}/{total_images - 1}")
    else:
        print(f"Saved image {saved_images_counter + 1}/{total_images - 1}", end='\r')

### Main Pipeline

#### Parameters 

In [14]:
# Figure size
fig_size = (8, 8)
# Num of classes
numOfClasses = 5 # Trash, Mixed Waste -Black Bag-, Organic Waste -White Bag-, Other, Recycled Waste -Grey or Green Bag-
testing_dir = './trashy-dataset-roboflow.voc/test'  # Get testing directory
output_dir_images = './images'                      # Output directories for images
output_dir_plots = './plots'                        # Output directories for plots
saved_model_dir = './RetinaNet-Weights'             # Saved model directory
# saved images counter
saved_images_counter = 0

# Initialize lists for true labels and predicted scores
y_true_list = []
y_scores_list = []

#### Checking for Device Availability and Loading Weights From Trained Model

This code loads a trained RetinaNet model from a specified directory and moves it to the appropriate device (GPU or CPU). The model path is determined dynamically since during training only the last epoch is saved effectivly having only one model saved instead of multiple models, selecting the first file in the `saved_model_dir`. If a GPU is available, the model is moved to the GPU; otherwise, it is loaded onto the CPU. The model is initialized using `get_retinanet_model`, and its weights are loaded from the saved state dictionary. The model is then moved to the selected device, set to `evaluation mode with model.eval()`.

In [15]:
 # Load Model and Move to Device
print("\n----- <Loading Model> -----")
model_path = os.path.join(saved_model_dir, os.listdir(saved_model_dir)[0])

if torch.cuda.is_available():
    device = torch.device('cuda')
    print(f"GPU: {torch.cuda.get_device_name(0)} is available - moving model to GPU")
else:
    device = torch.device('cpu')
    print("No GPU available. Moving testing to CPU")

model = get_retinanet_model(numOfClasses)
state_dict = torch.load(model_path, weights_only=True, map_location=device)
model.load_state_dict(state_dict)
model.to(device)
model.eval()
print(f"----- <Model [{os.listdir(saved_model_dir)[0]}] Loaded and Moved Successfully> -----\n")


----- <Loading Model> -----
GPU: NVIDIA GeForce RTX 3070 Ti Laptop GPU is available - moving model to GPU
----- <Model [retinanet_epoch_100.pth] Loaded and Moved Successfully> -----



#### Testing Loop

This code in this sections tests the trained RetinaNet model on images from a testing directory. It starts by creating an output directory for saving annotated images. Ground truth labels are loaded using `load_ground_truth`, and for each test image, the following steps are performed:

- **Preprocessing:** The image is preprocessed into a tensor using preprocess_image and moved to the device.
- **Ground Truth Conversion:** True labels are converted into a one-hot encoded format for precision-recall calculations.
- **Prediction:** The model predicts bounding boxes, class labels, and scores for the image using torch.no_grad() for efficiency.
- **Score Collection:** Predicted scores are stored for later evaluation.
- **Visualization:** Bounding boxes and labels are drawn on the image using draw_bboxes, and the result is saved to the output directory.

The process iterates over all test images, saving predictions and displaying progress, with memory management ensured by closing all plots after each iteration.

In [16]:
print("----- <Testing Model> -----\n")
    
# Create output image directory
if not os.path.exists(output_dir_images):
    os.makedirs(output_dir_images)

# Get all images in testing directory
test_images = [img for img in os.listdir(testing_dir) if img.endswith(('.jpg', '.png', '.jpeg'))]

# Load ground truth
ground_truth = load_ground_truth(testing_dir, CLASS_MAPPING)

# Iterate through all images
for image in test_images:
    # Get ground truth labels for the image
    true_labels = ground_truth.get(image, [0])  # Default to background if no labels
    
    # Get image path
    img_path = os.path.join(testing_dir, image)
    # Preprocess image
    image_tensor = preprocess_image(img_path, device)
    
    # Convert to one-hot encoding for precision-recall computation
    true_one_hot = np.zeros(numOfClasses)
    
    for label in true_labels:
        true_one_hot[label] = 1

    y_true_list.append(true_one_hot)

    # Disable gradient computation for faster inference
    with torch.no_grad():
        # Get model prediction
        prediction = model(image_tensor)
        
        # Collect predicted scores
        pred_scores = np.zeros(numOfClasses)
        
        for label, score in zip(prediction[0]['labels'].cpu().numpy(), prediction[0]['scores'].cpu().numpy()):
            pred_scores[label] = max(pred_scores[label], score)

        # Add background scores if not explicitly included
        pred_scores[0] = 1 - np.sum(pred_scores[1:])  # Assume background is the complement of other scores

        # Append the predicted scores to the y_scores_list for later metric computation
        y_scores_list.append(pred_scores)

    # Draw bounding boxes on image and save result
    draw_bboxes(
        output_dir_images, 
        Image.open(img_path), 
        os.path.splitext(image)[0], 
        prediction, fig_size, 
        CLASS_MAPPING, 
        saved_images_counter, 
        total_images=len(test_images)
    )
    
    # Increment saved images counter
    saved_images_counter += 1

    # Closing all figures to free up memory
    plt.close('all')
        
print(f"Results Saved in {output_dir_images} Folder\n")

----- <Testing Model> -----

Saved image 48/48
Results Saved in ./images Folder



### Plot Metrics - Precision-Recall and Confusion Matrix 

This code finalises the testing process by evaluating the model's performance and saving visualization plots. It first creates an output directory for storing evaluation plots, if it doesn't already exist. The collected true labels `(y_true_list)` and predicted scores `(y_scores_list)` are converted to NumPy arrays for compatibility with plotting functions.

The `plot_precision_recall` function generates precision-recall curves and calculates the AUC for each class, saving the plots to the output directory. The `plot_multiclass_confusion_matrix` function creates a normalized confusion matrix to visualize classification performance across all classes. A completion message is printed to indicate the end of the testing phase.

In [17]:
# Add Output Directory for Plots
if not os.path.exists(output_dir_plots):
    os.makedirs(output_dir_plots)

# Convert scores and predictions to NumPy arrays
y_true_np = np.array(y_true_list)
y_scores_np = np.array(y_scores_list)

# Plot Precision-Recall Curve and AUC for each class
plot_precision_recall(y_true_np, y_scores_np, numOfClasses, output_dir_plots, CLASS_MAPPING)
# Plot Cconfusion Matrix
plot_multiclass_confusion_matrix(y_true_np, y_scores_np, numOfClasses, output_dir_plots, CLASS_MAPPING, normalize=True)

# Display completion message
print("\n----- <Testing Completed Successfully> -----\n")

Plotting Precision-Recall Curve and Calculating AUC | Done
Plotting Confusion Matrix | Done

----- <Testing Completed Successfully> -----

