# 1. Introduction

## 1.1. Goal
The goal of the project is to build a deep-learning model that is able to detect potholes.

## 1.2. Dataset
The pothole dataset contains 665 images annotated with bounding boxes around potholes.The annotations are in XML format(similar to Pascal VOC). There is a splits.json file in which training-test sets are defined.

- - -
- - -
# 2. Object proposals

## 2.1. Visualize the dataset
> Familiarise yourself with the data and visualize some examples with the ground-truth bounding boxes.

In [None]:
from utils.globalConst import *

In [None]:
from Potholes.DatasetAnalyzer import DatasetAnalyzer

dataset_analyzer = DatasetAnalyzer()
dataset_analyzer.visualize_samples('train', num_samples=6, rows=2, cols=3)
print(dataset_analyzer.visualize_image_with_annotations)

In [None]:
import os

# Count total images
train_images = len(os.listdir(TRAIN_DIR))
val_images = len(os.listdir(VAL_DIR))
test_images = len(os.listdir(TEST_DIR))
total_images = train_images + val_images + test_images

print(
    f'The dataset contains {total_images} images, with the following distribution:',
    f'- Training: {train_images} ({train_images / total_images * 100:.0f}%)',
    f'- Validation: {val_images} ({val_images / total_images * 100:.0f}%)',
    f'- Testing: {test_images} ({test_images / total_images * 100:.0f}%)',
    sep='\n'
)

In [None]:
# Get unique image sizes and aspect ratios
unique_sizes, unique_aspect_ratios = dataset_analyzer.get_unique_image_sizes_and_ratios()
print(f"Number of unique image sizes: {len(unique_sizes)}")
print(f"Number of unique image aspect ratios: {len(unique_aspect_ratios)}")

In [None]:
# Count total annotations and track objects per image
total_annotations, objects_per_image = dataset_analyzer.count_annotations()
print(f"Total number of pothole annotations: {total_annotations}")

In [None]:
max_objects, image_indexes = dataset_analyzer.find_images_with_max_objects(objects_per_image)
min_objects = min(objects_per_image.values())
print(f"Maximum number of objects in an image: {max_objects}")
print(f"Minimum number of objects in an image: {min_objects}")

In [None]:
print(f"Image(s) with the maximum number of potholes ({max_objects}):")
for idx in image_indexes: 
    dataset_analyzer.visualize_image_with_annotations('train', idx)

- - -
## 2.2. Calculate object proposals
> Extract object proposals for all the images of the dataset (e.g. Selecting Search, Edge Boxes, etc). 
> 
> ``` admonition
> Note that you may have to resize the images before you run SS for better efficiency. 
> ```

The goal is to generate candidate regions in each image that may contain potholes using object proposal algorithms.

**What are object proposals?**
Regions in an image that are likely to contain objects of interest. They allow to reduce the search space for the object detection models by focusing on promising areas.

Some moethods:

**Selective Search(SS):**
- It groups pixels based on color, texture, size and shape compatibility. Hierarchical grouping leads to region proposals.

**Edge Boxes:**
- Score boxes based on the number of enclosed eedges. It is faster than SS and generates high-quality proposals

How to extract the proposals?

1. Resize the images to spped up the process(specially with SS it is neccessary)
2. Implementation of the algorithm
3. Iterate over the dataset and aply the algo
4. Save the proposals as numpy arrays

In [None]:
from RegionProposals.ProposalExtractor import ProposalExtractor
proposal_extractor = ProposalExtractor()

Extract the proposals with both algorithms (unnecessary if the proposals are already extracted)

In [None]:
# proposal_extractor.extract_with('selective_search')
# proposal_extractor.extract_with('edge_boxes')

We can also plot an example of the generated proposals for an image in the dataset:

In [None]:
proposal_extractor.visualize_top_proposals(idx=EXAMPLE_IDX, alg="selective_search", top_n=10)
proposal_extractor.visualize_top_proposals(idx=EXAMPLE_IDX, alg="edge_boxes", top_n=10)

- - -
## 2.3. Proposals evaluation
> Evaluate the extracted proposals on the training set of the dataset and determine the number of required proposals

Two metrics produce valuable insights about the algorithms perofrmance in generating areas of interests.
- **Pascal-Recall:** The percentage of ground-truth objects that have at least one proposal overlapping with them at an IoU (Intersection over Union) greater than or equal to a threshold (typically 0.5). It measures how well the proposals cover the actual objects in the dataset.
- **MABO(Mean Average Best Overlap)::**  For each ground-truth object, find the proposal with the highest IoU. MABO is the average of these maximum IoUs over all ground-truth objects. It assesses the localization quality of the proposals.

In [None]:
from RegionProposals.ProposalEvaluator import ProposalEvaluator
proposal_evaluator = ProposalEvaluator()

In [None]:
# Define the number of proposals to evaluate
num_proposals_list = [1, 8, 64, 512, 2048]  
iou_threshold = 0.7

# Evaluate Selective Search
results_ss = proposal_evaluator.evaluate_proposals(
    alg='selective_search',
    num_proposals_list=num_proposals_list,
    iou_threshold=iou_threshold
)

# Evaluate Edge Boxes
results_eb = proposal_evaluator.evaluate_proposals(
    alg='edge_boxes',
    num_proposals_list=num_proposals_list,
    iou_threshold=iou_threshold
)

# Plot Pascal-Recall vs. Number of Proposals
proposal_evaluator.plot_metric_vs_proposals(
    num_proposals_list=num_proposals_list,
    metric_ss=results_ss['recalls'],
    metric_eb=results_eb['recalls'],
    metric_name='Pascal-Recall'
)

# Plot MABO vs. Number of Proposals
proposal_evaluator.plot_metric_vs_proposals(
    num_proposals_list=num_proposals_list,
    metric_ss=results_ss['mabo'],
    metric_eb=results_eb['mabo'],
    metric_name='MABO'
)

In [None]:
proposal_evaluator.display_ground_truth_and_proposals(idx=EXAMPLE_IDX, alg='selective_search', top_n=10)
proposal_evaluator.display_ground_truth_and_proposals(idx=EXAMPLE_IDX, alg='edge_boxes', top_n=10)

- - -
- - -
# 3. Data preparation

## 3.1. Proposal labeling
> Prepare the proposals for the training of the object detector. This requires assigning a label (i.e., class or background label) to each proposal.

Analyzes the IoU Threshold and marks as true the

A possible concern could have been that if let´s say three propsal with a high IoU with the ground trough higher than 0.5 are all labelled positive, we would have multiple propsals for each object. This could be a problem in a multiple class setup, where excesive positive proposals for the same object could result in class imbalance. Additionally a high number of overlapping proposals could introduce redundancy,making training less efficient.

On the other hand having multiple proposals covering covering the same object from different angles and positions can provide different training examples. It could serve as a sort of Data-Augmentation. It can also ensure that there is enough training samples for less frequent objects: such as for example a very small pothole.

Here we call the evaluator method for the assignment (unnecessary if already assigned)

In [None]:
# proposal_evaluator.assign_labels_to_proposals(k_1=0.3, k_2=0.7)

And then we plot the result for an example image:

In [None]:
proposal_evaluator.visualize_labeled_proposals(idx=EXAMPLE_IDX, alg='selective_search')
proposal_evaluator.visualize_labeled_proposals(idx=EXAMPLE_IDX, alg='edge_boxes')

- - -
## 3.2. Custom Data loader
> Build a dataloader for the object detection task. 
> 
> ```admonition
> Think about the class imbalance issue of the background proposals
> ```

For the data loader we need to define the batch size and the ratio between the number of potholes and background, to avoid the class imbalance issue.
Then, after defining the data transformations, we can build the data loader such that it takes the bounding box proposals as input and their generated label as target.

*See ProposalDataset class in ProposalDataset.py*

- - -
## 3.3. Handle class imbalance

1. The CrossEntropy fiunction can accept a weight parameter, which assigns a weight to each class. This makes the loss fucntion pay more attention to underrrepresented classes.
2. To compute_class_weights: weights= total_samples/(num_classes*count)
3. The WeightedRandomSampler allows the DataLoader to sample elements based on assigned weights, ensuring that each class is represented proportionally during training.

Basically Instead of using a fixed ratio of 75-25 it calculates the ratios by counting the negative and positive classes. Then this is 

In [None]:
from collections import Counter
import torch

def compute_class_weights(dataset, num_classes):
    labels = [sample['label'] for sample in dataset.samples]
    label_counts = Counter(labels)
    total_samples = len(labels)
    class_weights = []

    for label in dataset.label_encoder.classes_:
        count = label_counts.get(label, 0)
        weight = total_samples / (num_classes * count) if count > 0 else 0
        class_weights.append(weight)

    class_weights = torch.tensor(class_weights, dtype=torch.float)
    return class_weights

In [None]:
from torch.utils.data import DataLoader, WeightedRandomSampler

def create_sampler(dataset, class_weights):
    labels = [dataset.label_encoder.transform([sample['label']])[0] for sample in dataset.samples]
    sample_weights = [class_weights[label] for label in labels]
    sampler = WeightedRandomSampler(sample_weights, num_samples=len(sample_weights), replacement=True)
    return sampler

In [None]:
import json
from torchvision import transforms
from sklearn.preprocessing import LabelEncoder

from RegionProposals.ProposalDataset import ProposalDataset

splits = json.load(open(f'./Potholes/splits.json', 'r'))
    
train_ids = splits.get('train', [])
val_ids = splits.get('val', [])
test_ids = splits.get('test', [])

label_encoder = LabelEncoder()

all_labels = []
for indices, split in [(train_ids, 'train'), (val_ids, 'val'), (test_ids, 'test')]:
    for idx in indices:
        labeled_proposals = json.load(open(f'{ROOT_DIR}/{split}/img-{idx}/selective_search.json')) 
        labels = [proposal['label'] for proposal in labeled_proposals]
        all_labels.extend(labels)
            
label_encoder.fit(all_labels)
num_classes = len(label_encoder.classes_)

transform = transforms.Compose([
    transforms.Resize((IMG_SIZE, IMG_SIZE)),
    transforms.ToTensor(),
])

train_dataset = ProposalDataset('train', label_encoder, transform=transform)
val_dataset = ProposalDataset('val', label_encoder, transform=transform)
test_dataset = ProposalDataset('test', label_encoder, transform=transform)

class_weights = compute_class_weights(train_dataset, num_classes)
sampler = create_sampler(train_dataset, class_weights)

- - -
- - -
# 4. Models and training

> Build a convolutional neural network to classify object proposals ($N+1$ classes)

In [None]:
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')

## 4.1. Custom CNN

In [None]:
import torch.nn as nn
import torch.optim as optim

from Models.CNN import CNN

num_epochs = 10
batch_size = 64
learning_rate = 1e-3

train_loader = DataLoader(train_dataset, batch_size=batch_size, sampler=sampler)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

model = CNN(device, num_classes)

class_weights_tensor = class_weights.to(device)
criterion = nn.CrossEntropyLoss(weight=class_weights_tensor)
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

model.train_(train_loader, val_loader, num_epochs, criterion, optimizer)

test_loss, test_acc = model.eval_(test_loader, criterion)
print(f'Test Loss: {test_loss:.4f}, Test Accuracy: {test_acc:.4f}')

torch.save(model.state_dict(), 'proposal_classifier.pth')
print("Model saved as 'proposal_classifier.pth'.")

model.plot_training_history()
model.plot_confusion_matrix(val_loader, label_encoder)

- - -
## 4.2. ResNet18

First we can import the ResNet18 pre-trained model and modify the last layer so that it only has two outputs, one for the 'pothole' class and one for the 'background' class.

In [None]:
from Models.ResNet18 import ResNet18

num_epochs = 2
batch_size = 32
learning_rate = 1e-3

train_loader = DataLoader(train_dataset, batch_size=batch_size, sampler=sampler)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

model = ResNet18(device, num_classes)

class_weights_tensor = class_weights.to(device)
criterion = nn.CrossEntropyLoss(weight=class_weights_tensor)
optimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=learning_rate)

model.train_(train_loader, val_loader, num_epochs, criterion, optimizer)

test_loss, test_acc = model.eval_(model, test_loader, criterion, device)
print(f'Test Loss: {test_loss:.4f}, Test Accuracy: {test_acc:.4f}')

torch.save(model.state_dict(), 'proposal_classifier_resnet.pth')
print("Model saved as 'proposal_classifier_resnet.pth'.")

model.plot_training_history()
model.plot_confusion_matrix(val_loader, label_encoder)

- - -
## 4.3. VGG16

We can do the same for the VGG16 model.

In [None]:
from Models.VGG16 import VGG16

num_epochs = 2
batch_size = 32
learning_rate = 1e-3

train_loader = DataLoader(train_dataset, batch_size=batch_size, sampler=sampler)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

model = VGG16(device, num_classes)

class_weights_tensor = class_weights.to(device)
criterion = nn.CrossEntropyLoss(weight=class_weights_tensor)
optimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=learning_rate)

model, history = model.train_(train_loader, val_loader, num_epochs, criterion, optimizer)

test_loss, test_acc = model.eval_(test_loader, criterion)
print(f'Test Loss: {test_loss:.4f}, Test Accuracy: {test_acc:.4f}')

torch.save(model.state_dict(), 'proposal_classifier_vgg16.pth')
print("Model saved as 'proposal_classifier_vgg16.pth'.")

model.plot_training_history()
model.plot_confusion_matrix(val_loader, label_encoder)

- - -
- - -
# 5. Evaluation
> Evaluate the classification accuracy of the network on the validation set.
> 
> ```admonition
> Note that this is different from the evaluation of the object detection task.
> ```

- - -