# CV - Assignment 2

Alexandre Vilhena da Costa

Tiago Marques Claro

## **Introduction**

This work aims to develop a model capable of detecting sacrificial anodes in underwater environments. The dataset used in this assignment was captured in two distinct settings: an indoor pool with minimal noise and clear images (figure 1), and the sea, where the presence of biomatter and suspended particles poses a significant challenge for detecting anodes (figure). Additionally, as depth increases, the absorption of light begins to filter out certain wavelengths, resulting in images dominated by green and blue hues [1].

To address these challenges, this work begins by pre-processing the dataset to enhance image quality and splitting it into training, validation, and testing subsets. Subsequently, we design and implement Neural Networks, which are trained, validated, and tested using this dataset. Finally, we present a discussion of the results, including a comparison with state-of-the-art models.

<div style="text-align: center;">
  <img src="266.png" alt="Underwater Image" width="400"/>
  <p><em>Figure 1: Pool Image</em></p>
</div>


<div style="text-align: center;">
  <img src="5482.png" alt="Underwater Image" width="400"/>
  <p><em>Figure 2: Underwater Image</em></p>
</div>

[1] Zhou, J., Yang, T. & Zhang, W. Underwater vision enhancement technologies: a comprehensive review, challenges, and recent trends. Appl Intell 53, 3594–3621 (2023). https://doi.org/10.1007/s10489-022-03767-y


## **Dataset Preprocessing**

In [None]:
import os
import shutil
import random
import cv2
from tqdm import tqdm
import matplotlib.pyplot as plt

### Definition of the paths to the organized dataset

Firstly we created our organized dataset in the following folder structure:

organized_dataset/           
├── images/                     
│  	 ├── train/                   
│    ├── val/                    
│	 └── test/                 
├── labels/                     
│  	 ├── train/                   
│	 ├── val/                     
│  	 └── test/                   


In [None]:
# Define paths
datasets_root = "datasets"  # Root folder containing the subfolders
output_dir = "organized_dataset"  # Root directory for the organized dataset
images_dir = os.path.join(output_dir, "images")
labels_dir = os.path.join(output_dir, "labels")
train_dir = os.path.join(images_dir, "train")
val_dir = os.path.join(images_dir, "val")
test_dir = os.path.join(images_dir, "test")
train_labels_dir = os.path.join(labels_dir, "train")
val_labels_dir = os.path.join(labels_dir, "val")
test_labels_dir = os.path.join(labels_dir, "test")

# Create output directories for images and labels
os.makedirs(train_dir, exist_ok=True)
os.makedirs(val_dir, exist_ok=True)
os.makedirs(test_dir, exist_ok=True)
os.makedirs(train_labels_dir, exist_ok=True)
os.makedirs(val_labels_dir, exist_ok=True)
os.makedirs(test_labels_dir, exist_ok=True)

### Split the images and labels into the training, validation and testing datasets

In [None]:
# Collect all images and labels
image_label_pairs = []
for folder in os.listdir(datasets_root):
    folder_path = os.path.join(datasets_root, folder)
    if os.path.isdir(folder_path):
        images_path = os.path.join(folder_path, "images")
        labels_path = os.path.join(folder_path, "labels")

        if os.path.exists(images_path) and os.path.exists(labels_path):
            images = os.listdir(images_path)
            for image in images:
                label = image.replace(".jpg", ".txt").replace(".png", ".txt")  # Adjust for label naming
                image_path = os.path.join(images_path, image)
                label_path = os.path.join(labels_path, label)

                if os.path.exists(image_path) and os.path.exists(label_path):
                    image_label_pairs.append((image_path, label_path))

# Shuffle dataset
random.shuffle(image_label_pairs)

# Split into train, test, and validation sets
train_ratio = 0.7
val_ratio = 0.2
test_ratio = 0.1

total_size = len(image_label_pairs)
train_size = int(total_size * train_ratio)
val_size = int(total_size * val_ratio)

train_set = image_label_pairs[:train_size]
val_set = image_label_pairs[train_size:train_size + val_size]
test_set = image_label_pairs[train_size + val_size:]

### Application of CLAHE

Underwater images often lose red light frequencies at depths of less than 10 meters, resulting in a predominantly bluish-green appearance. To address this, we applied the CLAHE (Contrast Limited Adaptive Histogram Equalization) technique. This method enhances the image histogram, improving contrast and making it easier to identify the anodes. The following image illustrates this approach.

<div style="text-align: center;">
  <img src="histogram_equalization.png" alt="Underwater Image" width="400"/>
  <p><em>Figure 3: Contrast Limited Adaptive Histogram Equalization</em></p>
</div>

In [None]:
# Function to apply CLAHE processing to an image
def apply_clahe(image_path):
    image = cv2.imread(image_path)
    if image is None:
        raise ValueError(f"Could not read image at {image_path}")

    # Convert to LAB color space
    lab = cv2.cvtColor(image, cv2.COLOR_BGR2LAB)
    l, a, b = cv2.split(lab)

    # Apply CLAHE to the L-channel
    clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
    l_clahe = clahe.apply(l)

    # Merge channels and convert back to BGR
    lab_clahe = cv2.merge((l_clahe, a, b))
    processed_image = cv2.cvtColor(lab_clahe, cv2.COLOR_LAB2BGR)
    return processed_image

The image below shows the results on three different samples, where the contrast has been enhanced. This improvement allows the CNN to more effectively detect the anodes.

<div style="text-align: center;">
  <img src="Pasted image.png" alt="Underwater Image" width="800"/>
  <p><em>Figure 4: Application of CLAHE</em></p>
</div>

### Save the images of each dataset into the respective folders

In [None]:
def copy_and_process_files(file_pairs, image_dest, label_dest, prefix):
    for i, (image_path, label_path) in enumerate(tqdm(file_pairs, desc=f"Processing {prefix} set")):
        unique_id = f"{prefix}_{i}"
        new_image_name = f"{unique_id}.jpg"
        new_label_name = f"{unique_id}.txt"

        # Apply CLAHE processing
        processed_image = apply_clahe(image_path)

        # Save the processed image
        processed_image_path = os.path.join(image_dest, new_image_name)
        cv2.imwrite(processed_image_path, processed_image)

        # Copy the label file
        shutil.copy(label_path, os.path.join(label_dest, new_label_name))

# Copy and process train, validation, and test sets
copy_and_process_files(train_set, train_dir, train_labels_dir, "train")
copy_and_process_files(val_set, val_dir, val_labels_dir, "val")
copy_and_process_files(test_set, test_dir, test_labels_dir, "test")

print(f"Dataset organized and processed successfully!")
print(f"Train set: {len(train_set)} images")
print(f"Validation set: {len(val_set)} images")
print(f"Test set: {len(test_set)} images")


## **CNN implementation**

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
from torchvision.transforms import ToTensor
import numpy as np
import cv2
import os
import albumentations as A
from albumentations.pytorch import ToTensorV2
import matplotlib.pyplot as plt
from tqdm import tqdm
import shutil

### Dataset Class

To train and test our model, we first need to create a custom class for our dataset that converts the provided information into an image and its corresponding bounding box. For this purpose, we developed the AnodeDataset class, which takes the following parameters: the images directory, the annotations directory, the image size, and the transformations to be applied.

The image size parameter is used to resize the input images, enabling a more efficient training process. The transform parameter defines the data augmentation procedures, like changes in contrast, saturation, etc... .

In [None]:
# Dataset Class
class AnodeDataset(Dataset):
    def __init__(self, image_dir, annotation_dir, img_size, transform=None):

        self.image_dir = image_dir
        self.annotation_dir = annotation_dir
        self.img_size = img_size
        self.transform = transform

        # Match images and annotations by filenames
        self.image_files = sorted([f for f in os.listdir(image_dir) if f.endswith((".jpg", ".png"))])
        self.annotation_files = sorted([f for f in os.listdir(annotation_dir) if f.endswith(".txt")])
        assert len(self.image_files) == len(self.annotation_files), "Mismatch in images and annotations count."


    def __len__(self):
        return len(self.image_files)

    def __getitem__(self, idx):

		# Load image
        img_path = os.path.join(self.image_dir, self.image_files[idx])
        img = cv2.imread(img_path)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)  # Convert to RGB

        # Resize image
        img = cv2.resize(img, (self.img_size, self.img_size))

        # Load annotation
        anno_path = os.path.join(self.annotation_dir, self.annotation_files[idx])

		# If annotation file is empty, return an empty target
        if os.path.exists(anno_path) and os.path.getsize(anno_path) > 0:
            with open(anno_path, "r") as f:
                line = f.readline().strip()
            if line:  # File is not empty and has valid content
                _, x_center, y_center, width, height = map(float, line.split())
                bboxes = [[x_center, y_center, width, height]]
            else:
                bboxes = []
        else:
            bboxes = []

		# Apply augmentations
        if self.transform and bboxes:
            transformed = self.transform(image=img)
            img = transformed["image"]

		# Normalize image and convert to tensor
        img = img / 255.0
        img = torch.tensor(img, dtype=torch.float32).permute(2, 0, 1)

        # If there are no bounding boxes, return a dummy target
        if not bboxes:
            target = torch.tensor([0, 0, 0, 0], dtype=torch.float32)
        else:
            target = torch.tensor(bboxes[0], dtype=torch.float32)  # Assuming one bounding box per image

        return img, target

### Model Definition

In [None]:
# Model Definition
class NeuralNetwork(nn.Module):
    def __init__(self, img_size):
        super(NeuralNetwork, self).__init__()
        self.img_size = img_size
        self.backbone = nn.Sequential(
            nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2)
        )
        feature_size = 256 * (img_size // 32) * (img_size // 32)
        self.fc = nn.Sequential(
            nn.Flatten(),
            nn.Linear(feature_size, 128),
            nn.ReLU(),
            nn.Dropout(0.1),    # Dropout layer to prevent overfitting
            nn.Linear(128, 4),  # 4 for bbox
            nn.Sigmoid()        # Sigmoid activation for values in [0, 1]
        )

    def forward(self, x):
        x = self.backbone(x)
        x = self.fc(x)
        return x


### Loss Function and IoU calculation

In [None]:
# Loss Function
def loss_function(predictions, targets):
    pred_boxes = predictions[:, :4]  # x_center, y_center, width, height
    target_boxes = targets[:, :4]

    bbox_loss = nn.L1Loss()(pred_boxes, target_boxes)

    return bbox_loss

# Function to calculate Intersection over Union (IoU)
def calculate_iou(pred_box, gt_box):
    x1, y1, x2, y2 = pred_box
    gx1, gy1, gx2, gy2 = gt_box

    # Compute intersection area
    inter_x1 = max(x1, gx1)
    inter_y1 = max(y1, gy1)
    inter_x2 = min(x2, gx2)
    inter_y2 = min(y2, gy2)

    inter_width = max(0, inter_x2 - inter_x1)
    inter_height = max(0, inter_y2 - inter_y1)
    intersection_area = inter_width * inter_height

    # Compute areas
    pred_area = (x2 - x1) * (y2 - y1)
    gt_area = (gx2 - gx1) * (gy2 - gy1)

    # Compute IoU
    union_area = pred_area + gt_area - intersection_area
    iou = intersection_area / union_area if union_area > 0 else 0
    return iou

### Model Saver

In [None]:
def save_model(model, num_epochs):

    model_name = f"anode_detector_{num_epochs}.pth"
    os.makedirs(f"models/{num_epochs}", exist_ok=True)
    torch.save(model.state_dict(), f"models/{num_epochs}/{model_name}")

	# Save the model neural network architecture in a txt file
    with open(f"models/{num_epochs}/model_architecture.txt", "w") as f:
	    f.write(str(model))

def makedirs_clean(name, mode=0o777, exist_ok=True):
    if os.path.exists(name):
        if exist_ok:
            shutil.rmtree(name)  # Remove the existing directory and its contents
        else:
            raise OSError(f"Directory {name} already exists.")
    os.makedirs(name, mode=mode, exist_ok=exist_ok)

### Training Loop

In [None]:
# Training Loop
def train_model(model, dataloader, dataloader_val, optimizer, device, num_epochs=10):
    model.to(device)
    model.train()

    best_accuracy = 0.0
    best_loss = 0.0
    best_epoch = 0

    # Create directory for the model cleaning it if already exists
    makedirs_clean(f"models/{num_epochs}")

    for epoch in range(num_epochs):

        train_loss = 0.0
        train_acc = 0.0
        invalid_files = 0

        for images, targets in tqdm(dataloader, desc=f"Epoch {epoch+1}/{num_epochs}"):

            images, targets = images.to(device), targets.to(device)

            # Skip images with incorrect annotations
            if torch.equal(targets, torch.tensor([0, 0, 0, 0], dtype=torch.float32).to(device)):
                invalid_files += 1
                continue

            predictions = model(images)

            loss = loss_function(predictions, targets)
            train_loss += loss.item()
            optimizer.zero_grad()


            loss.backward()
            optimizer.step()

            # Calculate and accumulate accuracy metric across all batches
            iou = 0.0
            for i in range(len(predictions)):
                pred_bbox = predictions[i, :4]
                target_bbox = targets[i, :4]
                # convert values to (xmin, ymin, xmax, ymax)
                pred_bbox = (pred_bbox[0] - pred_bbox[2] / 2, pred_bbox[1] - pred_bbox[3] / 2,
                            pred_bbox[0] + pred_bbox[2] / 2, pred_bbox[1] + pred_bbox[3] / 2)
                target_bbox = (target_bbox[0] - target_bbox[2] / 2, target_bbox[1] - target_bbox[3] / 2,
                            target_bbox[0] + target_bbox[2] / 2, target_bbox[1] + target_bbox[3] / 2)

                iou += calculate_iou(pred_bbox, target_bbox)

            train_acc += iou/len(predictions)

        torch.cuda.empty_cache()


        train_loss = train_loss / (len(dataloader) - invalid_files)
        train_acc = train_acc / (len(dataloader) - invalid_files)

        # Measure the accuracy using the dataloader_val
        model.eval()
        val_acc = 0.0
        invalid_files = 0

        with torch.no_grad():
            for images, targets in tqdm(dataloader_val, desc=f"Epoch {epoch+1}/{num_epochs}"):

                images, targets = images.to(device), targets.to(device)

                if torch.equal(targets, torch.tensor([0, 0, 0, 0], dtype=torch.float32).to(device)):
                    invalid_files += 1
                    continue

                predictions = model(images)
                # Calculate and accumulate accuracy metric across all batches
                iou = 0.0
                for i in range(len(predictions)):
                    pred_bbox = predictions[i, :4]
                    target_bbox = targets[i, :4]
                    # convert values to (xmin, ymin, xmax, ymax)
                    pred_bbox = (pred_bbox[0] - pred_bbox[2] / 2, pred_bbox[1] - pred_bbox[3] / 2,
                                pred_bbox[0] + pred_bbox[2] / 2, pred_bbox[1] + pred_bbox[3] / 2)
                    target_bbox = (target_bbox[0] - target_bbox[2] / 2, target_bbox[1] - target_bbox[3] / 2,
                                target_bbox[0] + target_bbox[2] / 2, target_bbox[1] + target_bbox[3] / 2)

                    iou += calculate_iou(pred_bbox, target_bbox)

                val_acc += iou/len(predictions)

        val_acc = val_acc / (len(dataloader_val) - invalid_files)

        torch.cuda.empty_cache()

        # Save the loss and accuracy for each epoch in a txt file
        with open(f"models/{num_epochs}/train_metrics.txt", "a") as f:
            f.write(f"{epoch+1},{train_loss:.6f},{train_acc:.4f},{val_acc:.4f}\n")

        print(f"Epoch {epoch+1}/{num_epochs}, Loss: {train_loss:.6f}, Accuracy: {val_acc:.4f}")
        # Stores the model with the best accuracy
        if val_acc > best_accuracy:
            save_model(model, num_epochs)
            best_loss = train_loss
            best_accuracy = val_acc
            best_epoch = epoch
            print(f"*** Best Epoch #{epoch+1} ***")

		# Early stoping if the accuracy does not improve after 10 epochs
        if epoch - best_epoch > 10:
            print(f"Early stopping at epoch {epoch+1}")
            break


    return best_loss, best_accuracy, best_epoch

## **Usage of the CNN**

### Set the dataset path and prepare for training

In [None]:
dataset_path = "organized_dataset"

image_train_path = f"{dataset_path}/images/train"
annotations_train_path = f"{dataset_path}/labels/train"

val_images_path = f"{dataset_path}/images/val"
val_annotations_path = f"{dataset_path}/labels/val"

# Prepare Dataset and DataLoader for training
dataset = AnodeDataset(image_train_path, annotations_train_path, img_size=256, transform=None)
dataloader = DataLoader(dataset, batch_size=128, shuffle=True)

# Prepare Dataset and DataLoader for validation
val_dataset = AnodeDataset(val_images_path, val_annotations_path, img_size=256, transform=None)
val_dataloader = DataLoader(val_dataset, batch_size=128, shuffle=True)


# Model Initialization
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
model = NeuralNetwork(img_size=256)
optimizer = optim.Adam(model.parameters(), lr=0.001)

### Train the model

In [None]:
# Train the Model
num_epochs = 40
_, _, best_epoch = train_model(model, dataloader, val_dataloader, optimizer, device, num_epochs)

### Evaluation of the training process

In [None]:
def plot_training_data(num_epochs):

	# Get data from the training metrics file
	with open(f"models/{num_epochs}/train_metrics.txt", "r") as f:
		lines = f.readlines()

	epochs = []
	losses = []
	accuracies_train = []
	accuracies_val = []

	for line in lines:
		epoch, loss, acc_train, acc_val = map(float, line.strip().split(","))
		epochs.append(epoch)
		losses.append(loss)
		accuracies_train.append(acc_train)
		accuracies_val.append(acc_val)

	fig, ax1 = plt.subplots()

	color = 'tab:red'
	ax1.set_xlabel('Epoch')
	ax1.set_ylabel('Loss', color=color)
	ax1.plot(epochs, losses, color=color)
	ax1.tick_params(axis='y', labelcolor=color)

	ax2 = ax1.twinx()
	color = 'tab:blue'
	ax2.set_ylabel('Accuracy', color=color)
	ax2.plot(epochs, accuracies_train, color=color)
	ax2.plot(epochs, accuracies_val, color="green")
	ax2.tick_params(axis='y', labelcolor=color)

	fig.tight_layout()
	plt.title(f"Training Metrics for {num_epochs} epochs")
	plt.show()

In [None]:
num_epochs = 40
plot_training_data(num_epochs)

### Evaluation of the model in the testing dataset

In [None]:
def load_model(model_path, img_size):
    model = NeuralNetwork(img_size)
    model.load_state_dict(torch.load(model_path))
    model.eval()
    return model

# Function to process image and make prediction
def predict(image_path, model, img_size):
    # Load and preprocess the image
    img = cv2.imread(image_path)
    img_resized = cv2.resize(img, (img_size, img_size))
    img_normalized = img_resized / 255.0  # Normalize to [0, 1]
    img_tensor = torch.tensor(img_normalized, dtype=torch.float32).permute(2, 0, 1).unsqueeze(0)

    # Make the prediction
    with torch.no_grad():
        prediction = model(img_tensor)

    # Convert prediction to bounding box (x_center, y_center, width, height)
    x_center, y_center, width, height = prediction[0]
    x_center, y_center = x_center.item(), y_center.item()
    width, height = width.item(), height.item()

    # Return the bounding box and confidence
    return x_center, y_center, width, height, img_resized

# Function to load ground truth
def load_ground_truth(annotation_path, img_size):
    if os.path.exists(annotation_path) and os.path.getsize(annotation_path) > 0:
        with open(annotation_path, "r") as f:
            line = f.readline().strip()
        if line:
            class_id, x_center, y_center, width, height = map(float, line.split())
            return x_center, y_center, width, height

    return None  # No ground truth


# Function to visualize prediction on image
def visualize_prediction(pred_box, gt_box, img):
    # Draw predicted bounding box if confidence is above threshold
    if pred_box is not None:
        x1, y1, x2, y2 = pred_box
        # Convert to image size
        x1, y1, x2, y2 = int(x1 * img.shape[1]), int(y1 * img.shape[0]), int(x2 * img.shape[1]), int(y2 * img.shape[0])
        # Draw bounding box
        cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)

    # Draw ground truth bounding box if available
    if gt_box is not None:
        x1, y1, x2, y2 = gt_box
        # Convert to image size
        x1, y1, x2, y2 = int(x1 * img.shape[1]), int(y1 * img.shape[0]), int(x2 * img.shape[1]), int(y2 * img.shape[0])
        # Draw bounding box
        cv2.rectangle(img, (x1, y1), (x2, y2), (255, 0, 0), 2)

    # Show the image with both predicted and ground truth boxes
    plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
    plt.axis('off')  # Hide axes
    plt.show()

In [None]:
model = load_model(f"models/{num_epochs}/anode_detector_{num_epochs}.pth", img_size=256)

In [None]:
import random
from tqdm import tqdm

test_images_path = f"{dataset_path}/images/test"
test_annotations_path = f"{dataset_path}/labels/test"

test_images = [os.path.join(test_images_path, img) for img in os.listdir(test_images_path)]
test_annotations = [os.path.join(test_annotations_path, os.path.splitext(os.path.basename(img))[0] + ".txt") for img in test_images]

# Get average IoU for all test images
iou_values = []

for image_path, annotation_path in tqdm(zip(test_images, test_annotations), total=len(test_images), desc="Processing test images"):
	x_center, y_center, width, height, img = predict(image_path, model, img_size=256)
	# Load ground truth
	gt_box = load_ground_truth(annotation_path, img_size=256)
	# Calculate IoU if ground truth exists
	if gt_box is not None:
		# Convert bounding box to (xmin, ymin, xmax, ymax) format
		pred_box = (x_center - width / 2, y_center - height / 2, x_center + width / 2, y_center + height / 2)
		gt_box = (gt_box[0] - gt_box[2] / 2, gt_box[1] - gt_box[3] / 2, gt_box[0] + gt_box[2] / 2, gt_box[1] + gt_box[3] / 2)
		iou = calculate_iou(pred_box, gt_box)
		iou_values.append(iou)

print(f"Average IoU: {np.mean(iou_values):.4f}")

plt.boxplot(iou_values)
plt.title("IoU Distribution for Test Images")
plt.show()

print("Displaying 30 random images with predictions and ground truth...")

# Display 30 random images with predictions and ground truth
test_images_rand = random.sample(test_images, 30)
test_annotations_rand = [os.path.join(test_annotations_path, os.path.splitext(os.path.basename(img))[0] + ".txt") for img in test_images_rand]

# Display the images, predictions and ground truth
for image_path, annotation_path in zip(test_images_rand, test_annotations_rand):
    # Make prediction
    x_center, y_center, width, height, img = predict(image_path, model, img_size=256)
    # Convert bounding box to (xmin, ymin, xmax, ymax) format
    pred_box = (x_center - width / 2, y_center - height / 2, x_center + width / 2, y_center + height / 2)
    # Load ground truth
    gt_box = load_ground_truth(annotation_path, img_size=256)

    # Calculate IoU if ground truth exists
    if gt_box is not None:
        # Convert bounding box to (xmin, ymin, xmax, ymax) format
        gt_box = (gt_box[0] - gt_box[2] / 2, gt_box[1] - gt_box[3] / 2, gt_box[0] + gt_box[2] / 2, gt_box[1] + gt_box[3] / 2)
        iou = calculate_iou(pred_box, gt_box)
        print(f"IoU: {iou:.4f}")
    else:
        print("No ground truth available for this image.")

    # Visualize the prediction + ground truth (if available)
    visualize_prediction(pred_box, gt_box, img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

# **Results**


We conducted three evaluations of our model: the first two without data augmentation, and the second with data augmentation. Let’s start by reviewing the results without data augmentation.

This model was trained using a batch size of 6 and 15 epochs. The following graph illustrates the evolution of both training and validation accuracy over the epochs, along with the progression of the loss.

**add graph**

The boxplot below displays the distribution of test accuracy. We can observe some outliers, indicating that the model occasionally struggles with certain cases. However, the spread and central tendency suggest relatively high accuracy levels, indicating that the model performs reasonably well, though not perfectly.

**add boxplot**

Bellow are some examples of the results obtained with this model.

**add images**

----------------

Our second evaluation was made with the same batch_size as the previous one but with 200 epochs. The evolution of the training and validation accuracy show **...**.

**add graph**

And the results of the testing dataset got the following boxplot that show **...**

**add boxplot**

Bellow are some examples of the results obtained with this model.

**add images**

----------------

Our last model was made with the same batch_size and 50 epochs. 