<a href="https://colab.research.google.com/github/KetilJacobsen/DAT255-Deep-learning-engineering.-Prosjektoppgave/blob/main/YOLOv11_Small_Version.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# In this notebook we will train the YOLOv11 Version small.

# References
@software{yolo11_ultralytics, \
  author = {Glenn Jocher and Jing Qiu}, \
  title = {Ultralytics YOLO11}, \
  version = {11.0.0}, \
  year = {2024}, \
  url = {https://github.com/ultralytics/ultralytics}, \
  orcid = {0000-0001-5950-6979, 0000-0002-7603-6750, 0000-0003-3783-7069},\
  license = {AGPL-3.0}
}

# Step 1: Install Required Libraries
We begin by installing the necessary Python libraries for this project:

- **Ultralytics**: Provides the YOLOv11 framework for training, evaluating, and deploying object detection models.
- **Roboflow** *(optional)*: Used to easily download and manage datasets hosted on [Roboflow](https://roboflow.com/).
- **OpenCV & Matplotlib**: Used later in the notebook to draw ground truth boxes and display side-by-side comparisons between model predictions and actual labels.



These libraries are regularly updated, so we include `--upgrade` to ensure the latest features and bug fixes are used.

In [None]:
!pip install ultralytics --upgrade
!pip install roboflow
!pip install opencv-python matplotlib

Collecting ultralytics
  Downloading ultralytics-8.3.107-py3-none-any.whl.metadata (37 kB)
Collecting ultralytics-thop>=2.0.0 (from ultralytics)
  Downloading ultralytics_thop-2.0.14-py3-none-any.whl.metadata (9.4 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.8.0->ultralytics)
  Downloading n

# Step 2: Clone the Ultralytics Repository

We clone the official [Ultralytics GitHub repository](https://github.com/ultralytics/ultralytics) to access the YOLOv11 model configuration files directly.

This allows us to:
- Modify the model architecture (e.g., `yolo11.yaml`)
- Train a custom version of YOLOv11

Note: The `ultralytics` package is installed via pip and used for training, inference, and evaluation.  
We clone the GitHub repo only to edit the model architecture files.

In [None]:
!git clone https://github.com/ultralytics/ultralytics.git

fatal: destination path 'ultralytics' already exists and is not an empty directory.


# Step 3: View and Modify the YOLOv11 Detection Model Configuration

The YOLOv11 detection architecture is defined in `yolo11.yaml`.  
We view this file to:
- Update the number of object classes (`nc`)
- Optionally customize the model’s architecture (e.g., layers, modules, channels)

In this case, we change:
- `nc: 80` → `nc: 4`  
  to match our dataset's four classes: `person`, `aware`, `unaware`, and `partially-aware`.
- Add or remove from the backbone to modify the structure.

Below, we display the contents of the file before editing:

In [None]:
# Used to view the structure
!cat ultralytics/ultralytics/cfg/models/11/yolo11.yaml

# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license

# Ultralytics YOLO11 object detection model with P3/8 - P5/32 outputs
# Model docs: https://docs.ultralytics.com/models/yolo11
# Task docs: https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 4 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.50, 0.25, 1024] # summary: 181 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  s: [0.50, 0.50, 1024] # summary: 181 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  m: [0.50, 1.00, 512] # summary: 231 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  l: [1.00, 1.00, 512] # summary: 357 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  x: [1.00, 1.50, 512] # summary: 357 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs

# YOLO11n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, C

To enhance the model's capacity for extracting context-rich and subtle features (such as head orientation and posture), we added a 3×3 convolutional layer followed by an additional C3k2 block with 1024 channels before the final SPPF and C2PSA layers. The Conv layer helps reorganize the feature maps for better gradient flow, while the C3k2 block deepens the network, allowing it to capture more abstract representations. This structure aims to improve detection performance for our 4-class awareness classification task.

# How to change the yaml file and what was changed

Add the: "%%writefile ultralytics/ultralytics/cfg/models/11/yolo11.yaml" \
This is used to modify and overvrite the structure below.
First we change the number of classes (nc) to 4 and then we added block 9 and 10 below

In [None]:
%%writefile ultralytics/ultralytics/cfg/models/11/yolo11s.yaml
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license

# Ultralytics YOLO11 object detection model with P3/8 - P5/32 outputs
# Model docs: https://docs.ultralytics.com/models/yolo11
# Task docs: https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 4 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  # [depth, width, max_channels]
  s: [0.50, 0.50, 1024] # summary: 181 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs


# YOLO11n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  - [-1, 2, C3k2, [256, False, 0.25]]
  - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  - [-1, 2, C3k2, [512, False, 0.25]]
  - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  - [-1, 2, C3k2, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  - [-1, 2, C3k2, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]] # 9
  - [-1, 2, C2PSA, [1024]] # 10

# YOLO11n head
head:
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  - [-1, 2, C3k2, [512, False]] # 13

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  - [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 13], 1, Concat, [1]] # cat head P4
  - [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 10], 1, Concat, [1]] # cat head P5
  - [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)

  - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)



Writing ultralytics/ultralytics/cfg/models/11/yolo11s.yaml


In [None]:
# Used to view the structure after chainging the structure
!cat ultralytics/ultralytics/cfg/models/11/yolo11s.yaml

# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license

# Ultralytics YOLO11 object detection model with P3/8 - P5/32 outputs
# Model docs: https://docs.ultralytics.com/models/yolo11
# Task docs: https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 4 # number of classes
depth_multiple: 0.50
width_multiple: 0.50

scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  # [depth, width, max_channels]
  s: [0.50, 0.50, 1024] # summary: 181 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs


# YOLO11n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  - [-1, 2, C3k2, [256, False, 0.25]]
  - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  - [-1, 2, C3k2, [512, False, 0.25]]
  - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  - [-1, 2, C3k2, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  - [-1, 2, C3k2, [1024, True]]
  - [-1, 1, SPPF, [1024

# Step 4: Download Our Custom Dataset from Roboflow

We use the Roboflow Python SDK to download our own annotated dataset,  
`human_awareness_face` (version 39), from the **HVL Robotics workspace**.

This dataset was created and labeled by our project group to support  
awareness classification of humans in agriculture.

We download it in **YOLOv11 format**, ensuring compatibility with the custom YOLOv11 model configuration  
we are using for training.

The download includes:
- `train`, `valid`, and `test` image sets
- A `data.yaml` file with class names and paths


In [None]:
from roboflow import Roboflow
rf = Roboflow(api_key="CsyOoX5KDG78hhUgnDyj")
project = rf.workspace("hvl-robotics").project("human_awareness_face")
version = project.version(39)
dataset = version.download("yolov11")

loading Roboflow workspace...
loading Roboflow project...


# Step 5: Train the Custom YOLOv11 Detection Model size Nano

We now train our custom YOLOv11 object detection model using the modified architecture  
defined in `yolo11.yaml`, and the dataset we downloaded from Roboflow.

Key training parameters:
- `model='n',  # nano scale`
- `data="/content/human_awareness_face-39/data.yaml`
- `epochs=50`
- `imgsz=640`
- `batch=16`
- `name="yolo11n_custom_aug_es`,  # custom name for logs \

# Data augmentations
- `shear=10`
- `hsv_h=0.015`
- `hsv_s=0.7`
- `hsv_v=0.4`

# Early stopping
- `patience=10,results`015`
- `hsv_s=0.7`
- `hsv_v=0.4`

# Early stopping
- `patience=10,results`

The model will be trained to detect and classify the following 4 classes:
- `person`
- `aware`
- `unaware`
- `partially-aware`

We experimented with different YOLOv11 scale variants by leveraging Ultralytics' compound scaling system. The 'n' (nano) version offers faster training and inference with lower accuracy, while the 's' (small) variant balances performance and resource use. These variants are activated by adjusting the model scale parameter in the training function. \
To avoid overfitting and reduce unnecessary training time, we enabled early stopping with a patience of 10 epochs. This ensures training halts once performance plateaus, preserving the best-performing model based on validation metrics.

In [None]:
from ultralytics import YOLO

# Load the YOLOv11 model
model = YOLO("ultralytics/ultralytics/cfg/models/11/yolo11s.yaml")

# Train the small version
model.train(
    model='s',  # small scale
    data="/content/human_awareness_face-39/data.yaml",
    epochs=50,
    imgsz=640,
    batch=16,
    name="yolo11s_aug_es",  # custom name for logs

    # Data augmentations
    shear=10,
    hsv_h=0.010,
    hsv_s=0.25,
    hsv_v=0.15,

    # Early stopping
    patience=10,
)


Ultralytics 8.3.107 🚀 Python-3.11.12 torch-2.6.0+cu124 CPU (Intel Xeon 2.20GHz)
[34m[1mengine/trainer: [0mtask=detect, mode=train, model=s, data=/content/human_awareness_face-39/data.yaml, epochs=50, time=None, patience=10, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=yolo11s_aug_es2, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, sh

[34m[1mtrain: [0mScanning /content/human_awareness_face-39/train/labels.cache... 541 images, 29 backgrounds, 0 corrupt: 100%|██████████| 541/541 [00:00<?, ?it/s]

[34m[1malbumentations: [0mBlur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01, num_output_channels=3, method='weighted_average'), CLAHE(p=0.01, clip_limit=(1.0, 4.0), tile_grid_size=(8, 8))



[34m[1mval: [0mScanning /content/human_awareness_face-39/valid/labels.cache... 151 images, 7 backgrounds, 0 corrupt: 100%|██████████| 151/151 [00:00<?, ?it/s]

Plotting labels to runs/detect/yolo11s_aug_es2/labels.jpg... 





[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.00125, momentum=0.9) with parameter groups 81 weight(decay=0.0), 88 weight(decay=0.0005), 87 bias(decay=0.0)
[34m[1mTensorBoard: [0mmodel graph visualization added ✅
Image sizes 640 train, 640 val
Using 0 dataloader workers
Logging results to [1mruns/detect/yolo11s_aug_es2[0m
Starting training for 50 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       1/50         0G      3.798      4.846      4.208         93        640:   6%|▌         | 2/34 [01:42<27:03, 50.74s/it]

# Step 6: Evaluate the Trained YOLOv11 Model

After training, we load the best-performing weights (`best.pt`)  
from the `runs/detect/yolo11_custom/weights/` directory and evaluate the model on the validation set.

This evaluation provides key performance metrics:
- **Precision**: How many of the predicted bounding boxes are correct
- **Recall**: How many of the actual objects were detected
- **mAP@0.5**: Mean Average Precision at IoU 0.5
- **mAP@0.5:0.95**: Average over 10 IoU thresholds from 0.5 to 0.95 (stricter and more informative)

It also generates visual outputs such as:
- Confusion matrix
- Precision-recall curves
- Overall results summary plots


In [None]:
# Total number of model parameters
total_params = sum(p.numel() for p in model.model.parameters())
print(f"Number of parameters: {total_params:,}")
model.model.info(verbose=True)

In [None]:
from ultralytics import YOLO

# Load best weights from the nano model
model = YOLO("runs/detect/yolo11s_aug_es/weights/best.pt")

# Evaluate on the validation set (same one used during training)
metrics = model.val()


The training dataset consisted of 541 original images, annotated across four classes: 'aware', 'partially-aware', 'person', and 'unaware'. To improve generalization and prevent overfitting, extensive on-the-fly data augmentation was applied, including random scaling, flipping, rotation, HSV color shifts, mosaic, and mixup. As a result, the model effectively saw thousands of unique image variations across 50 training epochs, even though the core dataset remained the same.

# Step 7: Explore Evaluation Output Files

After evaluating the trained model, Ultralytics automatically generates a `runs/detect/val/` folder  
containing all visual and numerical outputs related to model performance.

This includes:
- `confusion_matrix.png`: Class-level confusion matrix
- `PR_curve.png`: Precision-recall curve for each class
- `results.png`: Combined loss and metric plots
- `labels.jpg`: Annotated image overview from the validation set

We list the contents of this folder below to confirm that evaluation outputs were generated.


In [None]:
!ls runs/detect/val/

# Step 8: Visualize Evaluation Results

To better understand the performance of the model, we display key visual outputs from the evaluation step:

- **Confusion Matrix**: Shows how well the model distinguishes between the four classes.
- **Results Summary**: Includes training loss curves, precision, recall, and mAP progression over epochs.

These visuals help diagnose performance bottlenecks (e.g., misclassifications between similar classes like `aware` vs `partially-aware`) and guide further improvements.


In [None]:
from IPython.display import Image, display

# Show confusion matrix
display(Image(filename='runs/detect/val/confusion_matrix.png'))

# Show results summary (make sure this path matches the latest run)
display(Image(filename='/content/runs/detect/yolo11s_aug_es/results.png'))


# Step 9: Display Key Evaluation Metrics

To summarize the performance of the trained model, we print the core evaluation metrics:

- **Precision**: The percentage of predicted bounding boxes that are correct
- **Recall**: The percentage of actual objects that were detected
- **mAP@0.5**: Mean Average Precision at 50% IoU (standard object detection score)
- **mAP@0.5:0.95**: Stricter mean Average Precision averaged across 10 IoU thresholds (from 0.5 to 0.95)

These values provide a quick, numerical snapshot of how well the model performs on the validation set.


In [None]:
# Run this after: metrics = model.val()

print(f"Precision:      {metrics.box.mp:.3f}")
print(f"Recall:         {metrics.box.mr:.3f}")
print(f"mAP@0.5:        {metrics.box.map50:.3f}")
print(f"mAP@0.5:0.95:   {metrics.box.map:.3f}")


In [None]:
import os
import cv2
import matplotlib.pyplot as plt
from ultralytics import YOLO
import glob

# === CONFIGURATION ===
val_img_dir = "/content/human_awareness_face-39/valid/images"
val_lbl_dir = "/content/human_awareness_face-39/valid/labels"
model_path = "runs/detect/yolo11s_aug_es/weights/best.pt"
num_images_to_show = 5  # Adjust this number

# === LOAD MODEL ===
model = YOLO(model_path)
class_names = model.names  # Automatically uses correct class order

# === COLOR MAP MATCHING CLASS ORDER IN YOUR YAML ===
# ['aware', 'partially-aware', 'person', 'unaware']
color_map = {
    0: (0, 255, 0),       # aware - green
    1: (255, 165, 0),     # partially-aware - orange
    2: (0, 0, 255),       # person - blue
    3: (255, 0, 0),       # unaware - red
}

# === GET IMAGE LIST ===
image_paths = glob.glob(os.path.join(val_img_dir, "*.jpg"))

# === LOOP THROUGH IMAGES ===
for idx, img_path in enumerate(image_paths[:num_images_to_show]):
    img_name = os.path.basename(img_path)
    label_path = os.path.join(val_lbl_dir, img_name.replace(".jpg", ".txt"))

    # Load image
    image = cv2.imread(img_path)
    if image is None or not os.path.exists(label_path):
        print(f"Skipping {img_name} (missing image or label)")
        continue
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    h, w = image_rgb.shape[:2]

    # === DRAW GROUND TRUTH ===
    img_gt = image_rgb.copy()
    with open(label_path, 'r') as f:
        for line in f:
            cls_id, x, y, bw, bh = map(float, line.strip().split())
            x1 = int((x - bw/2) * w)
            y1 = int((y - bh/2) * h)
            x2 = int((x + bw/2) * w)
            y2 = int((y + bh/2) * h)
            cls_id = int(cls_id)
            cls_name = class_names[cls_id]
            color = color_map.get(cls_id, (0, 255, 255))  # fallback to yellow

            # Draw box
            cv2.rectangle(img_gt, (x1, y1), (x2, y2), color, 2)

            # Draw label with background
            label = f"{cls_name}"
            (text_w, text_h), _ = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.4, 1)
            cv2.rectangle(img_gt, (x1, y1 - text_h - 6), (x1 + text_w + 4, y1), color, -1)
            cv2.putText(img_gt, label, (x1 + 2, y1 - 4),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.4, (0, 0, 0), 1)

    # === PREDICTIONS ===
    results = model(img_path)[0]
    img_pred = image_rgb.copy()
    for box in results.boxes:
        x1, y1, x2, y2 = map(int, box.xyxy[0].tolist())
        cls_id = int(box.cls[0])
        conf = box.conf[0].item()
        cls_name = class_names[cls_id]
        color = color_map.get(cls_id, (0, 255, 255))

        # Draw box
        cv2.rectangle(img_pred, (x1, y1), (x2, y2), color, 2)

        # Draw label with background
        label = f"{cls_name} {conf:.2f}"
        (text_w, text_h), _ = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.4, 1)
        cv2.rectangle(img_pred, (x1, y1 - text_h - 6), (x1 + text_w + 4, y1), color, -1)
        cv2.putText(img_pred, label, (x1 + 2, y1 - 4),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.4, (0, 0, 0), 1)

    # === DISPLAY ===
    plt.figure(figsize=(14, 7))
    plt.subplot(1, 2, 1)
    plt.imshow(img_gt)
    plt.title(f"Ground Truth: {img_name}", fontsize=14)
    plt.axis("off")

    plt.subplot(1, 2, 2)
    plt.imshow(img_pred)
    plt.title(f"Prediction: {img_name}", fontsize=14)
    plt.axis("off")

    plt.tight_layout()
    plt.show()
