<a href="https://colab.research.google.com/github/PavanDaniele/drone-person-detection/blob/main/model_Training_and_Evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Set up: mount drive + import libraries

**Important Information:** We need to activate the GPU on Colab (_Runtime --> Change runtime type_). \
Every time you start a new session (or reopen the notebook after a few hours) check that the GPU is still active. If we are not using the GPU it can take up to tens of hours to train the models. \
_GPU T4 is the best choice._

In [1]:
# Run this Every time you start a new session
from google.colab import drive
drive.mount('/content/drive') # to mount google drive (to see/access it)

Mounted at /content/drive


In [2]:
!pip install ultralytics # Installation of Ultralytics for YOLO models

Collecting ultralytics
  Downloading ultralytics-8.3.167-py3-none-any.whl.metadata (37 kB)
Collecting ultralytics-thop>=2.0.0 (from ultralytics)
  Downloading ultralytics_thop-2.0.14-py3-none-any.whl.metadata (9.4 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.8.0->ultralytics)
  Downloading n

In [5]:
from ultralytics import YOLO # Import of Ultralytics for YOLO models
import shutil
import os

# General Explanation

### Backbone

In Computer Vision, a _Backbone_ is the part of a convolutional neural network responsible for extracting the main features from an image. \
It serves as the shared base upon which subsequent modules are built (such as heads for classification, object detection, segmentation, etc.).

\
Each backbone has been pre-trained on specific datasets (e.g., ImageNet) using particular preprocessing steps, input dimensions, normalization, and augmentation techniques, which should ideally be replicated during fine-tuning to maintain compatibility and achieve optimal performance.

### Data Loader

To train a deep learning model, it is essential to properly handle data loading and preparation. This is the task of the _Data Loader_, a component responsible for:
- Loading images and their corresponding annotations (e.g., .txt or .json) from the dataset.
- Applying preprocessing operations, such as resizing, normalization, data augmentation, etc.
- Organizing data into batches to feed the model during training.

\
Considering the limited resources of my development environment, at first I decided to perform the image and annotation resizing in a separate phase (prior to training), in order to:
- Reduce the workload of the data loader during fine-tuning;
- Increase data loading and training speed;
- Ensure consistency between images and annotations.

But due to the different type of scaling technique, I want to try to fine-tuning the model without any pre-scaling.

The other transformations, instead, are handled by the model-specific data loader, since each model uses different preprocessing and normalization techniques. \
Moreover, some models require specific transformations to achieve optimal performance, and the libraries that provide the models (e.g., Ultralytics for YOLO, torchvision for EfficientDet/SSD) already implement loaders that are properly configured and optimized.

### Normalizzazione

Image normalization consists in scaling pixel values from the range [0, 255] to a more suitable interval (e.g., [0, 1] or [-1, 1]), often based on the mean and standard deviation of the pre-training dataset, with the goal of:
- Avoiding overly large values in the tensors;
- Making the model more stable during training;
- Speeding up convergence.

\
Normalization helps maintain a consistent pixel range and distribution, which is essential for pre-trained models.

### Data Augmentations

Data augmentation consists of random transformations (e.g., rotations, flips, crops, brightness changes, etc.) applied during training. Their purpose is to:
- Simulate new visual conditions;
- Increase dataset variety;
- Reduce overfitting by improving the model’s ability to generalize.

\
In practice, the semantic content of the image doesn't change (e.g., a person remains a person), but its visual appearance is altered to help the model "learn better."


My goal is to evaluate the real-world performance of each model in its ideal scenario, in order to select the most suitable one for deployment on the Jetson Nano. \
For this reason, each model is trained using its native augmentations, meaning the ones that were designed and optimized as part of its original architecture. \
It wouldn’t make sense to disable them or enforce a uniform setup across models, because what we want to observe is the maximum potential of each model, working in the way it was designed to perform best.

# Fine-Tuning Model

### YOLOv8n

First of all we need to save the dataset locally:


In [6]:
src = '/content/drive/MyDrive/projectUPV/datasets/AERALIS_YOLOv8n'
dst = '/content/AERALIS_YOLOv8n_local'  # is now on the local VM, NOT on drive

# If the destination folder already exists, I delete it
if os.path.exists(dst):
  shutil.rmtree(dst)

# Recursive copy of ENTIRE folder (and subfolders)
shutil.copytree(src, dst)
print("Copy completed:", os.path.exists(dst))

Copy completed: True


Let's check the total free space:

In [7]:
!df -h / # It shows the total, used and free space on the root (/) of the Colab VM.

# Avail column: space still available for your files.

Filesystem      Size  Used Avail Use% Mounted on
overlay         113G   55G   58G  49% /


In [8]:
# Show space used by your local folder
!du -sh /content/AERALIS_YOLOv8n_local

6.6G	/content/AERALIS_YOLOv8n_local


In [9]:
# Show space occupied by various folders in /content/.
!du -h --max-depth=1 /content/

140K	/content/.config
6.6G	/content/AERALIS_YOLOv8n_local
du: cannot access '/content/drive/.Encrypted/.shortcut-targets-by-id/1LQbD7p_iS5KLqGNdfrYEvsAx0i_bgB0h/projectUPV': No such file or directory
67G	/content/drive
55M	/content/sample_data
74G	/content/


We want to create the data.yaml file, which YOLO uses to know:
- the path to the training, validation, and test images
- the number of classes (nc)
- the names of the classes (names)

\
This file is used by YOLO to locate the images and their annotations.

In [10]:
# YAML dataset (edit routes)
data_yaml = """
train: /content/AERALIS_YOLOv8n_local/train/images
val:   /content/AERALIS_YOLOv8n_local/val/images
test:  /content/AERALIS_YOLOv8n_local/test/images

nc: 1
names: ['person']
"""
open('data.yaml', 'w').write(data_yaml)

176

Perfect, we have correctly written the data.yaml file for the AERALIS_YOLOv8n dataset. Let's continue with the loading of the model:

In [11]:
# Upload the pre-trained model we want to use as a starting point

model_YOLOv8n = YOLO('yolov8n.pt') # it is the model that will be fine-tuned on the custom dataset

Downloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolov8n.pt to 'yolov8n.pt'...


100%|██████████| 6.25M/6.25M [00:00<00:00, 106MB/s]


We have now downloaded the pre-trained model from the official Ultralytics repository.

The **Batch size** is the number of images processed simultaneously in each training step. With 4-8GB of RAM, a batch size of 8 (or even less) is recommended, so we'll start with that value and reduce it if necessary.

**Early Stopping** is a technique that automatically stops the training process if the model stops improving after a certain number of epochs. This helps prevent overfitting and saves time.

**Workers** are the parallel processes used to load and preprocess data while the model is training. However, due to our limited resources, we’ll start with 2 workers, and if data loading errors occur, we'll reduce this number to 1 or even 0.


In [12]:
# To see the available GPU
import torch
print(torch.cuda.is_available()) # True = you have GPU --> if False then use device='cpu'
print(torch.cuda.device_count()) # Name of GPU

# If True and at least 1, you can use device=0.
# If you don't have GPU: use device='cpu' (much slower).
# Locally (not Colab): check with nvidia-smi from terminal.

True
1


Now that we have confirmation that the GPU is active we can train the model:

In [13]:
# Fine‑tuning
results_YOLOv8n = model_YOLOv8n.train(
  data='data.yaml', # use the newly created yaml file
  epochs=100, # Maximum number of training epochs
  imgsz=640, # Image input size (recommended for YOLO).
  batch=16,  # Batch size
  patience=20, # Early stopping if the metrics do not improve for 20 epochs
  workers=2, # Number of workers for the dataloader
  device=0, # Use GPU 0 (or put 'cpu' if you don't have GPU)
  project='runs_finetune', # Folder where it will save the results of the experiments (the folder will be created automatically)
  name='person_yolov8n' # Subfolder/name specific to our experiment
)

# results -->  will contain metrics, logs, and the path of the best weights found during the training

Ultralytics 8.3.167 🚀 Python-3.11.13 torch-2.6.0+cu124 CUDA:0 (Tesla T4, 15095MiB)
[34m[1mengine/trainer: [0magnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=data.yaml, degrees=0.0, deterministic=True, device=0, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=100, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8n.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=person_yolov8n, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=20, perspective=0.0, plots=True, pose=12.0, pretrained

100%|██████████| 755k/755k [00:00<00:00, 20.0MB/s]

Overriding model.yaml nc=80 with nc=1

                   from  n    params  module                                       arguments                     
  0                  -1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]                 
  1                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]                
  2                  -1  1      7360  ultralytics.nn.modules.block.C2f             [32, 32, 1, True]             
  3                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]                
  4                  -1  2     49664  ultralytics.nn.modules.block.C2f             [64, 64, 2, True]             
  5                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]               
  6                  -1  2    197632  ultralytics.nn.modules.block.C2f             [128, 128, 2, True]           
  7                  -1  1    295424  ultralytics




 22        [15, 18, 21]  1    751507  ultralytics.nn.modules.head.Detect           [1, [64, 128, 256]]           
Model summary: 129 layers, 3,011,043 parameters, 3,011,027 gradients, 8.2 GFLOPs

Transferred 319/355 items from pretrained weights
Freezing layer 'model.22.dfl.conv.weight'
[34m[1mAMP: [0mrunning Automatic Mixed Precision (AMP) checks...
Downloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt to 'yolo11n.pt'...


100%|██████████| 5.35M/5.35M [00:00<00:00, 91.4MB/s]


[34m[1mAMP: [0mchecks passed ✅
[34m[1mtrain: [0mFast image access ✅ (ping: 0.0±0.0 ms, read: 133.6±32.9 MB/s, size: 2026.4 KB)


[34m[1mtrain: [0mScanning /content/AERALIS_YOLOv8n_local/train/labels... 2395 images, 388 backgrounds, 0 corrupt: 100%|██████████| 2395/2395 [00:05<00:00, 402.72it/s]


[34m[1mtrain: [0mNew cache created: /content/AERALIS_YOLOv8n_local/train/labels.cache
[34m[1malbumentations: [0mBlur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01, method='weighted_average', num_output_channels=3), CLAHE(p=0.01, clip_limit=(1.0, 4.0), tile_grid_size=(8, 8))
[34m[1mval: [0mFast image access ✅ (ping: 0.0±0.0 ms, read: 2294.6±2062.8 MB/s, size: 2606.8 KB)


[34m[1mval: [0mScanning /content/AERALIS_YOLOv8n_local/val/labels... 515 images, 75 backgrounds, 0 corrupt: 100%|██████████| 515/515 [00:00<00:00, 1369.54it/s]


[34m[1mval: [0mNew cache created: /content/AERALIS_YOLOv8n_local/val/labels.cache
Plotting labels to runs_finetune/person_yolov8n/labels.jpg... 
[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.002, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 2 dataloader workers
Logging results to [1mruns_finetune/person_yolov8n[0m
Starting training for 100 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      1/100      2.06G      2.225      3.149      1.096         61        640: 100%|██████████| 150/150 [02:14<00:00,  1.11it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 17/17 [00:29<00:00,  1.74s/it]


                   all        515       1258      0.611      0.433       0.45      0.174

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      2/100       2.3G       2.13      1.834      1.072         52        640: 100%|██████████| 150/150 [02:19<00:00,  1.07it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 17/17 [00:25<00:00,  1.49s/it]


                   all        515       1258      0.619      0.446      0.476      0.167

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      3/100      2.31G      2.116      1.459      1.073         50        640: 100%|██████████| 150/150 [02:23<00:00,  1.05it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 17/17 [00:25<00:00,  1.52s/it]

                   all        515       1258      0.671      0.483        0.5      0.199






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      4/100      2.32G      2.075      1.343      1.069         39        640: 100%|██████████| 150/150 [02:20<00:00,  1.07it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 17/17 [00:25<00:00,  1.48s/it]

                   all        515       1258      0.719      0.527      0.573      0.226






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      5/100      2.32G      2.045      1.272      1.051         45        640: 100%|██████████| 150/150 [02:23<00:00,  1.04it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 17/17 [00:25<00:00,  1.49s/it]

                   all        515       1258      0.759      0.527      0.593      0.251






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      6/100      2.33G      1.991      1.211      1.041         37        640: 100%|██████████| 150/150 [02:15<00:00,  1.10it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 17/17 [00:26<00:00,  1.53s/it]

                   all        515       1258       0.73      0.582       0.61      0.275






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      7/100      2.33G      1.968      1.203      1.037         47        640: 100%|██████████| 150/150 [02:11<00:00,  1.14it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 17/17 [00:26<00:00,  1.58s/it]


                   all        515       1258      0.771      0.552       0.62      0.279

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      8/100      2.33G      1.954       1.17      1.025         39        640: 100%|██████████| 150/150 [02:18<00:00,  1.08it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 17/17 [00:26<00:00,  1.57s/it]

                   all        515       1258      0.682      0.561      0.625      0.268






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      9/100      2.35G      1.898      1.112     0.9982         40        640: 100%|██████████| 150/150 [02:21<00:00,  1.06it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 17/17 [00:26<00:00,  1.57s/it]

                   all        515       1258       0.81      0.603      0.679      0.307






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     10/100      2.35G      1.877      1.083      1.004         61        640: 100%|██████████| 150/150 [02:22<00:00,  1.05it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 17/17 [00:25<00:00,  1.52s/it]

                   all        515       1258      0.817      0.605      0.682      0.313






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     11/100      2.35G       1.86      1.075     0.9999         40        640: 100%|██████████| 150/150 [02:21<00:00,  1.06it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 17/17 [00:27<00:00,  1.62s/it]

                   all        515       1258       0.81       0.58      0.644      0.288






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     12/100      2.37G      1.819      1.042     0.9902         42        640: 100%|██████████| 150/150 [02:22<00:00,  1.05it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 17/17 [00:25<00:00,  1.50s/it]


                   all        515       1258      0.818      0.612      0.701      0.316

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     13/100      2.37G      1.793      1.017     0.9826         47        640: 100%|██████████| 150/150 [02:14<00:00,  1.11it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 17/17 [00:28<00:00,  1.69s/it]

                   all        515       1258      0.786      0.649        0.7      0.317






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     14/100      2.37G      1.789      1.011     0.9844         36        640: 100%|██████████| 150/150 [02:26<00:00,  1.02it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 17/17 [00:27<00:00,  1.60s/it]

                   all        515       1258      0.828      0.632      0.715      0.348






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     15/100      2.37G      1.768       0.99     0.9761         61        640: 100%|██████████| 150/150 [02:27<00:00,  1.01it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 17/17 [00:27<00:00,  1.61s/it]

                   all        515       1258      0.784      0.641      0.694      0.332






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     16/100      2.37G      1.786     0.9934     0.9768         41        640: 100%|██████████| 150/150 [02:24<00:00,  1.04it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 17/17 [00:27<00:00,  1.60s/it]

                   all        515       1258      0.818      0.635      0.703      0.334






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     17/100      2.37G       1.74     0.9725     0.9706         61        640: 100%|██████████| 150/150 [02:22<00:00,  1.05it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 17/17 [00:28<00:00,  1.68s/it]

                   all        515       1258      0.828      0.628      0.706       0.34






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     18/100      2.37G      1.743     0.9692     0.9677         65        640: 100%|██████████| 150/150 [02:25<00:00,  1.03it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 17/17 [00:27<00:00,  1.62s/it]

                   all        515       1258      0.805      0.632      0.723      0.341






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     19/100      2.37G       1.73     0.9607     0.9642         66        640: 100%|██████████| 150/150 [02:31<00:00,  1.01s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 17/17 [00:28<00:00,  1.70s/it]

                   all        515       1258      0.813      0.634       0.71      0.337






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     20/100      2.37G      1.711     0.9296     0.9593         29        640: 100%|██████████| 150/150 [02:27<00:00,  1.02it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 17/17 [00:27<00:00,  1.64s/it]

                   all        515       1258      0.839      0.656      0.737      0.358






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     21/100      2.37G      1.718      0.934     0.9586         65        640: 100%|██████████| 150/150 [02:37<00:00,  1.05s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 17/17 [00:27<00:00,  1.63s/it]

                   all        515       1258      0.814      0.661      0.732      0.359






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     22/100      2.37G      1.697     0.9284     0.9529         51        640:  91%|█████████ | 136/150 [02:14<00:13,  1.01it/s]


KeyboardInterrupt: 

We now want to evaluate the trained model using the Test set defined in data.yaml. \
YOLO does not compute standard accuracy, because in object detection True Negatives (TN) are not counted. Therefore, traditional accuracy is not applicable or useful.


So, we will compute:
- **Precision**: how correct your detected positives are
- **Recall**: how many of the real objects you detected
- **mAP50**: mean Average Precision with IoU ≥ 0.5 (how accurate the predictions are)
- **mAP50-95**: average over various IoU thresholds, a more strict metric
- **F1_score**: combination of precision and recall (you can compute it as: 2 * (P * R) / (P + R))

In [None]:
# Evaluates the trained model using the TEST SET defined in data.yaml

metrics_YOLOv8n = model_YOLOv8n.val(data='data.yaml', split='test') # returns accuracy metrics (e.g., mAP, precision, recall, etc.) on the test set
print("Test metrics:", metrics_YOLOv8n)

Confidence (confidence) is the probability estimated by the model that a detected object is actually real (i.e., not a false positive).

In [None]:
#  Inference (practical use of the fine-tuned model)

# I load the best weights found
best_YOLOv8n = results_YOLOv8n.best or f'runs_finetune/person_yolov8n/weights/best.pt' # or '...': is a manual backup in case results.best does not exist

model_inf_YOLOv8n = YOLO(best_YOLOv8n) # Creates a new model instance by loading the best weights

# Performs inference on one or more images, or on a video, by specifying the path in the source parameter.
# conf=0.25 → Confidence threshold for considering a detection valid.
preds_YOLOv8n = model_inf_YOLOv8n.predict(
  # source='/content/drive/MyDrive/projectUPV/datasets/AERALIS_YOLOv8n/test/images',
  source='/content/AERALIS_YOLOv8n_local/test/images',
  conf=0.25
)
preds_YOLOv8n.show() # displays the predictions (eventually you can also save them)

### YOLOv11n

Ora eseguiamo lo stesso procedimento per YOLOv11n:

In [None]:
src = '/content/drive/MyDrive/projectUPV/datasets/AERALIS_YOLOv11n'
dst = '/content/AERALIS_YOLOv11n_local'  # is now on the local VM, NOT on drive

# If the destination folder already exists, I delete it
if os.path.exists(dst):
  shutil.rmtree(dst)

# Recursive copy of ENTIRE folder (and subfolders)
shutil.copytree(src, dst)
print("Copy completed:", os.path.exists(dst))

In [None]:
!df -h / # It shows the total, used and free space on the root (/) of the Colab VM.

# Avail column: space still available for your files.

In [None]:
# Show space used by your local folder
!du -sh /content/AERALIS_YOLOv11n_local

In [None]:
# Show space occupied by various folders in /content/.
!du -h --max-depth=1 /content/

In [None]:
# YAML dataset (edit routes)
data_yaml = """
train: /content/AERALIS_YOLOv11n_local/train/images
val:   /content/AERALIS_YOLOv11n_local/val/images
test:  /content/AERALIS_YOLOv11n_local/test/images

nc: 1
names: ['person']
"""
open('data.yaml', 'w').write(data_yaml)

In [None]:
# Carica il modello pre-addestrato che vogliamo usare come punto di partenza

model_YOLOv11n = YOLO('yolov11n.pt') # è il modello che verrà fine-tunato sul dataset custom

In [None]:
# Fine‑tuning
results_YOLOv11n = model_YOLOv11n.train(
  data='data.yaml',
  epochs=100,
  imgsz=640,
  batch=16,
  patience=20,
  workers=2,
  device=0,
  project='runs_finetune',
  name='person_yolov11n'
)

In [None]:
# Evaluates the trained model using the TEST SET defined in data.yaml

metrics_YOLOv11n = model_YOLOv11n.val(data='data.yaml', split='test') # returns accuracy metrics (e.g., mAP, precision, recall, etc.) on the test set
print("Test metrics:", metrics_YOLOv11n)

In [None]:
#  Inference (practical use of the fine-tuned model)

# I load the best weights found
best_YOLOv11n = results_YOLOv11n.best or f'runs_finetune/person_yolov11n/weights/best.pt' # or '...': is a manual backup in case results.best does not exist

model_inf_YOLOv11n = YOLO(best_YOLOv11n) # Creates a new model instance by loading the best weights

# Performs inference on one or more images, or on a video, by specifying the path in the source parameter.
# conf=0.25 → Confidence threshold for considering a detection valid.
preds_YOLOv11n = model_inf_YOLOv11n.predict(
  # source='/content/drive/MyDrive/projectUPV/datasets/AERALIS_YOLOv11n/test/images',
  source='/content/AERALIS_YOLOv11n_local/test/images',
  conf=0.25
)
preds_YOLOv11n.show() # displays the predictions (eventually you can also save them)

### EfficientDet D0

### EfficientDet D1

### EfficientDet D2

### MobileNetV2 + SSDLite

### MobileNetV3 + SSDLite

# Normalization and Data Augmentation

Per i modelli leggeri ottimizzati come quelli per Jetson Nano, la normalizzazione delle immagini è quasi sempre richiesta prima di passarle al modello.

I modelli in PyTorch lavorano SOLO con tensori, NON con immagini PIL o array NumPy.
- ToTensor() converte un’immagine (PIL o NumPy) in un tensore PyTorch di tipo float32, formato [C, H, W] (canale, altezza, larghezza).

- Inoltre, scala i valori dei pixel da [0,255] a [0,1] automaticamente.

La normalizzazione (Normalize) funziona SOLO su tensori.
La funzione Normalize(mean, std) richiede input già in formato tensore (float) e applica lo shift/scala canale per canale.

Se provi a normalizzare un’immagine PIL o un NumPy array direttamente, ottieni errore o comportamenti inattesi.

Quindi: la sequenza è SEMPRE
(Opzionale) Resize

ToTensor()   →  Converte e scala [0,255] in [0,1]

Normalize()  →  Normalizza ogni canale secondo mean/std richiesto dal modello


\
 In sintesi:
ToTensor è indispensabile, non è solo per PyTorch, ma anche perché la normalizzazione funziona SOLO su tensori, non su immagini raw!

La normalizzazione NON sostituisce ToTensor: lavora sopra i dati già convertiti.

La sequenza ToTensor() + normalizzazione è quasi sempre necessaria, ma i dettagli della normalizzazione (mean, std, range pixel) possono cambiare in base al modello.
Vediamo la situazione per i tuoi modelli:
-


# Fine-Tuning Models (Alexia)

Analizzo il codice di Alexia per avere uno spunto:

1. test_comptage_img.py
Scopo:
Carica un modello YOLO addestrato e conta quanti oggetti della classe 0 (qui chiamati "oiseaux" = uccelli, ma tu potresti adattare a "persone") vengono rilevati in una singola immagine.

In [None]:
from ultralytics import YOLO # Importa la libreria Ultralytics YOLO

# Chargement du modèle entraîné
model = YOLO("../runs/detect/train4/weights/best.pt") # Carica il modello YOLO addestrato dal file best.pt (specificare il percorso giusto)

# Prédiction sur une image
results = model.predict("test4.jpeg") # Esegue la predizione sull'immagine "test4.jpeg"

# Compte des oiseaux (classe 0)
bird_count = sum(1 for cls in results[0].boxes.cls if int(cls) == 0) # Conta quante bounding box appartengono alla classe 0

print(f"Nombre d'oiseaux : {bird_count}") # Stampa il numero di oggetti (classe 0) rilevati


Nota:

Puoi cambiare "classe 0" con "persona" se il tuo modello rileva persone come classe 0.

2. test_comptage_video.py
Scopo:
Carica un modello YOLO addestrato, effettua il tracking (con ByteTrack) e conta quanti oggetti della classe 0 ("oiseaux") entrano in un rettangolo centrale all'interno di un video.
Annota la video con bounding box, ID, conta corrente e totale degli oggetti unici che sono entrati nel rettangolo.

In [None]:
from ultralytics import YOLO                # Importa YOLO da Ultralytics
import cv2                                  # Importa OpenCV per gestione video e immagini
import os                                   # Importa os (qui non usato, ma spesso per path)

# Charger le modèle
model = YOLO("/Users/alexiagaido--amoros/Desktop/UPV-test/entrainement_serveur/runs/detect/train9/weights/best.pt")
# Carica il modello YOLO addestrato (specifica percorso)

# Chemin de la vidéo
video_path = "img_video/video_test_1.mp4"   # Path della video da analizzare
output_path = "output_video.mp4"            # Path della video annotata in output

# Distance des bords pour le rectangle de contact (en pixels)
border_distance = 50                        # Margine dai bordi (pixels) per il rettangolo centrale

# Ouvrir la vidéo
cap = cv2.VideoCapture(video_path)          # Apre la video
if not cap.isOpened():
    print("Erreur : Impossible d'ouvrir la vidéo")   # Se non apre la video, errore
    exit()

# Obtenir les propriétés de la vidéo
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))       # Ottiene larghezza frame
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))     # Ottiene altezza frame
fps = int(cap.get(cv2.CAP_PROP_FPS))                # Ottiene fps

# Définir les coordonnées du rectangle de contact
rect_x1 = border_distance                           # Coordinate x1 del rettangolo
rect_y1 = border_distance                           # Coordinate y1
rect_x2 = width - border_distance                   # Coordinate x2
rect_y2 = height - border_distance                  # Coordinate y2

# Configurer la sortie vidéo
fourcc = cv2.VideoWriter_fourcc(*"mp4v")            # Codec video per output
out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))  # Oggetto per scrivere la video annotata

# Ensemble pour stocker les IDs uniques des oiseaux dans le rectangle
unique_bird_ids = set()                             # Insieme per salvare gli ID unici degli oggetti che sono passati nel rettangolo

while cap.isOpened():
    ret, frame = cap.read()                         # Leggi un frame
    if not ret:
        break

    # Effectuer l'inférence avec suivi
    results = model.track(frame, conf=0.5, tracker="bytetrack.yaml", persist=True)
    # Fa inferenza + tracking, usa ByteTrack, restituisce risultati con ID di tracking

    # Compter les oiseaux dans cette frame
    bird_count = 0
    if results[0].boxes.id is not None:             # Se ci sono ID di tracking
        for box, box_id in zip(results[0].boxes, results[0].boxes.id):   # Scorri bounding box e relativi ID
            # Vérifier si le centre de la bounding box est dans le rectangle
            x_center = (box.xyxy[0][0] + box.xyxy[0][2]) / 2            # Calcola centro x
            y_center = (box.xyxy[0][1] + box.xyxy[0][3]) / 2            # Calcola centro y
            if rect_x1 < x_center < rect_x2 and rect_y1 < y_center < rect_y2:   # Se centro box dentro rettangolo centrale
                unique_bird_ids.add(box_id.item())                      # Aggiungi ID a set (oggetti unici che sono passati)
                bird_count += 1                                         # Conta per questa frame

    # Annoter l'image avec les détections et IDs
    annotated_frame = results[0].plot()              # Disegna box e ID sul frame

    # Dessiner le rectangle de contact
    cv2.rectangle(
        annotated_frame,
        (rect_x1, rect_y1),
        (rect_x2, rect_y2),
        (255, 0, 0),  # Blu
        2,            # Spessore linea
    )

    # Afficher le nombre d'oiseaux dans cette frame et le total unique
    cv2.putText(
        annotated_frame,
        f"Oiseaux dans cette frame : {bird_count}",
        (10, 30),
        cv2.FONT_HERSHEY_SIMPLEX,
        1,
        (0, 255, 0),  # Verde
        2,
    )
    cv2.putText(
        annotated_frame,
        f"Oiseaux uniques : {len(unique_bird_ids)}",
        (10, 60),
        cv2.FONT_HERSHEY_SIMPLEX,
        1,
        (0, 255, 0),
        2,
    )

    # Écrire l'image annotée dans la vidéo de sortie
    out.write(annotated_frame)

    # Afficher l'image en temps réel
    cv2.imshow("YOLO Tracking", annotated_frame)
    if cv2.waitKey(1) & 0xFF == ord("q"):  # Premere 'q' per uscire
        break

# Afficher le total des oiseaux uniques détectés
print(f"Nombre total d'oiseaux uniques détectés dans la vidéo : {len(unique_bird_ids)}")

# Libérer les ressources
cap.release()
out.release()
cv2.destroyAllWindows()

Considerazioni tecniche
Classe 0: Il codice è pensato per oggetti "oiseaux" (uccelli) = classe 0. Se tu hai persone come classe 0, funziona identico.

Tracking (ByteTrack): Permette di assegnare un ID a ogni oggetto/persona che attraversa l’area, così da contarli solo una volta anche se si fermano/muovono nella scena.

Rettangolo di interesse: Conta solo gli oggetti il cui centro entra in una zona centrale, utile ad esempio per contare solo chi passa in una certa area (adattabile per ingressi, uscite, ecc).

Salvataggio video annotato: Il risultato è un video con box, ID e conteggi stampati sopra.