### Transitioning from Custom Model to YOLOv8/YOLOv5 on High-Performance Infrastructure

After successfully building and training a **custom object detection model** using an InceptionV3 backbone, we implemented core components including:

- Data preprocessing and label transformation  
- Grid-based YOLO tensor encoding  
- Custom loss function with IoU-based penalty  
- Manual anchor assignment heuristics  
- Inference with bounding box decoding and Non-Maximum Suppression (NMS)  
- **Selective layer freezing (only `mixed7` was trainable)**  
- **Learning rate decay using `ExponentialDecay` scheduler**

This custom pipeline provided deep insights into how YOLO-style detection works at a granular level. It gave full control over feature extraction, model capacity, and training behaviour. However, it was **computationally expensive** and required careful manual tuning to achieve even modest performance.

---

### Why We Transitioned to YOLOv8 and YOLOv5

To benchmark performance and streamline experimentation, we transitioned to **YOLOv8** and **YOLOv5** models using **GPU-enabled infrastructure** (e.g., Google Cloud). These models offer:

- **Superior accuracy out of the box**
- **Faster training and inference** via highly optimised pipelines
- **Minimal setup** — no need to define losses, backbones, or training loops
- Built-in support for:
  - Anchor-free and anchor-based detection  
  - Data augmentation  
  - Post-processing (NMS, confidence thresholds, etc.)

Unlike our custom setup, these models allowed us to skip architecture design, preprocessing pipelines, and manual loss handling — drastically speeding up the training-validation cycle.

---

> - This experience highlights a common engineering approach:
> - Start **custom** to understand the inner workings and design choices  
> - Then move to **robust, production-ready tools** when efficiency and scalability become priorities

---

We began the transition by installing the `ultralytics` package:
```bash
pip install ultralytics
```
This enabled us to load and evaluate YOLOv8 and YOLOv5 models easily using their `.predict()` API and visualise results immediately.

In [None]:
pip install ultralytics

Collecting ultralytics
  Downloading ultralytics-8.3.168-py3-none-any.whl.metadata (37 kB)
Collecting ultralytics-thop>=2.0.0 (from ultralytics)
  Downloading ultralytics_thop-2.0.14-py3-none-any.whl.metadata (9.4 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.8.0->ultralytics)
  Downloading n

### Dataset Preparation for YOLO Training

This section unpacks the dataset ZIP archive and prepares the folder structure required by Ultralytics YOLOv8:

---

#### Step 1: Unzip the Dataset
- The dataset (in YOLO format) is downloaded from Google Drive as a `.zip` file.
- It is extracted to a folder next to the ZIP location using Python's `zipfile` module.

---

#### Step 2: Define and Create Dataset Paths
- A working directory `yolo_dataset/` is created within the extracted dataset folder.
- This will hold the **training/validation images and labels** in the structure required by YOLO.

---

#### Step 3: Directory Structure Setup

YOLO expects the following structure:

```kotlin
yolo_dataset/
│
├── images/
│ ├── train/
│ └── val/
│
└── labels/
├── train/
└── val/
```

Each image should have a corresponding `.txt` file in the `labels/` folder with the same filename, containing class and bounding box information.

This setup ensures compatibility with YOLOv8’s built-in training functions.


In [None]:
import os                                              # For file and directory operations
import zipfile                                         # To handle .zip file extraction
import shutil                                          # For file/folder copying and moving
import glob                                            # For pattern matching file paths (e.g., *.jpg)
from sklearn.model_selection import train_test_split   # For splitting dataset into train/val
from ultralytics import YOLO                           # Import YOLO (Ultralytics) for training/detection

# === Step 1: Unzip your dataset ===

# Define the path to your zipped dataset (assumed stored in Google Drive)
ZIP_PATH = r"/content/drive/MyDrive/People Detection -General-.v8i.darknet.zip"

# Define the folder where the zip will be extracted.
# This joins the zip's parent directory with the intended output folder.
EXTRACTED_PATH = os.path.join(os.path.dirname(ZIP_PATH), "People Detection -General-.v8i.darknet")

# Open the zip file and extract all contents to the specified directory
with zipfile.ZipFile(ZIP_PATH, 'r') as zip_ref:
    zip_ref.extractall(EXTRACTED_PATH)


# === Step 2: Define paths ===
# Set the base path to the extracted dataset directory
BASE_PATH = EXTRACTED_PATH

# Define a subdirectory to store processed data or outputs
DATASET_PATH = os.path.join(BASE_PATH, "yolo_dataset")

# Create the yolo_dataset directory if it doesn't exist already
os.makedirs(DATASET_PATH, exist_ok=True)


# === Step 3: Prepare directory structure ===
# Create folders for training and validation sets for both images and labels
# Required by YOLO format: one .txt file per image with matching filename
for subfolder in ["images/train", "images/val", "labels/train", "labels/val"]:
    os.makedirs(os.path.join(DATASET_PATH, subfolder), exist_ok=True)

### Organise Dataset and Create YOLO Config File (`people.yaml`)

This section prepares the dataset for training with Ultralytics YOLOv8 by performing two main tasks:

---

#### Step 4: Organise Images and Labels

- All `.jpg` images are collected from:
  - `train/` → moved to `images/train/`
  - `test/` → moved to `images/val/`
- Each image’s corresponding `.txt` label file (in YOLO format) is also copied to:
  - `labels/train/` or `labels/val/`
- Each label must match the image filename (e.g., `dog.jpg` ↔ `dog.txt`) and contain lines in the format:

```php-template
<class_id> <x_center> <y_center> <width> <height>
```

with all values normalised to `[0, 1]`.

---

#### 📄 Step 5: Create the `people.yaml` File

This file is required by YOLOv8 for training and specifies:
- The root `path` to the dataset
- Relative paths to the training/validation images
- `nc`: Number of object classes (`1`)
- `names`: List of class names (`['person']`)

Example content:
```yaml
path: /content/People Detection -General-.v8i.darknet/yolo_dataset
train: images/train
val: images/val
nc: 1
names: ['person']
```

This completes the YOLO-compatible dataset setup, ready for training using:

``` python
YOLO("yolov8x.yaml").train(data="people.yaml", epochs=...)
```



In [None]:
# === Step 4: Move images and labels ===

# Get all training image paths (*.jpg) from 'train' folder inside the extracted dataset
train_imgs = glob.glob(os.path.join(BASE_PATH, "train", "*.jpg"))

# Get all validation image paths (*.jpg) from 'test' folder inside the extracted dataset
val_imgs = glob.glob(os.path.join(BASE_PATH, "test", "*.jpg"))

# Define a function to move images and their matching label files to the correct destination
def move_files(img_list, subset):
    for img_path in img_list:
        label_path = img_path.replace(".jpg", ".txt")  # Assumes label filename matches image
        img_name = os.path.basename(img_path)          # Extracts just the filename from path
        label_name = os.path.basename(label_path)

        # Copy image to the images/<subset>/ folder
        shutil.copy(img_path, os.path.join(DATASET_PATH, f"images/{subset}", img_name))

        # If a label file exists for this image, copy it to labels/<subset>/
        if os.path.exists(label_path):
            shutil.copy(label_path, os.path.join(DATASET_PATH, f"labels/{subset}", label_name))

# Move training images and labels to the correct folders
move_files(train_imgs, "train")

# Move validation images and labels to the correct folders
move_files(val_imgs, "val")


# === Step 5: Create people.yaml file ===

# Convert backslashes to forward slashes in path (for compatibility across OS)
yaml_path_clean = DATASET_PATH.replace("\\", "/")

# Define the content of the YOLO dataset configuration file
# - path: root directory of the dataset
# - train: relative path to training images
# - val: relative path to validation images
# - nc: number of object classes (1 in this case)
# - names: list of class names
yaml_content = f"""
path: {yaml_path_clean}
train: images/train
val: images/val
nc: 1
names: ['person']
"""

# Write the YAML configuration to a file called 'people.yaml' in the dataset folder
with open(os.path.join(DATASET_PATH, "people.yaml"), "w") as f:
    f.write(yaml_content.strip())

### Quick Training and Inference with YOLOv5 (Low Infrastructure)

---

#### Step 6: Lightweight YOLOv5 Training

This step demonstrates how to train a **YOLOv5 model** on a custom dataset using **limited computational resources** (e.g., CPU or a small GPU):

- Pretrained model: `yolov5l.pt` (YOLOv5 Large)
- Training config:
  - **Epochs**: 5 (quick training pass)
  - **Image size**: 640×640 (standard resolution)
  - **Batch size**: 16 (adjustable for GPU/VRAM limits)
  - **Caching**: Disabled to conserve RAM
  - **Workers**: 8 threads for efficient data loading
- This setup is ideal for fast iteration, testing pipelines, and training on shared or free resources (e.g., Google Colab with T4 GPUs)

---

#### Step 7: Test Prediction

After training, the model performs inference on a sample image from the validation set:

- The result is **automatically saved** to the `runs/detect/predict/` folder.
- If `show=True`, the image is also rendered in a local pop-up window (if supported).

---

> YOLOv5 offers strong accuracy and fast convergence even in low-epoch, low-resource settings — making it a great tool for rapid experimentation before scaling up training on YOLOv8 or larger models.


In [None]:
# === Step 6: Train the YOLOv5 model with low infrastructure (fast & inexpensive) ===

# Load a pre-trained YOLOv5 'large' model (you can also try 's' or 'n' for lighter versions)
model = YOLO("yolov5l.pt")

# Train the model on your custom dataset
model.train(
    data=os.path.join(DATASET_PATH, "people.yaml").replace("\\", "/"),  # Path to dataset config
    epochs=5,               # Low number of epochs for quick training
    imgsz=640,              # Image resolution (balanced between speed and accuracy)
    batch=16,               # Batch size (adjust based on available VRAM)
    cache=False,            # Avoid caching images to save memory
    device=0,               # Use GPU 0 (if available), or set to 'cpu' for CPU
    workers=8,              # Number of parallel data loader threads
)

# === Step 7: Predict on a test image ===
TEST_IMAGE = val_imgs[0]  # Predict on the first image from test set
results = model.predict(source=TEST_IMAGE, save=True, show=True)


PRO TIP 💡 Replace 'model=yolov5l.pt' with new 'model=yolov5lu.pt'.
YOLOv5 'u' models are trained with https://github.com/ultralytics/ultralytics and feature improved performance vs standard YOLOv5 models trained with https://github.com/ultralytics/yolov5.

Downloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolov5lu.pt to 'yolov5lu.pt'...


100%|██████████| 102M/102M [00:00<00:00, 287MB/s] 


Ultralytics 8.3.167 🚀 Python-3.11.13 torch-2.6.0+cu124 CUDA:0 (Tesla T4, 15095MiB)
[34m[1mengine/trainer: [0magnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=/content/drive/MyDrive/People Detection -General-.v8i.darknet/yolo_dataset/people.yaml, degrees=0.0, deterministic=True, device=0, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=5, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov5l.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=train4, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=Tr

[34m[1mtrain: [0mScanning /content/drive/MyDrive/People Detection -General-.v8i.darknet/yolo_dataset/labels/train.cache... 16195 images, 2746 backgrounds, 0 corrupt: 100%|██████████| 16195/16195 [00:00<?, ?it/s]

[34m[1malbumentations: [0mBlur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01, method='weighted_average', num_output_channels=3), CLAHE(p=0.01, clip_limit=(1.0, 4.0), tile_grid_size=(8, 8))





[34m[1mval: [0mFast image access ✅ (ping: 2.2±3.3 ms, read: 26.3±26.5 MB/s, size: 65.1 KB)


[34m[1mval: [0mScanning /content/drive/MyDrive/People Detection -General-.v8i.darknet/yolo_dataset/labels/val.cache... 1897 images, 3 backgrounds, 0 corrupt: 100%|██████████| 1897/1897 [00:00<?, ?it/s]


Plotting labels to runs/detect/train4/labels.jpg... 
[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.002, momentum=0.9) with parameter groups 113 weight(decay=0.0), 120 weight(decay=0.0005), 119 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 2 dataloader workers
Logging results to [1mruns/detect/train4[0m
Starting training for 5 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        1/5      9.57G       1.58      1.579      1.526         33        640: 100%|██████████| 1013/1013 [13:44<00:00,  1.23it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:38<00:00,  1.54it/s]


                   all       1897       4731      0.673      0.539      0.615      0.255

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        2/5       9.8G      1.575      1.614      1.538         15        640: 100%|██████████| 1013/1013 [13:15<00:00,  1.27it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:36<00:00,  1.62it/s]


                   all       1897       4731      0.681      0.589      0.677      0.323

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        3/5      9.71G      1.497      1.501      1.479         12        640: 100%|██████████| 1013/1013 [13:00<00:00,  1.30it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:37<00:00,  1.60it/s]


                   all       1897       4731      0.702      0.658      0.679      0.255

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        4/5      9.93G      1.419      1.396      1.429         11        640: 100%|██████████| 1013/1013 [12:58<00:00,  1.30it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:37<00:00,  1.60it/s]


                   all       1897       4731      0.753      0.711      0.769      0.359

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        5/5      9.83G      1.334      1.267      1.364          2        640: 100%|██████████| 1013/1013 [12:55<00:00,  1.31it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:36<00:00,  1.63it/s]


                   all       1897       4731      0.776       0.75      0.793      0.369

5 epochs completed in 1.167 hours.
Optimizer stripped from runs/detect/train4/weights/last.pt, 106.8MB
Optimizer stripped from runs/detect/train4/weights/best.pt, 106.8MB

Validating runs/detect/train4/weights/best.pt...
Ultralytics 8.3.167 🚀 Python-3.11.13 torch-2.6.0+cu124 CUDA:0 (Tesla T4, 15095MiB)
YOLOv5l summary (fused): 128 layers, 53,132,179 parameters, 0 gradients, 134.7 GFLOPs


                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:40<00:00,  1.48it/s]


                   all       1897       4731      0.773       0.75      0.793      0.369
Speed: 0.2ms preprocess, 15.3ms inference, 0.0ms loss, 1.6ms postprocess per image
Results saved to [1mruns/detect/train4[0m

image 1/1 /content/drive/MyDrive/People Detection -General-.v8i.darknet/test/005304_jpg.rf.19efca555ace117390f5a1e9761e630a.jpg: 640x640 2 persons, 43.6ms
Speed: 2.2ms preprocess, 43.6ms inference, 1.9ms postprocess per image at shape (1, 3, 640, 640)
Results saved to [1mruns/detect/train42[0m


### YOLOv5lu (Ultralytics Upgraded) – Training Summary on Google Cloud GPU

We trained the upgraded **YOLOv5l-u model (`yolov5lu.pt`)** using **Ultralytics 8.3+**, which offers better performance than the legacy YOLOv5 from the original repo.

> **PRO TIP**
> YOLOv5-u models (`yolov5lu.pt`) are pretrained via the **Ultralytics v8 engine**, improving training convergence, mAP, and runtime stability.

---

#### Training Configuration:

- **Model**: `yolov5lu.pt` (YOLOv5 Large, upgraded)
- **Epochs**: 5
- **Image size**: 640×640
- **Batch size**: 16
- **Optimizer**: Automatically selected (AdamW)
- **GPU**: Tesla T4 (15 GB memory)
- **Total Training Time**: ~1.17 hours

---

#### Performance Summary (Epoch 5/5):

| Metric       | Value     |
|--------------|-----------|
| Precision    | **0.776** |
| Recall       | **0.750** |
| mAP@0.5      | **0.793** |
| mAP@0.5:0.95 | **0.369** |

> The model shows **high detection performance** with very good **box precision and class confidence** after just 5 epochs — validating YOLOv5's efficiency on high-quality datasets.

---

#### Inference Speed (per image):

- **Preprocess**: 2.2 ms  
- **Inference**: 43.6 ms  
- **Postprocess**: 1.9 ms  
- **Total**: ~47.7 ms per image at 640×640

---

#### Outputs:

- Trained model saved to: `runs/detect/train4/weights/best.pt`
- Annotated predictions: `runs/detect/train42/`
- Labels plot: `runs/detect/train4/labels.jpg`

---

### Insight:
By switching to `yolov5lu.pt`, we leveraged the latest training pipeline under Ultralytics v8, which simplifies training and boosts performance **without needing any custom architecture design** or learning rate tuning. This makes it an ideal choice for deployment-ready pipelines on GPU-based infrastructure.


### Step 6: Training the YOLOv8x Supermodel with High-End Infrastructure

We upgraded to **YOLOv8x** — the **largest and most accurate version** available from Ultralytics — to take full advantage of high-performance GPU resources.

Compared to our custom Inception-based model and lighter YOLO variants, YOLOv8x provides:

- **State-of-the-art accuracy** with deeper backbone and advanced architecture  
- **Faster convergence** using optimised pipelines and mixed precision (AMP)  
- **Automatic post-processing** (IoU-based NMS, confidence filtering)  
- **Simplified training workflow** requiring minimal manual engineering

> Ideal for final-stage model deployment or benchmarking on powerful hardware (e.g., A100).

#### 🔧 Training Configuration Summary:

| Parameter       | Value                          |
|------------------|-------------------------------|
| **Model**        | `yolov8x.pt` (Ultralytics)     |
| **Epochs**       | 20                             |
| **Image Size**   | 640 × 640                      |
| **Batch Size**   | 16                             |
| **Device**       | `GPU:0` (CUDA-enabled)         |
| **Cache**        | Enabled (for faster access)    |
| **Workers**      | 16                             |
| **Dataset**      | `people.yaml` (custom dataset) |

#### Learning Improvements vs Previous Models:

- **No manual loss engineering** (vs custom YOLO-Inception)
- **Higher precision and recall** (vs YOLOv5 on same data)
- **Time-efficient**, even on large datasets, using `ultralytics` pre-built pipeline

---

Training this version is especially valuable for **production use cases** or **benchmarking** against academic or business-grade object detection tasks.


In [None]:
# === Step 6: Train the YOLOv8x model (super model with expensive infrastructure) ===

# Load the YOLOv8x model (largest and most accurate version from Ultralytics)
model = YOLO("yolov8x.pt")

# Train the model on your custom dataset with more compute resources
model.train(
    data=os.path.join(DATASET_PATH, "people.yaml").replace("\\", "/"),  # Path to dataset config file
    epochs=20,            # Train for more epochs to increase accuracy
    imgsz=640,            # Input image size (standard for YOLOv8)
    batch=16,             # Larger batch size, assuming enough GPU memory
    cache=True,           # Cache dataset in memory to speed up training (uses more RAM)
    device=0,             # Use GPU 0 for training (ensure CUDA is available)
    workers=16,           # More data loading workers (useful for powerful CPUs)
)


Ultralytics 8.3.168 🚀 Python-3.11.13 torch-2.6.0+cu124 CUDA:0 (NVIDIA A100-SXM4-40GB, 40507MiB)
[34m[1mengine/trainer: [0magnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=True, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=/content/drive/MyDrive/People Detection -General-.v8i.darknet/yolo_dataset/people.yaml, degrees=0.0, deterministic=True, device=0, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=20, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8x.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=train5, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, ov

[34m[1mtrain: [0mScanning /content/drive/MyDrive/People Detection -General-.v8i.darknet/yolo_dataset/labels/train.cache... 16195 images, 2746 backgrounds, 0 corrupt: 100%|██████████| 16195/16195 [00:00<?, ?it/s]


[34m[1malbumentations: [0mBlur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01, method='weighted_average', num_output_channels=3), CLAHE(p=0.01, clip_limit=(1.0, 4.0), tile_grid_size=(8, 8))
[34m[1mval: [0mFast image access ✅ (ping: 1.1±1.1 ms, read: 7.4±7.5 MB/s, size: 67.6 KB)


[34m[1mval: [0mScanning /content/drive/MyDrive/People Detection -General-.v8i.darknet/yolo_dataset/labels/val.cache... 1897 images, 3 backgrounds, 0 corrupt: 100%|██████████| 1897/1897 [00:00<?, ?it/s]




[34m[1mval: [0mCaching images (2.2GB RAM): 100%|██████████| 1897/1897 [00:03<00:00, 519.08it/s]


Plotting labels to runs/detect/train5/labels.jpg... 
[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.002, momentum=0.9) with parameter groups 97 weight(decay=0.0), 104 weight(decay=0.0005), 103 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 12 dataloader workers
Logging results to [1mruns/detect/train5[0m
Starting training for 20 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       1/20      30.9G      1.572      1.608      1.555         27        640: 100%|██████████| 1013/1013 [03:53<00:00,  4.34it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:13<00:00,  4.60it/s]


                   all       1897       4731      0.607      0.572       0.54      0.212

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       2/20      31.5G      1.565        1.6      1.543          9        640: 100%|██████████| 1013/1013 [03:42<00:00,  4.56it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:12<00:00,  4.64it/s]


                   all       1897       4731      0.684      0.646      0.673       0.27

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       3/20      31.5G      1.497      1.498       1.49         33        640: 100%|██████████| 1013/1013 [03:39<00:00,  4.62it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:12<00:00,  4.67it/s]


                   all       1897       4731      0.732      0.639      0.702      0.309

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       4/20      31.7G      1.435      1.397      1.444         15        640: 100%|██████████| 1013/1013 [03:36<00:00,  4.68it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:12<00:00,  4.70it/s]


                   all       1897       4731      0.723      0.736      0.762      0.363

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       5/20      31.8G      1.387      1.342      1.414          5        640: 100%|██████████| 1013/1013 [03:36<00:00,  4.68it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:12<00:00,  4.71it/s]


                   all       1897       4731      0.773      0.718      0.747      0.344

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       6/20      31.5G      1.344      1.263      1.384          9        640: 100%|██████████| 1013/1013 [03:36<00:00,  4.67it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:12<00:00,  4.73it/s]

                   all       1897       4731      0.764      0.753      0.778      0.392






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       7/20      31.8G       1.32      1.217      1.362          8        640: 100%|██████████| 1013/1013 [03:36<00:00,  4.68it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:12<00:00,  4.74it/s]

                   all       1897       4731      0.768      0.752      0.793      0.404






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       8/20      31.8G      1.297      1.186      1.356         12        640: 100%|██████████| 1013/1013 [03:36<00:00,  4.68it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:12<00:00,  4.72it/s]

                   all       1897       4731      0.771      0.788      0.803      0.404






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       9/20      31.7G      1.272      1.143      1.333         27        640: 100%|██████████| 1013/1013 [03:37<00:00,  4.65it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:12<00:00,  4.73it/s]

                   all       1897       4731      0.769      0.768      0.806      0.437






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      10/20      31.5G      1.247      1.113       1.32         30        640: 100%|██████████| 1013/1013 [03:36<00:00,  4.69it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:12<00:00,  4.75it/s]

                   all       1897       4731      0.761      0.791      0.818      0.461





Closing dataloader mosaic
[34m[1malbumentations: [0mBlur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01, method='weighted_average', num_output_channels=3), CLAHE(p=0.01, clip_limit=(1.0, 4.0), tile_grid_size=(8, 8))

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      11/20      31.8G       1.27     0.9869      1.337          3        640: 100%|██████████| 1013/1013 [03:50<00:00,  4.39it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:12<00:00,  4.75it/s]

                   all       1897       4731       0.78      0.791      0.821      0.478






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      12/20      31.7G      1.243     0.9652      1.328          2        640: 100%|██████████| 1013/1013 [03:35<00:00,  4.71it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:12<00:00,  4.74it/s]

                   all       1897       4731      0.772      0.808      0.817      0.477






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      13/20      31.7G      1.209     0.9197      1.304         13        640: 100%|██████████| 1013/1013 [03:35<00:00,  4.71it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:12<00:00,  4.76it/s]

                   all       1897       4731      0.769      0.832      0.841      0.513






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      14/20      31.5G       1.18     0.8911      1.283          1        640: 100%|██████████| 1013/1013 [03:35<00:00,  4.71it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:12<00:00,  4.76it/s]

                   all       1897       4731      0.801      0.806      0.853      0.505






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      15/20      31.8G      1.155     0.8447      1.267          2        640: 100%|██████████| 1013/1013 [03:35<00:00,  4.71it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:12<00:00,  4.76it/s]

                   all       1897       4731      0.786      0.815      0.837      0.524






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      16/20      31.8G      1.133      0.814      1.247          4        640: 100%|██████████| 1013/1013 [03:35<00:00,  4.70it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:12<00:00,  4.73it/s]

                   all       1897       4731      0.794      0.804      0.848      0.546






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      17/20      31.7G      1.099     0.7896      1.231          3        640: 100%|██████████| 1013/1013 [03:35<00:00,  4.71it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:12<00:00,  4.76it/s]

                   all       1897       4731      0.789      0.828      0.864      0.568






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      18/20      31.5G      1.067     0.7561      1.211          6        640: 100%|██████████| 1013/1013 [03:35<00:00,  4.71it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:12<00:00,  4.78it/s]

                   all       1897       4731      0.793      0.824      0.857       0.57






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      19/20      31.8G      1.034     0.7267      1.195          4        640: 100%|██████████| 1013/1013 [03:35<00:00,  4.71it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:12<00:00,  4.78it/s]

                   all       1897       4731      0.783      0.837       0.86       0.58






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      20/20      31.7G      1.011     0.6932      1.173          3        640: 100%|██████████| 1013/1013 [03:35<00:00,  4.71it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:12<00:00,  4.78it/s]

                   all       1897       4731      0.791      0.832      0.854      0.585






20 epochs completed in 1.302 hours.
Optimizer stripped from runs/detect/train5/weights/last.pt, 136.7MB
Optimizer stripped from runs/detect/train5/weights/best.pt, 136.7MB

Validating runs/detect/train5/weights/best.pt...
Ultralytics 8.3.168 🚀 Python-3.11.13 torch-2.6.0+cu124 CUDA:0 (NVIDIA A100-SXM4-40GB, 40507MiB)
Model summary (fused): 112 layers, 68,124,531 parameters, 0 gradients, 257.4 GFLOPs


                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 60/60 [00:13<00:00,  4.34it/s]


                   all       1897       4731       0.79      0.833      0.854      0.585
Speed: 0.1ms preprocess, 3.5ms inference, 0.0ms loss, 1.2ms postprocess per image
Results saved to [1mruns/detect/train5[0m


ultralytics.utils.metrics.DetMetrics object with attributes:

ap_class_index: array([0])
box: ultralytics.utils.metrics.Metric object
confusion_matrix: <ultralytics.utils.metrics.ConfusionMatrix object at 0x7efd30fb2810>
curves: ['Precision-Recall(B)', 'F1-Confidence(B)', 'Precision-Confidence(B)', 'Recall-Confidence(B)']
curves_results: [[array([          0,    0.001001,    0.002002,    0.003003,    0.004004,    0.005005,    0.006006,    0.007007,    0.008008,    0.009009,     0.01001,    0.011011,    0.012012,    0.013013,    0.014014,    0.015015,    0.016016,    0.017017,    0.018018,    0.019019,     0.02002,    0.021021,    0.022022,    0.023023,
          0.024024,    0.025025,    0.026026,    0.027027,    0.028028,    0.029029,     0.03003,    0.031031,    0.032032,    0.033033,    0.034034,    0.035035,    0.036036,    0.037037,    0.038038,    0.039039,     0.04004,    0.041041,    0.042042,    0.043043,    0.044044,    0.045045,    0.046046,    0.047047,
          0.048048, 

## YOLOv8x Training Summary – High-Performance Model

Training was conducted on **YOLOv8x.pt**, Ultralytics’ largest and most accurate object detection model, using the following configuration:

### Environment & Setup

- **Framework**: Ultralytics v8.3.168  
- **Backend**: PyTorch 2.6.0 + CUDA 12.4  
- **Device**: `NVIDIA A100-SXM4-40GB`
- **Model**: `YOLOv8x` (68.1M params, 258.1 GFLOPs)  
- **Dataset**: `people.yaml` – single class ("person")  
- **Image Size**: 640x640  
- **Epochs**: 20  
- **Batch Size**: 16  
- **Workers**: 16  
- **Cache**: Enabled (fallback due to RAM limits)  
- **Augmentations**: Blur, MedianBlur, ToGray, CLAHE  
- **Optimizer**: `AdamW` (auto-selected)

---

### Performance Metrics

| Metric              | Value   |
|---------------------|---------|
| **Precision**       | 0.790   |
| **Recall**          | 0.833   |
| **mAP@0.50**        | 0.854   |
| **mAP@0.50:0.95**   | 0.585   |
| **Fitness Score**   | 0.612   |

- **Best model saved at**: `runs/detect/train5/weights/best.pt`  
- **Validation Speed**: 3.5ms inference + 1.2ms postprocess/image  
- **Training Time**: ~1.3 hours (20 epochs)

---

### Model Architecture (Simplified)

- 209 layers  
- 68,153,571 parameters  
- 595 total modules  
- Key blocks:
  - Multiple `C2f` and `Conv` layers
  - Upsampling + Concat for feature fusion
  - Final `Detect` head with shape `[1, [320, 640, 640]]`  

---

### Observations

- The model consistently improved **mAP@50** from `0.54` (epoch 1) to `0.85` (epoch 20).
- **mAP@50-95** reached `0.585`, indicating strong localisation across IoU thresholds.
- **Memory** usage peaked at ~31.8GB on the A100 GPU.
- The model handled 4731 validation instances across 1897 images.

---

### Key Takeaways

- **YOLOv8x** offers excellent performance with minimal tuning required.
- Using **high compute resources** (A100 + fast storage), training was fast and accurate.
- Ready for **production deployment** or **fine-tuning on edge tasks**.

> Final model: `runs/detect/train5/weights/best.pt`



### Downloading the Trained YOLOv8 Model (`best.pt`) from Colab

After completing the training, the best weights are saved in:

```php
<runs/detect/<latest_run>/weights/best.pt>
```


The following code does three things:

1. **Finds the latest training run folder** inside `runs/detect`.
2. **Builds the full path** to the `best.pt` file (best-performing checkpoint).
3. **Triggers the download** of this file to your local machine.

In [None]:
from google.colab import files

# Identify the latest training run
latest_run = sorted(os.listdir("runs/detect"))[-1]

# Construct full path to the best model weights
model_path = f"runs/detect/{latest_run}/weights/best.pt"

# Download the model file to your local machine
files.download(model_path)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>