# BottleCap Color Sorting with YOLOv11pico

![Banner](../assets/banner.png)

## Project Background
Effective waste management, specifically plastic recycling, is a major global challenge. Separating bottle caps from bottles is a critical step because they are often made of different plastics (e.g., PP vs. PET). Manual sorting is slow and expensive, while industrial machines are often too large for smaller facilities.

## Introduction
The goal of this project is to build a **high-speed computer vision system** capable of detecting and classifying bottle caps into three categories: **Light Blue**, **Dark Blue**, and **Others**.

Hardware Constraint: The system is designed to run on an edge device, specifically a **Raspberry Pi 5 equipped with an AI Accelerator (e.g., Hailo-8L or Coral TPU)**. To ensure smooth integration with mechanical sorters (like air jets or robotic arms), the model must achieve an inference latency of **5-10ms** per frame.

### ⚠️ Important Note on Environment
To achieve the strict latency requirement, I developed a custom, smaller version of YOLOv11 called **YOLOv11p (Pico)**.
* **Repo:** This notebook runs inside a cloned and modified version of the Ultralytics repository.
* **Modifications:** I modified `ultralytics/nn/tasks.py` to support a new scaling factor **Pico** (`p`) which is smaller than the standard **Nano** (`n`) model.

In [2]:
import sys
import os

# Get the absolute path of the parent directory (project root)
project_root = os.path.abspath('..')

# Add it to sys.path if it's not already there
if project_root not in sys.path:
    sys.path.append(project_root)


import wandb
from torch.cuda import empty_cache
import gc

## 1. Environment Setup
First, we set up the project paths. Since this notebook sits inside the repository structure, we need to ensure the system path sees the root directory. We also configure Ultralytics to store datasets and training artifacts in specific folders to keep the project organized.

In [3]:
from ultralytics import settings as ultralytics_settings
ultralytics_settings.update({
    'datasets_dir': os.path.join(project_root, 'datasets'),
    'weights_dir': os.path.join(project_root, 'weights'),
    'runs_dir': os.path.join(project_root, 'runs'),
    'mlflow': False,
    'wandb': True
})

In [4]:
from bsort.utils import (
    extract_and_prepare_dataset, 
    download_from_roboflow,
    print_metrics,
    log_metrics_to_wandb
)
from ultralytics import YOLO
from pathlib import Path

## 2. Experiment Configuration
Here, I define the project constants. I will run multiple experiments:
1.  **Baseline S & N:** Standard YOLO models (with COCO pretrained weights) to establish a performance benchmark.
2.  **Baseline P:** My custom "Pico" model trained from scratch (random weights).
3.  **Pretrain P:** The Pico model pretrained on a public dataset to learn the shape of bottle caps.
4.  **Finetune P:** The final model, transfer-learned from the public dataset to the specific challenge dataset.

In [5]:
PROJECT_NAME = "bottle_cap_project_temp"
BASELINE_S_RUN_NAME = "baseline_yolo11s"
BASELINE_N_RUN_NAME = "baseline_yolo11n"
BASELINE_P_RUN_NAME = "baseline_yolo11p"
PRETRAIN_P_RUN_NAME = "pretrain_yolo11p"
FINE_TUNE_P_RUN_NAME = "finetune_yolo11p"

## 3. Data Preparation Strategy

### The "Cold Start" Problem
Standard YOLO models (N, S, M) come pretrained on the COCO dataset, which gives them a good understanding of general features. However, my custom **YOLOv11p** is a new architecture, so it has no pretrained weights.

If I train the Pico model directly on the provided small sample dataset (only ~20 images), it will likely fail to generalize.

**Solution:**
I will use a **Public Dataset** from Roboflow containing generic bottle caps to pretrain the Pico model. This allows the model to learn what a "bottle cap" looks like before we teach it to distinguish the specific colors.

<a href="https://universe.roboflow.com/work3-dqzz5/bottle-cap-y6pzg">CLICK HERE FOR PUBLIC DATASET PAGE</a>

![Public Dataset Example](../assets/public_dataset.png)

In [None]:
target_location = Path("..") / "datasets" / "processed" / "public"
public_config_path = target_location / "data.yaml"

roboflow_config = {
    "workspace": "work3-dqzz5",
    "project": "bottle-cap-y6pzg",
    "version": 1
}

download_from_roboflow(roboflow_config["roboflow"], target_location)

KeyError: 'workspace'

### Preparing the Challenge Dataset
The provided sample dataset is very small. To ensure consistent evaluation, I decided not to use a random split. Instead, I manually separated the images into **Train** and **Validation** sets.

I created a utility function `extract_and_prepare_dataset` that:
1.  Extracts the raw images.
2.  Splits them based on a hardcoded list (to keep classes balanced).
3.  **Relabels the data:** The original labels are generic. I used a script to check the filename/folder and assign the correct class ID: `0: Others`, `1: Light Blue`, `2: Dark Blue`.

![Sample Dataset Example](../assets/sample_dataset.png)

In [None]:
zip_file_path = Path("..") / "datasets" / "sample.zip"
dataset_root_path = Path("..") / "datasets"

sample_config_path = extract_and_prepare_dataset(
    zip_path=zip_file_path,
    root_dir=dataset_root_path
)

## 4. Establishing Benchmarks

### Experiment A: YOLOv11s (Small)
I start by training the **YOLOv11s** model. This model serves as the "Gold Standard" for accuracy.

In [None]:
with wandb.init(project=PROJECT_NAME, name=BASELINE_S_RUN_NAME) as run:    
    model_baseline_s = YOLO("yolo11s.pt")
    
    results_baseline_s = model_baseline_s.train(
        data=str(sample_config_path),
        project=PROJECT_NAME,
        name=BASELINE_S_RUN_NAME,
        epochs=150,
        verbose=True,

        hsv_h=0.0,
        hsv_s=0.25,
        hsv_v=0.25,
        mixup=0.0,
        optimizer="SGD",
        lr0=0.02,
        weight_decay=0.01,
        fliplr=0.5,
        mosaic=0.25
    )
    
    log_metrics_to_wandb(
        results_baseline_s, 
        run_id=run.id, 
        project_name=PROJECT_NAME
    )
    
    del model_baseline_s
    gc.collect()
    empty_cache()

### Result Analysis Baseline YOLOv11s
The Small model achieved a high **mAP @ 50-95 of 0.9172** and a **Recall of 0.97**. This confirms that the dataset is high-quality and solvable. This score sets the upper bound of what is possible with this data.

In [None]:
print("Baseline YOLO11s Result on Sample Dataset:")
print_metrics(results_baseline_s)

### Experiment B: YOLOv11n (Nano)
Next, I train the **YOLOv11n**. This is the smallest standard model provided by Ultralytics.

In [None]:
with wandb.init(project=PROJECT_NAME, name=BASELINE_N_RUN_NAME) as run:    
    model_baseline_n = YOLO("yolo11n.pt")
    
    results_baseline_n = model_baseline_n.train(
        data=str(sample_config_path),
        project=PROJECT_NAME,
        name=BASELINE_N_RUN_NAME,
        epochs=150,
        verbose=True,

        hsv_h=0.0,
        hsv_s=0.25,
        hsv_v=0.25,
        mixup=0.0,
        optimizer="SGD",
        lr0=0.02,
        weight_decay=0.01,
        fliplr=0.5,
        mosaic=0.25
    )
    
    log_metrics_to_wandb(
        results_baseline_n, 
        run_id=run.id, 
        project_name=PROJECT_NAME
    )
    
    del model_baseline_n
    gc.collect()
    empty_cache()

### Result Analysis Baseline YOLOv11n
The Nano model performed impressively, achieving an **mAP @ 50-95 of 0.9018**. This is only ~1.5% lower than the significantly larger "Small" model. It suggests that a lighter architecture can still handle this task effectively without a major drop in accuracy.

In [None]:
print("Baseline YOLO11n Result on Sample Dataset:")
print_metrics(results_baseline_n)

## 5. The "Pico" Model (YOLOv11p)

To strictly meet the **5-10ms** inference time on the edge device, I use the custom `yolov11p.yaml` (Pico) definition (Scale: 0.25, 0.25).

In [None]:
with wandb.init(project=PROJECT_NAME, name=BASELINE_P_RUN_NAME) as run:
    model_baseline_p = YOLO("yolo11p.yaml")
    
    results_baseline_p = model_baseline_p.train(
        data=str(sample_config_path),
        project=PROJECT_NAME,
        name=BASELINE_P_RUN_NAME,
        epochs=150,
        verbose=True,

        hsv_h=0.0,
        hsv_s=0.25,
        hsv_v=0.25,
        mixup=0.0,
        optimizer="SGD",
        lr0=0.02,
        weight_decay=0.01,
        fliplr=0.5,
        mosaic=0.25
    )
    
    log_metrics_to_wandb(
        results_baseline_p, 
        run_id=run.id, 
        project_name=PROJECT_NAME
    )
    
    del model_baseline_p
    gc.collect()
    empty_cache()

### Result Analysis (The "Cold Start" Failure)
As hypothesized, the model failed to generalize.
* **mAP @ 50-95:** Dropped drastically to **0.4611**.
* **Recall:** Only **0.3923**, meaning it missed more than half of the objects.
This proves that without pre-trained weights, the dataset is too small for the model to learn feature extraction from scratch.

In [None]:
print("Baseline YOLO11p Result on Sample Dataset:")
print_metrics(results_baseline_p)

## 6. Transfer Learning Strategy

To fix the poor performance of Experiment C, I implement a two-stage training process.

### Stage 1: Pretraining on Public Data
I train the `yolov11p` architecture on the larger, public Roboflow dataset (generic bottle caps). This stage ignores color classes and focuses purely on **object localization** (learning the shape).

In [None]:
with wandb.init(project=PROJECT_NAME, name=PRETRAIN_P_RUN_NAME) as run:
    model_p_pretrain = YOLO("yolo11p.yaml")
    
    results_p_pretrain = model_p_pretrain.train(
        data=str(public_config_path),
        project=PROJECT_NAME,
        name=PRETRAIN_P_RUN_NAME,
        epochs=25,
        verbose=True,

        hsv_h=0.0,
        hsv_s=0.25,
        hsv_v=0.0,
        mixup=0.0,
        optimizer="SGD",
        lr0=0.02,
        weight_decay=0.01,
        fliplr=0.5,
        mosaic=0.0
    )

    log_metrics_to_wandb(
        results_p_pretrain, 
        run_id=run.id, 
        project_name=PROJECT_NAME
    )
    
    del model_p_pretrain
    gc.collect()
    empty_cache()

### Result Analysis
On the public dataset, the Pico model achieved an **mAP @ 50-95 of 0.8745**. This proves that the tiny "Pico" architecture *is* capable of learning complex features when given enough data. We now have a solid "backbone" saved as `best.pt`.


In [None]:
print("Pretraining Result (Trained on Public Dataset):")
print_metrics(results_p_pretrain)

### Stage 2: Fine-Tuning
Now I load the weights from Stage 1 (`best.pt`) and fine-tune the model on our specific project dataset to distinguish **Light Blue vs. Dark Blue**.

In [None]:
with wandb.init(project=PROJECT_NAME, name=FINE_TUNE_P_RUN_NAME) as run:
    stage1_weights_path = Path(PROJECT_NAME) / PRETRAIN_P_RUN_NAME / "weights" / "best.pt"

    model_p_finetune = YOLO(stage1_weights_path)
    
    results_p_finetune = model_p_finetune.train(
        data=str(sample_config_path),
        project=PROJECT_NAME,
        name=FINE_TUNE_P_RUN_NAME,
        epochs=150,
        verbose=True,
        hsv_h=0.0,
        hsv_s=0.25,
        hsv_v=0.25,
        mixup=0.0,
        optimizer="SGD",
        lr0=0.02,
        weight_decay=0.01,
        fliplr=0.5,
        mosaic=0.25
    )
    
    log_metrics_to_wandb(
        results_p_finetune, 
        run_id=run.id, 
        project_name=PROJECT_NAME
    )
    
    del model_p_finetune
    gc.collect()
    empty_cache()

### Result Analysis
The transfer learning strategy was a success.
* **mAP @ 50-95:** The model achieved **0.8742**, which is nearly identical to the Nano model (0.90) and far superior to the random-init Pico (0.46).
* **Recall:** It achieved a perfect **1.00**, ensuring no bottle caps were missed.

In [None]:
print("Fine-Tuned YOLOv11p Result (Trained on Sample Dataset):")
print_metrics(results_p_finetune)

## Conclusion

The Two-Stage Transfer Learning approach allowed us to use a highly optimized architecture without sacrificing significant accuracy.

![Comparison Baseline and Fine-Tuned YOLO11p](../assets/yolo11p_comparison.png)

**Final Comparison:**

| Model | Weights | mAP @ 50-95 | Params | Status |
| :--- | :--- | :--- | :--- | :--- |
| **YOLO11s** | COCO Pretrained | **0.9172** | 9.5M | Too Heavy |
| **YOLO11n** | COCO Pretrained | **0.9018** | 2.6M | Baseline |
| **YOLO11p** | Random Init | 0.4611 | 1.5M | Failed |
| **YOLO11p** | **Pretrained + Finetuned** | **0.8742** | **1.5M** | **Selected** |


**Note:** For the full inference speed analysis on the edge hardware, please refer to the `README.md` file in the repository.