<a href="https://colab.research.google.com/github/FajarKKP/Bottle_cap_color_classifier/blob/main/task_1_cap_detection/Bottle_cap_classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Bottle Cap Classification Project**

This notebook presents the development of a machine learning model for classifying bottle caps. The workflow begins with data preparation, followed by feature extraction and model training. Each step is accompanied by a brief explanation of its purpose and functionality, providing context for the code and methodology used.

The project culminates in the selection of the most suitable model based on performance metrics. Following the implementation, a detailed report discusses the rationale behind the chosen methods, insights gained during experimentation, challenges encountered, and potential avenues for improvement.

This approach ensures that the notebook serves not only as a functional tool for classification but also as a comprehensive record of the analytical process and decision-making behind the project.

As a note, another goal of this project is to have the model implemented on an edge device (ex. Raspberry Pi 5) with inference speed between 5-10 ms per frame, so this factor should be taken into account when training or testing.

# **Dataset Preparation**

The dataset that will be used comes from the sample dataset that has been processed on Roboflow. It also has been relabled to have the following category:

*   0 = Other
*   1 = Light Blue
*   2 = Dark Blue


Based on the initial search on the web, we will be implementing the solution on Yolov5 and Yolov8. Each will be tested on their nano model. Yolo5 will be done first, followed by Yolov8.


In [1]:
# Mount and link our gdrive to this notebook.

from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [2]:
# Set up and install yolov5
!git clone https://github.com/ultralytics/yolov5.git
%cd yolov5
!pip install -r requirements.txt

Cloning into 'yolov5'...
remote: Enumerating objects: 17739, done.[K
remote: Counting objects: 100% (98/98), done.[K
remote: Compressing objects: 100% (66/66), done.[K
remote: Total 17739 (delta 59), reused 32 (delta 32), pack-reused 17641 (from 3)[K
Receiving objects: 100% (17739/17739), 17.11 MiB | 23.67 MiB/s, done.
Resolving deltas: 100% (12045/12045), done.
/content/yolov5
Collecting thop>=0.1.1 (from -r requirements.txt (line 14))
  Downloading thop-0.1.1.post2209072238-py3-none-any.whl.metadata (2.7 kB)
Collecting ultralytics>=8.2.64 (from -r requirements.txt (line 18))
  Downloading ultralytics-8.3.229-py3-none-any.whl.metadata (37 kB)
Collecting ultralytics-thop<=2.0.18 (from ultralytics>=8.2.64->-r requirements.txt (line 18))
  Downloading ultralytics_thop-2.0.18-py3-none-any.whl.metadata (14 kB)
Downloading thop-0.1.1.post2209072238-py3-none-any.whl (15 kB)
Downloading ultralytics-8.3.229-py3-none-any.whl (1.1 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ

In [3]:
# Train on yolov5 nano
%cd /content/yolov5

!python train.py \
    --img 416 \
    --batch 8 \
    --epochs 150 \
    --data /content/drive/MyDrive/bottle_cap_project/Datset/data.yaml \
    --weights yolov5n.pt \
    --project /content/drive/MyDrive/bottle_cap_project/Result/train \
    --name bottlecap_yolo5_nano_finetune \
    --cache \
    --optimizer AdamW


/content/yolov5
Creating new Ultralytics Settings v0.0.6 file ‚úÖ 
View Ultralytics Settings with 'yolo settings' or at '/root/.config/Ultralytics/settings.json'
Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings.
2025-11-21 16:00:07.373589: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1763740807.399335    1412 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1763740807.407081    1412 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1763740807.426639    1412 computation_placer.cc:177] computation placer already registered. Plea

This section will start the training of Yolov8.

In [4]:
# Install YOLOv8 (Ultralytics)
!pip install ultralytics

# Import the library
from ultralytics import YOLO

# Load a YOLOv8 nano model (pretrained)
model = YOLO("yolov8n.pt")  # 'n' is nano

[KDownloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolov8n.pt to 'yolov8n.pt': 100% ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 6.2MB 212.1MB/s 0.0s


In [5]:
# Train on yolov8 nano
model.train(
    data="/content/drive/MyDrive/bottle_cap_project/Datset/data.yaml",
    imgsz=416,
    batch=8,
    epochs=200,
    project="/content/drive/MyDrive/bottle_cap_project/Result/train",
    name="bottlecap_yolov8n",
    cache=True
)

Ultralytics 8.3.229 üöÄ Python-3.12.12 torch-2.8.0+cu126 CPU (Intel Xeon CPU @ 2.20GHz)
[34m[1mengine/trainer: [0magnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=8, bgr=0.0, box=7.5, cache=True, cfg=None, classes=None, close_mosaic=10, cls=0.5, compile=False, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=/content/drive/MyDrive/bottle_cap_project/Datset/data.yaml, degrees=0.0, deterministic=True, device=cpu, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=200, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=416, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8n.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=bottlecap_yolov8n3, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_

ultralytics.utils.metrics.DetMetrics object with attributes:

ap_class_index: array([0, 1, 2])
box: ultralytics.utils.metrics.Metric object
confusion_matrix: <ultralytics.utils.metrics.ConfusionMatrix object at 0x79ae60260c20>
curves: ['Precision-Recall(B)', 'F1-Confidence(B)', 'Precision-Confidence(B)', 'Recall-Confidence(B)']
curves_results: [[array([          0,    0.001001,    0.002002,    0.003003,    0.004004,    0.005005,    0.006006,    0.007007,    0.008008,    0.009009,     0.01001,    0.011011,    0.012012,    0.013013,    0.014014,    0.015015,    0.016016,    0.017017,    0.018018,    0.019019,     0.02002,    0.021021,    0.022022,    0.023023,
          0.024024,    0.025025,    0.026026,    0.027027,    0.028028,    0.029029,     0.03003,    0.031031,    0.032032,    0.033033,    0.034034,    0.035035,    0.036036,    0.037037,    0.038038,    0.039039,     0.04004,    0.041041,    0.042042,    0.043043,    0.044044,    0.045045,    0.046046,    0.047047,
          0.04

After done training the models, it is time to save it for testing or future use. The default save is in .pt, based on pytorch. But for this project, it is recommended to save it into .onnx file due to its advantages of implementation on edge devices.

In [6]:
# Export yolo5 nano weight in ONNX
!python export.py \
    --weights /content/drive/MyDrive/bottle_cap_project/Result/train/bottlecap_yolo5_small_4150/weights/best.pt \
    --imgsz 416 \
    --include onnx \
    --simplify \
    --device cpu \
    --dynamic \



[34m[1mexport: [0mdata=data/coco128.yaml, weights=['/content/drive/MyDrive/bottle_cap_project/Result/train/bottlecap_yolo5_small_4150/weights/best.pt'], imgsz=[416], batch_size=1, device=cpu, half=False, inplace=False, keras=False, optimize=False, int8=False, per_tensor=False, dynamic=True, cache=, simplify=True, mlmodel=False, opset=17, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['onnx']
YOLOv5 üöÄ v7.0-448-gdeec5e45 Python-3.12.12 torch-2.8.0+cu126 CPU

Fusing layers... 
Model summary: 157 layers, 7018216 parameters, 0 gradients, 15.8 GFLOPs

[34m[1mPyTorch:[0m starting from /content/drive/MyDrive/bottle_cap_project/Result/train/bottlecap_yolo5_small_4150/weights/best.pt with output shape (1, 10647, 8) (13.6 MB)
[31m[1mrequirements:[0m Ultralytics requirements ['onnx>=1.12.0', 'onnxscript'] not found, attempting AutoUpdate...
Using Python 3.12.12 environment at: /usr
Resolved 8 packag

In [7]:
# Export the yolo8 nano weight to ONNX

from ultralytics import YOLO

# Load your trained YOLOv8 Nano model
model = YOLO("/content/drive/MyDrive/bottle_cap_project/Result/train/bottlecap_yolov8n_8200/weights/best.pt")

# Export to ONNX
model.export(
    format="onnx",
    imgsz=416,
    simplify=True,
    dynamic=True
)


Ultralytics 8.3.229 üöÄ Python-3.12.12 torch-2.8.0+cu126 CPU (Intel Xeon CPU @ 2.20GHz)
üí° ProTip: Export to OpenVINO format for best performance on Intel hardware. Learn more at https://docs.ultralytics.com/integrations/openvino/
Model summary (fused): 72 layers, 3,006,233 parameters, 0 gradients, 8.1 GFLOPs

[34m[1mPyTorch:[0m starting from '/content/drive/MyDrive/bottle_cap_project/Result/train/bottlecap_yolov8n_8200/weights/best.pt' with input shape (1, 3, 416, 416) BCHW and output shape(s) (1, 7, 3549) (5.9 MB)
[31m[1mrequirements:[0m Ultralytics requirement ['onnx>=1.12.0,<=1.19.1'] not found, attempting AutoUpdate...
Using Python 3.12.12 environment at: /usr
Resolved 5 packages in 114ms
Prepared 1 package in 1.98s
Uninstalled 1 package in 1.16s
Installed 1 package in 527ms
 - onnx==1.20.0rc1
 + onnx==1.19.1

[31m[1mrequirements:[0m AutoUpdate success ‚úÖ 4.2s


[34m[1mONNX:[0m starting export with onnx 1.19.1 opset 22...
[34m[1mONNX:[0m slimming with onnxslim 0

'/content/drive/MyDrive/bottle_cap_project/Result/train/bottlecap_yolov8n_8200/weights/best.onnx'

For testing or inference using the test dataset, it will be done using the model that is saved on the onnx version.


Testing will be done on gcolab free version with cpu only inference. This is done to simulate the defauly real-life raspberry pi 5 deployment that only use CPU.

Testing the result

In [None]:
# If we want to know the cpu spec that we will be using on gcolab free version

from psutil import *
# This code will return the number of CPU
print("Number of CPU: ", cpu_count())
# This code will return the CPU info
!cat /proc/cpuinfo

Number of CPU:  2
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 79
model name	: Intel(R) Xeon(R) CPU @ 2.20GHz
stepping	: 0
microcode	: 0xffffffff
cpu MHz		: 2200.194
cache size	: 56320 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat md_clear arch_capabilities
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa mmio_stale_data retbleed bhi its
bogomips	: 4400.38
clflush size	: 64
cache_alignment	

In [8]:
# Install  (Ultralytics)
!pip install ultralytics

#install onnx runtime
!pip install onnxruntime




In [9]:
# For yolov8 inference
from ultralytics import YOLO

# Load ONNX model
model = YOLO(
    "/content/drive/MyDrive/bottle_cap_project/Result/train/bottlecap_yolov8n_8200/weights/yolo8nano_8200.onnx")

# Run validation
results = model.val(
    data="/content/drive/MyDrive/bottle_cap_project/Datset/data.yaml",
    imgsz=320,
    device="cpu"
)

# Fitness
print(f"Fitness: {results.fitness:.4f}")

# Per-class mAP50
print("\nPer-class mAP50:")
for i, name in results.names.items():
    print(f"{name}: {results.maps[i]:.4f}, instances={results.nt_per_class[i]}")

# Inference speed
print("\nSpeed per image (ms):")
print(f"Preprocess: {results.speed['preprocess']:.2f}")
print(f"Inference: {results.speed['inference']:.2f}")
print(f"Postprocess: {results.speed['postprocess']:.2f}")


Ultralytics 8.3.229 üöÄ Python-3.12.12 torch-2.8.0+cu126 CPU (Intel Xeon CPU @ 2.20GHz)
Loading /content/drive/MyDrive/bottle_cap_project/Result/train/bottlecap_yolov8n_8200/weights/yolo8nano_8200.onnx for ONNX Runtime inference...
Using ONNX Runtime 1.24.0.dev20251031003 CPUExecutionProvider
[34m[1mval: [0mFast image access ‚úÖ (ping: 0.3¬±0.1 ms, read: 8.4¬±0.7 MB/s, size: 22.8 KB)
[K[34m[1mval: [0mScanning /content/drive/MyDrive/bottle_cap_project/Datset/valid/labels.cache... 3 images, 0 backgrounds, 0 corrupt: 100% ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 3/3 3.9Kit/s 0.0s
[K                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 1/1 2.2it/s 0.5s
                   all          3         19      0.914      0.974      0.991      0.871
                others          1          5      0.844          1      0.995      0.847
            light_blue          2          9      0.899      0.994      0.984     

In [10]:
# Navigate to yolov5 repo for yolov5 inference
%cd /content/yolov5


/content/yolov5


In [11]:
# For yolov5 inference
!python val.py \
  --weights /content/drive/MyDrive/bottle_cap_project/Result/train/bottlecap_yolo5_small_4150/weights/yolo5small_4150.onnx \
  --data /content/drive/MyDrive/bottle_cap_project/Datset/data.yaml \
  --img 320 \
  --task test \
  --device cpu \
  --task test


[34m[1mval: [0mdata=/content/drive/MyDrive/bottle_cap_project/Datset/data.yaml, weights=['/content/drive/MyDrive/bottle_cap_project/Result/train/bottlecap_yolo5_small_4150/weights/yolo5small_4150.onnx'], batch_size=32, imgsz=320, conf_thres=0.001, iou_thres=0.6, max_det=300, task=test, device=cpu, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 üöÄ v7.0-448-gdeec5e45 Python-3.12.12 torch-2.8.0+cu126 CPU

Loading /content/drive/MyDrive/bottle_cap_project/Result/train/bottlecap_yolo5_small_4150/weights/yolo5small_4150.onnx for ONNX Runtime inference...
Forcing --batch-size 1 square inference (1,3,320,320) for non-PyTorch models
[34m[1mtest: [0mScanning /content/drive/MyDrive/bottle_cap_project/Datset/test/labels.cache... 2 images, 0 backgrounds, 0 corrupt: 100% 2/2 [00:00<?, ?it/s]
                 Class     Images  Instances     

Finally, we have 2 model that stands out.

The first is the Yolov5-nano with the settings of 8 batch size, 150 epoch and image size of 320. It has been tested to achieve the best inference speed at a rate of around 25 ms.

The second is the Yolov8-nano with the settings of 8 batch size, 200 epoch and image size of 320. It is able to get an average score of map50 with 0.99 and map50-95 with 0.87 on an inference speed of 56 ms.

If we want to immediately deploy it, using the Yolov8-nano model is the choice. Although its performance may indicate its overfitting nature, if the real-case file is the same as the input, then it can be used as an option.






But if we have more time, it is an option to finetune the Yolov5-nano model. The fact it was able to get to 25 ms shows its architecture is capable in achieving that speed. We can then spend more time to improve its performance. And that is what we are going to do next.

# **Finetunning the chosen model**

Now, we will use the YOLOv5-nano with the settings 8 batch size and 150 epoch to finetune.

In [None]:
# Train on yolov5 nano
%cd /content/yolov5

%env WANDB_PROJECT=train
%env WANDB_ENTITY=f-kenichi-kp-none


!python train.py \
    --img 416 \
    --batch 8 \
    --epochs 150 \
    --data /content/drive/MyDrive/bottle_cap_project/Datset/data.yaml \
    --weights yolov5n.pt \
    --project /content/drive/MyDrive/bottle_cap_project/Result/train \
    --name bottlecap_yolo5_nano_finetuned_final \
    --cache \
    --patience 50 \
    --optimizer AdamW \
    --entity f-kenichi-kp-none


/content/yolov5
env: WANDB_PROJECT=train
env: WANDB_ENTITY=f-kenichi-kp-none
2025-11-20 23:20:15.500774: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1763680815.547751   51379 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1763680815.565394   51379 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1763680815.609211   51379 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1763680815.609283   51379 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:0

In [None]:
# Export yolo5 nano weight in ONNX
!python export.py \
    --weights /content/drive/MyDrive/bottle_cap_project/Result/train/final/weights/best.pt \
    --imgsz 416 \
    --include onnx \
    --simplify \
    --device cpu \
    --dynamic \



[34m[1mexport: [0mdata=data/coco128.yaml, weights=['/content/drive/MyDrive/bottle_cap_project/Result/train/final/weights/best.pt'], imgsz=[416], batch_size=1, device=cpu, half=False, inplace=False, keras=False, optimize=False, int8=False, per_tensor=False, dynamic=True, cache=, simplify=True, mlmodel=False, opset=17, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['onnx']
YOLOv5 üöÄ v7.0-448-gdeec5e45 Python-3.12.12 torch-2.8.0+cu126 CPU

Fusing layers... 
Model summary: 157 layers, 1763224 parameters, 0 gradients, 4.1 GFLOPs

[34m[1mPyTorch:[0m starting from /content/drive/MyDrive/bottle_cap_project/Result/train/final/weights/best.pt with output shape (1, 10647, 8) (3.6 MB)

[34m[1mONNX:[0m starting export with onnx 1.20.0rc1...
  torch.onnx.export(
[34m[1mONNX:[0m simplifier failure: module 'onnx.helper' has no attribute 'float32_to_bfloat16'
[34m[1mONNX:[0m export success ‚úÖ 0.7s,

In [None]:
# For yolov5 inference
!python val.py \
  --weights /content/drive/MyDrive/bottle_cap_project/Result/train/final/weights/best.onnx \
  --data /content/drive/MyDrive/bottle_cap_project/Datset/data.yaml \
  --img 320 \
  --task test \
  --device cpu \
  --task test


[34m[1mval: [0mdata=/content/drive/MyDrive/bottle_cap_project/Datset/data.yaml, weights=['/content/drive/MyDrive/bottle_cap_project/Result/train/final/weights/best.onnx'], batch_size=32, imgsz=320, conf_thres=0.001, iou_thres=0.6, max_det=300, task=test, device=cpu, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 üöÄ v7.0-448-gdeec5e45 Python-3.12.12 torch-2.8.0+cu126 CPU

Loading /content/drive/MyDrive/bottle_cap_project/Result/train/final/weights/best.onnx for ONNX Runtime inference...
Forcing --batch-size 1 square inference (1,3,320,320) for non-PyTorch models
[34m[1mtest: [0mScanning /content/drive/MyDrive/bottle_cap_project/Datset/test/labels.cache... 2 images, 0 backgrounds, 0 corrupt: 100% 2/2 [00:00<?, ?it/s]
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100% 2/2 [00:00<00:00, 

**Conclusion for the finetuning**

After experimentingm it shows that it manages to performs better in map50-95 result but worse in map50. Stil, the unexpected found is that it goes faster up to 23 ms.

# **Final Report Regarding the Project**

This section and below will describe and explain each process in detail.

## **Dataset Preparation **

When using roboflow, there is the option to prep the dataset (from preprocessing step until augmentation). Usually this is done when we are training on the model, not before. There are pros and cons on preprocess + augment dataset before or on the fly. The main pros is usually resource related, because we basically skip that step during training and has been prep beforehand. The cons is that the dataset will inflexible, if it does not fit what we are trying to achieve, the process is useless and we have to prep it again on the fly.

For this project, I do the preprocessing and augmentation beforehand. My decision is based on the fact that I need all the time to train the best performing model. By skipping the preprocess and data augmentation on the fly, I can start training faster and save resources. Not only that, the preprocess and data augmentation that I am going to do on the fly has been done during the dataset prep step on roboflow. So preprocessing + augmenting data that has been done is a redundant task.

For the preprocessing, I apply the Auto-orient step. It is used to re-orient images that may be saved in EXIF metadata. This is usefull to prevent bounding box misalignment due to EXIF metadata save.

For the data augmentation, i mainly do the augment related to position such as rotation and shear. I do not do augmentation that is color related such as brightness and noise addition due to small dataset + color detection is also the main key in this project.

# **Choosing The Model**

The main reason on choosing those models are their proven records on solving this type of problem while having a small size (under 3.5M parameter), which is very appropriate to be implemented on edge devices (ex. raspberry pi 5).

YOLOv5 will serve as the minimum benchmark. While slightly older, it has proven effective in practical applications. Its smaller variants are advantageous for deployment on edge devices, and many existing object detection solutions are based on this model.

YOLOv8 is a practical choice for object detection, offering a strong balance of accuracy and speed. Newer YOLO versions provide incremental improvements but are generally larger, making YOLOv8 well-suited for real-world deployment and edge applications.

## **Model Development and Analysis**

This project will use Yolov5-nano and Yolov8-nano as the backbone. Both will mostly be trained using the default hyperparameter that comes from using the Ultralytics API.

The changes or tweak will be on the batch size and epoch. This is done to overcome the dataset size limitation (more about this on the "Additional Findings/ Issues" section).

The batch sizes that will be used are 4 and 8. This is mainly due the small dataset we have. On the other hand, small batch sizes allows more frequent weight update and help model get better in generalization.

Epoch sizes that will be used are 150 and 200. This amount is considered due to small batch sizes. With small batch sizes, model needs to see the model more often to learn.

In total, there will be around 8 total training that will be conducted for this problem. They will be trained using the Ultralytics API call, so it is easier to monitor and tune.

After training, the models from each category will be saved to .onnx format.
The reasons why they are converted into .onnx versions are:

*   Deployment flexibility: ONNX can run on many platforms (TensorRT, OpenVINO, C++, C#, even some mobile/edge devices) without needing PyTorch.
*   Hardware optimization: ONNX runtimes can be faster, especially on devices with limited resources like Raspberry Pi 5.
*   Standardized format: Makes it easier to integrate with production pipelines or convert to other formats (e.g., TensorRT, CoreML).







## **Testing The Model**

The ideal scenario is to test it directly on the device that will be using the model. Unfortunately due to circumstances, inference test will be conducted here in gcolab free version with CPU only. The reason "Only CPU" is set as the constraint is because the default usage of devices such as Raspberry Pi 5 are CPU only. This is done to emulate the real case scenarios.

The CPU spec that is being used in gcolab:
| Feature                         | Value                                                             |
| ------------------------------- | ----------------------------------------------------------------- |
| CPU                             | Intel¬Æ Xeon¬Æ CPU @ 2.20‚ÄØGHz                                       |
| Physical cores                  | 1                                                                 |
| Logical cores (threads)         | 2 (hyperthreaded)                                                 |
| Cache                           | 56‚ÄØ320 KB (L3)                                                    |
| Max frequency                   | 2.2‚ÄØGHz                                                           |
| Supported extensions            | SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, FMA, AES, etc. |
| Total RAM (reported separately) | ~12‚Äì13‚ÄØGB in Colab free                                           |


These model will be tested to inference the test dataset. The .onnx format models will be tested on an input image of 416 and 320. 320 is choosen because it is an input image size that Raspberry Pi 5 can handle. The idea is that with 320 image size, deployed model can still perform up to standard while having a faster inference speed due to the reduction of image size compared to its training part.




## **Inferenece Test Result**

For more complete result, it can be seen on the excel sheet that is in the same folder as this notebook.

There are 2 models that has the potential to be implemented. Each has its pros and cons (further explanation can be seen on "Additional Findings / Issues" section).

The first model comes from the Yolov5-nano framework. Its settings are 8 batch sizes, 150 epochs and 320 image size. It has the fastest inference time (at around 25 ms) compared to other models. The downside is the performance, with an average map50 score of 0.75 and map50-95 of 0.44.

The other notable model comes from the Yolo8-nano framework, Its settings are 8 batch sizes, 200 epochs and 320 image size. It has the most stable performance (map50 score of 0.99 and map50-95 score of 0.87) with decent inference speed (55 ms) compared to other models in the same category. That high degree of score may indicate the models overfitting problem. This can be fixed by diversifying the dataset or give it more noise during training.


## **Further Finetune**

As started near the end, the Yolov5-nano model has the potential to be the answer. What comes next is as set of experiments done to improve its performance. The experiments included:

*   Learning Rate Adjustment
*   Application of Freezin on different value
*   Trained on larger image size

The results shows that it is feasible to improve it. Through the experimentsm we can come up with a model with a map50 score of 0.82 and map50-95 of 0.54, an increase from its previous performance during trainig.

Unfortunaely when tested for inference, it gets mixed result in performance, but what have been a found is that it can have a faster inference speed up to 23 ms.



## **Conclusion**

Through the training and testing process, we have managed to come into 2 possible solutions. Both have its downsides and upsides.

This is the speed vs accuracy that must taken into account when deploying a solution. At the end of the day, it depends on many circumstances but for a start, the model based on the Yolov5-nano is the solution.

## **Additonal Findings / Issues**

### **Dataset Issue**

During data preparation, one of the prominent problem is the lack of data. No matter how good or advanced the model is, dataset plays a huge part in the development. It comes down to preprocessing and data augmentation to fix this. Even then, over reliance on sythetic data is a dangerous scenario due to domain gap that can occur and unrealistic pattern learning.


Another issue that come up is the splitting of the train / valid / test dataset. The sample dataset that is given labels all of the cap as 1 label. Relabeling the objects is an important step for the next process, which is splitting the dataset.
With the small amount of data, it is necessary to ensure every label is represented on each train / valid / test.

Data preparation is an important step cause it can lead to the classic "Bias vs Variance" problem if handled incorrectly. That is why datsaset is crucial for training a model.


### **The Solution's Downside**

It has been noted that there is a downside with the solutiona of Yolov5-nano model and Yolov8-nano model. Each of them has different problem, that may be fixed if given another extra time.

Yolov5-nano model issue is related to performance. This can be fixed by retraining it (tweaking the hyperparameter) or adding new dataset. Although an experiment has shown it can improve, it feels the model just hit a bottleneck after the hyperparameter tuning. For new dataset, it depends on the client or resource available for new real-world dataset.

Yolov8-nano model issue is related to inference speed. THis can be fix by optimization or hardware upgrade. For the optimization, it has been tried to implement it but kept getting errors, thus the inability to use it. Another way is to upgrade the hardware or use additional hardware as complement. Upgrading the edge device spec or implementing gpu for the implementation may help it shorten its inference speed.


### **Optimization Issue**
Although the models are able to perform decently, there are some techniques that maybe can be tried to fix the issue. During training, especially on a constrained resource, gradient accumulation could be an answer. It works by accumulating the gradient overtime rather than using it directly to update weight. This can help when trainig larger batch with small CPU / GPU.

Unfortunately, this has not been implemented due to an experiment done. During training, it has been tried to tweak the hyperparameter. The result is a model with barely significant improvement. It is then concluded for this settings, this is the bottleneck and adding gradient accumulation may not solve it.

Another way to optimize is through quantization. It is the act of reducing the precision of the numbers in the model. This may create a model that is smaller and faster inference speed with the same performance.

Unfortunately, this method has been tested to fix the inference speed issue of Yolov8 model. This method is not used to an error that kept happening when authour tried to quantize a .onnx format model. Another avenue is to compare quantized .pt model with its .onnx, but that is for another day.


### **Tested Inference on .pt and .onnx**

A small experiment has been conducted where the same model has been saved in the ,pt and .onnx format. Both are then tested on the same test data with other variables the same. The result is that the .pt model performs 2-3 times slower than its .onnx counterpart. Although both have the same performancem this shows that using the .onnx format is the right way to go for edge device deployment.
