# Task 3 — YOLOv8 Enrichment (Run module: `src/yolo_detect.py`)

This notebook imports and runs:
- `D:\\Python\\Week 8\\Shipping-a-Data-Product\\src\\yolo_detect.py`

It generates:
- `D:\\Python\\Week 8\\Shipping-a-Data-Product\\data\\processed\\yolo\\detections.csv`

Then validates the CSV schema and prints quick summaries for the report.

## 0) Config

In [1]:
from pathlib import Path
import sys
import pandas as pd

PROJECT_ROOT = Path(r"D:\\Python\\Week 8\\Shipping-a-Data-Product")
SRC_DIR = PROJECT_ROOT / "src"

IMAGES_ROOT = PROJECT_ROOT / "data" / "raw" / "images"
CSV_PATH = PROJECT_ROOT / "data" / "processed" / "yolo" / "detections.csv"

MODEL_PATH = "yolov8n.pt"
CONF = 0.25

print("PROJECT_ROOT:", PROJECT_ROOT)
print("SRC_DIR:", SRC_DIR)
print("IMAGES_ROOT:", IMAGES_ROOT)
print("CSV_PATH:", CSV_PATH)


PROJECT_ROOT: D:\Python\Week 8\Shipping-a-Data-Product
SRC_DIR: D:\Python\Week 8\Shipping-a-Data-Product\src
IMAGES_ROOT: D:\Python\Week 8\Shipping-a-Data-Product\data\raw\images
CSV_PATH: D:\Python\Week 8\Shipping-a-Data-Product\data\processed\yolo\detections.csv


## 1) Import `yolo_detect` from `src/`

In [2]:
if str(SRC_DIR) not in sys.path:
    sys.path.insert(0, str(SRC_DIR))

import importlib
yolo_detect = importlib.import_module("yolo_detect")

print("Imported module:", yolo_detect)
print("Module file:", yolo_detect.__file__)
print("Has run():", hasattr(yolo_detect, "run"))


Imported module: <module 'yolo_detect' from 'D:\\Python\\Week 8\\Shipping-a-Data-Product\\src\\yolo_detect.py'>
Module file: D:\Python\Week 8\Shipping-a-Data-Product\src\yolo_detect.py
Has run(): True


## 2) Run YOLO detection (calls `yolo_detect.run`)

In [3]:
# Run your module's function exactly as defined in src/yolo_detect.py
out_path = yolo_detect.run(
    images_root=IMAGES_ROOT,
    out_csv=CSV_PATH,
    model_path=MODEL_PATH,
    conf=CONF,
)

print("\nReturned path:", out_path)
print("CSV exists:", CSV_PATH.exists())
print("CSV size (bytes):", CSV_PATH.stat().st_size if CSV_PATH.exists() else None)


Running YOLO detections: 100%|██████████| 8264/8264 [14:08<00:00,  9.74it/s]


[OK] Wrote YOLO detections CSV: D:\Python\Week 8\Shipping-a-Data-Product\data\processed\yolo\detections.csv (rows=19057)
[INFO] Images scanned: 8264
[INFO] Skipped unreadable images: 1
[INFO] Skipped inference errors: 0

Returned path: D:\Python\Week 8\Shipping-a-Data-Product\data\processed\yolo\detections.csv
CSV exists: True
CSV size (bytes): 4471579


## 3) Validate CSV schema

Your module writes CSV with comma delimiter by default (`df.to_csv(out_csv, index=False)`).

Expected columns (12):
- image_path, channel_name, message_id, detected_class, confidence_score,
  bbox_x1, bbox_y1, bbox_x2, bbox_y2, image_category, model_name, inference_ts

In [4]:
EXPECTED_COLS = [
    "image_path",
    "channel_name",
    "message_id",
    "detected_class",
    "confidence_score",
    "bbox_x1",
    "bbox_y1",
    "bbox_x2",
    "bbox_y2",
    "image_category",
    "model_name",
    "inference_ts",
]

df = pd.read_csv(CSV_PATH)  # comma-separated
print("Rows:", len(df))
print("Columns:", list(df.columns))

missing = [c for c in EXPECTED_COLS if c not in df.columns]
extra = [c for c in df.columns if c not in EXPECTED_COLS]

print("Missing:", missing)
print("Extra:", extra)

assert not missing, f"Missing required columns: {missing}"
df.head(10)

Rows: 19057
Columns: ['image_path', 'channel_name', 'message_id', 'detected_class', 'confidence_score', 'bbox_x1', 'bbox_y1', 'bbox_x2', 'bbox_y2', 'image_category', 'model_name', 'inference_ts']
Missing: []
Extra: []


Unnamed: 0,image_path,channel_name,message_id,detected_class,confidence_score,bbox_x1,bbox_y1,bbox_x2,bbox_y2,image_category,model_name,inference_ts
0,D:/Python/Week 8/Shipping-a-Data-Product/data/...,CheMed123,10,,,,,,,other,yolov8n,2026-01-19T13:48:44.902095+00:00
1,D:/Python/Week 8/Shipping-a-Data-Product/data/...,CheMed123,11,clock,0.427296,14.794464,68.837929,1017.059753,1015.803528,other,yolov8n,2026-01-19T13:48:44.902095+00:00
2,D:/Python/Week 8/Shipping-a-Data-Product/data/...,CheMed123,13,hot dog,0.503857,525.976807,634.037598,640.97168,752.583984,other,yolov8n,2026-01-19T13:48:44.902095+00:00
3,D:/Python/Week 8/Shipping-a-Data-Product/data/...,CheMed123,13,donut,0.398997,591.705139,588.356323,700.265076,706.577271,other,yolov8n,2026-01-19T13:48:44.902095+00:00
4,D:/Python/Week 8/Shipping-a-Data-Product/data/...,CheMed123,13,hot dog,0.309734,298.997375,615.658691,401.013306,716.196289,other,yolov8n,2026-01-19T13:48:44.902095+00:00
5,D:/Python/Week 8/Shipping-a-Data-Product/data/...,CheMed123,13,donut,0.296923,483.271729,559.534302,580.877686,645.154785,other,yolov8n,2026-01-19T13:48:44.902095+00:00
6,D:/Python/Week 8/Shipping-a-Data-Product/data/...,CheMed123,13,hot dog,0.26366,470.691895,687.914001,582.223633,789.909241,other,yolov8n,2026-01-19T13:48:44.902095+00:00
7,D:/Python/Week 8/Shipping-a-Data-Product/data/...,CheMed123,13,hot dog,0.25024,482.826233,559.193359,581.625305,644.184814,other,yolov8n,2026-01-19T13:48:44.902095+00:00
8,D:/Python/Week 8/Shipping-a-Data-Product/data/...,CheMed123,14,,,,,,,other,yolov8n,2026-01-19T13:48:44.902095+00:00
9,D:/Python/Week 8/Shipping-a-Data-Product/data/...,CheMed123,15,,,,,,,other,yolov8n,2026-01-19T13:48:44.902095+00:00


## 4) Quick summaries for report

In [5]:
print("Image category distribution:")
display(df["image_category"].value_counts(dropna=False))


Image category distribution:


image_category
other              6891
lifestyle          6034
product_display    3946
promotional        2186
Name: count, dtype: int64

In [6]:
print("Top detected classes:")
top_classes = (
    df.dropna(subset=["detected_class"])
      .groupby("detected_class")
      .size()
      .reset_index(name="n")
      .sort_values("n", ascending=False)
      .head(20)
)
display(top_classes)


Top detected classes:


Unnamed: 0,detected_class,n
41,person,5226
12,bottle,4170
63,tv,662
20,chair,620
19,cell phone,614
11,book,557
44,refrigerator,439
34,laptop,284
24,dining table,279
0,airplane,272


In [7]:
print("Channel visual content volume (unique images):")
img_level = df.drop_duplicates(subset=["image_path"])
display(img_level.groupby("channel_name").size().sort_values(ascending=False))


Channel visual content volume (unique images):


channel_name
tikvahpharma         5623
lobelia4cosmetics    2572
CheMed123              69
dtype: int64

## Notes
- If YOLO weights download is slow the first time, that is expected.
- If you see an error like 'No images found', confirm images exist under:
  `data/raw/images/<channel>/<message_id>.jpg`.
