
# YOLOv8 Model Comparison — Base vs Optuna Tuned

**Goal:** Evaluate and compare two YOLOv8 models on the **same dataset** and produce publication-ready **quantitative** and **qualitative** results with graphs.

**Inputs (place in the same folder as this notebook):**
- `bdd100k_images_10k.zip` — images archive
- `test_data_bbd_labels.zip` — labels + `data.yaml` (or modify the YAML generation cell)
- `yolov8n_custom_coco_best.pt` — Base model (trained on custom/coco)
- `yolov8n_optuna_best.pt` — Tuned model (Optuna)

**What you get:**
- Validation metrics (mAP50, mAP50-95, Precision, Recall)
- Per-class AP table
- Confusion matrices
- Precision–Recall curves
- Side-by-side qualitative predictions
- A concise comparison report and saved plots


## 1. Environment Setup

In [None]:

# If you're running in Colab, uncomment the next line to get GPU acceleration info
# !nvidia-smi

# Install dependencies (uncomment to run)
# Note: In some environments ultralytics is preinstalled.
%pip install -U ultralytics matplotlib pandas scikit-learn opencv-python tqdm


Collecting ultralytics
  Downloading ultralytics-8.3.226-py3-none-any.whl.metadata (37 kB)
Collecting matplotlib
  Downloading matplotlib-3.10.7-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (11 kB)
Collecting pandas
  Downloading pandas-2.3.3-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (91 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m91.2/91.2 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
Collecting scikit-learn
  Downloading scikit_learn-1.7.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (11 kB)
Collecting ultralytics-thop>=2.0.18 (from ultralytics)
  Downloading ultralytics_thop-2.0.18-py3-none-any.whl.metadata (14 kB)
Downloading ultralytics-8.3.226-py3-none-any.whl (1.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m27.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading matplotlib-3.10.7-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (8.7 MB

## 2. Imports & Global Config

In [None]:
from pathlib import Path
import os, random, numpy as np

# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# ==== PATH CONFIGURATION (edit here if you move files) ====
ROOT = Path("/content/drive/MyDrive/Vision")

# Dataset archives
DATA_IMAGES_ZIP = ROOT / "bdd100k_images_10k.zip"
DATA_LABELS_ZIP = ROOT / "test_data_bbd_labels.zip"

# Extracted dataset folder
DATA_ROOT = ROOT / "data"
DATA_YAML = DATA_ROOT / "test_data_bbd_labels" / "data.yaml"

# YOLO model weights
BASE_WEIGHTS = ROOT / "yolov8n_custom_coco_best.pt"
TUNED_WEIGHTS = ROOT / "yolov8n_optuna_best.pt"

# Comparison output directory
RUNS_DIR = ROOT / "runs_compare"
BASE_TAG = "base"
TUNED_TAG = "tuned"

BASE_RUN = RUNS_DIR / f"val_{BASE_TAG}"
TUNED_RUN = RUNS_DIR / f"val_{TUNED_TAG}"
BASE_PRED = RUNS_DIR / f"pred_{BASE_TAG}"
TUNED_PRED = RUNS_DIR / f"pred_{TUNED_TAG}"

# Other configs
N_QUAL = 24           # number of images for qualitative visualization
MAX_PER_ROW = 4       # images per row in grid
IMGSZ = 640           # image size

print("✅ All paths set relative to your Google Drive Vision folder")
print("ROOT:", ROOT)


Mounted at /content/drive
✅ All paths set relative to your Google Drive Vision folder
ROOT: /content/drive/MyDrive/Vision


## 3. Dataset Preparation

In [None]:

# Unzip archives if not already extracted
DATA_ROOT.mkdir(parents=True, exist_ok=True)

if Path(DATA_IMAGES_ZIP).exists():
    if not any(DATA_ROOT.glob("bdd100k_images_10k/*")):
        print("Extracting images...")
        shutil.unpack_archive(DATA_IMAGES_ZIP, DATA_ROOT)
    else:
        print("Images already extracted.")
else:
    print(f"WARNING: {DATA_IMAGES_ZIP} not found. Place it next to this notebook.")

if Path(DATA_LABELS_ZIP).exists():
    if not any(DATA_ROOT.glob("test_data_bbd_labels/*")):
        print("Extracting labels...")
        shutil.unpack_archive(DATA_LABELS_ZIP, DATA_ROOT)
    else:
        print("Labels already extracted.")
else:
    print(f"WARNING: {DATA_LABELS_ZIP} not found. Place it next to this notebook.")

print('Expected YAML:', DATA_YAML.resolve())
if not DATA_YAML.exists():
    print("""
DATA YAML not found at the expected location.
- If your labels zip already has `data.yaml`, please update DATA_YAML.
- Otherwise, create a minimal YOLO data.yaml here.

We'll generate a minimal template below. Edit class names/paths accordingly if needed.
""")


Extracting images...


NameError: name 'shutil' is not defined

### 3.1 (Optional) Generate a minimal `data.yaml` if missing

In [None]:

from textwrap import dedent

def write_minimal_yaml(yaml_path: Path, images_dir: Path, labels_dir: Path, names):
    yaml_path.parent.mkdir(parents=True, exist_ok=True)
    content = dedent(f"""
    # Auto-generated minimal data.yaml — edit paths/names to match your dataset
    path: {yaml_path.parent.as_posix()}
    train: {images_dir.as_posix()}/train
    val: {images_dir.as_posix()}/val
    test: {images_dir.as_posix()}/test
    names: {names}
    """)
    yaml_path.write_text(content)
    return yaml_path

if not DATA_YAML.exists():
    # Try to infer an images root; update these as per your extracted folder structure
    cand = list(DATA_ROOT.glob("bdd100k_images_10k/images"))
    if cand:
        images_root = cand[0]
        # Example class list — replace with your actual classes
        class_names = ["person","car","traffic light","bus","truck","bike","rider"]
        DATA_YAML = write_minimal_yaml(DATA_YAML, images_root, images_root, class_names)
        print("Wrote a minimal YAML to:", DATA_YAML.resolve())
    else:
        print("Could not infer image directories. Please set DATA_YAML manually.")
else:
    print("YAML exists:", DATA_YAML.resolve())


Could not infer image directories. Please set DATA_YAML manually.


## 4. Load Models

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:

assert Path(BASE_WEIGHTS).exists(), f"Missing {BASE_WEIGHTS}"
assert Path(TUNED_WEIGHTS).exists(), f"Missing {TUNED_WEIGHTS}"
assert DATA_YAML.exists(), f"Missing data.yaml at {DATA_YAML}"

base_model = YOLO(BASE_WEIGHTS) if YOLO else None
tuned_model = YOLO(TUNED_WEIGHTS) if YOLO else None

print("Base model:", BASE_WEIGHTS)
print("Tuned model:", TUNED_WEIGHTS)


AssertionError: Missing data.yaml at /content/drive/MyDrive/Vision/data/test_data_bbd_labels/data.yaml

## 5. Quantitative Evaluation (Validation)

In [None]:

RUNS_DIR.mkdir(exist_ok=True, parents=True)

def run_val(model, out_dir: Path, imgsz=640):
    out_dir.mkdir(exist_ok=True, parents=True)
    results = model.val(data=DATA_YAML.as_posix(), imgsz=imgsz, project=str(out_dir), name="val", save_json=True, plots=True, verbose=True)
    # In ultralytics, results.box contains aggregate metrics
    return results

results_base = run_val(base_model, BASE_RUN, imgsz=IMGSZ)
results_tuned = run_val(tuned_model, TUNED_RUN, imgsz=IMGSZ)

print("Validation finished. Plots & JSON saved to:", RUNS_DIR.resolve())


### 5.1 Aggregate Metrics Table

In [None]:

def to_row(tag, r):
    return dict(
        Model=tag,
        mAP50=float(getattr(r.box, 'map50', np.nan)),
        mAP50_95=float(getattr(r.box, 'map', np.nan)),
        Precision=float(getattr(r.box, 'p', np.nan)),
        Recall=float(getattr(r.box, 'r', np.nan)),
        # speed metrics (ms per image)
        Speed_pre=float(getattr(r.speed, 'preprocess', np.nan)),
        Speed_infer=float(getattr(r.speed, 'inference', np.nan)),
        Speed_post=float(getattr(r.speed, 'postprocess', np.nan)),
    )

metrics_df = pd.DataFrame([
    to_row("Base (COCO)", results_base),
    to_row("Optuna Tuned", results_tuned)
])

display(metrics_df)
metrics_df.to_csv(RUNS_DIR / "comparison_metrics.csv", index=False)


### 5.2 Global Metric Comparison Plots

In [None]:

def barplot(df, cols, title, out_png):
    ax = df.set_index("Model")[cols].plot(kind="bar")
    ax.set_title(title)
    ax.set_ylabel("Score")
    plt.tight_layout()
    plt.savefig(out_png, bbox_inches="tight")
    plt.show()

barplot(metrics_df, ["mAP50","mAP50_95","Precision","Recall"],
        "YOLOv8 Model Performance Comparison", RUNS_DIR / "global_metrics.png")


### 5.3 Per-class AP Table & Plot

In [None]:

def per_class_ap(results):
    # results.box.map_class is not a public API; instead use results.results_dict or JSON files.
    # We'll try to read the per-class AP from saved JSON (coco-style) if available.
    # Fallback: empty.
    per_class = {}
    for r in [results]:
        # Find the last results JSON inside out_dir
        out_dir = Path(r.save_dir) if hasattr(r, 'save_dir') else r.files.get('save_dir', Path('.'))
        jsons = sorted(Path(out_dir).glob('*.json'))
        if not jsons:
            continue
        # Try to parse COCO-style eval json summary if present
        # If not available, we'll skip.
    return pd.DataFrame()

# Instead, as a practical approach for most users of ultralytics:
# We'll parse the text summary produced in results.txt (if exists) to extract per-class APs.
def parse_results_txt(results_dir: Path):
    txt = results_dir / "val" / "results.txt"
    if not txt.exists():
        # sometimes it's at <results_dir>/results.txt depending on version
        txt = results_dir / "results.txt"
    if not txt.exists():
        return None
    lines = txt.read_text().splitlines()
    # Heuristic parse: look for lines like "class, AP50, AP50-95" or similar
    rows = []
    for ln in lines:
        # Example (varies by version); adapt pattern as needed
        if "," in ln and "all" not in ln and "metrics" not in ln.lower():
            parts = [p.strip() for p in ln.split(",")]
            if len(parts) >= 3 and parts[0] and parts[1].replace('.','',1).isdigit():
                try:
                    rows.append(dict(Class=parts[0], AP50=float(parts[1]), AP50_95=float(parts[2])))
                except:
                    pass
    if rows:
        return pd.DataFrame(rows)
    return None

base_pc = parse_results_txt(BASE_RUN)
tuned_pc = parse_results_txt(TUNED_RUN)

if base_pc is not None and tuned_pc is not None:
    merged = base_pc.merge(tuned_pc, on="Class", how="outer", suffixes=(" (Base)"," (Tuned)")).fillna(0)
    display(merged.sort_values("Class"))
    merged.to_csv(RUNS_DIR / "per_class_ap_comparison.csv", index=False)

    # Plot AP50 per class (side-by-side)
    classes = merged["Class"]
    x = np.arange(len(classes))
    width = 0.35
    fig = plt.figure()
    plt.bar(x - width/2, merged["AP50 (Base)"], width, label="Base")
    plt.bar(x + width/2, merged["AP50 (Tuned)"], width, label="Tuned")
    plt.xticks(x, classes, rotation=45, ha="right")
    plt.title("Per-class AP50 Comparison")
    plt.ylabel("AP50")
    plt.legend()
    plt.tight_layout()
    plt.savefig(RUNS_DIR / "per_class_ap50.png", bbox_inches="tight")
    plt.show()
else:
    print("Per-class AP extraction skipped (could not parse results.txt).")


### 5.4 Confusion Matrices & PR Curves

In [None]:

def show_if_exists(img_path: Path, title: str):
    if img_path.exists():
        from PIL import Image
        im = Image.open(img_path)
        plt.figure()
        plt.imshow(im)
        plt.axis('off')
        plt.title(title)
        plt.show()
    else:
        print("Missing:", img_path)

# Common file names produced by ultralytics val(plots=True)
show_if_exists(BASE_RUN / "val" / "confusion_matrix.png", "Confusion Matrix — Base")
show_if_exists(TUNED_RUN / "val" / "confusion_matrix.png", "Confusion Matrix — Tuned")

show_if_exists(BASE_RUN / "val" / "PR_curve.png", "PR Curve — Base")
show_if_exists(TUNED_RUN / "val" / "PR_curve.png", "PR Curve — Tuned")


## 6. Qualitative Comparison (Side-by-Side Predictions)

In [None]:

# 6.1 Run predictions (same set of test images)
def pick_images(images_root: Path, n: int):
    imgs = []
    for sub in ["test","val","images/test","images/val","images"]:
        p = images_root / sub
        if p.exists():
            imgs.extend([*p.glob("*.jpg"), *p.glob("*.png")])
    random.shuffle(imgs)
    return imgs[:n]

def run_predict(model, imgs, out_dir: Path, imgsz=640):
    out_dir.mkdir(parents=True, exist_ok=True)
    # Save visualizations
    model.predict(source=[str(p) for p in imgs], save=True, project=str(out_dir), name="pred", imgsz=imgsz, conf=0.25, iou=0.45, verbose=False)

# Attempt to infer an images root
images_roots = list(DATA_ROOT.glob("bdd100k_images_10k/images"))
if images_roots:
    test_root = images_roots[0]
    sel_imgs = pick_images(test_root, N_QUAL if N_QUAL>0 else 0)
    print("Selected", len(sel_imgs), "images for qualitative viz")

    if sel_imgs:
        run_predict(base_model, sel_imgs, BASE_PRED, imgsz=IMGSZ)
        run_predict(tuned_model, sel_imgs, TUNED_PRED, imgsz=IMGSZ)
else:
    print("Could not find images root; qualitative step skipped.")


### 6.2 Build a Side-by-Side Grid

In [None]:

from PIL import Image, ImageOps

def collect_pred_images(pred_dir: Path):
    # Find rendered images produced by ultralytics (labels on image)
    cand = []
    for p in [pred_dir / "pred", pred_dir]:
        if p.exists():
            cand.extend([*p.glob("*.jpg"), *p.glob("*.png")])
    return sorted(cand)[:N_QUAL]

def make_grid(imgs_left, imgs_right, out_png: Path, max_per_row=4, pad=4):
    assert len(imgs_left) == len(imgs_right), "Left/right lists must align."
    pairs = list(zip(imgs_left, imgs_right))
    tiles = []
    # Build rows: [Base, Tuned] side-by-side per sample
    for left, right in pairs:
        L = Image.open(left).convert("RGB")
        R = Image.open(right).convert("RGB")
        # Same height padding
        H = max(L.height, R.height)
        L = ImageOps.pad(L, (L.width, H))
        R = ImageOps.pad(R, (R.width, H))
        combo = Image.new("RGB", (L.width + R.width + pad, H), (255,255,255))
        combo.paste(L, (0,0))
        combo.paste(R, (L.width + pad, 0))
        tiles.append(combo)

    # Determine grid layout
    rows = int(np.ceil(len(tiles)/max_per_row))
    colw = max(t.width for t in tiles)
    rowh = max(t.height for t in tiles)
    grid = Image.new("RGB", (colw*max_per_row, rowh*rows), (255,255,255))

    for idx, tile in enumerate(tiles):
        r, c = divmod(idx, max_per_row)
        grid.paste(tile, (c*colw, r*rowh))

    out_png.parent.mkdir(parents=True, exist_ok=True)
    grid.save(out_png)
    return out_png

if N_QUAL > 0 and (BASE_PRED.exists() and TUNED_PRED.exists()):
    left_imgs  = collect_pred_images(BASE_PRED / "pred")
    right_imgs = collect_pred_images(TUNED_PRED / "pred")
    if left_imgs and right_imgs and len(left_imgs)==len(right_imgs):
        out_grid = RUNS_DIR / "qualitative_side_by_side.png"
        make_grid(left_imgs, right_imgs, out_grid, max_per_row=MAX_PER_ROW)
        from IPython.display import Image as DispImage, display
        display(DispImage(filename=str(out_grid)))
    else:
        print("Qualitative grid skipped: prediction images not aligned/available.")
else:
    print("Qualitative grid skipped.")


## 7. Error Analysis Summary

In [None]:

# If available, we can summarize confusion matrices.
# For a rigorous programmatic analysis, parse underlying confusion matrix arrays if saved.
# As a practical approach, this notebook currently saves and shows them visually.
print("Review confusion matrices and PR curves above to identify common FP/FN patterns.")


## 8. Final Report

In [None]:

report_lines = []
def line(s): report_lines.append(s)

line("### Model Comparison — Summary")
line("")
line(metrics_df.to_markdown(index=False))
line("")
line("- **Higher is better** for mAP50/mAP50-95/Precision/Recall.")
line("- **Tuned** model should generally outperform **Base** if Optuna found stronger hyperparameters.")
line("- Inspect per-class APs to see which categories benefit most.")
line("- Review **Confusion Matrix** to spot common misclassifications and class imbalance effects.")
line("- Review **PR Curves** to compare precision–recall tradeoffs.")
line("- See the **side-by-side grid** for qualitative behavior differences.")

REPORT_MD = RUNS_DIR / "FINAL_REPORT.md"
RUNS_DIR.mkdir(exist_ok=True, parents=True)
REPORT_MD.write_text("\n".join(report_lines))
print("Saved report to", REPORT_MD.resolve())


## 9. Appendix — Training Curves (Optional)

In [None]:

# If you have training runs (e.g., runs/detect/train/metrics.csv),
# point to them here to overlay learning curves (loss, mAP, precision, recall).
TRAIN_CSV_BASE  = Path("runs/detect/train_base/metrics.csv")   # edit if available
TRAIN_CSV_TUNED = Path("runs/detect/train_tuned/metrics.csv")  # edit if available

def plot_training_curve(csv_path: Path, title: str, y_cols=('metrics/mAP50(B)','metrics/mAP50-95(B)')):
    if not csv_path.exists():
        print("Missing:", csv_path)
        return
    df = pd.read_csv(csv_path)
    for col in y_cols:
        if col in df.columns:
            plt.plot(df.index, df[col], label=col)
    plt.title(title)
    plt.xlabel("Epoch")
    plt.ylabel("Value")
    plt.legend()
    plt.tight_layout()
    plt.show()

plot_training_curve(TRAIN_CSV_BASE, "Training Metrics — Base")
plot_training_curve(TRAIN_CSV_TUNED, "Training Metrics — Tuned")



---

### How to use
1. Put the four files next to this notebook: the two `.pt` weights and the two `.zip` archives.
2. Run the **Environment Setup** cell to install dependencies (if needed).
3. Run the notebook top-to-bottom. All outputs will be saved under `runs_compare/`.
4. If your dataset layout differs, update the `DATA_YAML` path or generate a minimal YAML in **3.1**.

> This notebook is designed to mirror your previous "compare_epochs" workflow, but compares **two different models** on the **same dataset** with rich plots and a final report.
