# YOLO11 Two-Stage Training ‚Äì HT_Vision Dataset (Robust to Interruptions)

This notebook implements a **robust two-stage YOLO11 training pipeline** for the `HT_Vision_Dataset`.

Pipeline overview:

1. **Stage 1** ‚Äì Train YOLO11 at 640√ó640 with tuned hyperparameters.
2. **Stage 1 Evaluation** ‚Äì Evaluate on **val** and **test** splits, and inspect the **best** and **worst** detections (IoU with ground truth).
3. **Stage 2** ‚Äì Fine-tune starting from Stage 1 best weights at 960√ó960.
4. **Stage 2 Evaluation** ‚Äì Again evaluate on **val** and **test**, with best/worst detection inspection.
5. **Compare Stage 1 vs Stage 2 metrics**.
6. **Inference** ‚Äì Run inference with Stage 1 and Stage 2 on a **custom image**:

```text
/mnt/Data1/mpiccolo/HT_Vision/inference_pinksalmon.png
```

7. Prepare a cell for **unseen dataset evaluation** (path to be defined later).

Robustness design:

- Training does **not** skip just because `best.pt` exists.
- It checks how many epochs are recorded in `results.csv`.
- If epochs done `< target`, it continues training (starting from `last.pt` if available).
- If epochs done `>= target` and `best.pt` exists, training is skipped as *complete*.


## 1. Environment setup

Install and import the required libraries.

> If `ultralytics` is already installed in your environment, you can skip the `pip install` line.


In [12]:
# If ultralytics is not installed, uncomment the next line
# !pip install ultralytics --upgrade

import os
from pathlib import Path
import random
import json
import csv
import glob

import numpy as np
import torch
from ultralytics import YOLO

print(f"PyTorch version : {torch.__version__}")
print(f"CUDA available  : {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device     : {torch.cuda.get_device_name(0)}")


PyTorch version : 2.8.0+cu128
CUDA available  : True
CUDA device     : NVIDIA GeForce RTX 5070 Ti


## 2. Global configuration

Define dataset paths, training results folder, model weights, and common settings.

The `val` split specified in `data.yaml` will be used automatically by YOLO for validation, early stopping, and best-model selection.


In [2]:
# Base dataset directory (matches data.yaml path)
ROOT_DIR = Path("/mnt/Data1/mpiccolo/HT_Vision/HT_Vision_Dataset")

# Path to data.yaml
DATA_YAML = ROOT_DIR / "data.yaml"

# Root folder for structured training results (Stage1, Stage2)
TRAINING_ROOT = Path("/mnt/Data1/mpiccolo/HT_Vision/Training_Results")
TRAINING_ROOT.mkdir(parents=True, exist_ok=True)

# Stage-specific root folders
STAGE1_ROOT = TRAINING_ROOT / "Stage1"
STAGE2_ROOT = TRAINING_ROOT / "Stage2"
STAGE3_ROOT = TRAINING_ROOT / "Stage3"
STAGE1_ROOT.mkdir(parents=True, exist_ok=True)
STAGE2_ROOT.mkdir(parents=True, exist_ok=True)
STAGE3_ROOT.mkdir(parents=True, exist_ok=True)

# YOLO model checkpoint to start from (Stage 1)
YOLO11_WEIGHTS = "yolo11m.pt"

# Common settings
IMG_SIZE_STAGE1 = 640
IMG_SIZE_STAGE2 = 960
IMG_SIZE_STAGE3 = 1024

DEVICE = 0 if torch.cuda.is_available() else "cpu"
EXPERIMENT_BASE_NAME = "yolo11_ht_vision_fish"

# Inference image
INFERENCE_IMAGE = "/mnt/Data1/mpiccolo/HT_Vision/inference_pinksalmon.png"

print(f"DATA_YAML     : {DATA_YAML}")
print(f"TRAINING_ROOT : {TRAINING_ROOT}")
print(f"Stage1 root   : {STAGE1_ROOT}")
print(f"Stage2 root   : {STAGE2_ROOT}")
print(f"Stage3 root   : {STAGE3_ROOT}")
print(f"Using model   : {YOLO11_WEIGHTS}")
print(f"Device        : {DEVICE}")
print(f"Inference img : {INFERENCE_IMAGE}")


DATA_YAML     : /mnt/Data1/mpiccolo/HT_Vision/HT_Vision_Dataset/data.yaml
TRAINING_ROOT : /mnt/Data1/mpiccolo/HT_Vision/Training_Results
Stage1 root   : /mnt/Data1/mpiccolo/HT_Vision/Training_Results/Stage1
Stage2 root   : /mnt/Data1/mpiccolo/HT_Vision/Training_Results/Stage2
Stage3 root   : /mnt/Data1/mpiccolo/HT_Vision/Training_Results/Stage3
Using model   : yolo11m.pt
Device        : 0
Inference img : /mnt/Data1/mpiccolo/HT_Vision/inference_pinksalmon.png


## 3. Reproducibility

We set a fixed random seed to make experiments as reproducible as possible (Python `random`, NumPy, and PyTorch CPU/CUDA).


In [3]:
def set_seed(seed: int = 42):
    os.environ["PYTHONHASHSEED"] = str(seed)
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

SEED = 42
set_seed(SEED)
print(f"Seed set to {SEED}")


Seed set to 42


## 4. Dataset sanity checks

Verify that:

- `data.yaml` exists.
- `images/train`, `images/val`, and `images/test` exist.


In [4]:
assert DATA_YAML.is_file(), f"data.yaml not found at: {DATA_YAML}"

train_dir = ROOT_DIR / "images" / "train"
val_dir   = ROOT_DIR / "images" / "val"
test_dir  = ROOT_DIR / "images" / "test"

print("Checking dataset directories...")
for p in [train_dir, val_dir, test_dir]:
    print(f"  {p} -> {'OK' if p.is_dir() else 'MISSING'}")


Checking dataset directories...
  /mnt/Data1/mpiccolo/HT_Vision/HT_Vision_Dataset/images/train -> OK
  /mnt/Data1/mpiccolo/HT_Vision/HT_Vision_Dataset/images/val -> OK
  /mnt/Data1/mpiccolo/HT_Vision/HT_Vision_Dataset/images/test -> OK


## 5. Helper functions

We define helpers to:

- Run evaluation and save metrics to JSON.
- Export metrics to CSV.
- Analyze **best** and **worst** detections on the test split by IoU with ground truth bounding boxes.


In [5]:
def evaluate_and_save(weights_path: Path,
                      data_yaml: Path,
                      split: str,
                      imgsz: int,
                      project: Path,
                      name: str,
                      seed: int,
                      out_json: Path):
    # Run YOLO .val, save metrics to JSON, and return (metrics_dict, results_obj).
    model_eval = YOLO(str(weights_path))
    kwargs = dict(
        data=str(data_yaml),
        imgsz=imgsz,
        seed=seed,
        project=str(project),
        name=name,
        exist_ok=True,
        plots=True,
        device=DEVICE,
    )
    if split in ("train", "val", "test"):
        kwargs["split"] = split

    results_obj = model_eval.val(**kwargs)
    try:
        metrics = getattr(results_obj, "results_dict", results_obj) if results_obj is not None else {}
    except Exception:
        metrics = {}

    out_json.parent.mkdir(parents=True, exist_ok=True)
    with out_json.open("w") as f:
        json.dump(metrics, f, indent=2)
    return metrics, results_obj


def export_metrics_csv(metrics_val: dict, metrics_test: dict, csv_path: Path):
    # Save validation and test metrics to a CSV file.
    fields = sorted(set(metrics_val.keys()) | set(metrics_test.keys()))
    csv_path.parent.mkdir(parents=True, exist_ok=True)
    with csv_path.open("w", newline="") as f:
        writer = csv.writer(f)
        writer.writerow(["metric", "val", "test"])
        for k in fields:
            writer.writerow([k, metrics_val.get(k, ""), metrics_test.get(k, "")])
    print("Metrics CSV saved to", csv_path)


def box_iou_xyxy(box1, box2):
    # Compute IoU between two boxes in XYXY format (x1, y1, x2, y2).
    x1 = max(box1[0], box2[0])
    y1 = max(box1[1], box2[1])
    x2 = min(box1[2], box2[2])
    y2 = min(box1[3], box2[3])

    inter_w = max(0.0, x2 - x1)
    inter_h = max(0.0, y2 - y1)
    inter = inter_w * inter_h

    area1 = max(0.0, box1[2] - box1[0]) * max(0.0, box1[3] - box1[1])
    area2 = max(0.0, box2[2] - box2[0]) * max(0.0, box2[3] - box2[1])
    union = area1 + area2 - inter + 1e-6

    return inter / union


def load_gt_boxes_yolo_format(image_path: str, split: str, orig_shape):
    # Load ground truth boxes from labels/<split> in YOLO format and convert to XYXY pixels.
    h, w = orig_shape  # (height, width)
    label_dir = ROOT_DIR / "labels" / split
    label_path = label_dir / (Path(image_path).stem + ".txt")
    if not label_path.is_file():
        return []

    gt_boxes = []
    with label_path.open("r") as f:
        for line in f:
            parts = line.strip().split()
            if len(parts) < 5:
                continue
            cls, cx, cy, bw, bh = map(float, parts[:5])
            x_center = cx * w
            y_center = cy * h
            box_w = bw * w
            box_h = bh * h
            x1 = x_center - box_w / 2.0
            y1 = y_center - box_h / 2.0
            x2 = x_center + box_w / 2.0
            y2 = y_center + box_h / 2.0
            gt_boxes.append([x1, y1, x2, y2])
    return gt_boxes


def analyze_best_worst_detections(results_obj, split: str = "test", top_k: int = 2, min_iou: float = 0.0):
    # From a YOLO .val results object, compute IoU between predictions and GT and
    # print the best and worst detections (by IoU).
    if results_obj is None:
        print("No results object provided for analysis.")
        return

    img_results = getattr(results_obj, "results", None)
    if img_results is None:
        print("The results object does not contain per-image results.")
        return

    pairs = []  # list of dicts with keys: iou, image, pred_box, gt_box, score
    for r in img_results:
        image_path = r.path
        orig_shape = r.orig_shape  # (h, w)
        gt_boxes = load_gt_boxes_yolo_format(image_path, split=split, orig_shape=orig_shape)
        if not gt_boxes:
            continue

        if r.boxes is None or r.boxes.xyxy is None:
            continue

        pred_boxes = r.boxes.xyxy.cpu().numpy()
        pred_scores = r.boxes.conf.cpu().numpy()

        for i, pb in enumerate(pred_boxes):
            best_iou = 0.0
            best_gt = None
            for gb in gt_boxes:
                iou = box_iou_xyxy(pb, gb)
                if iou > best_iou:
                    best_iou = iou
                    best_gt = gb
            if best_gt is not None and best_iou >= min_iou:
                score = float(pred_scores[i]) if i < len(pred_scores) else None
                pairs.append(
                    {
                        "iou": float(best_iou),
                        "image": image_path,
                        "pred_box": pb.tolist(),
                        "gt_box": best_gt,
                        "score": score,
                    }
                )

    if not pairs:
        print("No prediction‚ÄìGT pairs found for analysis.")
        return

    pairs_sorted = sorted(pairs, key=lambda x: x["iou"])
    worst = pairs_sorted[:top_k]
    best = pairs_sorted[-top_k:] if len(pairs_sorted) >= top_k else pairs_sorted

    print(f"\n=== Best {len(best)} detections (highest IoU) ===")
    for p in reversed(best):
        print(f"Image: {p['image']}")
        if p["score"] is not None:
            print(f"  IoU : {p['iou']:.4f}, score: {p['score']:.4f}")
        else:
            print(f"  IoU : {p['iou']:.4f}")
        print(f"  Pred box: {p['pred_box']}")
        print(f"  GT box  : {p['gt_box']}\n")

    print(f"\n=== Worst {len(worst)} detections (lowest IoU) ===")
    for p in worst:
        print(f"Image: {p['image']}")
        if p["score"] is not None:
            print(f"  IoU : {p['iou']:.4f}, score: {p['score']:.4f}")
        else:
            print(f"  IoU : {p['iou']:.4f}")
        print(f"  Pred box: {p['pred_box']}")
        print(f"  GT box  : {p['gt_box']}\n")


## 6. Stage 1 ‚Äì Training at 640√ó640 (robust with epoch check)

We now train Stage 1 using your **final consolidated hyperparameters**:

```python
BEST_OPTIMIZER = "SGD"
BEST_DROPOUT = 0.1
BEST_LR0 = 0.00164
BEST_BOX_WEIGHT = 5.0
BEST_CLS_WEIGHT = 0.4
BEST_OBJ_WEIGHT = 1.0   # kobj

# Augmentation
BEST_MOSAIC = 0.9129
BEST_MIXUP = 0.4553
BEST_FLIPUD = 0.07835
BEST_FLIPLR = 0.5
BEST_HSV_H = 0.0083
BEST_HSV_S = 0.02738
BEST_HSV_V = 0.33474

# Schedule
BATCH_SIZE = 16
EPOCHS = 300
WARMUP_EPOCHS = 3
WARMUP_MOMENTUM = 0.8
```

Robustness:

- We check how many epochs are recorded in `results.csv`.
- If `epochs_done < EPOCHS_STAGE1`, we continue training (from `last.pt` if available).
- If `epochs_done >= EPOCHS_STAGE1` and `best.pt` exists, we skip training as *complete*.


In [7]:
# === Stage 1 Hyperparameters ===
BEST_OPTIMIZER = "SGD"
BEST_DROPOUT = 0.1
BEST_LR0 = 0.00164
BEST_BOX_WEIGHT = 5.0
BEST_CLS_WEIGHT = 0.4
BEST_OBJ_WEIGHT = 1.0   # kobj

# Augmentation
BEST_MOSAIC = 0.9129
BEST_MIXUP = 0.4553
BEST_FLIPUD = 0.07835
BEST_FLIPLR = 0.5
BEST_HSV_H = 0.0083
BEST_HSV_S = 0.02738
BEST_HSV_V = 0.33474

# Schedule
BATCH_SIZE_STAGE1 = 16
EPOCHS_STAGE1 = 300
WARMUP_EPOCHS = 3
WARMUP_MOMENTUM = 0.8

# Other training parameters
LRS_STAGE1_FINAL_MULT = 0.1
MOMENTUM_STAGE1 = 0.9
WEIGHT_DECAY_STAGE1 = 0.0005
DFL_STAGE1 = 2.0
PATIENCE_STAGE1 = 50

STAGE1_NAME = f"{EXPERIMENT_BASE_NAME}_stage1_640"
print("Stage 1 experiment name:", STAGE1_NAME)


Stage 1 experiment name: yolo11_ht_vision_fish_stage1_640


In [7]:
# === Stage 1: Robust training cell with epoch check ===

stage1_project = STAGE1_ROOT
stage1_exp_name = STAGE1_NAME

stage1_exp_dir = STAGE1_ROOT / STAGE1_NAME
stage1_best = stage1_exp_dir / "weights" / "best.pt"
stage1_last = stage1_exp_dir / "weights" / "last.pt"
stage1_results_csv = stage1_exp_dir / "results.csv"

print("Stage 1 experiment dir:", stage1_exp_dir)
print("Existing Stage 1 best.pt:", stage1_best.is_file())
print("Existing Stage 1 last.pt:", stage1_last.is_file())
print("Existing Stage 1 results.csv:", stage1_results_csv.is_file())

# How many epochs are already recorded?
epochs_done_s1 = 0
if stage1_results_csv.is_file():
    with stage1_results_csv.open("r") as f:
        lines = [ln for ln in f.readlines() if ln.strip()]
    if len(lines) > 1:
        epochs_done_s1 = len(lines) - 1

print(f"Stage 1 epochs recorded in results.csv: {epochs_done_s1} / {EPOCHS_STAGE1}")

stage1_train_args = dict(
    data=str(DATA_YAML),
    epochs=EPOCHS_STAGE1,
    imgsz=IMG_SIZE_STAGE1,
    batch=BATCH_SIZE_STAGE1,
    device=DEVICE,
    project=str(stage1_project),
    name=stage1_exp_name,
    optimizer=BEST_OPTIMIZER,
    lr0=BEST_LR0,
    lrf=LRS_STAGE1_FINAL_MULT,
    momentum=MOMENTUM_STAGE1,
    weight_decay=WEIGHT_DECAY_STAGE1,
    box=BEST_BOX_WEIGHT,
    cls=BEST_CLS_WEIGHT,
    kobj=BEST_OBJ_WEIGHT,
    dfl=DFL_STAGE1,
    dropout=BEST_DROPOUT,
    cos_lr=True,
    patience=PATIENCE_STAGE1,
    amp=True,
    cache="disk",
    warmup_epochs=WARMUP_EPOCHS,
    warmup_momentum=WARMUP_MOMENTUM,
    mosaic=BEST_MOSAIC,
    mixup=BEST_MIXUP,
    flipud=BEST_FLIPUD,
    fliplr=BEST_FLIPLR,
    hsv_h=BEST_HSV_H,
    hsv_s=BEST_HSV_S,
    hsv_v=BEST_HSV_V,
    copy_paste=0.0,
    scale=0.5,
    shear=2.0,
    max_det=1000,
    plots=True,
    save_period=25,
    verbose=True,
    seed=SEED,
    exist_ok=True,
)

if stage1_best.is_file() and epochs_done_s1 >= EPOCHS_STAGE1:
    print("\n--- Stage 1 appears complete (best.pt present and epochs_done >= target). Skipping training. ---")
else:
    if stage1_last.is_file():
        print("\n--- Resuming Stage 1 from last.pt with true YOLO resume ---")
        model_s1 = YOLO(str(stage1_last))
        # True internal resume: use original training args from the run
        results_s1 = model_s1.train(resume=True)
    else:
        print("\n--- Starting Stage 1 from base YOLO11 weights ---")
        model_s1 = YOLO(YOLO11_WEIGHTS)
        stage1_train_args["resume"] = False
        results_s1 = model_s1.train(**stage1_train_args)

    print("\n--- Stage 1 Training Complete ---")

# Refresh paths after training/skip
stage1_exp_dir = STAGE1_ROOT / STAGE1_NAME
stage1_best = stage1_exp_dir / "weights" / "best.pt"
stage1_last = stage1_exp_dir / "weights" / "last.pt"
print("Final Stage 1 best.pt exists:", stage1_best.is_file())


Stage 1 experiment dir: /mnt/Data1/mpiccolo/HT_Vision/Training_Results/Stage1/yolo11_ht_vision_fish_stage1_640
Existing Stage 1 best.pt: True
Existing Stage 1 last.pt: True
Existing Stage 1 results.csv: True
Stage 1 epochs recorded in results.csv: 201 / 300

--- Resuming Stage 1 from last.pt with true YOLO resume ---
New https://pypi.org/project/ultralytics/8.3.228 available üòÉ Update with 'pip install -U ultralytics'
Ultralytics 8.3.197 üöÄ Python-3.9.13 torch-2.8.0+cu128 CUDA:0 (NVIDIA GeForce RTX 5070 Ti, 15840MiB)
[34m[1mengine/trainer: [0magnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=5.0, cache=disk, cfg=None, classes=None, close_mosaic=10, cls=0.4, compile=False, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=True, cutmix=0.0, data=/mnt/Data1/mpiccolo/HT_Vision/HT_Vision_Dataset/data.yaml, degrees=0.0, deterministic=True, device=0, dfl=2.0, dnn=False, dropout=0.1, dynamic=False, embed=None, epochs=300, erasing=0.4, ex

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



[K    260/300      8.05G     0.6806     0.4837      1.396         41        640: 100% ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 1259/1259 6.1it/s 3:27<0.3s
[K                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 180/180 8.0it/s 22.6s0.1s
                   all       5752      15252      0.907       0.87      0.915       0.62
[34m[1mEarlyStopping: [0mTraining stopped early as no improvement observed in last 50 epochs. Best results observed at epoch 210, best model saved as best.pt.
To update EarlyStopping(patience=50) pass a new patience value, i.e. `patience=300` or use `patience=0` to disable EarlyStopping.

59 epochs completed in 3.781 hours.
Optimizer stripped from /mnt/Data1/mpiccolo/HT_Vision/Training_Results/Stage1/yolo11_ht_vision_fish_stage1_640/weights/last.pt, 40.5MB
Optimizer stripped from /mnt/Data1/mpiccolo/HT_Vision/Training_Results/Stage1/yolo11_ht_vision_fish_stage1_640/weights/best.pt, 40.5MB

## 7. Stage 1 ‚Äì Locate best checkpoint

Double-check that `best.pt` exists in the expected location.


In [8]:
# Refresh paths after training/skip
stage1_exp_dir = STAGE1_ROOT / STAGE1_NAME
stage1_best = stage1_exp_dir / "weights" / "best.pt"
stage1_last = stage1_exp_dir / "weights" / "last.pt"
print("Final Stage 1 best.pt exists:", stage1_best.is_file())

Final Stage 1 best.pt exists: True


In [9]:
print("Stage 1 experiment dir:", stage1_exp_dir)
print("Stage 1 best weights  :", stage1_best)
print("Stage 1 last weights  :", stage1_last)

assert stage1_best.is_file(), "Stage 1 best.pt not found - training may not have completed successfully."


Stage 1 experiment dir: /mnt/Data1/mpiccolo/HT_Vision/Training_Results/Stage1/yolo11_ht_vision_fish_stage1_640
Stage 1 best weights  : /mnt/Data1/mpiccolo/HT_Vision/Training_Results/Stage1/yolo11_ht_vision_fish_stage1_640/weights/best.pt
Stage 1 last weights  : /mnt/Data1/mpiccolo/HT_Vision/Training_Results/Stage1/yolo11_ht_vision_fish_stage1_640/weights/last.pt


## 8. Stage 1 ‚Äì Evaluation on val and test

We evaluate Stage 1 `best.pt` on:

- **Validation split** (`split="val"`)
- **Test split** (`split="test"`)

Then we:

- Save metrics to JSON and CSV.
- Analyze the **best 2** and **worst 2** detections on the **test split** based on IoU.


In [15]:
stage1_eval_dir = stage1_exp_dir / "eval"
stage1_eval_dir.mkdir(parents=True, exist_ok=True)

# Validation evaluation
s1_val_json = stage1_eval_dir / "metrics_val.json"
metrics_s1_val, s1_val_results_obj = evaluate_and_save(
    weights_path=stage1_best,
    data_yaml=DATA_YAML,
    split="val",
    imgsz=IMG_SIZE_STAGE1,
    project=stage1_eval_dir,
    name="val_eval_stage1",
    seed=SEED,
    out_json=s1_val_json,
)
print("Stage 1 validation metrics saved to", s1_val_json)

# Test evaluation
s1_test_json = stage1_eval_dir / "metrics_test.json"
metrics_s1_test, s1_test_results_obj = evaluate_and_save(
    weights_path=stage1_best,
    data_yaml=DATA_YAML,
    split="test",
    imgsz=IMG_SIZE_STAGE1,
    project=stage1_eval_dir,
    name="test_eval_stage1",
    seed=SEED,
    out_json=s1_test_json,
)
print("Stage 1 test metrics saved to", s1_test_json)

# Save CSV summary
s1_csv_path = stage1_eval_dir / "metrics_summary_stage1.csv"
export_metrics_csv(metrics_s1_val, metrics_s1_test, s1_csv_path)

print("\n--- Stage 1: Best and Worst Detections on Test Split ---")
analyze_best_worst_detections(s1_test_results_obj, split="test", top_k=2, min_iou=0.0)


Ultralytics 8.3.197 üöÄ Python-3.9.13 torch-2.8.0+cu128 CUDA:0 (NVIDIA GeForce RTX 5070 Ti, 15837MiB)
YOLO11m summary (fused): 125 layers, 20,030,803 parameters, 0 gradients, 67.6 GFLOPs
[34m[1mval: [0mFast image access ‚úÖ (ping: 0.0¬±0.0 ms, read: 6.6¬±7.2 MB/s, size: 81.9 KB)
[K[34m[1mval: [0mScanning /mnt/Data1/mpiccolo/HT_Vision/HT_Vision_Dataset/labels/val.cache... 5752 images, 455 backgrounds, 0 corrupt: 100% ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 5752/5752 14.8Mit/s 0.0s
[34m[1mval: [0m/mnt/Data1/mpiccolo/HT_Vision/HT_Vision_Dataset/images/val/fishclef_05829.jpg: 1 duplicate labels removed
[34m[1mval: [0m/mnt/Data1/mpiccolo/HT_Vision/HT_Vision_Dataset/images/val/fishclef_05830.jpg: 2 duplicate labels removed
[K                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 360/360 12.7it/s 28.3s<0.1s
                   all       5752      15252       0.91      0.867      0.918      0.624
Speed: 0.1

## 9.2. Stage 3 ‚Äì Fine-tuning from Stage 1 best at 1024x1024

In [None]:
# ===== STAGE 3 TRAINING =====

EPOCHS_STAGE3 = 150
STAGE3_NAME = f"{EXPERIMENT_BASE_NAME}_stage3_1024"

stage3_project = STAGE3_ROOT
stage3_exp_name = STAGE3_NAME
stage3_exp_dir = stage3_project / stage3_exp_name
stage3_weights_dir = stage3_exp_dir / "weights"

stage3_best = stage3_weights_dir / "best.pt"
stage3_last = stage3_weights_dir / "last.pt"
stage3_results_csv = stage3_exp_dir / "results.csv"

# Stage 1 best (fallback start for Stage 3)
STAGE1_NAME = f"{EXPERIMENT_BASE_NAME}_stage1_640"
stage1_exp_dir = STAGE1_ROOT / STAGE1_NAME
stage1_best = stage1_exp_dir / "weights" / "best.pt"

print(f"Stage 3 dir   : {stage3_exp_dir}")
print(f"Stage 3 best  : {stage3_best}")
print(f"Stage 3 last  : {stage3_last}")
print(f"Stage 3 CSV   : {stage3_results_csv}")

def get_last_epoch_from_csv(csv_path: Path) -> int:
    if not csv_path.is_file():
        return 0
    last_epoch = 0
    with csv_path.open("r") as f:
        reader = csv.reader(f)
        next(reader, None)  # header
        for row in reader:
            if not row:
                continue
            try:
                e = int(row[0])
                if e > last_epoch:
                    last_epoch = e
            except Exception:
                continue
    return last_epoch

epochs_done_s3 = get_last_epoch_from_csv(stage3_results_csv)
print(f"Stage 3 epochs in CSV (last): {epochs_done_s3} / {EPOCHS_STAGE3}")

stage3_train_args = dict(
    data=str(DATA_YAML),
    epochs=EPOCHS_STAGE3,
    imgsz=IMG_SIZE_STAGE3,
    batch=8,
    device=DEVICE,
    project=str(stage3_project),
    name=stage3_exp_name,
    resume=False,
    optimizer="SGD",
    lr0=0.00164,
    lrf=0.1,
    momentum=0.9,
    weight_decay=0.0005,
    box=5.0,
    cls=0.4,
    kobj=1.0,
    dfl=2.0,
    dropout=0.1,
    cos_lr=True,
    patience=50,
    amp=True,
    cache="disk",
    warmup_epochs=3,
    warmup_momentum=0.8,
    mosaic=0.9129,
    mixup=0.4553,
    flipud=0.07835,
    fliplr=0.5,
    hsv_h=0.0083,
    hsv_s=0.02738,
    hsv_v=0.33474,
    copy_paste=0.0,
    scale=0.5,
    shear=2.0,
    max_det=1000,
    plots=True,
    save_period=5,      # <<< save every 5 epochs
    verbose=True,
    seed=SEED,
    exist_ok=True,
)

# If already complete, skip
if stage3_best.is_file() and epochs_done_s3 >= EPOCHS_STAGE3:
    print("\n--- Stage 3 appears complete (best.pt present and epochs_done >= target). Skipping. ---")
else:
    model_s3 = None
    resume_flag = False
    start_epoch = 0

    # 1) Try to resume from last.pt
    if stage3_last.is_file():
        print("\n--- Trying to resume Stage 3 from last.pt ---")
        try:
            model_s3 = YOLO(str(stage3_last))
            resume_flag = True
            print("Loaded Stage 3 last.pt; will resume.")
        except Exception as e:
            print(f"WARNING: Stage 3 last.pt corrupted ({e}). Deleting and falling back.")
            try:
                stage3_last.unlink()
                print("Deleted corrupted last.pt.")
            except Exception as del_e:
                print(f"WARNING: Could not delete last.pt: {del_e}")
            model_s3 = None
            resume_flag = False

    # 2) If last.pt unusable, use latest epoch*.pt <= epochs_done_s3
    if model_s3 is None:
        epoch_ckpts = []
        if stage3_weights_dir.is_dir():
            for p in stage3_weights_dir.glob("epoch*.pt"):
                # expects names like epoch5.pt, epoch10.pt, ...
                stem = p.stem  # 'epoch5'
                try:
                    ep = int(stem.replace("epoch", ""))
                    epoch_ckpts.append((ep, p))
                except ValueError:
                    continue

        valid = [(ep, p) for ep, p in epoch_ckpts if ep <= epochs_done_s3]
        if valid:
            valid.sort(key=lambda x: x[0], reverse=True)
            start_epoch, ckpt = valid[0]
            print(f"\n--- Resuming Stage 3 from {ckpt.name} (epoch {start_epoch}) ---")
            model_s3 = YOLO(str(ckpt))
            resume_flag = False
        else:
            # 3) Fallback: Stage 3 best, else Stage 1 best
            print("\n--- No usable epoch*.pt <= CSV epoch; falling back to Stage 3 best or Stage 1 best ---")
            if stage3_best.is_file():
                print("Starting Stage 3 from its own best.pt")
                model_s3 = YOLO(str(stage3_best))
            else:
                print("Stage 3 best.pt not found; starting from Stage 1 best.pt")
                assert stage1_best.is_file(), "Stage 1 best.pt not found - cannot start Stage 3."
                model_s3 = YOLO(str(stage1_best))
            start_epoch = min(epochs_done_s3, EPOCHS_STAGE3)
            resume_flag = False

    if model_s3 is None:
        raise RuntimeError("No valid model for Stage 3 could be created.")

    # === TRAINING ===
    if resume_flag:
        print(f"\nResuming Stage 3 with resume=True up to epoch {EPOCHS_STAGE3}.")
        stage3_train_args["resume"] = True
        stage3_train_args["epochs"] = EPOCHS_STAGE3
        results_s3 = model_s3.train(**stage3_train_args)
        print("\n--- Stage 3 training complete (resumed from last.pt) ---")
    else:
        print(f"\nStarting Stage 3 from logical epoch {start_epoch}, continuing to {EPOCHS_STAGE3}.")
        # 1) Preserve CSV rows up to start_epoch
        old_header, old_rows = None, []
        if stage3_results_csv.is_file():
            with stage3_results_csv.open("r") as f:
                reader = csv.reader(f)
                old_header = next(reader, None)
                for row in reader:
                    if not row:
                        continue
                    try:
                        e = int(row[0])
                        if e <= start_epoch:
                            old_rows.append(row)
                    except Exception:
                        continue
            print(f"Kept {len(old_rows)} CSV rows with epoch <= {start_epoch}.")
        else:
            print("No previous Stage 3 CSV; starting fresh.")

        epochs_remaining = max(EPOCHS_STAGE3 - start_epoch, 0)
        if epochs_remaining <= 0:
            print(f"No epochs remaining (start_epoch={start_epoch}, target={EPOCHS_STAGE3}). Skipping training.")
        else:
            print(f"Training {epochs_remaining} epochs (logical {start_epoch+1}..{EPOCHS_STAGE3}).")
            stage3_train_args["resume"] = False
            stage3_train_args["epochs"] = epochs_remaining

            results_s3 = model_s3.train(**stage3_train_args)
            print("\n--- Stage 3 training complete (from checkpoint or Stage1 best) ---")

            # 2) Merge new CSV with shifted epochs
            if stage3_results_csv.is_file():
                with stage3_results_csv.open("r") as f:
                    reader = csv.reader(f)
                    new_header = next(reader, None)
                    new_rows = [r for r in reader if r]

                header = old_header or new_header
                shifted_new_rows = []
                for row in new_rows:
                    try:
                        e = int(row[0])
                    except Exception:
                        continue
                    row[0] = str(start_epoch + e)
                    shifted_new_rows.append(row)

                with stage3_results_csv.open("w", newline="") as f:
                    writer = csv.writer(f)
                    if header:
                        writer.writerow(header)
                    writer.writerows(old_rows)
                    writer.writerows(shifted_new_rows)

                print(f"Stage 3 CSV updated with continuous epoch count up to {EPOCHS_STAGE3}.")
            else:
                print("WARNING: Stage 3 CSV missing after training; cannot merge epochs.")

# Refresh and report
stage3_best = stage3_weights_dir / "best.pt"
stage3_last = stage3_weights_dir / "last.pt"
print("Final Stage 3 best.pt exists:", stage3_best.is_file())
print("Final Stage 3 last.pt exists:", stage3_last.is_file())

Stage 3 dir   : /mnt/Data1/mpiccolo/HT_Vision/Training_Results/Stage3/yolo11_ht_vision_fish_stage3_1024
Stage 3 best  : /mnt/Data1/mpiccolo/HT_Vision/Training_Results/Stage3/yolo11_ht_vision_fish_stage3_1024/weights/best.pt
Stage 3 last  : /mnt/Data1/mpiccolo/HT_Vision/Training_Results/Stage3/yolo11_ht_vision_fish_stage3_1024/weights/last.pt
Stage 3 CSV   : /mnt/Data1/mpiccolo/HT_Vision/Training_Results/Stage3/yolo11_ht_vision_fish_stage3_1024/results.csv
Stage 3 epochs in CSV (last): 0 / 150

--- No usable epoch*.pt <= CSV epoch; falling back to Stage 3 best or Stage 1 best ---
Stage 3 best.pt not found; starting from Stage 1 best.pt

Starting Stage 3 from logical epoch 0, continuing to 150.
No previous Stage 3 CSV; starting fresh.
Training 150 epochs (logical 1..150).
New https://pypi.org/project/ultralytics/8.3.231 available üòÉ Update with 'pip install -U ultralytics'
Ultralytics 8.3.197 üöÄ Python-3.9.13 torch-2.8.0+cu128 CUDA:0 (NVIDIA GeForce RTX 5070 Ti, 15837MiB)
[34m[1men

## 10. Stage 3 ‚Äì Locate best checkpoint


In [16]:
print("Stage 3 experiment dir:", stage3_exp_dir)
print("Stage 3 best weights  :", stage3_best)
print("Stage 3 last weights  :", stage3_last)

assert stage3_best.is_file(), "Stage 3 best.pt not found - fine-tuning may not have completed successfully."


Stage 3 experiment dir: /mnt/Data1/mpiccolo/HT_Vision/Training_Results/Stage3/yolo11_ht_vision_fish_stage3_1024
Stage 3 best weights  : /mnt/Data1/mpiccolo/HT_Vision/Training_Results/Stage3/yolo11_ht_vision_fish_stage3_1024/weights/best.pt
Stage 3 last weights  : /mnt/Data1/mpiccolo/HT_Vision/Training_Results/Stage3/yolo11_ht_vision_fish_stage3_1024/weights/last.pt


## 11. Stage 3 ‚Äì Evaluation on val and test

Repeat the evaluation pipeline for Stage 3:

- Evaluate on **val** and **test**.
- Save metrics (JSON + CSV).
- Print **best 3** and **worst 3** detections on test.


In [17]:
stage3_eval_dir = stage3_exp_dir / "eval"
stage3_eval_dir.mkdir(parents=True, exist_ok=True)

# Validation evaluation
s3_val_json = stage3_eval_dir / "metrics_val.json"
metrics_s3_val, s3_val_results_obj = evaluate_and_save(
    weights_path=stage3_best,
    data_yaml=DATA_YAML,
    split="val",
    imgsz=IMG_SIZE_STAGE3,
    project=stage3_eval_dir,
    name="val_eval_stage3",
    seed=SEED,
    out_json=s3_val_json,
)
print("Stage 3 validation metrics saved to", s3_val_json)

# Test evaluation
s3_test_json = stage3_eval_dir / "metrics_test.json"
metrics_s3_test, s3_test_results_obj = evaluate_and_save(
    weights_path=stage3_best,
    data_yaml=DATA_YAML,
    split="test",
    imgsz=IMG_SIZE_STAGE3,
    project=stage3_eval_dir,
    name="test_eval_stage3",
    seed=SEED,
    out_json=s3_test_json,
)
print("Stage 3 test metrics saved to", s3_test_json)

# Save CSV summary
s3_csv_path = stage3_eval_dir / "metrics_summary_stage3.csv"
export_metrics_csv(metrics_s3_val, metrics_s3_test, s3_csv_path)

print("\n--- Stage 3: Best and Worst Detections on Test Split ---")
analyze_best_worst_detections(s3_test_results_obj, split="test", top_k=2, min_iou=0.0)


Ultralytics 8.3.197 üöÄ Python-3.9.13 torch-2.8.0+cu128 CUDA:0 (NVIDIA GeForce RTX 5070 Ti, 15837MiB)
YOLO11m summary (fused): 125 layers, 20,030,803 parameters, 0 gradients, 67.6 GFLOPs
[34m[1mval: [0mFast image access ‚úÖ (ping: 0.0¬±0.0 ms, read: 7.7¬±12.1 MB/s, size: 42.5 KB)
[K[34m[1mval: [0mScanning /mnt/Data1/mpiccolo/HT_Vision/HT_Vision_Dataset/labels/val.cache... 5752 images, 455 backgrounds, 0 corrupt: 100% ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 5752/5752 14.8Mit/s 0.0s
[34m[1mval: [0m/mnt/Data1/mpiccolo/HT_Vision/HT_Vision_Dataset/images/val/fishclef_05829.jpg: 1 duplicate labels removed
[34m[1mval: [0m/mnt/Data1/mpiccolo/HT_Vision/HT_Vision_Dataset/images/val/fishclef_05830.jpg: 2 duplicate labels removed
[K                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 360/360 5.6it/s 1:04<0.2ss
                   all       5752      15252      0.907      0.866      0.913      0.621
Speed: 0.2

## 12. Compare Stage 1 vs Stage 3 metrics

Here we build a quick comparison table from the **test** metrics of Stage 1 and Stage 3.


In [20]:
import pandas as pd

all_keys = sorted(set(metrics_s1_test.keys()) | set(metrics_s3_test.keys()))
rows = []
for k in all_keys:
    s1_val = metrics_s1_test.get(k, None)
    s3_val = metrics_s3_test.get(k, None)
    rows.append({
        "metric": k,
        "Stage1_test": s1_val,
        "Stage3_test": s3_val,
    })

df_compare = pd.DataFrame(rows)

# Delta column (Stage3 - Stage1)
df_compare["Delta_Stage3_minus_Stage1"] = (
    df_compare["Stage3_test"] - df_compare["Stage1_test"]
)

display(df_compare)

comparison_csv = TRAINING_ROOT / "stage1_vs_stage3_test_metrics.csv"
df_compare.to_csv(comparison_csv, index=False)
print("Stage 1 vs Stage 3 test metrics comparison saved to:", comparison_csv)

Unnamed: 0,metric,Stage1_test,Stage3_test,Delta_Stage3_minus_Stage1
0,fitness,0.645356,0.642022,-0.003334
1,metrics/mAP50(B),0.917138,0.912384,-0.004754
2,metrics/mAP50-95(B),0.615158,0.611982,-0.003176
3,metrics/precision(B),0.911953,0.91066,-0.001293
4,metrics/recall(B),0.866394,0.858068,-0.008326


Stage 1 vs Stage 3 test metrics comparison saved to: /mnt/Data1/mpiccolo/HT_Vision/Training_Results/stage1_vs_stage3_test_metrics.csv


## 13. Inference Demo ‚Äì Stage 1 and Stage 3

We now run inference for both Stage 1 and Stage 3 **best models** on the custom image:

```text
/mnt/Data1/mpiccolo/HT_Vision/inference_pinksalmon.png
```

Results (annotated images) are saved under the respective stage folders.


In [21]:
def run_inference(weights_path: Path, img_path: str, imgsz: int, project: Path, name: str):
    if not os.path.isfile(img_path):
        print(f"Inference image not found: {img_path}")
        return
    model_inf = YOLO(str(weights_path))
    print(f"Running inference with {weights_path} on {img_path} ...")
    _ = model_inf.predict(
        source=img_path,
        imgsz=imgsz,
        conf=0.25,
        max_det=100,
        device=DEVICE,
        project=str(project),
        name=name,
        save=True,
    )
    print("Inference complete. Check the output folder above.\n")

# Stage 1 inference
run_inference(
    weights_path=stage1_best,
    img_path=INFERENCE_IMAGE,
    imgsz=IMG_SIZE_STAGE1,
    project=STAGE1_ROOT,
    name=f"inference_{STAGE1_NAME}",
)

# Stage 3 inference
run_inference(
    weights_path=stage3_best,
    img_path=INFERENCE_IMAGE,
    imgsz=IMG_SIZE_STAGE3,
    project=STAGE3_ROOT,
    name=f"inference_{STAGE3_NAME}",
)


Running inference with /mnt/Data1/mpiccolo/HT_Vision/Training_Results/Stage1/yolo11_ht_vision_fish_stage1_640/weights/best.pt on /mnt/Data1/mpiccolo/HT_Vision/inference_pinksalmon.png ...

image 1/1 /mnt/Data1/mpiccolo/HT_Vision/inference_pinksalmon.png: 352x640 33 fishs, 29.4ms
Speed: 1.1ms preprocess, 29.4ms inference, 0.9ms postprocess per image at shape (1, 3, 352, 640)
Results saved to [1m/mnt/Data1/mpiccolo/HT_Vision/Training_Results/Stage1/inference_yolo11_ht_vision_fish_stage1_640[0m
Inference complete. Check the output folder above.

Running inference with /mnt/Data1/mpiccolo/HT_Vision/Training_Results/Stage3/yolo11_ht_vision_fish_stage3_1024/weights/best.pt on /mnt/Data1/mpiccolo/HT_Vision/inference_pinksalmon.png ...

image 1/1 /mnt/Data1/mpiccolo/HT_Vision/inference_pinksalmon.png: 544x1024 34 fishs, 30.8ms
Speed: 2.3ms preprocess, 30.8ms inference, 1.1ms postprocess per image at shape (1, 3, 544, 1024)
Results saved to [1m/mnt/Data1/mpiccolo/HT_Vision/Training_Results/S

### 14. Evaluation on Unseen Datasets

This section is prepared for evaluating Stage 1 and Stage 3 on an **unseen dataset**.

Once you have the unseen dataset ready, create a `data.yaml` for it and set its path in the next cell, e.g.:

```python
UNSEEN_DATA_YAML = Path("/path/to/unseen/data.yaml")
```

Then you can run evaluations for both stages on that unseen data.


In [22]:
UNSEEN_DATASETS = {
    "Fish_Video_Object_Tracking_Kaggle": Path(
        "/mnt/Data1/mpiccolo/HT_Vision/HT_Vision_Unseen_ds/Fish_Video_Object_Tracking_Kaggle/data.yaml"
    ),
    "Kaggle_Fish_Dataset": Path(
        "/mnt/Data1/mpiccolo/HT_Vision/HT_Vision_Unseen_ds/Kaggle_Fish_Dataset/data.yaml"
    ),
}

In [25]:
unseen_root = TRAINING_ROOT / "Unseen"
unseen_root.mkdir(parents=True, exist_ok=True)

for ds_name, yaml_path in UNSEEN_DATASETS.items():
    if not yaml_path.is_file():
        print(f"[{ds_name}] data.yaml not found, skipping:", yaml_path)
        continue

    print(f"[{ds_name}] Running unseen evaluation on", yaml_path)

    ds_root = unseen_root / ds_name
    ds_root.mkdir(parents=True, exist_ok=True)

    # ---------- Stage 1 on unseen ----------
    s1_unseen_dir = ds_root / "Stage1"
    s1_unseen_json = s1_unseen_dir / "metrics_unseen_stage1.json"
    metrics_s1_unseen, _ = evaluate_and_save(
        weights_path=stage1_best,
        data_yaml=yaml_path,
        split="test",
        imgsz=IMG_SIZE_STAGE1,
        project=s1_unseen_dir,
        name=f"{ds_name}_unseen_stage1",
        seed=SEED,
        out_json=s1_unseen_json,
    )
    print(f"[{ds_name}] Stage 1 unseen metrics saved to", s1_unseen_json)

    # ---------- Stage 3 on unseen ----------
    s3_unseen_dir = ds_root / "Stage3"
    s3_unseen_json = s3_unseen_dir / "metrics_unseen_stage3.json"
    metrics_s3_unseen, _ = evaluate_and_save(
        weights_path=stage3_best,        
        data_yaml=yaml_path,
        split="test",                     
        imgsz=IMG_SIZE_STAGE3,          
        project=s3_unseen_dir,
        name=f"{ds_name}_unseen_stage3",
        seed=SEED,
        out_json=s3_unseen_json,
    )
    print(f"[{ds_name}] Stage 3 unseen metrics saved to", s3_unseen_json)

    # ---------- CSV comparison: Stage 1 vs Stage 3 ----------
    unseen_csv = ds_root / "stage1_vs_stage3_unseen_metrics.csv"
    export_metrics_csv(metrics_s1_unseen, metrics_s3_unseen, unseen_csv)
    print(f"[{ds_name}] Stage1 vs Stage3 unseen metrics CSV:", unseen_csv)


[Fish_Video_Object_Tracking_Kaggle] Running unseen evaluation on /mnt/Data1/mpiccolo/HT_Vision/HT_Vision_Unseen_ds/Fish_Video_Object_Tracking_Kaggle/data.yaml
Ultralytics 8.3.197 üöÄ Python-3.9.13 torch-2.8.0+cu128 CUDA:0 (NVIDIA GeForce RTX 5070 Ti, 15837MiB)
YOLO11m summary (fused): 125 layers, 20,030,803 parameters, 0 gradients, 67.6 GFLOPs
[34m[1mval: [0mFast image access ‚úÖ (ping: 0.0¬±0.0 ms, read: 39.6¬±5.8 MB/s, size: 499.4 KB)
[K[34m[1mval: [0mScanning /mnt/Data1/mpiccolo/HT_Vision/HT_Vision_Unseen_ds/Fish_Video_Object_Tracking_Kaggle/labels... 99 images, 0 backgrounds, 0 corrupt: 100% ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 99/99 206.4it/s 0.5s0.0s
[34m[1mval: [0mNew cache created: /mnt/Data1/mpiccolo/HT_Vision/HT_Vision_Unseen_ds/Fish_Video_Object_Tracking_Kaggle/labels.cache
[K                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 7/7 4.4it/s 1.6s0.2s
                   all         99    