# Stage 0: Input Validation with YOLOv11 üöÄ

**Goal:** Train a robust classifier to distinguish between `Wound` (Relevant) and `Background` (Irrelevant).
**Model:** YOLOv11n-cls (Nano Classifier) - Pretrained on ImageNet.
**Data:** 
- **Positive:** Wound Dataset (7 Classes collapsed to 'wound')
- **Negative:** High-Res Random Images (Lorem Picsum)

In [1]:
import os
import shutil
from pathlib import Path
import pandas as pd
from sklearn.model_selection import train_test_split
from ultralytics import YOLO
import glob
import gc
import torch

# Config
BASE_DIR = Path("../").resolve()
DATA_DIR = BASE_DIR / "data"
RAW_WOUND_DIR = DATA_DIR / "raw/type_classification"
RAW_BG_DIR = DATA_DIR / "raw/background_class_highres"
YOLO_DATASET_DIR = DATA_DIR / "processed/yolo_stage0"

# Cleanup Memory from previous failed runs
gc.collect()
torch.cuda.empty_cache()

# Ensure Clean Start
if YOLO_DATASET_DIR.exists():
    pass
YOLO_DATASET_DIR.mkdir(parents=True, exist_ok=True)

## 1. Prepare Dataset Structure üìÅ
YOLO Classification Format:
```
root/
  train/
    wound/
    background/
  val/
    wound/
    background/
```

In [2]:
def prepare_yolo_dataset():
    # Check if already exists to save time
    if (YOLO_DATASET_DIR / 'train').exists():
        print("Dataset already exists. Skipping preparation.")
        return

    print("Preparing YOLO Dataset Structure...")
    
    # 1. Collect Wound Images (Positive)
    wound_images = []
    for ext in ['jpg', 'jpeg', 'png']:
        wound_images.extend(glob.glob(str(RAW_WOUND_DIR / f"*/*.{ext}")))
    
    # 2. Collect Background Images (Negative)
    bg_images = []
    for ext in ['jpg', 'jpeg', 'png']:
        bg_images.extend(glob.glob(str(RAW_BG_DIR / f"*.{ext}")))
        
    print(f"Found {len(wound_images)} Wound images")
    print(f"Found {len(bg_images)} Background images")
    
    # 4. Split Train/Val
    df_w = pd.DataFrame({'path': wound_images, 'label': 'wound'})
    df_b = pd.DataFrame({'path': bg_images, 'label': 'background'})
    df = pd.concat([df_w, df_b])
    
    train_df, val_df = train_test_split(df, test_size=0.2, stratify=df['label'], random_state=42)
    
    # 5. Copy Files
    for split_name, split_df in [('train', train_df), ('val', val_df)]:
        for _, row in split_df.iterrows():
            src = Path(row['path'])
            label = row['label']
            
            dst_dir = YOLO_DATASET_DIR / split_name / label
            dst_dir.mkdir(parents=True, exist_ok=True)
            
            shutil.copy(src, dst_dir / src.name)
            
    print("Dataset Preparation Complete! ‚úÖ")

prepare_yolo_dataset()

Dataset already exists. Skipping preparation.


## 2. Train YOLOv11 üèãÔ∏è‚Äç‚ôÄÔ∏è
We use the pre-trained `yolo11n-cls.pt`.
**Config (STABLE):** 
- `workers=0`: STRICTLY REQUIRED. Do not change. Windows hangs otherwise.
- `batch=64`: Safe batch size to avoid `Insufficient memory` errors.
**Crash Prevention:** `workers=4` crashed with OutOfMemory. We must stick to `workers=0`.

In [3]:
# Load Pretrained Model
model = YOLO('yolo11n-cls.pt')

# Train
results = model.train(
    data=str(YOLO_DATASET_DIR), 
    epochs=5, 
    imgsz=224, 
    batch=64,       # SAFE DEFAULT
    project='../models',
    name='stage0_yolo_v11',
    exist_ok=True,
    workers=4,      # FIXED: REQUIRED FOR WINDOWS
    device=0
)

Ultralytics 8.3.245  Python-3.12.7 torch-2.6.0+cu124 CUDA:0 (NVIDIA GeForce RTX 3050 Laptop GPU, 4096MiB)
[34m[1mengine\trainer: [0magnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=64, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, compile=False, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=F:\Housepital-AI\Housepital-AI\AI_Pipeline_V2\data\processed\yolo_stage0, degrees=0.0, deterministic=True, device=0, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=5, erasing=0.4, exist_ok=True, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=224, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolo11n-cls.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=stage0_yolo_v11, nbs=64, nms=False, opset=None, optimize=Fa

## 3. Quick Validation
**Note:** Using `workers=0` here to ensure stability during final validation.

In [4]:
# Explicitly force workers=0 to prevent hang during standalone validation
metrics = model.val(workers=0)
print(f"Top-1 Accuracy: {metrics.top1:.4f}")

Ultralytics 8.3.245  Python-3.12.7 torch-2.6.0+cu124 CUDA:0 (NVIDIA GeForce RTX 3050 Laptop GPU, 4096MiB)
YOLO11n-cls summary (fused): 47 layers, 1,528,586 parameters, 0 gradients, 3.2 GFLOPs
[34m[1mtrain:[0m F:\Housepital-AI\Housepital-AI\AI_Pipeline_V2\data\processed\yolo_stage0\train... found 20722 images in 2 classes  
[34m[1mval:[0m F:\Housepital-AI\Housepital-AI\AI_Pipeline_V2\data\processed\yolo_stage0\val... found 5181 images in 2 classes  
[34m[1mtest:[0m None...
[34m[1mval: [0mFast image access  (ping: 0.10.1 ms, read: 473.7218.7 MB/s, size: 59.0 KB)
[K[34m[1mval: [0mScanning F:\Housepital-AI\Housepital-AI\AI_Pipeline_V2\data\processed\yolo_stage0\val... 5181 images, 0 corrupt: 100% ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 5181/5181 5.2Mit/s 0.0s
[K               classes   top1_acc   top5_acc: 100% ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 324/324 8.1it/s 40.0s<0.1ss
                   all      0.998          1
Speed: 0.2ms preprocess, 2.1ms inference, 0.0ms loss, 0.0

## 4. Export & Save


In [5]:
print(f"Best model saved at: {model.trainer.best}")

Best model saved at: F:\Housepital-AI\Housepital-AI\AI_Pipeline_V2\models\stage0_yolo_v11\weights\best.pt
