# Notebook 2: YOLOv8 Object Detection

This notebook demonstrates **YOLO-World** object detection on captured frames.

## What's Different About YOLO-World?
- **Zero-shot detection**: Can detect ANY object by name (not limited to 80 COCO classes)
- **Custom classes**: We can add "door", "stairs", "obstacle" etc.
- **CLIP-based**: Uses language-vision model for flexible detection

## What You'll Learn
- Loading YOLO-World model with custom classes
- Running inference on images
- Visualizing detection results
- Analyzing detection accuracy

In [None]:
import sys
sys.path.insert(0, '..')

import cv2
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
from datetime import datetime
import time

from src.config import Config, DetectionConfig
from src.detector import Detector, Detection
from src.utils import Timer

print("Imports successful!")

## Load YOLO-World Model

Using YOLO-World with custom classes relevant for visually impaired navigation:

In [None]:
# Use YOLO-World for custom class detection (including doors!)
config = DetectionConfig(
    model="yolov8s-world.pt",  # YOLO-World model
    confidence=0.2,            # Lower threshold for better recall
    classes=[
        "door", "person", "chair", "table", "stairs", 
        "wall", "window", "car", "bicycle", "obstacle"
    ]
)

detector = Detector(config)

print("Loading YOLO-World model...")
start = time.time()
if detector.load():
    print(f"Model loaded in {time.time()-start:.2f}s")
    print(f"Custom classes: {detector.class_names}")
else:
    print("Failed to load model")

## List Available Classes

In [None]:
print("Custom Classes (YOLO-World can detect any object by name):")
for i, name in enumerate(detector.class_names):
    print(f"  {i:2d}: {name}")

print(f"\nNote: YOLO-World uses CLIP for zero-shot detection.")
print("You can add any class name and it will try to detect it!")

## Load Test Images

Load images from captures or use sample images

In [None]:
captures_dir = Path("../data/captures")
samples_dir = Path("../data/sample_images")

image_files = list(captures_dir.glob("*.jpg")) + list(samples_dir.glob("*.jpg"))

if not image_files:
    print("No images found. Creating a test image...")
    test_img = np.zeros((480, 640, 3), dtype=np.uint8)
    test_img[:] = (128, 128, 128)
    cv2.putText(test_img, "Test Image", (200, 240), 
                cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
    samples_dir.mkdir(parents=True, exist_ok=True)
    cv2.imwrite(str(samples_dir / "test.jpg"), test_img)
    image_files = [samples_dir / "test.jpg"]

print(f"Found {len(image_files)} images:")
for f in image_files[:5]:
    print(f"  - {f.name}")

## Run Detection on Single Image

In [None]:
if image_files:
    test_image_path = image_files[0]
    print(f"Processing: {test_image_path.name}")
    
    frame = cv2.imread(str(test_image_path))
    print(f"Image shape: {frame.shape}")
    
    timer = Timer("detection")
    timer.start()
    detections = detector.detect(frame)
    inference_time = timer.stop()
    
    print(f"\nInference time: {inference_time*1000:.1f}ms")
    print(f"Detections: {len(detections)}")
    
    for det in detections:
        print(f"  - {det.class_name}: {det.confidence:.2f} at {det.bbox}")

## Visualize Detection Results

In [None]:
if image_files and frame is not None:
    result_frame = detector.draw_detections(frame, detections)
    
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    axes[0].imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
    axes[0].set_title("Original Image")
    axes[0].axis('off')
    
    axes[1].imshow(cv2.cvtColor(result_frame, cv2.COLOR_BGR2RGB))
    axes[1].set_title(f"Detection Result ({len(detections)} objects)")
    axes[1].axis('off')
    
    plt.tight_layout()
    
    results_dir = Path("../data/results")
    results_dir.mkdir(parents=True, exist_ok=True)
    plt.savefig(results_dir / f"detection_{test_image_path.stem}.jpg", dpi=150)
    plt.show()

## Batch Processing - All Images

In [None]:
all_results = []
total_time = 0

for img_path in image_files:
    frame = cv2.imread(str(img_path))
    if frame is None:
        continue
    
    timer = Timer()
    timer.start()
    detections = detector.detect(frame)
    elapsed = timer.stop()
    total_time += elapsed
    
    all_results.append({
        'filename': img_path.name,
        'detections': detections,
        'time_ms': elapsed * 1000,
        'frame': frame
    })

print(f"Processed {len(all_results)} images")
print(f"Total time: {total_time:.2f}s")
print(f"Average time: {total_time/len(all_results)*1000:.1f}ms per image")

## Detection Statistics

In [None]:
from collections import Counter

all_classes = []
all_confidences = []

for result in all_results:
    for det in result['detections']:
        all_classes.append(det.class_name)
        all_confidences.append(det.confidence)

class_counts = Counter(all_classes)

print("Detection Summary:")
print(f"  Total detections: {len(all_classes)}")
print(f"  Unique classes: {len(class_counts)}")
print(f"\nClass distribution:")
for cls, count in class_counts.most_common(10):
    print(f"  {cls}: {count}")

if all_confidences:
    print(f"\nConfidence statistics:")
    print(f"  Min: {min(all_confidences):.2f}")
    print(f"  Max: {max(all_confidences):.2f}")
    print(f"  Mean: {np.mean(all_confidences):.2f}")

## Confidence Distribution Plot

In [None]:
if all_confidences:
    fig, axes = plt.subplots(1, 2, figsize=(12, 4))
    
    axes[0].hist(all_confidences, bins=20, color='steelblue', edgecolor='white')
    axes[0].set_xlabel('Confidence')
    axes[0].set_ylabel('Count')
    axes[0].set_title('Detection Confidence Distribution')
    
    if class_counts:
        classes = list(class_counts.keys())[:10]
        counts = [class_counts[c] for c in classes]
        axes[1].barh(classes, counts, color='steelblue')
        axes[1].set_xlabel('Count')
        axes[1].set_title('Top 10 Detected Classes')
    
    plt.tight_layout()
    plt.savefig(results_dir / "detection_statistics.jpg", dpi=150)
    plt.show()

## Save All Detection Results

In [None]:
for result in all_results:
    frame = result['frame']
    detections = result['detections']
    filename = result['filename']
    
    result_frame = detector.draw_detections(frame, detections)
    output_path = results_dir / f"detection_{Path(filename).stem}.jpg"
    cv2.imwrite(str(output_path), result_frame)

print(f"Saved {len(all_results)} detection results to {results_dir}")

## Summary

This notebook demonstrated:
1. Loading **YOLO-World** model for zero-shot object detection
2. Detecting custom classes like "door", "stairs", "obstacle"
3. Visualizing bounding boxes and labels
4. Analyzing detection statistics

**Key Advantages of YOLO-World:**
- Can detect objects not in COCO (doors, stairs, curbs)
- Flexible - just add class names
- Good for accessibility applications

**Trade-offs:**
- Slightly slower than YOLOv8n (~100-200ms vs ~30ms)
- Lower confidence scores (use threshold ~0.2)
- Model is larger (26MB vs 6MB)

**Next:** Run notebook 03_depth_estimation.ipynb to add distance information