# Notebook 4: Fusion Pipeline - Detection + Depth

This notebook combines **YOLO-World** detection with **Depth Anything V2** to create
a complete obstacle awareness system.

## Pipeline Flow
```
Frame → YOLO-World → Detections (door, person, stairs, etc.)
                  ↘
                    → Fusion Engine → Obstacles with distances
                  ↗
Frame → Depth Anything V2 → Depth Map
```

## What You'll Learn
- Combining detection bounding boxes with depth maps
- Calculating distance to detected objects
- Priority-based obstacle ranking
- Position awareness (left/center/right)

In [None]:
import sys
sys.path.insert(0, '..')

import cv2
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
import time

from src.config import Config
from src.detector import Detector
from src.depth import DepthEstimator
from src.fusion import FusionEngine, Obstacle
from src.utils import Timer, create_side_by_side

print("Imports successful!")

## Load All Models

In [None]:
from src.config import Config, DetectionConfig, DepthConfig, FusionConfig

# Use YOLO-World for custom class detection
detection_config = DetectionConfig(
    model="yolov8s-world.pt",
    confidence=0.2,
    classes=[
        "door", "person", "chair", "table", "stairs",
        "wall", "window", "car", "bicycle", "obstacle"
    ]
)

# Depth config
depth_config = DepthConfig(model="vits")

# Fusion config
fusion_config = FusionConfig(
    danger_zone=1.5,
    warning_zone=3.0
)

print("Loading YOLO-World...")
detector = Detector(detection_config)
detector.load()
print(f"  Classes: {detector.class_names}")

print("\nLoading Depth Anything V2...")
depth_estimator = DepthEstimator(depth_config)
depth_estimator.load()
print("  Done!")

fusion = FusionEngine(fusion_config)
print("\nAll models loaded!")

## Load Test Images

In [None]:
captures_dir = Path("../data/captures")
samples_dir = Path("../data/sample_images")
results_dir = Path("../data/results")
results_dir.mkdir(parents=True, exist_ok=True)

image_files = list(captures_dir.glob("*.jpg")) + list(samples_dir.glob("*.jpg"))
print(f"Found {len(image_files)} images")

## Process Single Image Through Full Pipeline

In [None]:
if image_files:
    test_path = image_files[0]
    frame = cv2.imread(str(test_path))
    print(f"Processing: {test_path.name}")
    print(f"Shape: {frame.shape}")
    
    det_timer = Timer("detection")
    det_timer.start()
    detections = detector.detect(frame)
    det_time = det_timer.stop()
    print(f"\n1. Detection: {len(detections)} objects in {det_time*1000:.1f}ms")
    
    depth_timer = Timer("depth")
    depth_timer.start()
    depth_map = depth_estimator.estimate(frame)
    depth_time = depth_timer.stop()
    print(f"2. Depth: completed in {depth_time*1000:.1f}ms")
    
    fusion_timer = Timer("fusion")
    fusion_timer.start()
    obstacles = fusion.process(detections, depth_map, frame.shape[1])
    fusion_time = fusion_timer.stop()
    print(f"3. Fusion: {len(obstacles)} obstacles in {fusion_time*1000:.1f}ms")
    
    total_time = det_time + depth_time + fusion_time
    print(f"\nTotal pipeline: {total_time*1000:.1f}ms ({1/total_time:.1f} FPS)")

## Obstacle Details

In [None]:
if obstacles:
    print("\nDetected Obstacles (sorted by priority):")
    print("=" * 60)
    
    for i, obs in enumerate(obstacles):
        danger = "DANGER" if obs.is_danger else ("WARNING" if obs.is_warning else "")
        print(f"\n{i+1}. {obs.class_name.upper()}")
        print(f"   Distance: {obs.distance:.2f}m {danger}")
        print(f"   Position: {obs.position}")
        print(f"   Confidence: {obs.confidence:.2f}")
        print(f"   Priority: {obs.priority}")
        print(f"   Alert: '{obs.to_alert_text()}'")
else:
    print("No obstacles detected")

## Visualize Full Pipeline Results

In [None]:
if frame is not None and depth_map is not None:
    detection_frame = detector.draw_detections(frame.copy(), detections)
    fusion_frame = fusion.draw_obstacles(frame.copy(), obstacles)
    depth_colored = depth_estimator.colorize(depth_map)
    
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    axes[0, 0].imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
    axes[0, 0].set_title("1. Original Image")
    axes[0, 0].axis('off')
    
    axes[0, 1].imshow(cv2.cvtColor(detection_frame, cv2.COLOR_BGR2RGB))
    axes[0, 1].set_title(f"2. YOLO-World Detection ({len(detections)} objects)")
    axes[0, 1].axis('off')
    
    axes[1, 0].imshow(cv2.cvtColor(depth_colored, cv2.COLOR_BGR2RGB))
    axes[1, 0].set_title("3. Depth Estimation")
    axes[1, 0].axis('off')
    
    axes[1, 1].imshow(cv2.cvtColor(fusion_frame, cv2.COLOR_BGR2RGB))
    axes[1, 1].set_title(f"4. Fusion Result ({len(obstacles)} obstacles)")
    axes[1, 1].axis('off')
    
    plt.suptitle(f"Smart Aid Pipeline - {test_path.name}", fontsize=14)
    plt.tight_layout()
    plt.savefig(results_dir / f"pipeline_{test_path.stem}.jpg", dpi=150)
    plt.show()
else:
    print("No depth map available - check depth estimator")

## Batch Process All Images

In [None]:
all_results = []

for img_path in image_files:
    frame = cv2.imread(str(img_path))
    if frame is None:
        continue
    
    start = time.time()
    
    detections = detector.detect(frame)
    depth_map = depth_estimator.estimate(frame)
    obstacles = fusion.process(detections, depth_map, frame.shape[1])
    
    elapsed = time.time() - start
    
    all_results.append({
        'filename': img_path.name,
        'frame': frame,
        'detections': detections,
        'depth_map': depth_map,
        'obstacles': obstacles,
        'time_ms': elapsed * 1000,
        'fps': 1 / elapsed
    })

avg_fps = np.mean([r['fps'] for r in all_results])
print(f"Processed {len(all_results)} images")
print(f"Average FPS: {avg_fps:.1f}")

## Summary Statistics

In [None]:
from collections import Counter

all_obstacles = []
for result in all_results:
    all_obstacles.extend(result['obstacles'])

print(f"Total obstacles detected: {len(all_obstacles)}")

if all_obstacles:
    classes = Counter([o.class_name for o in all_obstacles])
    positions = Counter([o.position for o in all_obstacles])
    distances = [o.distance for o in all_obstacles]
    
    danger_count = sum(1 for o in all_obstacles if o.is_danger)
    warning_count = sum(1 for o in all_obstacles if o.is_warning and not o.is_danger)
    
    print(f"\nDanger zone (<1.5m): {danger_count}")
    print(f"Warning zone (1.5-3m): {warning_count}")
    print(f"Safe (>3m): {len(all_obstacles) - danger_count - warning_count}")
    
    print(f"\nPosition distribution:")
    for pos, count in positions.most_common():
        print(f"  {pos}: {count}")
    
    print(f"\nDistance statistics:")
    print(f"  Min: {min(distances):.2f}m")
    print(f"  Max: {max(distances):.2f}m")
    print(f"  Mean: {np.mean(distances):.2f}m")

## Save All Pipeline Results

In [None]:
for result in all_results:
    frame = result['frame']
    obstacles = result['obstacles']
    depth_map = result['depth_map']
    filename = result['filename']
    
    fusion_frame = fusion.draw_obstacles(frame, obstacles)
    depth_colored = depth_estimator.colorize(depth_map)
    combined = create_side_by_side(fusion_frame, depth_colored, 
                                    ("Obstacles", "Depth"))
    
    output_path = results_dir / f"fusion_{Path(filename).stem}.jpg"
    cv2.imwrite(str(output_path), combined)

print(f"Saved {len(all_results)} fusion results to {results_dir}")

## Understanding the Fusion Logic

### Priority Calculation
```
priority = distance_score + class_score + confidence_score

distance_score:
  < 1.5m (danger)  → 100
  < 3.0m (warning) → 50
  else             → 10

class_score:
  person, car, bicycle → +20-30
  furniture            → +10
  other               → 0

confidence_score: confidence × 10
```

### Position Detection
- **Left**: center_x < 33% of frame width
- **Center**: 33% - 67%
- **Right**: center_x > 67%

### Distance Estimation
1. Scale detection bbox to depth map resolution
2. Extract center region of bbox
3. Average depth values in region
4. Convert to estimated meters

## Summary

This notebook demonstrated the complete fusion pipeline:
1. **YOLO-World detection** → What objects are present (including doors, stairs!)
2. **Depth estimation** → How far are they
3. **Fusion** → Which obstacles matter most

**Key Advantages:**
- Can detect doors, stairs, curbs (not in COCO)
- Single RGB camera for both detection AND depth
- Priority-based alerting for most critical obstacles

**Performance (MacBook M1/M2):**
- YOLO-World: ~100-200ms
- Depth: ~800-1200ms (CPU)
- Fusion: ~1ms
- Total: ~1-1.5s per frame

**Next:** Run notebook 05_full_analysis.ipynb for thesis-ready analysis