# Notebook 3: Depth Estimation with Depth Anything V2

This notebook demonstrates monocular depth estimation using Depth Anything V2.

## What You'll Learn
- How monocular depth estimation works
- Loading and using Depth Anything V2
- Visualizing depth maps
- Extracting distance information

In [None]:
import sys
sys.path.insert(0, '..')

import cv2
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
import time

from src.config import DepthConfig
from src.depth import DepthEstimator
from src.utils import Timer, create_side_by_side

print("Imports successful!")

## Load Depth Anything V2 Model

Available models:
- `vits` (Small): 24M params, fastest
- `vitb` (Base): 97M params, balanced
- `vitl` (Large): 335M params, most accurate

In [None]:
config = DepthConfig(
    model="vits",  # Use small model for speed
    max_distance=10.0,
    colormap="inferno"
)

depth_estimator = DepthEstimator(config)

print("Loading Depth Anything V2 model...")
print("(First load downloads model from HuggingFace)")
start = time.time()
if depth_estimator.load():
    print(f"Model loaded in {time.time()-start:.2f}s")
else:
    print("Failed to load model")

## Load Test Images

In [None]:
captures_dir = Path("../data/captures")
samples_dir = Path("../data/sample_images")
results_dir = Path("../data/results")
results_dir.mkdir(parents=True, exist_ok=True)

image_files = list(captures_dir.glob("*.jpg")) + list(samples_dir.glob("*.jpg"))

if not image_files:
    print("No images found. Creating test image with depth variation...")
    test_img = np.zeros((480, 640, 3), dtype=np.uint8)
    
    for i in range(6):
        x = 50 + i * 100
        color = int(255 - i * 40)
        cv2.rectangle(test_img, (x, 100), (x+80, 380), (color, color, color), -1)
    
    samples_dir.mkdir(parents=True, exist_ok=True)
    cv2.imwrite(str(samples_dir / "depth_test.jpg"), test_img)
    image_files = [samples_dir / "depth_test.jpg"]

print(f"Found {len(image_files)} images")

## Run Depth Estimation on Single Image

In [None]:
test_image_path = image_files[0]
print(f"Processing: {test_image_path.name}")

frame = cv2.imread(str(test_image_path))
print(f"Image shape: {frame.shape}")

timer = Timer("depth")
timer.start()
depth_map = depth_estimator.estimate(frame)
inference_time = timer.stop()

print(f"\nInference time: {inference_time*1000:.1f}ms")
print(f"Depth map shape: {depth_map.shape}")
print(f"Depth range: [{depth_map.min():.3f}, {depth_map.max():.3f}]")

## Visualize Depth Map

In [None]:
depth_colored = depth_estimator.colorize(depth_map)

fig, axes = plt.subplots(1, 3, figsize=(15, 5))

axes[0].imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
axes[0].set_title("Original Image")
axes[0].axis('off')

im = axes[1].imshow(depth_map, cmap='inferno')
axes[1].set_title("Depth Map (Raw)")
axes[1].axis('off')
plt.colorbar(im, ax=axes[1], fraction=0.046, pad=0.04)

axes[2].imshow(cv2.cvtColor(depth_colored, cv2.COLOR_BGR2RGB))
axes[2].set_title("Depth Map (Colored)")
axes[2].axis('off')

plt.tight_layout()
plt.savefig(results_dir / f"depth_{test_image_path.stem}.jpg", dpi=150)
plt.show()

## Different Colormaps

In [None]:
colormaps = ['inferno', 'jet', 'viridis', 'plasma', 'magma', 'turbo']

fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

for ax, cmap in zip(axes, colormaps):
    colored = depth_estimator.colorize(depth_map, cmap)
    ax.imshow(cv2.cvtColor(colored, cv2.COLOR_BGR2RGB))
    ax.set_title(f"Colormap: {cmap}")
    ax.axis('off')

plt.tight_layout()
plt.savefig(results_dir / "depth_colormaps.jpg", dpi=150)
plt.show()

## Extract Distance at Specific Points

In [None]:
h, w = frame.shape[:2]
points = [
    (w//4, h//2, "Left"),
    (w//2, h//2, "Center"),
    (3*w//4, h//2, "Right"),
    (w//2, h//4, "Top"),
    (w//2, 3*h//4, "Bottom"),
]

print("Estimated distances at various points:")
print("-" * 40)

vis_frame = frame.copy()

for x, y, name in points:
    distance = depth_estimator.get_distance_at(depth_map, x, y)
    print(f"  {name:8s} ({x:3d}, {y:3d}): {distance:.2f}m")
    
    cv2.circle(vis_frame, (x, y), 10, (0, 255, 0), -1)
    cv2.putText(vis_frame, f"{distance:.1f}m", (x+15, y+5),
                cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)

## Visualize Distance Points

In [None]:
combined = create_side_by_side(vis_frame, depth_colored, 
                                ("Distance Points", "Depth Map"))

plt.figure(figsize=(14, 5))
plt.imshow(cv2.cvtColor(combined, cv2.COLOR_BGR2RGB))
plt.title("Distance Estimation Results")
plt.axis('off')
plt.savefig(results_dir / "depth_with_distances.jpg", dpi=150)
plt.show()

## Process All Images

In [None]:
all_results = []
total_time = 0

for img_path in image_files:
    frame = cv2.imread(str(img_path))
    if frame is None:
        continue
    
    timer = Timer()
    timer.start()
    depth_map = depth_estimator.estimate(frame)
    elapsed = timer.stop()
    total_time += elapsed
    
    if depth_map is not None:
        all_results.append({
            'filename': img_path.name,
            'depth_map': depth_map,
            'time_ms': elapsed * 1000,
            'frame': frame
        })

print(f"Processed {len(all_results)} images")
print(f"Total time: {total_time:.2f}s")
print(f"Average time: {total_time/len(all_results)*1000:.1f}ms per image")

## Save All Depth Results

In [None]:
for result in all_results:
    frame = result['frame']
    depth_map = result['depth_map']
    filename = result['filename']
    
    depth_colored = depth_estimator.colorize(depth_map)
    combined = create_side_by_side(frame, depth_colored, ("Original", "Depth"))
    
    output_path = results_dir / f"depth_{Path(filename).stem}.jpg"
    cv2.imwrite(str(output_path), combined)

print(f"Saved {len(all_results)} depth results to {results_dir}")

## Understanding Depth Anything V2

### How It Works
1. **Monocular Depth**: Estimates depth from a SINGLE RGB image
2. **Relative Depth**: Output is normalized [0, 1], not absolute meters
3. **Deep Learning**: Uses Vision Transformer (ViT) backbone

### Key Points
- **Brighter = Closer** in the depth map
- **Darker = Farther** in the depth map
- Distance is estimated, not measured (no depth sensor)
- Works best with good lighting and contrast

### For Smart Aid
- Combined with object detection â†’ know WHAT is nearby
- No extra hardware needed (vs stereo cameras, LiDAR)
- Good enough for obstacle avoidance

## Summary

This notebook demonstrated:
1. Loading Depth Anything V2 model
2. Running monocular depth estimation
3. Visualizing depth maps with various colormaps
4. Extracting estimated distances

**Key Metrics:**
- Model: Depth Anything V2 Small (24M params)
- Inference time: ~60-100ms on MacBook
- Output: Relative depth map [0, 1]

**Next:** Run notebook 04_fusion_pipeline.ipynb to combine detection + depth