# üé® RGBD-Depth: Real-time Depth Refinement ‚Äî Quickstart

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Aedelon/camera-depth-models/blob/main/quickstart_colab.ipynb)
[![PyPI](https://img.shields.io/pypi/v/rgbd-depth.svg)](https://pypi.org/project/rgbd-depth/)
[![GitHub](https://img.shields.io/github/stars/Aedelon/rgbd-depth.svg?style=social)](https://github.com/Aedelon/camera-depth-models)

Transform noisy depth camera data into clean, simulation-quality depth maps using Vision Transformers.

**This notebook:**
- ‚úÖ Installs `rgbd-depth` from PyPI
- ‚úÖ Downloads example RGB-D data
- ‚úÖ Runs depth refinement in ~2s on Colab GPU
- ‚úÖ Visualizes before/after comparison

**Use cases:** Robotics, AR/VR, 3D reconstruction, sim-to-real transfer

## üì¶ Installation

Install the package from PyPI (takes ~30s):

In [None]:
!pip install -q rgbd-depth

## üñºÔ∏è Download Example Data

We'll use real RGB-D data from an Intel RealSense D435 camera:

In [None]:
import os
import urllib.request

# Create example_data directory
os.makedirs("example_data", exist_ok=True)

# Base URL for example data
base_url = "https://raw.githubusercontent.com/Aedelon/camera-depth-models/main/example_data/"

# Download RGB and depth images
files = ["color_12.png", "depth_12.png"]
for filename in files:
    url = base_url + filename
    filepath = os.path.join("example_data", filename)
    print(f"Downloading {filename}...")
    urllib.request.urlretrieve(url, filepath)

print("‚úÖ Example data downloaded!")

## üöÄ Run Depth Refinement

Load the model and refine the depth map:

In [None]:
import cv2
import numpy as np
import torch
from rgbddepth import RGBDDepth

# Initialize model (downloads checkpoint on first run ~300MB)
print("Loading model...")
model = RGBDDepth(
    camera_type="d435",  # Intel RealSense D435
    device="auto",       # Auto-detect GPU/CPU
    use_xformers=True    # Enable optimizations if available
)

# Load RGB and depth images
rgb = cv2.imread("example_data/color_12.png")
rgb = cv2.cvtColor(rgb, cv2.COLOR_BGR2RGB)  # BGR ‚Üí RGB

depth_raw = cv2.imread("example_data/depth_12.png", cv2.IMREAD_UNCHANGED)
depth_raw = depth_raw.astype(np.float32) / 1000.0  # mm ‚Üí meters

print(f"Input shape: RGB={rgb.shape}, Depth={depth_raw.shape}")

# Run inference
print("Running depth refinement...")
depth_refined = model(rgb, depth_raw)

print(f"‚úÖ Refinement complete! Output shape: {depth_refined.shape}")
print(f"   Depth range: {depth_refined.min():.3f}m - {depth_refined.max():.3f}m")

## üìä Visualize Results

Compare raw vs refined depth:

In [None]:
import matplotlib.pyplot as plt

# Create visualization
fig, axes = plt.subplots(1, 3, figsize=(18, 6))

# RGB input
axes[0].imshow(rgb)
axes[0].set_title("RGB Input", fontsize=14, fontweight="bold")
axes[0].axis("off")

# Raw depth (noisy)
im1 = axes[1].imshow(depth_raw, cmap="turbo", vmin=0, vmax=3)
axes[1].set_title("Raw Depth (Noisy)", fontsize=14, fontweight="bold")
axes[1].axis("off")
plt.colorbar(im1, ax=axes[1], fraction=0.046, pad=0.04, label="Depth (m)")

# Refined depth (clean)
im2 = axes[2].imshow(depth_refined, cmap="turbo", vmin=0, vmax=3)
axes[2].set_title("Refined Depth ‚ú®", fontsize=14, fontweight="bold", color="green")
axes[2].axis("off")
plt.colorbar(im2, ax=axes[2], fraction=0.046, pad=0.04, label="Depth (m)")

plt.suptitle("RGBD-Depth: Real-time Depth Refinement", fontsize=16, fontweight="bold", y=0.98)
plt.tight_layout()
plt.show()

# Print quality metrics
noise_reduction = np.std(depth_raw[depth_raw > 0]) - np.std(depth_refined[depth_refined > 0])
print(f"\nüìâ Noise reduction: {noise_reduction:.4f}m (lower is cleaner)")
print(f"üìä Valid pixels: Raw={np.sum(depth_raw > 0):,} | Refined={np.sum(depth_refined > 0):,}")

## üéØ Next Steps

**Try different cameras:**
```python
model = RGBDDepth(camera_type="d405")  # RealSense D405
model = RGBDDepth(camera_type="l515")  # RealSense L515
model = RGBDDepth(camera_type="zed2i") # ZED 2i
model = RGBDDepth(camera_type="kinect_azure") # Azure Kinect
```

**Optimize for speed:**
```python
# Enable mixed precision (2√ó faster on GPU)
model = RGBDDepth(camera_type="d435", precision="fp16")
```

**Use your own data:**
```python
# Load your RGB-D pair
rgb = cv2.imread("your_rgb.png")
rgb = cv2.cvtColor(rgb, cv2.COLOR_BGR2RGB)
depth = cv2.imread("your_depth.png", cv2.IMREAD_UNCHANGED).astype(np.float32) / 1000.0

# Refine
depth_refined = model(rgb, depth)
```

**Learn more:**
- üìñ [GitHub Repository](https://github.com/Aedelon/camera-depth-models)
- üì¶ [PyPI Package](https://pypi.org/project/rgbd-depth/)
- üéÆ [HuggingFace Spaces Demo](https://huggingface.co/spaces/Aedelon/rgbd-depth)
- üìÑ [Original Paper](https://manipulation-as-in-simulation.github.io/)

---

**Found this useful?** ‚≠ê Star the repo on GitHub!

## ‚ö° Performance Benchmark (Optional)

Measure inference speed on this Colab instance:

In [None]:
import time

# Warmup
for _ in range(3):
    _ = model(rgb, depth_raw)

# Benchmark
n_runs = 10
times = []
for _ in range(n_runs):
    start = time.time()
    _ = model(rgb, depth_raw)
    torch.cuda.synchronize() if torch.cuda.is_available() else None
    times.append(time.time() - start)

mean_time = np.mean(times)
std_time = np.std(times)
fps = 1.0 / mean_time

device_name = torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU"

print(f"\n‚ö° Benchmark Results ({device_name}):")
print(f"   Mean time: {mean_time*1000:.1f} ¬± {std_time*1000:.1f} ms")
print(f"   Throughput: {fps:.2f} FPS")
print(f"\n   Reference (NVIDIA RTX 3090):")
print(f"   - FP32: ~950ms (1.05 FPS)")
print(f"   - FP16: ~520ms (1.92 FPS)")