# Awesome Depth Anything 3 - Interactive Tutorial

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Aedelon/awesome-depth-anything-3/blob/main/notebooks/da3_tutorial.ipynb)
[![GitHub](https://img.shields.io/badge/GitHub-Repo-blue)](https://github.com/Aedelon/awesome-depth-anything-3)

This notebook demonstrates **Depth Anything 3**, a state-of-the-art model for:
- üåä **Monocular Depth Estimation** - Depth maps from single images
- üì∑ **Camera Pose Estimation** - Extrinsics and intrinsics from multi-view
- ‚òÅÔ∏è **Point Cloud Reconstruction** - 3D visualization with cameras
- üé• **Novel View Synthesis** - 3D Gaussian Splatting (optional)

---

## ‚ö†Ô∏è GPU Required

Go to **Runtime ‚Üí Change runtime type ‚Üí GPU** before running!


## 1. Setup & Installation

In [None]:
# @title Install awesome-depth-anything-3 { display-mode: "form" }
# @markdown This will install the package and its dependencies (~2-3 minutes)

!pip install -q awesome-depth-anything-3

# Verify GPU is available
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"‚úì PyTorch {torch.__version__}")
print(f"‚úì Device: {device}")
if device == "cuda":
    print(f"‚úì GPU: {torch.cuda.get_device_name(0)}")
else:
    print("‚ö†Ô∏è No GPU detected! Go to Runtime ‚Üí Change runtime type ‚Üí GPU")

## 2. Quick Start - Single Image Depth

In [None]:
# @title Load the model { display-mode: "form" }
# @markdown Choose your model size (larger = more accurate but slower)

model_name = "DA3-LARGE"  # @param ["DA3-SMALL", "DA3-BASE", "DA3-LARGE", "DA3-GIANT"]

from depth_anything_3.api import DepthAnything3

print(f"Loading {model_name}...")
model = DepthAnything3.from_pretrained(f"depth-anything/{model_name}")
model = model.to(device)
print(f"‚úì Model loaded on {device}")

In [None]:
# @title Download sample image { display-mode: "form" }

!wget -q https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Camponotus_flavomarginatus_ant.jpg/1280px-Camponotus_flavomarginatus_ant.jpg -O sample.jpg

from PIL import Image
import matplotlib.pyplot as plt

img = Image.open("sample.jpg")
plt.figure(figsize=(10, 6))
plt.imshow(img)
plt.title("Input Image")
plt.axis("off")
plt.show()
print(f"Image size: {img.size}")

In [None]:
# @title Run depth estimation { display-mode: "form" }

# Run inference
result = model.inference(["sample.jpg"])

print("‚úì Inference complete!")
print(f"  Depth shape: {result.depth.shape}")
print(f"  Confidence shape: {result.conf.shape}")
print(f"  Extrinsics shape: {result.extrinsics.shape}")
print(f"  Intrinsics shape: {result.intrinsics.shape}")

In [None]:
# @title Visualize depth map { display-mode: "form" }

import numpy as np
from depth_anything_3.utils.io import visualize_depth

# Get depth map
depth = result.depth[0]  # First (only) image

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Original image
axes[0].imshow(result.processed_images[0])
axes[0].set_title("Input Image")
axes[0].axis("off")

# Depth map (colorized)
depth_colored = visualize_depth(depth, colormap="Spectral")
axes[1].imshow(depth_colored)
axes[1].set_title(f"Depth Map (min={depth.min():.2f}, max={depth.max():.2f})")
axes[1].axis("off")

plt.tight_layout()
plt.show()

## 3. Understanding the Outputs

The `result` object contains several useful attributes:

| Attribute | Shape | Description |
|-----------|-------|-------------|
| `depth` | `[N, H, W]` | Depth maps (in relative or metric units) |
| `conf` | `[N, H, W]` | Confidence maps (0-1) |
| `extrinsics` | `[N, 3, 4]` | Camera extrinsics (world-to-camera, OpenCV format) |
| `intrinsics` | `[N, 3, 3]` | Camera intrinsics (focal length, principal point) |
| `processed_images` | `[N, H, W, 3]` | Resized input images (uint8) |

Where `N` is the number of input images.

In [None]:
# @title Explore camera intrinsics { display-mode: "form" }

K = result.intrinsics[0]
print("Camera Intrinsics Matrix (K):")
print(K)
print(f"\nFocal length (fx, fy): ({K[0,0]:.1f}, {K[1,1]:.1f})")
print(f"Principal point (cx, cy): ({K[0,2]:.1f}, {K[1,2]:.1f})")

## 4. Multi-View 3D Reconstruction

In [None]:
# @title Download multi-view example { display-mode: "form" }

import os

# Download a few views of the same scene
os.makedirs("multiview", exist_ok=True)
!wget -q https://raw.githubusercontent.com/ByteDance-Seed/Depth-Anything-3/main/assets/examples/SOH/000.png -O multiview/000.png
!wget -q https://raw.githubusercontent.com/ByteDance-Seed/Depth-Anything-3/main/assets/examples/SOH/010.png -O multiview/010.png
!wget -q https://raw.githubusercontent.com/ByteDance-Seed/Depth-Anything-3/main/assets/examples/SOH/020.png -O multiview/020.png

images = sorted([f"multiview/{f}" for f in os.listdir("multiview") if f.endswith(".png")])
print(f"Loaded {len(images)} images")

# Display
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for i, img_path in enumerate(images):
    axes[i].imshow(Image.open(img_path))
    axes[i].set_title(f"View {i+1}")
    axes[i].axis("off")
plt.tight_layout()
plt.show()

In [None]:
# @title Run multi-view inference { display-mode: "form" }

result_mv = model.inference(images)

print("‚úì Multi-view inference complete!")
print(f"  Depth maps: {result_mv.depth.shape}")
print(f"  Camera poses: {result_mv.extrinsics.shape}")

# Visualize all depth maps
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
for i in range(3):
    axes[0, i].imshow(result_mv.processed_images[i])
    axes[0, i].set_title(f"View {i+1}")
    axes[0, i].axis("off")
    
    depth_viz = visualize_depth(result_mv.depth[i], colormap="Spectral")
    axes[1, i].imshow(depth_viz)
    axes[1, i].set_title(f"Depth {i+1}")
    axes[1, i].axis("off")
plt.tight_layout()
plt.show()

## 5. Model Comparison

Compare different model sizes to understand the speed/quality tradeoff.

In [None]:
# @title Compare model sizes { display-mode: "form" }
# @markdown ‚ö†Ô∏è This will download and run 3 models (~5-10 minutes)

import time

models_to_compare = ["DA3-SMALL", "DA3-BASE", "DA3-LARGE"]
results_comparison = {}

for model_name in models_to_compare:
    print(f"\n{'='*50}")
    print(f"Testing {model_name}...")
    
    # Load model
    m = DepthAnything3.from_pretrained(f"depth-anything/{model_name}").to(device)
    
    # Warmup
    _ = m.inference(["sample.jpg"])
    
    # Benchmark
    torch.cuda.synchronize() if device == "cuda" else None
    start = time.perf_counter()
    result = m.inference(["sample.jpg"])
    torch.cuda.synchronize() if device == "cuda" else None
    elapsed = time.perf_counter() - start
    
    results_comparison[model_name] = {
        "time": elapsed,
        "depth": result.depth[0],
    }
    print(f"  Inference time: {elapsed*1000:.1f} ms")
    
    # Cleanup
    del m
    torch.cuda.empty_cache() if device == "cuda" else None

print("\n" + "="*50)
print("Summary:")
for name, data in results_comparison.items():
    print(f"  {name}: {data['time']*1000:.1f} ms")

In [None]:
# @title Visualize model comparison { display-mode: "form" }

fig, axes = plt.subplots(1, len(models_to_compare), figsize=(15, 5))

for i, (name, data) in enumerate(results_comparison.items()):
    depth_viz = visualize_depth(data["depth"], colormap="Spectral")
    axes[i].imshow(depth_viz)
    axes[i].set_title(f"{name}\n{data['time']*1000:.0f}ms")
    axes[i].axis("off")

plt.suptitle("Depth Maps by Model Size", fontsize=14)
plt.tight_layout()
plt.show()

## 6. Export Results

In [None]:
# @title Export to GLB (3D viewable) { display-mode: "form" }

from depth_anything_3.utils.export import export

# Reload model for export
model = DepthAnything3.from_pretrained("depth-anything/DA3-LARGE").to(device)
result = model.inference(images)  # Multi-view

# Export to GLB
os.makedirs("output", exist_ok=True)
export(
    result,
    export_dir="output",
    export_format="glb",
    conf_thresh_percentile=10,
    num_max_points=50000,
    show_cameras=True,
)

print("‚úì Exported to output/")
!ls -la output/

In [None]:
# @title Download results { display-mode: "form" }

from google.colab import files

# Zip and download
!zip -r output.zip output/
files.download("output.zip")
print("\n‚úì Download started! Open the .glb file in https://gltf-viewer.donmccurdy.com/")

## 7. Save to Google Drive (Optional)

In [None]:
# @title Mount Google Drive { display-mode: "form" }

from google.colab import drive
drive.mount("/content/drive")

# Create output directory
drive_output = "/content/drive/MyDrive/DA3_Results"
os.makedirs(drive_output, exist_ok=True)
print(f"‚úì Output directory: {drive_output}")

In [None]:
# @title Save results to Drive { display-mode: "form" }

import shutil

# Copy output to Drive
for f in os.listdir("output"):
    shutil.copy(f"output/{f}", drive_output)
    print(f"  Saved: {f}")

print(f"\n‚úì All files saved to Google Drive: {drive_output}")

## 8. Tips & Best Practices

### Model Selection
- **DA3-SMALL**: Fast, good for real-time or many images
- **DA3-BASE**: Good balance of speed and quality
- **DA3-LARGE**: High quality, recommended for most use cases
- **DA3-GIANT**: Highest quality, requires more VRAM

### Memory Management
- Use `batch_inference()` for large image sets
- Set `batch_size="auto"` for automatic memory management
- Clear cache with `torch.cuda.empty_cache()` between runs

### Quality Tips
- More views = better 3D reconstruction
- Higher confidence threshold = cleaner point cloud
- Use `process_res="high_res"` for detailed depth maps

---

## Credits

This notebook uses **Depth Anything 3** by ByteDance:
- [Paper](https://arxiv.org/abs/2511.10647)
- [Project Page](https://depth-anything-3.github.io)
- [Original Repository](https://github.com/ByteDance-Seed/Depth-Anything-3)

Optimized fork: [awesome-depth-anything-3](https://github.com/Aedelon/awesome-depth-anything-3)