# MapAnything Demo: Config-Driven Pipeline

This notebook demonstrates the MapAnything pipeline using configuration from Splatter's YAML configs.

## Pipeline Steps

1. **Load Config**: Load paths and settings from Splatter config file
2. **Run MapAnything**: Execute the complete MapAnything pipeline
3. **Visualize**: Visualize the reconstructed point cloud and cameras

## Setup and Configuration

In [2]:
%load_ext autoreload
%autoreload 2

import sys
import json
from pathlib import Path

import numpy as np
import torch
import pyvista as pv
import pycolmap

# Import Splatter for config loading
from collab_splats.wrapper import Splatter

# Import our modular utilities
from mapanything_utils import (
    load_mapanything_model,
    load_and_preprocess_images,
    run_mapanything_inference,
    export_to_colmap,
    rescale_to_original_dimensions,
    convert_to_nerfstudio_format,
    cleanup_gpu_memory,
)

print(f"Python executable: {sys.executable}")
print(f"Python version: {sys.version}")

if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
else:
    print("WARNING: CUDA not available")

Jupyter environment detected. Enabling Open3D WebVisualizer.
[Open3D INFO] WebRTC GUI backend enabled.
[Open3D INFO] WebRTCWindowSystem: HTTP handshake server disabled.
Python executable: /opt/conda/envs/nerfstudio/bin/python
Python version: 3.10.18 | packaged by conda-forge | (main, Jun  4 2025, 14:45:41) [GCC 13.3.0]
GPU: NVIDIA A40


## Load Configuration from Splatter

Load dataset configuration using Splatter's config system.

In [3]:
# Configuration paths
config_dir = Path("/workspace/collab-splats/docs/splats/configs")
dataset_name = "birds_date-02062024_video-C0043" #"bicycle_mapanything"

print(f"Loading configuration: {dataset_name}")
print(f"Config directory: {config_dir}")

# Load the splatter configuration
splatter = Splatter.from_config_file(
    dataset=dataset_name,
    config_dir=config_dir,
)

print(f"\n" + "="*70)
print("Configuration Loaded")
print("="*70)
print(f"Dataset: {splatter.config['file_path']}")
print(f"Method: {splatter.config['method']}")
print(f"Input type: {splatter.config['input_type']}")
print(f"Output: {splatter.config['output_path']}")
print("="*70)

Loading configuration: birds_date-02062024_video-C0043
Config directory: /workspace/collab-splats/docs/splats/configs
‚úì Valid video file with 2388 frames

Configuration Loaded
Dataset: /workspace/fieldwork-data/birds/2024-02-06/SplatsSD/C0043.MP4
Method: rade-features
Input type: video
Output: /workspace/fieldwork-data/birds/2024-02-06/environment/C0043


## Extract Paths and Parameters

Extract the paths and parameters we need for MapAnything from the config.

In [4]:
# Extract paths from config
image_dir = Path(splatter.config['file_path'])
output_base = Path(splatter.config['output_path'])
preproc_dir = output_base / "preproc"
colmap_dir = preproc_dir / "colmap"

print(f"Paths extracted from config:")
print(f"  Image directory: {image_dir}")
print(f"  Output base: {output_base}")
print(f"  Preprocessing dir: {preproc_dir}")

# Verify image directory exists
if not image_dir.exists():
    raise ValueError(f"Image directory does not exist: {image_dir}")

print(f"\n‚úì Image directory verified")

Paths extracted from config:
  Image directory: /workspace/fieldwork-data/birds/2024-02-06/SplatsSD/C0043.MP4
  Output base: /workspace/fieldwork-data/birds/2024-02-06/environment/C0043
  Preprocessing dir: /workspace/fieldwork-data/birds/2024-02-06/environment/C0043/preproc

‚úì Image directory verified


## Configure MapAnything Parameters

Set up MapAnything model and inference parameters.

In [5]:
# ============================================================================
# Model Configuration
# ============================================================================

# Model selection
model_name = "facebook/map-anything"  # CC-BY-NC 4.0 license
# model_name = "facebook/map-anything-apache"  # Apache 2.0 license

# ============================================================================
# Inference Parameters
# ============================================================================

inference_params = {
    'memory_efficient_inference': True,
    'minibatch_size': 1,
    'use_amp': True,
    'amp_dtype': 'bf16',  # Use 'fp16' for older GPUs
    'apply_mask': True,
    'mask_edges': True,
    'apply_confidence_mask': True,
    'use_multiview_confidence': True,
    'confidence_percentile': 35.0,
}

# ============================================================================
# COLMAP Export Parameters
# ============================================================================

export_params = {
    'voxel_fraction': 0.02,  # Voxel size as fraction of scene extent
    'save_ply': True,
    'save_images': True,
}

# ============================================================================
# Rescaling Parameters
# ============================================================================

rescale_params = {
    'shared_camera': True,  # MapAnything typically uses shared camera
    'shift_point2d_to_original_res': True,
}

print(f"MapAnything Configuration:")
print(f"  Model: {model_name}")
print(f"  Memory efficient: {inference_params['memory_efficient_inference']}")
print(f"  Voxel fraction: {export_params['voxel_fraction']}")
print(f"  Shared camera: {rescale_params['shared_camera']}")

MapAnything Configuration:
  Model: facebook/map-anything
  Memory efficient: True
  Voxel fraction: 0.02
  Shared camera: True


## Run MapAnything Pipeline

Execute the complete MapAnything pipeline using the paths from the config.

### Step 1: Load MapAnything Model

In [5]:
# Load model from HuggingFace
model = load_mapanything_model(
    model_name=model_name,
    enable_optimizations=True,
    verbose=True,
)

Loading pretrained dinov2_vitg14 from torch hub


Using cache found in /workspace/models/hub/facebookresearch_dinov2_main


### Step 2: Load and Preprocess Images

In [36]:
if dataset_name != 'bicycle_mapanything':
    image_dir = preproc_dir / 'images'

# Load images from directory specified in config
views, image_paths = load_and_preprocess_images(
    image_dir=image_dir,
    max_images=200,
    verbose=True,
)

# Get model dimensions from preprocessed images
model_width = views[0]['img'].shape[-1]
model_height = views[0]['img'].shape[-2]

print(f"\nModel processes images at: {model_width}x{model_height}")
print(f"Number of images: {len(image_paths)}")


Model processes images at: 294x518
Number of images: 200


### Step 3: Run MapAnything Inference

In [37]:
# Run inference to get poses, depth, and 3D points
outputs = run_mapanything_inference(
    model=model,
    views=views,
    verbose=True,
    **inference_params
)

# Inspect outputs
print("\nOutput keys:")
for key in outputs[0].keys():
    value = outputs[0][key]
    if isinstance(value, torch.Tensor):
        print(f"  {key}: {value.shape} ({value.dtype})")
    elif isinstance(value, list):
        print(f"  {key}: list of {len(value)} items")
    else:
        print(f"  {key}: {type(value).__name__}")

Checking frustum containment: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4/4 [00:00<00:00, 298.51it/s]
Checking triangle intersections: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4/4 [00:00<00:00, 76.79it/s]



Output keys:
  pts3d: torch.Size([1, 518, 294, 3]) (torch.float32)
  pts3d_cam: torch.Size([1, 518, 294, 3]) (torch.float32)
  ray_directions: torch.Size([1, 518, 294, 3]) (torch.float32)
  depth_along_ray: torch.Size([1, 518, 294, 1]) (torch.float32)
  cam_trans: torch.Size([1, 3]) (torch.float32)
  cam_quats: torch.Size([1, 4]) (torch.float32)
  metric_scaling_factor: torch.Size([1, 1]) (torch.float32)
  conf: torch.Size([1, 518, 294]) (torch.float32)
  non_ambiguous_mask: torch.Size([1, 518, 294]) (torch.bool)
  non_ambiguous_mask_logits: torch.Size([1, 518, 294]) (torch.float32)
  img_no_norm: torch.Size([1, 518, 294, 3]) (torch.float32)
  depth_z: torch.Size([1, 518, 294, 1]) (torch.float32)
  intrinsics: torch.Size([1, 3, 3]) (torch.float32)
  camera_poses: torch.Size([1, 4, 4]) (torch.float32)
  mask: torch.Size([1, 518, 294, 1]) (torch.bool)


### Step 4: Export to COLMAP Format

In [38]:
# Get image names for COLMAP
image_names = [p.name for p in image_paths]

# # Export to COLMAP format
# colmap_sparse_dir = export_to_colmap(
#     outputs=outputs,
#     views=views,
#     image_names=image_names,
#     output_dir=colmap_dir,
#     model=model,
#     verbose=True,
#     **export_params
# )

export_params_updated = export_params.copy()
# export_params_updated['spatial_filter_percentile'] = (1.0, 99.0)
# export_params_updated['spatial_filter_max_extent'] = 25.0
export_params_updated['voxel_fraction'] = 0.005

# Remove outliers + clip to 50m max extent
colmap_sparse_dir = export_to_colmap(
    outputs=outputs,
    views=views, 
    image_names=image_names, 
    output_dir=colmap_dir, 
    model=model,
    spatial_filter_percentile=(1.0, 99.0),
    spatial_filter_max_extent=30.0,
    verbose=True,
    **export_params_updated
)

print(f"\nCOLMAP reconstruction: {colmap_sparse_dir}")


COLMAP reconstruction: /workspace/fieldwork-data/birds/2024-02-06/environment/C0043/preproc/colmap/sparse/0


### Step 5: Rescale to Original Dimensions

In [None]:
# Rescale reconstruction to original image sizes
rescaled_dir = rescale_to_original_dimensions(
    colmap_sparse_dir=colmap_sparse_dir,
    image_paths=image_paths,
    model_width=model_width,
    model_height=model_height,
    output_dir=preproc_dir,
    verbose=True,
    **rescale_params
)

# rescaled_sparse_dir = rescaled_dir / "colmap" / "sparse" / "0"
print(f"\nRescaled reconstruction: {rescaled_dir}")


Rescaled reconstruction: /workspace/fieldwork-data/birds/2024-02-06/environment/C0043/preproc/colmap/sparse/0


### Step 6: Convert to Nerfstudio Format

In [7]:
# Convert to transforms.json
transforms_path = convert_to_nerfstudio_format(
    colmap_sparse_dir=rescaled_sparse_dir,
    output_dir=preproc_dir,
    ply_filename="sparse_pc.ply",
    copy_ply_from_colmap=True,
    verbose=True,
)

print(f"\nNerfstudio transforms: {transforms_path}")
ply_path = preproc_dir / "sparse_pc.ply"

{200: Camera(id=200, model='PINHOLE', width=294, height=518, params=array([454.71694946, 455.62411499, 146.52827454, 258.74172974])), 199: Camera(id=199, model='PINHOLE', width=294, height=518, params=array([461.49856567, 462.29397583, 146.4912262 , 258.68356323])), 198: Camera(id=198, model='PINHOLE', width=294, height=518, params=array([454.97314453, 455.83453369, 146.44416809, 258.67327881])), 197: Camera(id=197, model='PINHOLE', width=294, height=518, params=array([457.05584717, 458.70318604, 146.29042053, 258.70367432])), 196: Camera(id=196, model='PINHOLE', width=294, height=518, params=array([457.24343872, 458.87097168, 146.34733582, 258.69766235])), 195: Camera(id=195, model='PINHOLE', width=294, height=518, params=array([462.97399902, 464.62695312, 146.31620789, 258.17672729])), 194: Camera(id=194, model='PINHOLE', width=294, height=518, params=array([469.77023315, 470.95596313, 146.61053467, 258.19003296])), 193: Camera(id=193, model='PINHOLE', width=294, height=518, params=a


Nerfstudio transforms: /workspace/fieldwork-data/birds/2024-02-06/environment/C0043/preproc/transforms.json


### Step 7: Clean Up GPU Memory

In [None]:
# Free up GPU memory
cleanup_gpu_memory(
    model=model,
    outputs=outputs,
    views=views,
    verbose=True,
)

## Update Splatter Config with Preprocessing Path

Update the Splatter config to point to our MapAnything outputs for future training.

In [8]:
# Set the preprocessing data path in splatter config
splatter.config["preproc_data_path"] = preproc_dir

print(f"Updated Splatter config:")
print(f"  Preprocessing path: {splatter.config['preproc_data_path']}")
print(f"\nThis path can now be used for training with:")
print(f"  splatter.extract_features(overwrite=True)")

Updated Splatter config:
  Preprocessing path: /workspace/fieldwork-data/birds/2024-02-06/environment/C0043/preproc

This path can now be used for training with:
  splatter.extract_features(overwrite=True)


## Visualization: Point Cloud

Visualize the reconstructed sparse point cloud.

In [10]:
# Load point cloud
point_cloud = pv.PolyData(str(ply_path))

# Handle RGBA colors
if 'RGBA' in point_cloud.point_data:
    point_cloud.point_data['RGB'] = point_cloud.point_data['RGBA'][:, :3]

print(f"Point cloud loaded: {point_cloud.n_points:,} points")

# Create plotter
plotter = pv.Plotter(window_size=[1200, 800])

# Add point cloud
plotter.add_mesh(
    point_cloud,
    scalars='RGB',
    rgb=True,
    point_size=1,
    render_points_as_spheres=False,
)

# Add coordinate axes
plotter.add_axes(
    xlabel='X',
    ylabel='Y',
    zlabel='Z',
    line_width=5,
)

# Set camera view
plotter.camera_position = 'iso'
plotter.enable_eye_dome_lighting()  # Better depth perception

# Add title
plotter.add_text(
    f"MapAnything Reconstruction\n{point_cloud.n_points:,} points",
    position='upper_left',
    font_size=12,
    color='white',
)

# Show
plotter.show()

Point cloud loaded: 675,426 points


Widget(value='<iframe src="http://localhost:38257/index.html?ui=P_0x7fa103584f40_1&reconnect=auto" class="pyvi‚Ä¶

## Visualization: Camera Poses

Visualize camera poses from the reconstruction.

In [None]:
# Load reconstruction
reconstruction = pycolmap.Reconstruction(str(rescaled_sparse_dir))

print(f"Reconstruction loaded:")
print(f"  Cameras: {len(reconstruction.cameras)}")
print(f"  Images: {len(reconstruction.images)}")
print(f"  Points3D: {len(reconstruction.points3D)}")

# Extract camera positions and orientations
camera_positions = []
camera_directions = []

for img_id, image in reconstruction.images.items():
    # Get camera center in world coordinates
    camera_center = image.projection_center()
    camera_positions.append(camera_center)
    
    # Get camera viewing direction (negative Z axis in camera space)
    R = image.cam_from_world.rotation.matrix()
    view_direction = R[2, :]  # Negative Z axis
    camera_directions.append(view_direction)

camera_positions = np.array(camera_positions)
camera_directions = np.array(camera_directions)

print(f"\nCamera trajectory:")
print(f"  Start: {camera_positions[0]}")
print(f"  End: {camera_positions[-1]}")
print(f"  Total distance: {np.linalg.norm(camera_positions[-1] - camera_positions[0]):.2f}")

In [None]:
# Visualize cameras and point cloud together
plotter = pv.Plotter(window_size=[1200, 800])

# Add point cloud (smaller points)
plotter.add_mesh(
    point_cloud,
    scalars='RGB',
    rgb=True,
    point_size=1,
    opacity=0.5,
)

# Add camera centers
camera_cloud = pv.PolyData(camera_positions)
plotter.add_mesh(
    camera_cloud,
    color='red',
    point_size=10,
    render_points_as_spheres=True,
    label='Camera Centers',
)

# Add camera trajectory
if len(camera_positions) > 1:
    trajectory = pv.PolyData(camera_positions)
    trajectory.lines = np.hstack([[2, i, i+1] for i in range(len(camera_positions)-1)])
    plotter.add_mesh(
        trajectory,
        color='yellow',
        line_width=3,
        label='Camera Trajectory',
    )

# Add camera viewing directions (arrows)
arrow_scale = 0.5
for pos, direction in zip(camera_positions[::5], camera_directions[::5]):  # Show every 5th camera
    arrow = pv.Arrow(
        start=pos,
        direction=direction,
        scale=arrow_scale,
    )
    plotter.add_mesh(
        arrow,
        color='cyan',
        opacity=0.7,
    )

# Add axes
plotter.add_axes(
    xlabel='X',
    ylabel='Y',
    zlabel='Z',
    line_width=5,
)

# Add legend
plotter.add_legend()

# Set view
plotter.camera_position = 'iso'
plotter.enable_eye_dome_lighting()

# Add title
plotter.add_text(
    f"Camera Poses & Reconstruction\n{len(camera_positions)} cameras, {point_cloud.n_points:,} points",
    position='upper_left',
    font_size=12,
    color='white',
)

plotter.show()

## Inspection: Reconstruction Quality

Analyze the quality of the reconstruction.

In [None]:
# Load transforms.json
with open(transforms_path) as f:
    transforms = json.load(f)

print("Reconstruction Statistics:")
print("="*70)
print(f"Dataset: {image_dir.name}")
print(f"\nImages:")
print(f"  Total: {len(reconstruction.images)}")
print(f"  Registered: {len([img for img in reconstruction.images.values() if img.registered])}")

print(f"\nCameras:")
print(f"  Total: {len(reconstruction.cameras)}")
for cam_id, camera in list(reconstruction.cameras.items())[:3]:  # Show first 3 cameras
    print(f"  Camera {cam_id}: {camera.model.name} - {camera.width}x{camera.height}")
    print(f"    Focal: fx={camera.params[0]:.2f}, fy={camera.params[1]:.2f}")
    print(f"    Principal: cx={camera.params[2]:.2f}, cy={camera.params[3]:.2f}")

print(f"\nPoint Cloud:")
print(f"  3D Points: {len(reconstruction.points3D):,}")
print(f"  PLY Points: {point_cloud.n_points:,}")

# Compute point cloud statistics
points = np.array([pt.xyz for pt in reconstruction.points3D.values()])
if len(points) > 0:
    print(f"\nPoint Cloud Extent:")
    print(f"  X: [{points[:, 0].min():.2f}, {points[:, 0].max():.2f}]")
    print(f"  Y: [{points[:, 1].min():.2f}, {points[:, 1].max():.2f}]")
    print(f"  Z: [{points[:, 2].min():.2f}, {points[:, 2].max():.2f}]")
    
    # Compute scene scale
    scene_extent = points.max(axis=0) - points.min(axis=0)
    scene_diagonal = np.linalg.norm(scene_extent)
    print(f"\n  Scene diagonal: {scene_diagonal:.2f}")

# Camera trajectory statistics
trajectory_length = np.sum(np.linalg.norm(np.diff(camera_positions, axis=0), axis=1))
print(f"\nCamera Trajectory:")
print(f"  Total length: {trajectory_length:.2f}")
print(f"  Average step: {trajectory_length / (len(camera_positions) - 1):.2f}")

print("="*70)

## Summary

MapAnything preprocessing complete and ready for training!

In [None]:
print("\n" + "="*70)
print("MAPANYTHING PIPELINE COMPLETE")
print("="*70)

print(f"\nüìÅ Configuration:")
print(f"  Dataset: {dataset_name}")
print(f"  Config dir: {config_dir}")

print(f"\nüì∏ Input:")
print(f"  Images: {image_dir}")
print(f"  Count: {len(image_paths)} images")

print(f"\nüìä MapAnything Outputs:")
print(f"  Preprocessing: {preproc_dir}")
print(f"  transforms.json: {transforms_path}")
print(f"  Point cloud: {ply_path} ({point_cloud.n_points:,} points)")
print(f"  COLMAP: {rescaled_sparse_dir}")

print(f"\nüöÄ Next Steps:")
print(f"  Train model: splatter.extract_features(overwrite=True)")
print(f"  Or use nerfstudio directly: ns-train {splatter.config['method']} --data {preproc_dir}")

print("="*70)