# MapAnything Preprocessing for Nerfstudio

This notebook demonstrates how to use **MapAnything** for image-only structure-from-motion preprocessing, with conversion to COLMAP format for nerfstudio compatibility.

## What is MapAnything?

MapAnything is a foundation model from Meta Research that:
- Performs feedforward inference for 3D reconstruction from images
- Estimates camera poses and depth maps directly
- Outputs dense 3D point clouds with confidence scores
- Supports metric scale estimation
- Exports directly to COLMAP format

Repository: https://github.com/facebookresearch/map-anything

## Workflow

### Section 1: Image-Only Inference with MapAnything
1. **Load MapAnything Model**: Initialize from HuggingFace pretrained weights
2. **Process Images**: Load and preprocess input images
3. **Run Inference**: Get camera poses, depth maps, confidence, and 3D points
4. **Export to COLMAP**: Convert outputs to COLMAP format with voxel downsampling

### Section 2: Visualization
5. **Visualize Point Cloud**: Display the reconstructed sparse point cloud
6. **Compare with VGGT**: Benchmark against VGGT-X/InfiniteVGGT (optional)

### Section 3: Loop Closure Extension
7. **Setup Loop Closure**: Configure VGGT-Long with MapAnythingAdapter
8. **Run Loop Detection**: Detect and close loops for long sequences
9. **Global Optimization**: Refine poses with Sim(3) optimization

## Prerequisites

```bash
# Install MapAnything
pip install git+https://github.com/facebookresearch/map-anything.git

# For loop closure (optional)
git clone https://github.com/DengKaiCQ/VGGT-Long.git
cd VGGT-Long && pip install -e .
```

---
# Section 1: MapAnything Image-Only Inference
---

## Setup and Imports

In [2]:
%load_ext autoreload
%autoreload 2

import sys
import os
from pathlib import Path
import time
import json

import torch
import numpy as np
from tqdm import tqdm
from PIL import Image

# Verify conda environment
print(f"Python executable: {sys.executable}")
print(f"Python version: {sys.version}")

if 'nerfstudio' not in sys.executable:
    print("\n‚ö†Ô∏è  WARNING: Not running in nerfstudio conda environment!")
    print("Please activate with: conda activate nerfstudio")
else:
    print("\n‚úì Running in nerfstudio environment")

# Import MapAnything
try:
    from mapanything.models import MapAnything
    from mapanything.utils.image import load_images
    print("‚úì MapAnything imported successfully")
except ImportError as e:
    print(f"‚ùå MapAnything import failed: {e}")
    print("Please install: pip install git+https://github.com/facebookresearch/map-anything.git")
    raise

# Import nerfstudio utilities
from nerfstudio.process_data import colmap_utils
from nerfstudio.utils.rich_utils import CONSOLE

print("‚úì All imports successful")

Python executable: /opt/conda/envs/nerfstudio/bin/python
Python version: 3.10.18 | packaged by conda-forge | (main, Jun  4 2025, 14:45:41) [GCC 13.3.0]

‚úì Running in nerfstudio environment
‚úì MapAnything imported successfully
‚úì All imports successful


## Configuration

Set up paths for input images, output directory, and model settings.

In [3]:
# ============================================================================
# Dataset Configuration
# ============================================================================

# Input: Image directory (update with your path)
image_dir = Path("/workspace/bicycle/images_4")

# Output: Will create structure similar to VGGT-X
output_base = Path("/workspace/bicycle/environment/bicycle_mapanything")
preproc_dir = output_base / "preproc"
colmap_dir = preproc_dir / "colmap"
colmap_output_dir = colmap_dir / "sparse" / "0"
image_output_dir = colmap_dir / "images"

# Create directories
colmap_output_dir.mkdir(parents=True, exist_ok=True)
image_output_dir.mkdir(parents=True, exist_ok=True)

# ============================================================================
# MapAnything Model Configuration
# ============================================================================

# Model selection
model_name = "facebook/map-anything"  # Standard model (CC-BY-NC 4.0)
# model_name = "facebook/map-anything-apache"  # Apache 2.0 licensed alternative

# Inference parameters
memory_efficient_inference = True  # Use memory-efficient mode
minibatch_size = 1                 # Process one image at a time
use_amp = True                     # Use automatic mixed precision
amp_dtype = "bf16"                 # bfloat16 for Ampere+ GPUs, use "fp16" for older GPUs
apply_mask = True                  # Apply geometric validity mask
mask_edges = True                  # Mask image edges for better quality

# COLMAP export parameters
voxel_fraction = 0.01              # Voxel size as fraction of scene extent (0.01 = 1%)
# voxel_size = None                # Explicit voxel size in meters (overrides voxel_fraction)
export_glb = False                 # Export dense mesh as GLB (optional)

print(f"Input images: {image_dir}")
print(f"Output directory: {output_base}")
print(f"COLMAP output: {colmap_output_dir}")
print(f"Model: {model_name}")

# Check input images
image_extensions = ["*.jpg", "*.jpeg", "*.png", "*.JPG", "*.JPEG", "*.PNG"]
image_paths = []
for ext in image_extensions:
    image_paths.extend(sorted(list(image_dir.glob(ext))))

print(f"\nFound {len(image_paths)} images")
if len(image_paths) == 0:
    raise ValueError(f"No images found in {image_dir}")

image_names = [os.path.basename(path) for path in image_paths]

Input images: /workspace/bicycle/images_4
Output directory: /workspace/bicycle/environment/bicycle_mapanything
COLMAP output: /workspace/bicycle/environment/bicycle_mapanything/preproc/colmap/sparse/0
Model: facebook/map-anything

Found 194 images


## Step 1: Load MapAnything Model

Load the pretrained MapAnything model from HuggingFace.

In [4]:
# Setup device
device = "cuda" if torch.cuda.is_available() else "cpu"
if not torch.cuda.is_available():
    raise RuntimeError("CUDA is required for MapAnything inference")

print(f"Device: {device}")
print(f"GPU: {torch.cuda.get_device_name(0)}")

# Enable CUDA optimizations
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
torch.backends.cudnn.benchmark = True
torch.cuda.set_per_process_memory_fraction(0.95)
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

print("\n" + "="*70)
print("LOADING MAPANYTHING MODEL")
print("="*70)

# Load model from HuggingFace
model = MapAnything.from_pretrained(model_name).to(device)
model.eval()

print(f"‚úì Model loaded: {model_name}")
print(f"  Device: {device}")
print("="*70)

Device: cuda
GPU: NVIDIA A40

LOADING MAPANYTHING MODEL
Loading pretrained dinov2_vitg14 from torch hub


Using cache found in /workspace/models/hub/facebookresearch_dinov2_main


‚úì Model loaded: facebook/map-anything
  Device: cuda


## Step 2: Load and Preprocess Images

Load images using MapAnything's utility functions.

In [5]:
print("\n" + "="*70)
print("LOADING IMAGES")
print("="*70)

# Convert paths to strings
image_path_list = [str(p) for p in image_paths]

# Load images using MapAnything's load_images function
print(f"Loading {len(image_path_list)} images...")
views = load_images(image_path_list)

print(f"‚úì Images loaded: {len(views)} views")
print(f"  First image shape: {views[0]['img'].shape if 'img' in views[0] else 'N/A'}")
print("="*70)


LOADING IMAGES
Loading 194 images...
‚úì Images loaded: 194 views
  First image shape: torch.Size([1, 3, 336, 518])


## Step 3: Run MapAnything Inference

Run feedforward inference to get camera poses, depth maps, and 3D points.

In [7]:
print("\n" + "="*70)
print("RUNNING MAPANYTHING INFERENCE")
print("="*70)

# Track inference time and memory
torch.cuda.reset_peak_memory_stats()
torch.cuda.synchronize()
start_time = time.time()

# Run inference
with torch.no_grad():
    outputs = model.infer(
        views,
        memory_efficient_inference=memory_efficient_inference,
        minibatch_size=minibatch_size,
        use_amp=use_amp,
        amp_dtype=amp_dtype,
        apply_mask=apply_mask,
        mask_edges=mask_edges,
        apply_confidence_mask=True,
        use_multiview_confidence=True,
        confidence_percentile=35,
    )

torch.cuda.synchronize()
end_time = time.time()

inference_time = end_time - start_time
peak_memory_gb = torch.cuda.max_memory_allocated() / (1024**3)

print("\n" + "="*70)
print("INFERENCE COMPLETE")
print("="*70)
print(f"  Total time: {inference_time:.2f}s")
print(f"  Time per frame: {inference_time / len(image_paths):.3f}s")
print(f"  Peak GPU memory: {peak_memory_gb:.2f} GB")
print(f"  Frames processed: {len(image_paths)}")
print("="*70)

# Inspect outputs
print("\nOutput keys:")
for key in outputs[0].keys():
    value = outputs[0][key]
    if isinstance(value, torch.Tensor):
        print(f"  {key}: {value.shape} ({value.dtype})")
    elif isinstance(value, list):
        print(f"  {key}: list of {len(value)} items")
    else:
        print(f"  {key}: {type(value)}")


RUNNING MAPANYTHING INFERENCE


Checking frustum containment: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4/4 [00:00<00:00, 326.76it/s]
Checking triangle intersections: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4/4 [00:00<00:00, 81.58it/s]



INFERENCE COMPLETE
  Total time: 77.11s
  Time per frame: 0.397s
  Peak GPU memory: 23.42 GB
  Frames processed: 194

Output keys:
  pts3d: torch.Size([1, 336, 518, 3]) (torch.float32)
  pts3d_cam: torch.Size([1, 336, 518, 3]) (torch.float32)
  ray_directions: torch.Size([1, 336, 518, 3]) (torch.float32)
  depth_along_ray: torch.Size([1, 336, 518, 1]) (torch.float32)
  cam_trans: torch.Size([1, 3]) (torch.float32)
  cam_quats: torch.Size([1, 4]) (torch.float32)
  metric_scaling_factor: torch.Size([1, 1]) (torch.float32)
  conf: torch.Size([1, 336, 518]) (torch.float32)
  non_ambiguous_mask: torch.Size([1, 336, 518]) (torch.bool)
  non_ambiguous_mask_logits: torch.Size([1, 336, 518]) (torch.float32)
  img_no_norm: torch.Size([1, 336, 518, 3]) (torch.float32)
  depth_z: torch.Size([1, 336, 518, 1]) (torch.float32)
  intrinsics: torch.Size([1, 3, 3]) (torch.float32)
  camera_poses: torch.Size([1, 4, 4]) (torch.float32)
  mask: torch.Size([1, 336, 518, 1]) (torch.bool)


## Step 4: Export to COLMAP Format

Export MapAnything outputs to COLMAP format with adaptive voxel downsampling.

In [8]:
print("\n" + "="*70)
print("EXPORTING TO COLMAP FORMAT")
print("="*70)

# Import export function
from mapanything.utils.colmap_export import export_predictions_to_colmap

# Get image names for COLMAP output
image_names = [os.path.basename(path) for path in image_path_list]

# Export to COLMAP format (includes saving processed images)
print("Exporting to COLMAP format...")
_ = export_predictions_to_colmap(
    outputs=outputs,
    processed_views=views,
    image_names=image_names,
    output_dir=colmap_dir,
    voxel_fraction=0.02,
    voxel_size=None,
    data_norm_type=model.encoder.data_norm_type,
    save_ply=True,
    save_images=True,
    skip_point2d=False,
)

print(f"‚úì Exported to COLMAP format: {colmap_dir}")

print("\n" + "="*70)
print("COLMAP EXPORT COMPLETE")
print("="*70)
print(f"Output directory: {colmap_output_dir}")
print(f"  - cameras.bin")
print(f"  - images.bin")
print(f"  - points3D.bin")
print(f"  - points.ply")
print("="*70)


EXPORTING TO COLMAP FORMAT


Jupyter environment detected. Enabling Open3D WebVisualizer.
[Open3D INFO] WebRTC GUI backend enabled.
[Open3D INFO] WebRTCWindowSystem: HTTP handshake server disabled.
Exporting to COLMAP format...
Total points before downsampling: 19557475
Scene extent (IQR-based): 9.167m, full extent: 141.767m
Adaptive voxel size: 0.1833m
Downsampled from 19557475 to 575435 points
Backprojecting points to frames...
Built COLMAP reconstruction:
  - 194 images
  - 575435 3D points
  - 11255720 Point2D observations
Saved COLMAP reconstruction to: /workspace/bicycle/environment/bicycle_mapanything/preproc/colmap/sparse
Saved point cloud PLY to: /workspace/bicycle/environment/bicycle_mapanything/preproc/colmap/sparse/points.ply
Saved 194 processed images to: /workspace/bicycle/environment/bicycle_mapanything/preproc/colmap/images
‚úì Exported to COLMAP format: /workspace/bicycle/environment/bicycle_mapanything/preproc/colmap

COLMAP EXPORT COMPLETE
Output directory: /workspace/bicycle/environment/bicycle

In [9]:
import shutil
from pathlib import Path

# Move all files within sparse to sparse/0
sparse_dir = Path("/workspace/bicycle/environment/bicycle_mapanything/preproc/colmap/sparse")
sparse_0_dir = sparse_dir / "0"
sparse_0_dir.mkdir(parents=True, exist_ok=True)

for item in sparse_dir.iterdir():
    if item.is_file():
        shutil.move(str(item), str(sparse_0_dir / item.name))

## Step 5: Convert to Nerfstudio Format

Convert the COLMAP reconstruction to nerfstudio's `transforms.json` format.

In [10]:
print("\n" + "="*70)
print("CONVERTING TO NERFSTUDIO FORMAT")
print("="*70)

# MapAnything exports to colmap/sparse/ not colmap/sparse/0/
mapanything_colmap_dir = colmap_dir / "sparse" / "0"

# Convert COLMAP to transforms.json
print(f"Converting COLMAP reconstruction to transforms.json...")
colmap_utils.colmap_to_json(
    recon_dir=mapanything_colmap_dir,
    output_dir=preproc_dir,
)

transforms_path = preproc_dir / "transforms.json"
print(f"‚úì Created: {transforms_path}")

# Load transforms to get applied_transform
with open(transforms_path) as f:
    transforms = json.load(f)

applied_transform = torch.tensor(transforms["applied_transform"])

# Create point cloud PLY file
ply_filename = "sparse_pc.ply"
print(f"\nCreating point cloud PLY file...")

# Check if MapAnything already exported points.ply
mapanything_ply = mapanything_colmap_dir / "points.ply"
if mapanything_ply.exists():
    print(f"Using MapAnything's exported point cloud: {mapanything_ply}")
    import shutil
    shutil.copy(mapanything_ply, preproc_dir / ply_filename)
else:
    print(f"Creating point cloud from COLMAP reconstruction...")
    colmap_utils.create_ply_from_colmap(
        filename=ply_filename,
        recon_dir=mapanything_colmap_dir,
        output_dir=preproc_dir,
        applied_transform=applied_transform,
    )

ply_path = preproc_dir / ply_filename
print(f"‚úì Created: {ply_path}")

# Update transforms.json with PLY path
transforms["ply_file_path"] = ply_filename
with open(transforms_path, 'w') as f:
    json.dump(transforms, f, indent=2)

print("\n" + "="*70)
print("CONVERSION COMPLETE")
print("="*70)
print(f"\nOutput files:")
print(f"  üìÅ {preproc_dir}/")
print(f"    üìÑ transforms.json")
print(f"    üìÑ {ply_filename}")
print(f"    üìÅ colmap/sparse/")
print(f"      üìÑ cameras.bin")
print(f"      üìÑ images.bin")
print(f"      üìÑ points3D.bin")
print("="*70)


CONVERTING TO NERFSTUDIO FORMAT
Converting COLMAP reconstruction to transforms.json...


{194: Camera(id=194, model='PINHOLE', width=518, height=336, params=array([446.20718384, 446.7557373 , 258.47644043, 168.69714355])), 193: Camera(id=193, model='PINHOLE', width=518, height=336, params=array([443.82510376, 444.42956543, 258.24310303, 168.65664673])), 192: Camera(id=192, model='PINHOLE', width=518, height=336, params=array([439.04541016, 439.10784912, 258.19897461, 168.56225586])), 191: Camera(id=191, model='PINHOLE', width=518, height=336, params=array([463.70431519, 463.10665894, 257.81298828, 168.24563599])), 190: Camera(id=190, model='PINHOLE', width=518, height=336, params=array([472.55310059, 471.3659668 , 258.28079224, 168.24697876])), 189: Camera(id=189, model='PINHOLE', width=518, height=336, params=array([473.02542114, 474.08963013, 258.1192627 , 168.08198547])), 188: Camera(id=188, model='PINHOLE', width=518, height=336, params=array([464.24938965, 466.61468506, 258.05456543, 168.02281189])), 187: Camera(id=187, model='PINHOLE', width=518, height=336, params=a

## Step 8: Rescale Reconstruction to Original Dimensions

MapAnything processes images at a fixed resolution (typically 336x518). We need to rescale the COLMAP reconstruction back to the original image dimensions for training.

In [22]:
import pycolmap
from nerfstudio.process_data.vggt_utils import _rescale_reconstruction_to_original_dimensions

print("\n" + "="*70)
print("RESCALING RECONSTRUCTION TO ORIGINAL DIMENSIONS")
print("="*70)

# Load the COLMAP reconstruction
map_anything_recon_dir = mapanything_colmap_dir / "sparse" / "0"
reconstruction = pycolmap.Reconstruction(str(map_anything_recon_dir))

recon_dir = Path('/workspace/bicycle/environment/bicycle/preproc/colmap/sparse/0')
rescaled_dir = Path('/workspace/bicycle/environment/bicycle/preproc')

recon_dir.mkdir(parents=True, exist_ok=True)
rescaled_dir.mkdir(parents=True, exist_ok=True)

print(f"Original reconstruction:")
print(f"  Cameras: {len(reconstruction.cameras)}")
print(f"  Images: {len(reconstruction.images)}")
print(f"  Points3D: {len(reconstruction.points3D)}")

# Get original image sizes
original_coords = []
for img_path in image_paths:
    img = Image.open(img_path)
    # Format: [top_left_x, top_left_y, crop_right, crop_bottom, original_width, original_height]
    # For MapAnything, images are resized without cropping, so top_left = (0, 0)
    # and crop dimensions = model resolution
    model_width = views[0]['img'].shape[-1]  # Width from loaded views
    model_height = views[0]['img'].shape[-2]  # Height from loaded views
    original_coords.append([0, 0, model_width, model_height, img.width, img.height])

original_coords = np.array(original_coords)

print(f"\nImage dimensions:")
print(f"  Model resolution: {model_width}x{model_height}")
print(f"  Original resolution: {original_coords[0, -2]}x{original_coords[0, -1]} (sample)")

# Rescale reconstruction
reconstruction = _rescale_reconstruction_to_original_dimensions(
    reconstruction=reconstruction,
    image_paths=image_paths,
    original_image_sizes=original_coords,
    image_size=(model_width, model_height),
    shared_camera=True,  # MapAnything uses per-image cameras
    shift_point2d_to_original_res=True,
    verbose=True,
)

# Write rescaled reconstruction back
reconstruction.write_binary(str(recon_dir))
print(f"\n‚úì Wrote rescaled reconstruction to: {recon_dir}")

# Regenerate transforms.json with rescaled reconstruction
print(f"\nRegenerating transforms.json with rescaled cameras...")
colmap_utils.colmap_to_json(
    recon_dir=recon_dir,
    output_dir=rescaled_dir,
)

# Update transforms.json with PLY path
transforms_path = rescaled_dir / "transforms.json"
with open(transforms_path) as f:
    transforms = json.load(f)

transforms["ply_file_path"] = ply_filename
with open(transforms_path, 'w') as f:
    json.dump(transforms, f, indent=2)

print(f"‚úì Updated transforms.json: {transforms_path}")
print("="*70)

[autoreload of timm.models.densenet failed: Traceback (most recent call last):
  File "/opt/conda/envs/nerfstudio/lib/python3.10/site-packages/IPython/extensions/autoreload.py", line 276, in check
    superreload(m, reload, self.old_objects)
  File "/opt/conda/envs/nerfstudio/lib/python3.10/site-packages/IPython/extensions/autoreload.py", line 475, in superreload
    module = reload(module)
  File "/opt/conda/envs/nerfstudio/lib/python3.10/importlib/__init__.py", line 169, in reload
    _bootstrap._exec(spec, module)
  File "<frozen importlib._bootstrap>", line 619, in _exec
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/opt/conda/envs/nerfstudio/lib/python3.10/site-packages/timm/models/densenet.py", line 47, in <module>
    class DenseLayer(nn.Module):
  File "/opt/conda/envs/nerfstudio/lib/python3.10/site-packages/timm/models/densenet.py", line 84, in DenseLayer
    def 


RESCALING RECONSTRUCTION TO ORIGINAL DIMENSIONS
Original reconstruction:
  Cameras: 194
  Images: 194
  Points3D: 575435

Image dimensions:
  Model resolution: 518x336
  Original resolution: 1236x821 (sample)



‚úì Wrote rescaled reconstruction to: /workspace/bicycle/environment/bicycle/preproc/colmap/sparse/0

Regenerating transforms.json with rescaled cameras...


{127: Camera(id=127, model='PINHOLE', width=1236, height=821, params=array([1125.75929313, 1148.80291712,  615.71344183,  410.0403986 ])), 126: Camera(id=126, model='PINHOLE', width=1236, height=821, params=array([483.45178223, 480.21566772, 258.07977295, 167.92936707])), 125: Camera(id=125, model='PINHOLE', width=1236, height=821, params=array([471.61141968, 471.2331543 , 258.00686646, 167.93612671])), 124: Camera(id=124, model='PINHOLE', width=1236, height=821, params=array([480.75714111, 481.12683105, 258.51016235, 167.98339844])), 123: Camera(id=123, model='PINHOLE', width=1236, height=821, params=array([472.31552124, 474.11572266, 258.10043335, 168.10075378])), 122: Camera(id=122, model='PINHOLE', width=1236, height=821, params=array([477.20794678, 477.81842041, 258.99316406, 167.65707397])), 121: Camera(id=121, model='PINHOLE', width=1236, height=821, params=array([474.63970947, 477.56521606, 258.12362671, 168.4954834 ])), 120: Camera(id=120, model='PINHOLE', width=1236, height=8

---
# Section 2: Visualization
---

## Step 6: Visualize Sparse Point Cloud

Visualize the reconstructed point cloud using PyVista.

In [24]:
import pyvista as pv

# Optional: Import visualization utilities if available
try:
    from collab_splats.utils.visualization import (
        CAMERA_KWARGS,
        MESH_KWARGS,
        VIZ_KWARGS,
        visualize_splat,
    )
    has_viz_utils = True
except ImportError:
    has_viz_utils = False
    print("collab_splats visualization utilities not available, using basic PyVista")

print(f"\nLoading point cloud from: {ply_path}")
point_cloud = pv.PolyData(str(ply_path))
point_cloud.point_data['RGB'] = point_cloud.point_data['RGBA']
print(f"  Points: {point_cloud.n_points:,}")

if has_viz_utils:
    # Use collab_splats visualization
    pcd_kwargs = MESH_KWARGS.copy()
    pcd_kwargs.update({
        "point_size": 2,
        "render_points_as_spheres": True,
        "ambient": 0.3,
        "diffuse": 0.8,
        "specular": 0.1,
    })
    
    plotter = visualize_splat(
        mesh=point_cloud,
        mesh_kwargs=pcd_kwargs,
        viz_kwargs=VIZ_KWARGS,
    )
else:
    # Basic PyVista visualization
    plotter = pv.Plotter()
    plotter.add_mesh(
        point_cloud,
        point_size=2,
        render_points_as_spheres=True,
    )
    plotter.add_axes()

plotter.show()


Loading point cloud from: /workspace/bicycle/environment/bicycle_mapanything/preproc/sparse_pc.ply
  Points: 575,435


Widget(value='<iframe src="http://localhost:35325/index.html?ui=P_0x7127fd4aae30_1&reconnect=auto" class="pyvi‚Ä¶

## Step 7: Performance Summary

Display comprehensive performance metrics for MapAnything.

In [16]:
print("\n" + "="*70)
print("MAPANYTHING PERFORMANCE SUMMARY")
print("="*70)

print(f"\nDataset: {image_dir.name}")
print(f"  Images: {len(image_paths)}")

print(f"\nInference Performance:")
print(f"  Total time: {inference_time:.2f}s")
print(f"  Per frame: {inference_time / len(image_paths):.3f}s")
print(f"  FPS: {len(image_paths) / inference_time:.2f}")
print(f"  Peak GPU memory: {peak_memory_gb:.2f} GB")

print(f"\nReconstruction Quality:")
print(f"  Point cloud points: {point_cloud.n_points:,}")
print(f"  Voxel downsampling: {voxel_fraction * 100:.1f}% of scene extent")

print(f"\nOutput Directory: {preproc_dir}")
print("="*70)


MAPANYTHING PERFORMANCE SUMMARY

Dataset: images_4
  Images: 194

Inference Performance:
  Total time: 94.99s
  Per frame: 0.490s
  FPS: 2.04
  Peak GPU memory: 40.66 GB

Reconstruction Quality:
  Point cloud points: 575,366
  Voxel downsampling: 1.0% of scene extent

Output Directory: /workspace/bicycle/environment/bicycle_mapanything/preproc


## Step 9: Train Gaussian Splatting Model

Train a Gaussian Splatting model using the preprocessed MapAnything data.

In [None]:
from collab_splats.wrapper import Splatter

print("\n" + "="*70)
print("TRAINING GAUSSIAN SPLATTING MODEL")
print("="*70)

# Create splatter pointing to MapAnything preprocessing output
splatter = Splatter(
    dataset=image_dir.name,
    method="rade-gs",
    file_path=image_dir,
    input_type="images",
    output_path=output_base,
    preproc_data_path=preproc_dir,  # Use existing MapAnything preprocessing
)

print(f"\nDataset: {splatter.config['dataset']}")
print(f"Method: {splatter.config['method']}")
print(f"Preprocessing: {preproc_dir}")
print(f"Output: {output_base}")

# Training configuration
training_kwargs = {
    "pipeline.model.cull-alpha-thresh": 0.005,
    "pipeline.model.continue-cull-post-densification": False,
    "pipeline.model.output-depth-during-training": True,
    "pipeline.model.rasterize-mode": "antialiased",
    "optimizers.xyz.optimizer": "Adam",
    "optimizers.xyz.scheduler": "ExponentialDecayScheduler",
}

print(f"\nTraining configuration:")
for key, value in training_kwargs.items():
    print(f"  {key}: {value}")

# Train the model
print(f"\nStarting training...")
splatter.train(
    kwargs=training_kwargs,
    overwrite=True
)

print("\n" + "="*70)
print("TRAINING COMPLETE")
print("="*70)
print(f"\nModel saved to: {output_base / 'outputs'}")
print(f"\nTo visualize the trained model:")
print(f"  ns-viewer --load-config {output_base / 'outputs' / 'config.yml'}")
print("="*70)

---
# Section 3: Loop Closure Extension
---

## Overview: VGGT-Long Loop Closure

**VGGT-Long** extends foundation models like MapAnything with loop closure capabilities for long video sequences. This is particularly useful for:
- Long video sequences with revisited locations
- Reducing drift in camera pose estimation
- Improving global consistency of 3D reconstructions

Repository: https://github.com/DengKaiCQ/VGGT-Long

### Architecture

The VGGT-Long system uses an **adapter pattern** to support multiple foundation models:
- `VGGTAdapter`: For VGGT models
- `Pi3Adapter`: For Pi3 models
- `MapAnythingAdapter`: For MapAnything models

### Loop Closure Pipeline

1. **Chunked Inference**: Process video in overlapping chunks
2. **Loop Detection**: Detect revisited locations using:
   - DBoW2-based retrieval
   - DNIO v2 loop detector
3. **Loop Alignment**: Align detected loops using Sim(3) transformation
4. **Global Optimization**: Refine all poses with `Sim3LoopOptimizer`
5. **Point Cloud Merging**: Combine aligned point clouds

### Key Advantages

- Handles long sequences (>1000 frames)
- Reduces pose drift
- Improves reconstruction consistency
- Memory-efficient chunked processing
- Supports metric scale from MapAnything

## Step 9: Setup VGGT-Long with MapAnythingAdapter

Configure VGGT-Long to use MapAnything with loop closure.

In [None]:
# Import VGGT-Long components
try:
    from vggt_long import VGGTLong
    from base_models.base_model import MapAnythingAdapter
    print("‚úì VGGT-Long imported successfully")
except ImportError as e:
    print(f"‚ùå VGGT-Long import failed: {e}")
    print("Please install: git clone https://github.com/DengKaiCQ/VGGT-Long.git && cd VGGT-Long && pip install -e .")
    raise

# Configuration for VGGT-Long with MapAnything
vggt_long_config = {
    # Model configuration
    "Weights": {
        "model": "Mapanything",
        "model_url": "facebook/map-anything",  # or "facebook/map-anything-apache"
    },
    
    # Chunking parameters
    "chunk_size": 100,        # Frames per chunk
    "overlap": 10,             # Overlap between chunks
    
    # Loop closure detection
    "useDBoW": True,           # Use DBoW2 for loop detection
    "dbow_threshold": 0.5,     # Similarity threshold
    "dbow_repeat_count": 3,    # Minimum repeat count
    
    # Alignment parameters
    "conf_threshold_coef": 0.1,  # Confidence threshold coefficient
    "overlap_ratio": 0.3,         # Overlap ratio for alignment
    
    # Point cloud saving
    "Pointcloud_Save": {
        "save_unaligned": True,   # Save per-chunk point clouds
        "save_aligned": True,      # Save aligned point clouds
        "save_loop": True,         # Save loop-closed point clouds
        "sample_ratio": 0.1,       # Downsampling ratio for final point cloud
    },
    
    # Output directory
    "output_dir": str(output_base / "loop_closure"),
}

print("\n" + "="*70)
print("VGGT-LONG CONFIGURATION")
print("="*70)
print(f"Model: {vggt_long_config['Weights']['model']}")
print(f"Chunk size: {vggt_long_config['chunk_size']}")
print(f"Overlap: {vggt_long_config['overlap']}")
print(f"Loop detection: DBoW2 (threshold={vggt_long_config['dbow_threshold']})")
print(f"Output: {vggt_long_config['output_dir']}")
print("="*70)

## Step 10: Run Loop Closure Pipeline

Execute the full VGGT-Long pipeline with MapAnything.

**Note**: This step is computationally intensive and best suited for long video sequences (>200 frames) with loop closures.

In [None]:
# Create output directory
loop_output_dir = Path(vggt_long_config["output_dir"])
loop_output_dir.mkdir(parents=True, exist_ok=True)

print("\n" + "="*70)
print("RUNNING VGGT-LONG LOOP CLOSURE")
print("="*70)

# Initialize VGGT-Long
vggt_long = VGGTLong(config=vggt_long_config)

# Run the full pipeline
# This will:
# 1. Process images in chunks using MapAnythingAdapter
# 2. Detect loops between chunks
# 3. Align chunks with loop constraints
# 4. Optimize poses globally
# 5. Merge point clouds

start_time = time.time()
results = vggt_long.run(image_paths=image_path_list)
end_time = time.time()

loop_closure_time = end_time - start_time

print("\n" + "="*70)
print("LOOP CLOSURE COMPLETE")
print("="*70)
print(f"  Total time: {loop_closure_time:.2f}s")
print(f"  Loops detected: {len(results.get('loops', []))}")
print(f"  Chunks processed: {results.get('num_chunks', 0)}")
print(f"  Final point cloud: {loop_output_dir / 'merged_point_cloud.ply'}")
print("="*70)

## Step 11: Visualize Loop-Closed Reconstruction

Compare the original MapAnything reconstruction with the loop-closed version.

In [None]:
import pyvista as pv

# Load both point clouds
original_pc = pv.PolyData(str(ply_path))
loop_closed_pc = pv.PolyData(str(loop_output_dir / "merged_point_cloud.ply"))

print(f"Original point cloud: {original_pc.n_points:,} points")
print(f"Loop-closed point cloud: {loop_closed_pc.n_points:,} points")

# Visualize side by side
plotter = pv.Plotter(shape=(1, 2))

# Original reconstruction
plotter.subplot(0, 0)
plotter.add_text("Original MapAnything", font_size=12)
plotter.add_mesh(original_pc, point_size=2, render_points_as_spheres=True)
plotter.add_axes()

# Loop-closed reconstruction
plotter.subplot(0, 1)
plotter.add_text("With Loop Closure", font_size=12)
plotter.add_mesh(loop_closed_pc, point_size=2, render_points_as_spheres=True)
plotter.add_axes()

plotter.link_views()
plotter.show()

## Step 12: Export Loop-Closed Results to COLMAP

Convert the loop-closed reconstruction to COLMAP format for nerfstudio training.

In [None]:
# The VGGT-Long pipeline should have already exported to COLMAP format
# Check the output directory for COLMAP files

loop_colmap_dir = loop_output_dir / "colmap" / "sparse" / "0"

if loop_colmap_dir.exists():
    print(f"Loop-closed COLMAP reconstruction found at: {loop_colmap_dir}")
    
    # Convert to nerfstudio format
    loop_preproc_dir = loop_output_dir / "preproc"
    loop_preproc_dir.mkdir(parents=True, exist_ok=True)
    
    colmap_utils.colmap_to_json(
        recon_dir=loop_colmap_dir,
        output_dir=loop_preproc_dir,
    )
    
    print(f"‚úì Nerfstudio transforms.json created: {loop_preproc_dir / 'transforms.json'}")
else:
    print(f"‚ö†Ô∏è  COLMAP output not found. Check VGGT-Long configuration.")
    print(f"    Expected: {loop_colmap_dir}")

## Comparison: MapAnything vs VGGT vs InfiniteVGGT

| Feature | MapAnything | VGGT | InfiniteVGGT |
|---------|-------------|------|---------------|
| **Speed** | Fast (feedforward) | Moderate | Fast (streaming) |
| **Memory** | Memory-efficient | High | Frame-cached |
| **Metric Scale** | ‚úÖ Yes | ‚ùå No | ‚ùå No |
| **Long Sequences** | Limited | Limited | ‚úÖ Optimized |
| **Point Density** | Dense | Dense | Dense |
| **COLMAP Export** | ‚úÖ Built-in | Manual | Manual |
| **Loop Closure** | ‚ö†Ô∏è Via VGGT-Long | ‚ö†Ô∏è Via VGGT-Long | ‚ö†Ô∏è Via VGGT-Long |
| **Best For** | Metric reconstruction, quick preview | General scenes | Long videos, streaming |

### When to Use MapAnything

‚úÖ **Use MapAnything when:**
- You need metric scale (real-world units)
- You want fast feedforward inference
- You need quick previews or prototyping
- You have short to medium sequences (<500 frames)
- You want built-in COLMAP export

‚ùå **Don't use MapAnything when:**
- You have very long sequences (use InfiniteVGGT)
- You need maximum accuracy (use VGGT with BA)
- You have limited GPU memory (use InfiniteVGGT with disk caching)

### Loop Closure Benefits

Adding loop closure (Section 3) is beneficial when:
- Your sequence has revisited locations
- You observe drift in camera poses
- You want globally consistent reconstruction
- Your sequence is longer than 200-300 frames

The overhead is:
- ~1.5-2x inference time
- Additional memory for loop detection
- More complex pipeline

But you gain:
- Reduced pose drift
- Better global consistency
- Improved reconstruction quality
- Support for very long sequences

## Troubleshooting

### Out of Memory
- Set `minibatch_size=1` for memory-efficient inference
- Use `amp_dtype="fp16"` instead of `"bf16"` on older GPUs
- Process fewer images at once
- Reduce image resolution before inference

### Poor Reconstruction Quality
- Adjust `voxel_fraction` (try 0.005 - 0.02)
- Enable `apply_mask=True` and `mask_edges=True`
- Check input image quality and lighting
- Ensure sufficient overlap between images

### Loop Closure Issues
- Adjust `dbow_threshold` (try 0.3 - 0.7)
- Increase `chunk_size` and `overlap`
- Check that sequences have actual loop closures
- Verify GPU memory is sufficient for long sequences

### Installation Issues
```bash
# MapAnything
pip install git+https://github.com/facebookresearch/map-anything.git

# VGGT-Long (for loop closure)
git clone https://github.com/DengKaiCQ/VGGT-Long.git
cd VGGT-Long
pip install -e .
```

### License Considerations
- Default model: `facebook/map-anything` (CC-BY-NC 4.0 - non-commercial)
- Apache licensed: `facebook/map-anything-apache` (Apache 2.0 - commercial OK)
- Choose based on your use case