# InfiniteVGGT Preprocessing for Nerfstudio

This notebook demonstrates how to use **InfiniteVGGT** (StreamVGGT) as an alternative to VGGT-X for structure-from-motion preprocessing, with conversion to COLMAP format for nerfstudio compatibility.

## What is InfiniteVGGT?

InfiniteVGGT (StreamVGGT) extends VGGT with:
- Streaming inference for long video sequences
- Memory-efficient processing via frame caching
- Improved pose estimation for extended scenes

Repository: https://github.com/AutoLab-SAI-SJTU/InfiniteVGGT

## Workflow

1. **Run InfiniteVGGT Inference**: Get camera poses, depth maps, and confidence
2. **Convert to COLMAP Format**: Transform outputs to match VGGT-X structure
3. **Build Reconstruction**: Create COLMAP files (cameras.bin, images.bin, points3D.bin)
4. **Generate transforms.json**: Convert to nerfstudio format
5. **Compare Performance**: Benchmark against VGGT-X (optional)

## Prerequisites

```bash
# InfiniteVGGT is already cloned in /workspace/InfiniteVGGT
# Checkpoint available at: https://huggingface.co/lch01/StreamVGGT
```

## Setup and Imports

In [None]:
%load_ext autoreload
%autoreload 2

import sys
import os
from pathlib import Path
import time
import json

import torch
import numpy as np
import cv2
from tqdm import tqdm

# Verify conda environment
print(f"Python executable: {sys.executable}")
print(f"Python version: {sys.version}")

if 'nerfstudio' not in sys.executable:
    print("\n⚠️  WARNING: Not running in nerfstudio conda environment!")
    print("Please activate with: conda activate nerfstudio")
else:
    print("\n✓ Running in nerfstudio environment")

# Add InfiniteVGGT to Python path (append to end so environment VGGT takes precedence)
infinitevggt_root = Path("/workspace/InfiniteVGGT")
if str(infinitevggt_root / "src") not in sys.path:
    sys.path.append(str(infinitevggt_root / "src"))  # ✅ This puts it last

# Import InfiniteVGGT components
from streamvggt.models.streamvggt import StreamVGGT
from streamvggt.utils.load_fn import load_and_preprocess_images
from streamvggt.utils.pose_enc import pose_encoding_to_extri_intri
from streamvggt.utils.geometry import FrameDiskCache

# Import nerfstudio utilities
from nerfstudio.process_data import vggt_utils, colmap_utils
from nerfstudio.process_data.process_data_utils import CameraModel
from nerfstudio.utils.rich_utils import CONSOLE

print("✓ All imports successful")

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Python executable: /opt/conda/envs/nerfstudio/bin/python
Python version: 3.10.18 | packaged by conda-forge | (main, Jun  4 2025, 14:45:41) [GCC 13.3.0]

✓ Running in nerfstudio environment
✓ All imports successful


Process ForkProcess-2:
Process ForkProcess-23:
Process ForkProcess-30:
Process ForkProcess-25:
Process ForkProcess-16:
Process ForkProcess-19:
Process ForkProcess-18:
Process ForkProcess-32:
Process ForkProcess-14:
Process ForkProcess-31:
Process ForkProcess-26:
Process ForkProcess-7:
Process ForkProcess-24:
Process ForkProcess-22:
Process ForkProcess-15:
Process ForkProcess-29:
Traceback (most recent call last):
Process ForkProcess-17:
Process ForkProcess-8:
Process ForkProcess-6:
Process ForkProcess-10:
Process ForkProcess-9:
Process ForkProcess-5:
Process ForkProcess-21:
Process ForkProcess-20:
Process ForkProcess-27:
Process ForkProcess-28:
Process ForkProcess-4:
Process ForkProcess-3:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/opt/conda/envs/nerfstudio/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
Traceback (most recent call last):
Traceback (most recent call last):
Tracebac

## Configuration

Set up paths for input images, output directory, and model checkpoint.

In [6]:
# ============================================================================
# Dataset Configuration
# ============================================================================

# Input: Bicycle dataset (194 images)
image_dir = Path("/workspace/bicycle/images")

# Output: Will create structure similar to VGGT-X
output_base = Path("/workspace/bicycle/environment/bicycle")
preproc_dir = output_base / "preproc"
colmap_dir = preproc_dir / "colmap"
colmap_output_dir = colmap_dir / "sparse" / "0"

# Create directories
colmap_output_dir.mkdir(parents=True, exist_ok=True)

# ============================================================================
# InfiniteVGGT Model Configuration
# ============================================================================

# Checkpoint path - download from https://huggingface.co/lch01/StreamVGGT
checkpoint_path = Path("/workspace/InfiniteVGGT/checkpoints/streamvggt.pth")

# Model parameters
total_budget = 1200000  # Total point budget for reconstruction
cache_results = True    # Cache results in memory (set False for very long sequences)
frame_cache_dir = None  # Set to a directory path to cache per-frame outputs to disk

# COLMAP conversion parameters (matching VGGT-X defaults)
conf_threshold = 50.0         # Confidence threshold percentile (0-100)
scale_factor = 2.5            # Scale points for better reconstruction
shared_camera = True          # Use single camera model for all frames
max_points_for_colmap = 500000  # Maximum points for COLMAP reconstruction

print(f"Input images: {image_dir}")
print(f"Output directory: {output_base}")
print(f"COLMAP output: {colmap_output_dir}")
print(f"Checkpoint: {checkpoint_path}")
print(f"  Exists: {checkpoint_path.exists()}")

# Check input images
image_paths = sorted(list(image_dir.glob("*.JPG")) + list(image_dir.glob("*.png")))
print(f"\nFound {len(image_paths)} images")

Input images: /workspace/bicycle/images
Output directory: /workspace/bicycle/environment/bicycle
COLMAP output: /workspace/bicycle/environment/bicycle/preproc/colmap/sparse/0
Checkpoint: /workspace/InfiniteVGGT/checkpoints/streamvggt.pth
  Exists: False

Found 194 images


## Step 1: Run InfiniteVGGT Inference

Load the StreamVGGT model and run inference on the image sequence. This produces:
- Camera pose encodings
- Depth maps with confidence
- 3D point predictions in camera coordinates

In [7]:
device = "cuda" if torch.cuda.is_available() else "cpu"
if not torch.cuda.is_available():
    raise RuntimeError("CUDA is required for InfiniteVGGT inference")

print(f"Device: {device}")
print(f"GPU: {torch.cuda.get_device_name(0)}")

print("\n" + "="*70)
print("LOADING MODEL")
print("="*70)

# Initialize model
model = StreamVGGT.from_pretrained("lch01/StreamVGGT", total_budget=total_budget)
print(f"✓ Model initialized with budget: {total_budget:,} points")

model = model.to(device)
model.eval()

Device: cuda
GPU: NVIDIA A40

LOADING MODEL
✓ Model initialized with budget: 1,200,000 points


StreamVGGT(
  (aggregator): Aggregator(
    (patch_embed): DinoVisionTransformer(
      (patch_embed): PatchEmbed(
        (proj): Conv2d(3, 1024, kernel_size=(14, 14), stride=(14, 14))
        (norm): Identity()
      )
      (blocks): ModuleList(
        (0-23): 24 x NestedTensorBlock(
          (norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
          (attn): MemEffAttention(
            (qkv): Linear(in_features=1024, out_features=3072, bias=True)
            (q_norm): Identity()
            (k_norm): Identity()
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=1024, out_features=1024, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (ls1): LayerScale()
          (drop_path1): Identity()
          (norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=1024, out_features=4096, bias=True)
            (act): GELU(approx

### Load and preprocess images

In [8]:
# Load and preprocess images
print(f"Loading {len(image_paths)} images...")
images = load_and_preprocess_images([str(p) for p in image_paths]).to(device)
print(f"✓ Images preprocessed: {images.shape}")
print(f"  Shape: (N={images.shape[0]}, C={images.shape[1]}, H={images.shape[2]}, W={images.shape[3]})")

# Prepare frames for inference
frames = [{"img": images[i].unsqueeze(0)} for i in range(images.shape[0])]

Loading 194 images...
✓ Images preprocessed: torch.Size([194, 3, 350, 518])
  Shape: (N=194, C=3, H=350, W=518)


### Set dtye and run inference

In [9]:
# Determine optimal dtype based on GPU capability
dtype = torch.bfloat16 if torch.cuda.get_device_capability()[0] >= 8 else torch.float16
print(f"Using dtype: {dtype}")

# Run inference with timing and memory tracking
torch.cuda.reset_peak_memory_stats()
torch.cuda.synchronize()


start_time = time.time()

with torch.no_grad():
    with torch.cuda.amp.autocast(dtype=dtype):
        output = model.inference(
            frames,
            frame_writer=None,
            cache_results=cache_results
        )

torch.cuda.synchronize()
end_time = time.time()

inference_time = end_time - start_time
peak_memory_gb = torch.cuda.max_memory_allocated() / (1024**3)

print("\n" + "="*70)
print("INFERENCE COMPLETE")
print("="*70)
print(f"  Total time: {inference_time:.2f}s")
print(f"  Time per frame: {inference_time / len(image_paths):.3f}s")
print(f"  Peak GPU memory: {peak_memory_gb:.2f} GB")
print(f"  Frames processed: {len(image_paths)}")
print("="*70)


Using dtype: torch.bfloat16


  return F.conv2d(input, weight, bias, self.stride,



INFERENCE COMPLETE
  Total time: 67.23s
  Time per frame: 0.347s
  Peak GPU memory: 15.25 GB
  Frames processed: 194


## Step 2: Extract and Convert Predictions

Extract predictions from InfiniteVGGT output and convert to VGGT-X compatible format.

### InfiniteVGGT Output Structure
- `pts3d_in_other_view`: 3D points in world coordinates (N, H, W, 3)
- `conf`: Point confidence scores (N, H, W)
- `depth`: Depth maps (N, H, W)
- `depth_conf`: Depth confidence (N, H, W)
- `camera_pose`: Pose encoding (N, 16)

In [32]:
print("\n" + "="*70)
print("EXTRACTING PREDICTIONS")
print("="*70)

if not cache_results or output.ress is None or len(output.ress) == 0:
    raise RuntimeError(
        "No cached results available. Set cache_results=True or use frame_cache_dir "
        "to save per-frame outputs."
    )

# Extract results (following InfiniteVGGT/run_inference.py pattern)
all_pts3d = [res['pts3d_in_other_view'].squeeze(0) for res in output.ress]
all_conf = [res['conf'].squeeze(0) for res in output.ress]
all_depth = [res['depth'].squeeze(0) for res in output.ress]
all_depth_conf = [res['depth_conf'].squeeze(0) for res in output.ress]
all_camera_pose = [res['camera_pose'].squeeze(0) for res in output.ress]

# Stack into tensors
world_points = torch.stack(all_pts3d, dim=0)      # (N, H, W, 3)
world_points_conf = torch.stack(all_conf, dim=0)  # (N, H, W)
depth = torch.stack(all_depth, dim=0)             # (N, H, W)
depth_conf = torch.stack(all_depth_conf, dim=0)   # (N, H, W)
pose_enc = torch.stack(all_camera_pose, dim=0)    # (N, 16)

print(f"✓ Extracted predictions:")
print(f"  World points: {world_points.shape}")
print(f"  Confidence: {world_points_conf.shape}")
print(f"  Depth: {depth.shape}")
print(f"  Depth confidence: {depth_conf.shape}")
print(f"  Pose encoding: {pose_enc.shape}")

# Convert pose encoding to extrinsic and intrinsic matrices
# Following VGGT pattern: pose_encoding_to_extri_intri expects (B, N, 16)
extrinsic, intrinsic = pose_encoding_to_extri_intri(
    pose_enc.unsqueeze(0),  # Add batch dimension
    images.shape[-2:]       # (H, W)
)

# Remove batch dimension
extrinsic = extrinsic.squeeze(0)  # (N, 4, 4) or (N, 3, 4)
intrinsic = intrinsic.squeeze(0) if intrinsic is not None else None  # (N, 3, 3)

print(f"\n✓ Converted to camera matrices:")
print(f"  Extrinsic: {extrinsic.shape}")
print(f"  Intrinsic: {intrinsic.shape if intrinsic is not None else 'None'}")

# Move to CPU for further processing
images_cpu = images.detach().cpu()
world_points_cpu = world_points.detach().cpu()
world_points_conf_cpu = world_points_conf.detach().cpu()
depth_cpu = depth.detach().cpu()
depth_conf_cpu = depth_conf.detach().cpu()
extrinsic_cpu = extrinsic.detach().cpu()
intrinsic_cpu = intrinsic.detach().cpu() if intrinsic is not None else None

# Clean up GPU memory
torch.cuda.empty_cache()
print("\n✓ Moved predictions to CPU and cleared GPU cache")


EXTRACTING PREDICTIONS
✓ Extracted predictions:
  World points: torch.Size([194, 350, 518, 3])
  Confidence: torch.Size([194, 350, 518])
  Depth: torch.Size([194, 350, 518, 1])
  Depth confidence: torch.Size([194, 350, 518])
  Pose encoding: torch.Size([194, 9])

✓ Converted to camera matrices:
  Extrinsic: torch.Size([194, 3, 4])
  Intrinsic: torch.Size([194, 3, 3])

✓ Moved predictions to CPU and cleared GPU cache


## Step 3: Convert to COLMAP Format

Convert InfiniteVGGT predictions to COLMAP format following the VGGT-X pattern:
1. Scale poses and depth
2. Convert world points to the format expected by COLMAP
3. Filter points by confidence
4. Build pycolmap reconstruction
5. Write COLMAP binary files

In [33]:
print("\n" + "="*70)
print("CONVERTING TO COLMAP FORMAT")
print("="*70)
scale_factor=1

conf_threshold = 50.0

# Convert tensors to numpy for COLMAP processing
images_np = images_cpu.numpy()  # (N, C, H, W)
world_points_np = world_points_cpu.numpy()  # (N, H, W, 3)
world_points_conf_np = world_points_conf_cpu.numpy()  # (N, H, W)
depth_np = depth_cpu.numpy()  # (N, H, W)
depth_conf_np = depth_conf_cpu.numpy()  # (N, H, W)
extrinsic_np = extrinsic_cpu.numpy()  # (N, 4, 4) or (N, 3, 4)
intrinsic_np = intrinsic_cpu.numpy() if intrinsic_cpu is not None else None


CONVERTING TO COLMAP FORMAT


In [25]:
from nerfstudio.process_data.vggt_utils import _run_global_alignment

# Optimized camera poses using feature matching
extrinsic, intrinsic, match_outputs = _run_global_alignment(
    images=images,
    image_paths=image_paths,
    extrinsic=extrinsic_np,
    intrinsic=intrinsic_np,
    depth_map=depth_np,
    depth_conf=depth_conf_np,
    lambda_depth=0.0,
    colmap_dir=colmap_dir,
    shared_camera=shared_camera,
    verbose=True,
)

Using cache found in /workspace/models/hub/verlab_accelerated_features_main


Total candidate image pairs found:  1790


Matching image pairs...: 100%|██████████| 14/14 [00:04<00:00,  2.86it/s]


Pose Optimization...: 100%|██████████| 300/300 [00:07<00:00, 40.56it/s]


In [41]:
from vggt.utils.geometry import unproject_depth_map_to_point_map
from nerfstudio.process_data.vggt_utils import _filter_and_prepare_points_for_pycolmap

# # Unproject depth map to point map
# points3d = unproject_depth_map_to_point_map(
#     depth_cpu, 
#     extrinsic, 
#     intrinsic
# )
points3d = world_points_np

# Convert confidence threshold from percentile to value
if conf_threshold > 1.0:
    conf_threshold_value = np.percentile(depth_conf_np, conf_threshold)
else:
    conf_threshold_value = conf_threshold


print(f"Original shapes:")
print(f"  World points: {points3d.shape}")
print(f"  Extrinsic: {extrinsic_np.shape}")
print(f"  Depth: {depth_np.shape}")

# Filter points for pycolmap reconstruction using VGGTX logic
points3d, points_xyf, points_rgb = _filter_and_prepare_points_for_pycolmap(
    points3d=points3d,
    depth_map=depth_np,
    depth_conf=depth_conf_np,
    images=images,
    image_paths=image_paths,
    conf_thres_value=conf_threshold_value,
    use_global_alignment=True,
    max_points_for_colmap=max_points_for_colmap,
    match_outputs=match_outputs
)

print(f"\nConfidence filtering:")
print(f"  Threshold percentile: {conf_threshold}")
print(f"  Threshold value: {conf_threshold_value:.4f}")
print(f"  Depth conf range: [{depth_conf_np.min():.4f}, {depth_conf_np.max():.4f}]")

# Filter points by confidence
conf_mask = depth_conf_np >= conf_threshold_value
num_valid_points = conf_mask.sum()
print(f"  Valid points: {num_valid_points:,} / {conf_mask.size:,} ({100*num_valid_points/conf_mask.size:.1f}%)")

Original shapes:
  World points: (194, 350, 518, 3)
  Extrinsic: (194, 3, 4)
  Depth: (194, 350, 518, 1)

Confidence filtering:
  Threshold percentile: 50.0
  Threshold value: 2.5808
  Depth conf range: [1.0000, 16.1755]
  Valid points: 17,586,100 / 35,172,200 (50.0%)


### Postproc via StreamVGGT method

In [None]:

# # Convert tensors to numpy for COLMAP processing
# images_np = images_cpu.numpy()  # (N, C, H, W)
# world_points_np = world_points_cpu.numpy()  # (N, H, W, 3)
# world_points_conf_np = world_points_conf_cpu.numpy()  # (N, H, W)
# depth_np = depth_cpu.numpy()  # (N, H, W)
# depth_conf_np = depth_conf_cpu.numpy()  # (N, H, W)
# extrinsic_np = extrinsic_cpu.numpy()  # (N, 4, 4) or (N, 3, 4)
# intrinsic_np = intrinsic_cpu.numpy() if intrinsic_cpu is not None else None

# print(f"Original shapes:")
# print(f"  World points: {world_points_np.shape}")
# print(f"  Extrinsic: {extrinsic_np.shape}")
# print(f"  Depth: {depth_np.shape}")

# # Apply scale factor (following VGGT-X pattern)
# print(f"\nApplying scale factor: {scale_factor}")
# if extrinsic_np.shape[-1] == 4 and extrinsic_np.shape[-2] == 4:
#     # (N, 4, 4) format
#     extrinsic_np[:, :3, 3] *= scale_factor
# elif extrinsic_np.shape[-1] == 4 and extrinsic_np.shape[-2] == 3:
#     # (N, 3, 4) format
#     extrinsic_np[:, :, 3] *= scale_factor
# depth_np *= scale_factor
# world_points_np *= scale_factor


# print(f"\nConfidence filtering:")
# print(f"  Threshold percentile: {conf_threshold}")
# print(f"  Threshold value: {conf_threshold_value:.4f}")
# print(f"  Depth conf range: [{depth_conf_np.min():.4f}, {depth_conf_np.max():.4f}]")

# # Filter points by confidence
# conf_mask = depth_conf_np >= conf_threshold_value
# num_valid_points = conf_mask.sum()
# print(f"  Valid points: {num_valid_points:,} / {conf_mask.size:,} ({100*num_valid_points/conf_mask.size:.1f}%)")

In [None]:
# # from vggt.utils.helper import create_pixel_coordinate_grid, randomly_limit_trues
# # Filter points for pycolmap reconstruction using VGGTX logic
# points3d, points_xyf, points_rgb = _filter_and_prepare_points_for_pycolmap(
#     points3d=world_points_np,
#     depth_map=depth_np,
#     depth_conf=depth_conf_np,
#     images=images,
#     image_paths=image_paths,
#     conf_thres_value=conf_threshold_value,
#     use_global_alignment=False,
#     max_points_for_colmap=max_points_for_colmap,
# )

### Move into colmap format (Stream VGGT)

## Step 4: Build pycolmap Reconstruction

Use VGGT's utility function to build a pycolmap Reconstruction from the filtered points.

In [42]:
from nerfstudio.process_data.vggt_utils import _build_pycolmap_reconstruction_without_tracks

print("\n" + "="*70)
print("BUILDING PYCOLMAP RECONSTRUCTION")
print("="*70)

# Image size [W, H]
N, H, W, _= depth_np.shape
image_size = np.array([W, H])

# Camera type
camera_type = "SIMPLE_PINHOLE" if shared_camera else "PINHOLE"

print(f"Building reconstruction:")
print(f"  Camera type: {camera_type}")
print(f"  Image size: {image_size[0]}x{image_size[1]}")
print(f"  Number of cameras: {1 if shared_camera else N}")
print(f"  Number of images: {N}")
print(f"  Number of 3D points: {len(points3d)}")

# Build reconstruction
reconstruction = _build_pycolmap_reconstruction_without_tracks(
    points3d=points3d,
    points_xyf=points_xyf,
    points_rgb=points_rgb,
    extrinsic=extrinsic,
    intrinsic=intrinsic,
    image_paths=image_paths,
    image_size=image_size,
    camera_type=camera_type,
    shared_camera=False,
    verbose=True,
)


if reconstruction is None:
    raise RuntimeError("Failed to build pycolmap reconstruction")

print(f"\n✓ Reconstruction built successfully:")
print(f"  Cameras: {len(reconstruction.cameras)}")
print(f"  Images: {len(reconstruction.images)}")
print(f"  Points3D: {len(reconstruction.points3D)}")


BUILDING PYCOLMAP RECONSTRUCTION
Building reconstruction:
  Camera type: SIMPLE_PINHOLE
  Image size: 518x350
  Number of cameras: 1
  Number of images: 194
  Number of 3D points: 484886



✓ Reconstruction built successfully:
  Cameras: 194
  Images: 194
  Points3D: 484886


### Expand image to original size

In [43]:
from PIL import Image
from nerfstudio.process_data.vggt_utils import _rescale_reconstruction_to_original_dimensions

img = Image.open(image_paths[0])

# # Step 2: Rescale reconstruction to original dimensions
reconstruction_resolution = (img.width, img.height) # Reverse as it expects width and height

original_coords = [[0, 0, W, H, img.width, img.height]] * len(image_paths)
original_coords = np.array(original_coords)

reconstruction = _rescale_reconstruction_to_original_dimensions(
    reconstruction=reconstruction,
    image_paths=image_paths,
    original_image_sizes=original_coords,
    image_size=reconstruction_resolution,
    shift_point2d_to_original_res=True,
    shared_camera=shared_camera,
    verbose=True,
)

# Write COLMAP files
print(f"\nWriting COLMAP files to: {colmap_output_dir}")
reconstruction.write_binary(str(colmap_output_dir))
print(f"✓ Wrote:")
print(f"  - {colmap_output_dir / 'cameras.bin'}")
print(f"  - {colmap_output_dir / 'images.bin'}")
print(f"  - {colmap_output_dir / 'points3D.bin'}")


Writing COLMAP files to: /workspace/bicycle/environment/bicycle/preproc/colmap/sparse/0
✓ Wrote:
  - /workspace/bicycle/environment/bicycle/preproc/colmap/sparse/0/cameras.bin
  - /workspace/bicycle/environment/bicycle/preproc/colmap/sparse/0/images.bin
  - /workspace/bicycle/environment/bicycle/preproc/colmap/sparse/0/points3D.bin


## Step 5: Convert to Nerfstudio Format

Convert the COLMAP reconstruction to nerfstudio's `transforms.json` format.

In [44]:
print("\n" + "="*70)
print("CONVERTING TO NERFSTUDIO FORMAT")
print("="*70)

# Convert COLMAP to transforms.json
print(f"Converting COLMAP reconstruction to transforms.json...")
colmap_utils.colmap_to_json(
    recon_dir=colmap_output_dir,
    output_dir=preproc_dir,
)

transforms_path = preproc_dir / "transforms.json"
print(f"✓ Created: {transforms_path}")

# Load transforms to get applied_transform
with open(transforms_path) as f:
    transforms = json.load(f)

applied_transform = torch.tensor(transforms["applied_transform"])

# Create point cloud PLY file
ply_filename = "sparse_pc.ply"
print(f"\nCreating point cloud PLY file...")
colmap_utils.create_ply_from_colmap(
    filename=ply_filename,
    recon_dir=colmap_output_dir,
    output_dir=preproc_dir,
    applied_transform=applied_transform,
)

ply_path = preproc_dir / ply_filename
print(f"✓ Created: {ply_path}")

# Update transforms.json with PLY path
transforms["ply_file_path"] = ply_filename
with open(transforms_path, 'w') as f:
    json.dump(transforms, f, indent=2)

print("\n" + "="*70)
print("CONVERSION COMPLETE")
print("="*70)
print(f"\nOutput files:")
print(f"  📁 {preproc_dir}/")
print(f"    📄 transforms.json")
print(f"    📄 {ply_filename}")
print(f"    📁 colmap/sparse/0/")
print(f"      📄 cameras.bin")
print(f"      📄 images.bin")
print(f"      📄 points3D.bin")
print("="*70)


CONVERTING TO NERFSTUDIO FORMAT
Converting COLMAP reconstruction to transforms.json...


{194: Camera(id=194, model='SIMPLE_PINHOLE', width=4946, height=3286, params=array([462.96069336, 259.        , 175.        ])), 193: Camera(id=193, model='SIMPLE_PINHOLE', width=4946, height=3286, params=array([464.77331543, 259.        , 175.        ])), 192: Camera(id=192, model='SIMPLE_PINHOLE', width=4946, height=3286, params=array([468.1517334, 259.       , 175.       ])), 191: Camera(id=191, model='SIMPLE_PINHOLE', width=4946, height=3286, params=array([483.56451416, 259.        , 175.        ])), 190: Camera(id=190, model='SIMPLE_PINHOLE', width=4946, height=3286, params=array([489.1272583, 259.       , 175.       ])), 189: Camera(id=189, model='SIMPLE_PINHOLE', width=4946, height=3286, params=array([482.90963745, 259.        , 175.        ])), 188: Camera(id=188, model='SIMPLE_PINHOLE', width=4946, height=3286, params=array([497.43511963, 259.        , 175.        ])), 187: Camera(id=187, model='SIMPLE_PINHOLE', width=4946, height=3286, params=array([519.1083374, 259.       , 

## Step 6: Visualize Sparse Point Cloud

Visualize the reconstructed point cloud using PyVista.

In [45]:
import pyvista as pv

# Optional: Import visualization utilities if available
try:
    from collab_splats.utils.visualization import (
        CAMERA_KWARGS,
        MESH_KWARGS,
        VIZ_KWARGS,
        visualize_splat,
    )
    has_viz_utils = True
except ImportError:
    has_viz_utils = False
    print("collab_splats visualization utilities not available, using basic PyVista")

print(f"\nLoading point cloud from: {ply_path}")
point_cloud = pv.PolyData(str(ply_path))
print(f"  Points: {point_cloud.n_points:,}")

if has_viz_utils:
    # Use collab_splats visualization
    pcd_kwargs = MESH_KWARGS.copy()
    pcd_kwargs.update({
        "point_size": 2,
        "render_points_as_spheres": True,
        "ambient": 0.3,
        "diffuse": 0.8,
        "specular": 0.1,
    })
    
    plotter = visualize_splat(
        mesh=point_cloud,
        mesh_kwargs=pcd_kwargs,
        viz_kwargs=VIZ_KWARGS,
    )
else:
    # Basic PyVista visualization
    plotter = pv.Plotter()
    plotter.add_mesh(
        point_cloud,
        point_size=2,
        render_points_as_spheres=True,
    )
    plotter.add_axes()

plotter.show()


Loading point cloud from: /workspace/bicycle/environment/bicycle/preproc/sparse_pc.ply
  Points: 484,886


Widget(value='<iframe src="http://localhost:44221/index.html?ui=P_0x70844ed46140_4&reconnect=auto" class="pyvi…

## Step 7: Performance Summary

Display comprehensive performance metrics for InfiniteVGGT.

In [46]:
print("\n" + "="*70)
print("INFINITEVGGT PERFORMANCE SUMMARY")
print("="*70)

print(f"\nDataset: {image_dir.name}")
print(f"  Images: {len(image_paths)}")
print(f"  Resolution: {W}x{H}")

print(f"\nInference Performance:")
print(f"  Total time: {inference_time:.2f}s")
print(f"  Per frame: {inference_time / len(image_paths):.3f}s")
print(f"  FPS: {len(image_paths) / inference_time:.2f}")
print(f"  Peak GPU memory: {peak_memory_gb:.2f} GB")

print(f"\nReconstruction Quality:")
print(f"  Cameras: {len(reconstruction.cameras)}")
print(f"  Images registered: {len(reconstruction.images)} / {len(image_paths)}")
print(f"  3D points: {len(reconstruction.points3D):,}")
print(f"  Confidence threshold: {conf_threshold_value:.4f} (p{conf_threshold})")

# Calculate average track length
if len(reconstruction.points3D) > 0:
    track_lengths = [len(pt.track.elements) for pt in reconstruction.points3D.values()]
    avg_track_length = np.mean(track_lengths)
    print(f"  Avg track length: {avg_track_length:.2f}")

print(f"\nOutput Directory: {preproc_dir}")
print("="*70)


INFINITEVGGT PERFORMANCE SUMMARY

Dataset: images
  Images: 194
  Resolution: 518x350

Inference Performance:
  Total time: 67.23s
  Per frame: 0.347s
  FPS: 2.89
  Peak GPU memory: 15.25 GB

Reconstruction Quality:
  Cameras: 194
  Images registered: 194 / 194
  3D points: 484,886
  Confidence threshold: 2.5808 (p50.0)
  Avg track length: 1.00

Output Directory: /workspace/bicycle/environment/bicycle/preproc


In [47]:
# Option 1: Use Splatter wrapper for training
from collab_splats.wrapper import Splatter, SplatterConfig

# Configuration
config_dir = Path("/workspace/collab-splats/docs/splats/configs/")
dataset_name = "bicycle"
# dataset_name = "birds_date-02062024_video-C0043"

# Create splatter from config
splatter = Splatter.from_config_file(
    dataset=dataset_name,
    config_dir=config_dir,
    # overrides={
    #     "frame_proportion": 0.1,
    # }
)

splatter.preprocess()

# splatter.preprocess(
#     sfm_tool='vggt',
#     overwrite=False, 
#     kwargs={
#         "refine-vggt": "",
#         "camera-type": "pinhole",
#         "verbose": "",
#         "num_downscales": 0,
#         "vggt_conf_threshold": 35.0,
#         # "skip_image_processing": "",
#     }  # Enable bundle adjustment
# )

transforms.json already exists at /workspace/bicycle/environment/bicycle/preproc/transforms.json
To rerun preprocessing, set overwrite=True


In [None]:
feature_kwargs = {
    # "pipeline.model.strategy": "mcmc",
    "pipeline.model.output-depth-during-training": True,
    "pipeline.model.rasterize-mode": "antialiased",
    "pipeline.model.use-scale-regularization": True,
    "pipeline.model.random-scale": 1.0,
    "pipeline.model.num-downscales": 1,
    # "pipeline.datamanager.dataparser.downscale-factor": 1,
    # "pipeline.model.collider-params": "near_plane 0.1 far_plane 3.0",
}

splatter.extract_features(
    kwargs=feature_kwargs, 
    overwrite=True
)
print("\n✓ Training complete!")

[Taichi] version 1.7.4, llvm 15.0.4, commit b4b956fd, linux, python 3.10.18
[2;36m[21:35:04][0m[2;36m [0mUsing --data alias for --data.pipeline.datamanager.data                                          ]8;id=285482;file:///workspace/nerfstudio/nerfstudio/scripts/train.py\[2mtrain.py[0m]8;;\[2m:[0m]8;id=938014;file:///workspace/nerfstudio/nerfstudio/scripts/train.py#241\[2m241[0m]8;;\
[92m──────────────────────────────────────────────────────── [0mConfig[92m ────────────────────────────────────────────────────────[0m
[1;35m_TrainerConfig[0m[1m([0m
    [33m_target[0m=[1m<[0m[1;95mclass[0m[39m [0m[32m'nerfstudio.engine.trainer.Trainer'[0m[39m>,[0m
[39m    [0m[33moutput_dir[0m[39m=[0m[1;35mPosixPath[0m[1;39m([0m[32m'/workspace/bicycle/environment/bicycle'[0m[1;39m)[0m[39m,[0m
[39m    [0m[33mmethod_name[0m[39m=[0m[32m'rade-features'[0m[39m,[0m
[39m    [0m[33mexperiment_name[0m[39m=[0m[32m''[0m[39m,[0m
[39m    [0m[3

  torch.tensor(get_world2view_transform(R, T, trans, scale))
  torch.tensor(get_world2view_transform(R, T, trans, scale))
  return F.conv2d(input, weight, bias, self.stride,


[2;36m[21:36:52][0m[2;36m [0mCaching [35m/[0m undistorting train images                                            ]8;id=231148;file:///workspace/nerfstudio/nerfstudio/data/datamanagers/full_images_datamanager.py\[2mfull_images_datamanager.py[0m]8;;\[2m:[0m]8;id=471029;file:///workspace/nerfstudio/nerfstudio/data/datamanagers/full_images_datamanager.py#239\[2m239[0m]8;;\
[2KCaching / undistorting train images [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [35m100%[0m [33m0:00:29[0m00:01[0m00:02[0m
[1A[2K[2;36m[21:37:23][0m[2;36m [0mPrinting max of [1;36m10[0m lines. Set flag [33m--logging.local-writer.max-log-[0m[33msize[0m[33m=[0m[1;33m0[0m to disable line        ]8;id=732052;file:///workspace/nerfstudio/nerfstudio/utils/writer.py\[2mwriter.py[0m]8;;\[2m:[0m]8;id=443143;file:///workspace/nerfstudio/nerfstudio/utils/writer.py#449\[2m449[0m]8;;\
[2;36m           [0mwrapping.                                                             

Process ForkProcess-30:
Process ForkProcess-28:
Process ForkProcess-26:
Process ForkProcess-29:
Process ForkProcess-27:
Process ForkProcess-23:
Process ForkProcess-22:
Process ForkProcess-21:
Process ForkProcess-19:
Process ForkProcess-18:
Process ForkProcess-31:
Process ForkProcess-32:
Process ForkProcess-16:
Process ForkProcess-17:
Process ForkProcess-15:
Process ForkProcess-10:
Process ForkProcess-11:
Process ForkProcess-13:
Process ForkProcess-7:
Process ForkProcess-9:
Process ForkProcess-8:
Process ForkProcess-25:
Process ForkProcess-20:
Process ForkProcess-6:
Process ForkProcess-24:
Process ForkProcess-3:
Process ForkProcess-5:
Process ForkProcess-14:
Process ForkProcess-1:
Process ForkProcess-2:
Process ForkProcess-12:
Process ForkProcess-4:
Traceback (most recent call last):
  File "/opt/conda/envs/nerfstudio/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/opt/conda/envs/nerfstudio/lib/python3.10/multiprocessing/process.py", line 108,

Traceback [1m([0mmost recent call last[1m)[0m:
  File [32m"/workspace/nerfstudio/nerfstudio/scripts/train.py"[0m, line [1;36m190[0m, in launch
    [1;35mmain_func[0m[1m([0m[33mlocal_rank[0m=[1;36m0[0m, [33mworld_size[0m=[35mworld_size[0m, [33mconfig[0m=[35mconfig[0m[1m)[0m
  File [32m"/workspace/nerfstudio/nerfstudio/scripts/train.py"[0m, line [1;36m101[0m, in train_loop
    [1;35mtrainer.train[0m[1m([0m[1m)[0m
  File [32m"/workspace/nerfstudio/nerfstudio/engine/trainer.py"[0m, line [1;36m270[0m, in train
    [1;35mcallback.run_callback_at_location[0m[1m([0m
  File [32m"/workspace/nerfstudio/nerfstudio/engine/callbacks.py"[0m, line [1;36m116[0m, in run_callback_at_location
    [1;35mself.run_callback[0m[1m([0m[33mstep[0m=[35mstep[0m[1m)[0m
  File [32m"/workspace/nerfstudio/nerfstudio/engine/callbacks.py"[0m, line [1;36m106[0m, in run_callback
    [1;35mself.func[0m[1m([0m*self.args, **self.kwargs, [33mstep[0m=[35mstep

KeyboardInterrupt: 

: 

## Next Steps: Training with Nerfstudio

Now that preprocessing is complete, you can train a model using the generated data:

### Option 1: Using Splatter Wrapper

```python
from collab_splats.wrapper import Splatter

# Create config pointing to InfiniteVGGT preprocessing output
splatter = Splatter(
    dataset="bicycle",
    method="rade-features",
    file_path=image_dir,
    input_type="images",
    output_path=output_base,
)

# Training will use the existing preproc directory
splatter.extract_features(
    kwargs={
        "pipeline.model.output-depth-during-training": True,
        "pipeline.model.rasterize-mode": "antialiased",
    },
    overwrite=True
)
```

### Option 2: Direct ns-train

```bash
ns-train splatfacto \
  --data /workspace/bicycle/environment/bicycle_infinitevggt/preproc \
  --output-dir /workspace/bicycle/environment/bicycle_infinitevggt/outputs
```

## Key Differences: InfiniteVGGT vs VGGT-X

| Feature | InfiniteVGGT | VGGT-X |
|---------|--------------|--------|
| **Architecture** | StreamVGGT with frame caching | Memory-optimized VGGT |
| **Memory** | Uses frame budget system | Chunked processing |
| **Best for** | Long sequences, streaming | General scenes |
| **Output** | World points directly | Depth + camera poses |
| **Speed** | Comparable to VGGT-X | 30-40% faster than original VGGT |

## Troubleshooting

### Out of Memory
- Reduce `total_budget` (e.g., 800000)
- Set `cache_results=False` and use `frame_cache_dir` for disk caching
- Process fewer images at once

### Poor Reconstruction Quality
- Adjust `conf_threshold` (try 40.0 - 60.0)
- Modify `scale_factor` (try 1.0 - 5.0)
- Increase `max_points_for_colmap`

### Missing Checkpoint
```bash
cd /workspace/InfiniteVGGT
mkdir -p checkpoints
cd checkpoints
# Download from HuggingFace: https://huggingface.co/lch01/StreamVGGT
```