David Recasens1, Robert Maier, Aljaz Bozic, Stephane Grabli, Javier Civera1, Tony Tung, Edmond Boyer
Instead of optimizing a free 3D Gaussian, PAGaS makes each pixel's Gaussian depth-dependent: its 3D position, size, and orientation are derived from depth, while color and opacity are fixed, leaving the depth parameter as the sole degree of freedom.
Clone the repository, create a conda environment on Linux, and install the dependencies:
git clone https://.../pagas.git
cd pagas
conda env create --file environment.yml
conda activate pagas
# Install PyTorch that matches your CUDA (see with `nvcc -V`) from https://pytorch.org/get-started/ for example for CUDA 12.4:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
pip install torchmetrics[image]
pip install --no-build-isolation --no-deps "fused_ssim @ git+https://github.com/rahul-goel/fused-ssim@328dc9836f513d00c4b5bc38fe30478b4435cbb5" # Linux
pip install --no-build-isolation --no-deps "fused_ssim @ git+https://github.com/rahul-goel/fused-ssim" # WindowsTo use the opacity-aware gsplat 3DGS rasterizer:
cd thirdparty
git clone --recursive https://github.com/nerfstudio-project/gsplat.git
cd gsplat
git checkout bd64a47 # Originally built on top of this commit
git apply --reject --ignore-whitespace ../gsplat.patch
python -m pip install --no-build-isolation --no-deps .
# Go back to main repo and install in editable mode
cd ../../
python -m pip install -e .With this patch, the gsplat 3DGS rasterizer supports these additional inputs:
- radius_thres: the pixel is only influenced by Gaussians whose projected centers fall within this maximum radius (in pixel units) around the pixel center. The distance from the pixel center to a corner is 0.71, which we recommend as a minimum. When context views are far from the target view, there can appear areas with low density of Gaussians, producing that some Gaussians that should be occluded may remain visible. In that case, use a larger threshold, for example 1.42 or higher, to block them.
- depth_thres: this depth is added to the depth of the first Gaussian that falls inside the pixel influence area defined by
radius_thres. Only Gaussians that lie within both theradius_thresdisk and this depth range contribute to the pixel rendering. Gaussians inside the radius but further than this depth range are ignored. The pixel alpha is computed using only Gaussians in this valid range.depth_thresis given in the scene scale units. - opacities: can be view dependent (new, shape
#views x #gauss) or view independent (classic, shape#gauss).
To visualize the PAGaS optimization with the rerun viewer:
conda install -c conda-forge rerun-sdk
This section explains how to run the full or partial 3D reconstruction pipeline with PAGaS on your own scan:
images β camera intrinsics and extrinsics β optional mask β initial baseline mesh β mesh to depth β refined depth by PAGaS β refined depth to mesh
Each scan must follow this directory structure:
scan/
|-- images/ # [.png/.jpg/.jpeg] (e.g. 0000.png, 0001.png, ...)
|-- depth_init/ # [.npz] initial depth to refine
|-- masks/ # [.png/.jpg/.jpeg] optional binary masks
|-- sparse/ # camera intrinsics and extrinsics in COLMAP format
|-- 0/
|-- cameras.txt/.bin
|-- images.txt/.bin
|-- points3D.txt/.bin # not needed for PAGaS, only for 2DGS and PGSR
The easiest to run the full pipeline is using the automatic reconstruction script, run on the scan folder that contains the images folder.
If you do not want to use MVSAnywhere, 2DGS, or PGSR to estimate the initial mesh, you can provide your own initial mesh or depth maps. Save them in the scan folder as mesh_init.ply or inside the depth_init folder. The automatic pipeline detects these inputs and continues from there.
You can also save inconsistent depths in the depth folder and they will be fused into mesh_init.ply.
If you already have camera calibration in the sparse folder or masks in the masks folder, the script reuses them.
If the script is interrupted while refining depths, it resumes from the last processed view.
scripts/run_automatic.sh --data_folder /absolut/path/to/scan| Optional argument | Description |
|---|---|
--path_mvs_method |
Path to your installed initial MVS method (PGSR, 2DGS, or MVSAnywhere). The selected method is used to estimate the initial mesh mesh_init.ply. |
--get_masks |
Disabled by default. If set, estimates foreground masks in object centric scenes to ignore background pixels during PAGaS refinement. Skipped when a masks folder already exists. |
--mesh_res |
Default value is 4000. Mesh resolution used to define the voxel size for TSDF fusion when fusing baseline depth maps into mesh_init.ply and when fusing refined PAGaS depths. If TSDF fusion runs out of RAM, reduce this value. The automatic pipeline detects existing depth maps and continues from them. You can override voxel size and depth truncation with --voxel_size and --depth_trunc. |
--no_undist |
By default, if initial depth, mesh_init.ply, depth_init, or sparse are missing, COLMAP undistorts the images and masks into an undist folder and the pipeline runs there. Phone lenses often introduce strong distortion, so undistortion is recommended. When only images and/or masks are given, the --no_undist flag tells COLMAP to estimate a pinhole camera model directly. |
--no_shared_intrinsics |
By default, if camera calibration is not given in sparse, COLMAP estimates a shared intrinsic camera for all images. --no_shared_intrinsics tells COLMAP to estimate individual intrinsics. This is strongly discouraged when all images were recorded with the same device, because COLMAP self calibration is less stable without this constraint. |
--exposure |
Optimizes a per-image exposure compensation model during refinement. Useful when images have significant exposure differences. |
--consistent |
Optionally runs stereo depth consistency filtering on the refined depths. |
--save_extra |
Saves refined normals in .npz format and colorized .png depth and normal maps. This adds some overhead. |
Advanced use for full control of each step in the pipeline
The minimum input is a folder images inside your scene folder.
If your input is a video video.mp4, extract all frames as images:
cd /path/to/scan
mkdir -p images
ffmpeg -i video.mp4 -vsync 0 -start_number 0 images/%06d.png # use -vf fps=8 to sample at 8 fpsIf you only want to reconstruct a foreground object, extract binary masks. For example, using rembg:
rembg p -m birefnet-massive -om images masks # extract matting masks
mogrify -colorspace Gray -threshold 50% masks/* # threshold matting to get binary masksNext, obtain camera intrinsics and extrinsics in COLMAP format. You can estimate them with the provided COLMAP script:
cd /path/to/pagas
scripts/run_colmap.sh /path/to/scanOptional arguments for run_colmap.sh
| Optional argument | Description |
|---|---|
--shared_intrinsics |
Forces COLMAP to use a single shared camera for all images when intrinsics are unknown. This helps stabilize self calibration in datasets recorded with the same device. |
--intrinsics |
Relative path inside the scene to a cameras.txt file, or a folder that contains it. If provided, the script fixes intrinsics to that model instead of using self calibration. |
--undistort_images |
Indicates that input images are distorted. SfM then uses the OPENCV model. The script also exports an undistorted copy of the sparse model (as PINHOLE) into <scene>/undist/. |
--mvs |
Runs COLMAP multi-view stereo (PatchMatch, StereoFusion, meshing) with high quality, CPU based, full resolution settings. Always undistorts to <scene>/mvs/. |
Then estimate the initial mesh with any method. We show three options: one learning-based multi-view stereo method (MVSAnywhere, uses local views, faster) and two Gaussian Splatting methods (2DGS and PGSR, use all views, global consistent). We recommend PGSR.
- MVSAnywhere. Install MVSAnywhere in a conda environment and estimate depth at resolution 480x640. PAGaS automatically upsamples the depth to the input image resolution:
conda activate mvsanywhere
# Estimate depth
./experiments/MVSAnywhere_depth_predictor.sh /path/to/mvsanywhere /path/to/data_folder "scan"
# Fuse depth with our script (or use the original MVSAnywhere mesh)
./experiments/fuse_depth.sh --data_folder="/path/to/scan" --depth_name="depth_mvsanywhere" --mesh_name="mesh_init.ply"- 2DGS. Install 2DGS:
conda activate 2dgs
cd /path/to/2dgs
# Train and render
python train.py -s /path/to/scan -m /path/to/scan/2dgs_results -r 2 --depth_ratio 1
python render.py -s /path/to/scan -m /path/to/scan/2dgs_results -r 2 --depth_ratio 1 --skip_test --skip_train
# Move mesh to main folder
mv /path/to/scan/2dgs_results/train/ours_30000/fuse_post.ply /path/to/scan/mesh_init.ply- PGSR. Install PGSR:
conda activate pgsr
cd /path/to/PGSR
# Train and render
python train.py -s /path/to/scan -m /path/to/scan/pgsr_results --max_abs_split_points 0 --opacity_cull_threshold 0.05
python render.py -m /path/to/scan/pgsr_results --max_depth 10.0 --voxel_size 0.01 -r 2
# Move mesh to main folder
mv /path/to/scan/pgsr_results/mesh/tsdf_fusion_post.ply /path/to/scan/mesh_init.plyFourth, extract depth_init from mesh_init.ply:
conda activate pagas
cd /path/to/pagas
scripts/mesh_to_depth.sh /path/to/scanFifth, refine depth with PAGaS:
scripts/run_pagas.sh --data_dir="/path/to/scan"Optional arguments for run_pagas.sh
| Optional argument | Description |
|---|---|
--masks_name |
Name of the folder that contains the .png binary masks that define valid pixels to refine. |
--depth_folder |
Name of the folder that contains the depth to refine. Default is depth_init. It contains individual depth maps in .npz format. The input depth can have lower resolution than the images; it is automatically upsampled to match the image resolution. The refined depth has the same resolution as the images, except for a small crop if needed to ensure clean divisions by 2. |
--scale_factors |
Space separated list of scales to optimize. Use powers of two, for example "16 8 4 2 1", "1", or "4 2". Optimization starts from the largest scale (lowest resolution) and progresses to the finest scale (highest resolution). "2 1" works well in our tests. Use more scales when depth_init is very coarse. |
--max_steps |
Maximum number of optimization steps per scale. We observe convergence with values around 200 or 100 at scale 2 and 100 or 50 at scale 1. If you provide a single number, it is used for all scales. |
--lr |
Learning rate per scale. For coarser scales, use higher values when depth_init is far from the true depth, for example 1e-4 1e-5 for scales 2 and 1. If you provide a single number, it is used for all scales. |
--radius_thres |
Radius thresholds (in pixels) at the start and end of each scale. Provide them as a list of pairs, for example 1.7 1.42 2. 1.42 for scales 2 and 1, meaning [(1.7, 1.42), (2., 1.42)]. Within each scale, the radius changes linearly from the first to the second value. Values between 1.42 and 2. work well. Using 1.42 1.42 or 2. 2. is often enough. If you provide a single pair, it is applied to all scales. |
--depth_slices |
Number of slices that the depth range (automatically determined using the camera poses) is divided into to compute the depth threshold at the start and end of each scale. It varies linearly with the steps inside each scale, similar to --radius_thres. Values like 100 100 work well for most scenes. Higher values such as 1000 1000 are useful for large scenes such as Tanks and Temples. Larger scenes and thinner structures benefit from higher values. Alternatively, --depth_thres can be used to set the depth threshold directly in the depth units of depth_init. If you provide a single pair, it is applied to all scales. |
--normal_reg |
Strength of the normals from depth smoothness regularization per scale. Default is 0.0, which works well in most cases. You can experiment with values in [0., 0.01]. |
--num_context_views |
Number of context views to use. Default is 10. -1 uses all views listed in views.cfg. |
--viewer |
Enables visualization of the optimization for the first view. |
--starting_view |
Index of the first view to refine. Useful to resume refinement by skipping already processed frames. |
--save_extra |
Saves refined normals from depth in .npz format and colorized .png depth and normal maps. This adds some overhead. |
--exposure |
Optimizes a per image exposure compensation model during refinement. Useful when images have significant exposure differences. |
Optionally, run stereo depth consistency filtering. Filtered depth maps are saved in the depth_consistent folder:
python scripts/consistent_depth.py --data_folder="/path/to/scan"Finally, fuse refined depth into a mesh:
scripts/fuse_depth.sh --data_folder=/path/to/scan/results/depth_init_pagas --depth_name=depth/depth_consistentOptional arguments for fuse_depth.sh
| Optional argument | Description |
|---|---|
--mesh_res |
Mesh resolution used to compute voxel size when --voxel_size is -1. |
--voxel_size |
Voxel size in pose units. Use -1 to compute it automatically. |
--depth_name |
Name of the depth folder. Default is "depth". |
--mask_name |
Optional mask folder. Leave empty "" if masks are already applied (pixel's depth is 0). |
--relative_path_to_colmap |
Relative path to the COLMAP sparse/0 folder. Default "" assumes sparse folder is in the same main folder as the images. |
--depth_trunc |
Maximum depth in pose units. Use -1 if unknown. |
--sdf_trunc |
TSDF truncation distance, typically a few times the voxel size. |
--min_mesh_size |
Minimum triangle count for keeping a mesh component. |
--num_cluster |
Number of mesh clusters to retain. |
--erode_borders |
Erodes depth borders before fusion. |
Download the DTU dataset prepared by 2DGS. Since their mask folders are named mask instead of masks, and they are not cropped consistently with the images and intrinsics, run:
python scripts/crop_dtu_2dgs_masks.py --data_folder /path/to/DTUTo avoid confusion, we rename the uncropped masks from mask to masks_uncropped. The 2DGS authors provide COLMAP camera calibration on top of the DTU images and cropped them slightly. We use these cropped images. We evaluate as in 2DGS, culling the meshes during evaluation using the uncropped masks and cameras.npz.
For Tanks and Temples (TnT), follow the 2DGS instructions to obtain the dataset. Download the preprocessed TNT data by GOF and the ground truth point clouds, alignments, and crop files from the original TnT website. Combine these resources as described in 2DGS.
First, run 2DGS, PGSR, and MVSAnywhere and save their meshes as mesh_2dgs.ply, mesh_pgsr.ply, and mesh_mvsa.ply in each scene folder. We fuse the baseline depths into the initial meshes using our fusion script scripts/fuse_depth.sh. For TnT evaluation, you need a separated conda environment with open3d==0.10.0. For example, reuse the 2DGS environment and install it there:
pip install open3d==0.10.0Then run the PAGaS refinement and evaluation, giving the conda environment name with --conda_env_tnt:
./eval.sh --path_DTU=/path/to/DTU --path_DTU_gt=/path/to/MVSData --path_TNT=/path/to/TNT --conda_env_tnt=2dgsOur Occlusion-Aware 3DGS Rasterizer is built on top of Gsplat. Evaluation scripts for DTU and TnT dataset are based on DTUeval-python and TanksAndTemples respectively. We thank all the authors for their great work.
soon...