PAGaS: Pixel-Aligned 1DoF Gaussian Splatting for Depth Refinement

David Recasens¹, Robert Maier, Aljaz Bozic, Stephane Grabli, Javier Civera¹, Tony Tung, Edmond Boyer

Instead of optimizing a free 3D Gaussian, PAGaS makes each pixel's Gaussian depth-dependent: its 3D position, size, and orientation are derived from depth, while color and opacity are fixed, leaving the depth parameter as the sole degree of freedom.

⚙️ Setup

Clone the repository, create a conda environment on Linux, and install the dependencies:

git clone https://.../pagas.git
cd pagas
conda env create --file environment.yml
conda activate pagas

# Install PyTorch that matches your CUDA (see with `nvcc -V`) from https://pytorch.org/get-started/ for example for CUDA 12.4:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

pip install torchmetrics[image]
pip install --no-build-isolation --no-deps "fused_ssim @ git+https://github.com/rahul-goel/fused-ssim@328dc9836f513d00c4b5bc38fe30478b4435cbb5"  # Linux
pip install --no-build-isolation --no-deps "fused_ssim @ git+https://github.com/rahul-goel/fused-ssim"  # Windows

To use the opacity-aware gsplat 3DGS rasterizer:

cd thirdparty
git clone --recursive https://github.com/nerfstudio-project/gsplat.git
cd gsplat
git checkout bd64a47  # Originally built on top of this commit
git apply --reject --ignore-whitespace ../gsplat.patch
python -m pip install --no-build-isolation --no-deps .

# Go back to main repo and install in editable mode
cd ../../
python -m pip install -e .

With this patch, the gsplat 3DGS rasterizer supports these additional inputs:

radius_thres: the pixel is only influenced by Gaussians whose projected centers fall within this maximum radius (in pixel units) around the pixel center. The distance from the pixel center to a corner is 0.71, which we recommend as a minimum. When context views are far from the target view, there can appear areas with low density of Gaussians, producing that some Gaussians that should be occluded may remain visible. In that case, use a larger threshold, for example 1.42 or higher, to block them.
depth_thres: this depth is added to the depth of the first Gaussian that falls inside the pixel influence area defined by radius_thres. Only Gaussians that lie within both the radius_thres disk and this depth range contribute to the pixel rendering. Gaussians inside the radius but further than this depth range are ignored. The pixel alpha is computed using only Gaussians in this valid range. depth_thres is given in the scene scale units.
opacities: can be view dependent (new, shape #views x #gauss) or view independent (classic, shape #gauss).

To visualize the PAGaS optimization with the rerun viewer:

conda install -c conda-forge rerun-sdk

📷 Run on custom data

This section explains how to run the full or partial 3D reconstruction pipeline with PAGaS on your own scan:

images → camera intrinsics and extrinsics → optional mask → initial baseline mesh → mesh to depth → refined depth by PAGaS → refined depth to mesh

Each scan must follow this directory structure:

scan/
|-- images/      # [.png/.jpg/.jpeg]  (e.g. 0000.png, 0001.png, ...)
|-- depth_init/  # [.npz] initial depth to refine
|-- masks/       # [.png/.jpg/.jpeg]  optional binary masks       
|-- sparse/      # camera intrinsics and extrinsics in COLMAP format
    |-- 0/
        |-- cameras.txt/.bin
        |-- images.txt/.bin 
        |-- points3D.txt/.bin  # not needed for PAGaS, only for 2DGS and PGSR

The easiest to run the full pipeline is using the automatic reconstruction script, run on the scan folder that contains the images folder. If you do not want to use MVSAnywhere, 2DGS, or PGSR to estimate the initial mesh, you can provide your own initial mesh or depth maps. Save them in the scan folder as mesh_init.ply or inside the depth_init folder. The automatic pipeline detects these inputs and continues from there. You can also save inconsistent depths in the depth folder and they will be fused into mesh_init.ply. If you already have camera calibration in the sparse folder or masks in the masks folder, the script reuses them. If the script is interrupted while refining depths, it resumes from the last processed view.

scripts/run_automatic.sh --data_folder /absolut/path/to/scan

Optional argument	Description
`--path_mvs_method`	Path to your installed initial MVS method (PGSR, 2DGS, or MVSAnywhere). The selected method is used to estimate the initial mesh `mesh_init.ply`.
`--get_masks`	Disabled by default. If set, estimates foreground masks in object centric scenes to ignore background pixels during PAGaS refinement. Skipped when a `masks` folder already exists.
`--mesh_res`	Default value is `4000`. Mesh resolution used to define the voxel size for TSDF fusion when fusing baseline depth maps into `mesh_init.ply` and when fusing refined PAGaS depths. If TSDF fusion runs out of RAM, reduce this value. The automatic pipeline detects existing depth maps and continues from them. You can override voxel size and depth truncation with `--voxel_size` and `--depth_trunc`.
`--no_undist`	By default, if initial `depth`, `mesh_init.ply`, `depth_init`, or `sparse` are missing, COLMAP undistorts the images and masks into an `undist` folder and the pipeline runs there. Phone lenses often introduce strong distortion, so undistortion is recommended. When only images and/or masks are given, the `--no_undist` flag tells COLMAP to estimate a pinhole camera model directly.
`--no_shared_intrinsics`	By default, if camera calibration is not given in `sparse`, COLMAP estimates a shared intrinsic camera for all images. `--no_shared_intrinsics` tells COLMAP to estimate individual intrinsics. This is strongly discouraged when all images were recorded with the same device, because COLMAP self calibration is less stable without this constraint.
`--exposure`	Optimizes a per-image exposure compensation model during refinement. Useful when images have significant exposure differences.
`--consistent`	Optionally runs stereo depth consistency filtering on the refined depths.
`--save_extra`	Saves refined normals in `.npz` format and colorized `.png` depth and normal maps. This adds some overhead.

Advanced use for full control of each step in the pipeline

The minimum input is a folder images inside your scene folder.

If your input is a video video.mp4, extract all frames as images:

cd /path/to/scan
mkdir -p images
ffmpeg -i video.mp4 -vsync 0 -start_number 0 images/%06d.png  # use -vf fps=8 to sample at 8 fps

If you only want to reconstruct a foreground object, extract binary masks. For example, using rembg:

rembg p -m birefnet-massive -om images masks    # extract matting masks
mogrify -colorspace Gray -threshold 50% masks/* # threshold matting to get binary masks

Next, obtain camera intrinsics and extrinsics in COLMAP format. You can estimate them with the provided COLMAP script:

cd /path/to/pagas
scripts/run_colmap.sh /path/to/scan

Optional arguments for run_colmap.sh

Optional argument	Description
`--shared_intrinsics`	Forces COLMAP to use a single shared camera for all images when intrinsics are unknown. This helps stabilize self calibration in datasets recorded with the same device.
`--intrinsics`	Relative path inside the scene to a `cameras.txt` file, or a folder that contains it. If provided, the script fixes intrinsics to that model instead of using self calibration.
`--undistort_images`	Indicates that input images are distorted. SfM then uses the OPENCV model. The script also exports an undistorted copy of the sparse model (as PINHOLE) into `<scene>/undist/`.
`--mvs`	Runs COLMAP multi-view stereo (PatchMatch, StereoFusion, meshing) with high quality, CPU based, full resolution settings. Always undistorts to `<scene>/mvs/`.

Then estimate the initial mesh with any method. We show three options: one learning-based multi-view stereo method (MVSAnywhere, uses local views, faster) and two Gaussian Splatting methods (2DGS and PGSR, use all views, global consistent). We recommend PGSR.

MVSAnywhere. Install MVSAnywhere in a conda environment and estimate depth at resolution 480x640. PAGaS automatically upsamples the depth to the input image resolution:

conda activate mvsanywhere  

# Estimate depth
./experiments/MVSAnywhere_depth_predictor.sh /path/to/mvsanywhere /path/to/data_folder "scan"

# Fuse depth with our script (or use the original MVSAnywhere mesh)
./experiments/fuse_depth.sh --data_folder="/path/to/scan" --depth_name="depth_mvsanywhere" --mesh_name="mesh_init.ply"

2DGS. Install 2DGS:

conda activate 2dgs
cd /path/to/2dgs

# Train and render
python train.py -s /path/to/scan -m /path/to/scan/2dgs_results -r 2 --depth_ratio 1
python render.py -s /path/to/scan -m /path/to/scan/2dgs_results -r 2 --depth_ratio 1 --skip_test --skip_train

# Move mesh to main folder
mv /path/to/scan/2dgs_results/train/ours_30000/fuse_post.ply /path/to/scan/mesh_init.ply

PGSR. Install PGSR:

conda activate pgsr
cd /path/to/PGSR

# Train and render
python train.py -s /path/to/scan -m /path/to/scan/pgsr_results --max_abs_split_points 0 --opacity_cull_threshold 0.05
python render.py -m /path/to/scan/pgsr_results --max_depth 10.0 --voxel_size 0.01 -r 2 

# Move mesh to main folder
mv /path/to/scan/pgsr_results/mesh/tsdf_fusion_post.ply /path/to/scan/mesh_init.ply

Fourth, extract depth_init from mesh_init.ply:

conda activate pagas
cd /path/to/pagas
scripts/mesh_to_depth.sh /path/to/scan

Fifth, refine depth with PAGaS:

scripts/run_pagas.sh --data_dir="/path/to/scan"

Optional arguments for run_pagas.sh

Optional argument	Description
`--masks_name`	Name of the folder that contains the `.png` binary masks that define valid pixels to refine.
`--depth_folder`	Name of the folder that contains the depth to refine. Default is `depth_init`. It contains individual depth maps in `.npz` format. The input depth can have lower resolution than the images; it is automatically upsampled to match the image resolution. The refined depth has the same resolution as the images, except for a small crop if needed to ensure clean divisions by 2.
`--scale_factors`	Space separated list of scales to optimize. Use powers of two, for example `"16 8 4 2 1"`, `"1"`, or `"4 2"`. Optimization starts from the largest scale (lowest resolution) and progresses to the finest scale (highest resolution). `"2 1"` works well in our tests. Use more scales when `depth_init` is very coarse.
`--max_steps`	Maximum number of optimization steps per scale. We observe convergence with values around `200` or `100` at scale 2 and `100` or `50` at scale 1. If you provide a single number, it is used for all scales.
`--lr`	Learning rate per scale. For coarser scales, use higher values when `depth_init` is far from the true depth, for example `1e-4 1e-5` for scales 2 and 1. If you provide a single number, it is used for all scales.
`--radius_thres`	Radius thresholds (in pixels) at the start and end of each scale. Provide them as a list of pairs, for example `1.7 1.42 2. 1.42` for scales 2 and 1, meaning [(1.7, 1.42), (2., 1.42)]. Within each scale, the radius changes linearly from the first to the second value. Values between `1.42` and `2.` work well. Using `1.42 1.42` or `2. 2.` is often enough. If you provide a single pair, it is applied to all scales.
`--depth_slices`	Number of slices that the depth range (automatically determined using the camera poses) is divided into to compute the depth threshold at the start and end of each scale. It varies linearly with the steps inside each scale, similar to `--radius_thres`. Values like `100 100` work well for most scenes. Higher values such as `1000 1000` are useful for large scenes such as Tanks and Temples. Larger scenes and thinner structures benefit from higher values. Alternatively, `--depth_thres` can be used to set the depth threshold directly in the depth units of `depth_init`. If you provide a single pair, it is applied to all scales.
`--normal_reg`	Strength of the normals from depth smoothness regularization per scale. Default is `0.0`, which works well in most cases. You can experiment with values in `[0., 0.01]`.
`--num_context_views`	Number of context views to use. Default is `10`. `-1` uses all views listed in `views.cfg`.
`--viewer`	Enables visualization of the optimization for the first view.
`--starting_view`	Index of the first view to refine. Useful to resume refinement by skipping already processed frames.
`--save_extra`	Saves refined normals from depth in `.npz` format and colorized `.png` depth and normal maps. This adds some overhead.
`--exposure`	Optimizes a per image exposure compensation model during refinement. Useful when images have significant exposure differences.

Optionally, run stereo depth consistency filtering. Filtered depth maps are saved in the depth_consistent folder:

python scripts/consistent_depth.py --data_folder="/path/to/scan"

Finally, fuse refined depth into a mesh:

scripts/fuse_depth.sh --data_folder=/path/to/scan/results/depth_init_pagas --depth_name=depth/depth_consistent

Optional arguments for fuse_depth.sh

Optional argument	Description
`--mesh_res`	Mesh resolution used to compute voxel size when `--voxel_size` is `-1`.
`--voxel_size`	Voxel size in pose units. Use `-1` to compute it automatically.
`--depth_name`	Name of the depth folder. Default is `"depth"`.
`--mask_name`	Optional mask folder. Leave empty `""` if masks are already applied (pixel's depth is 0).
`--relative_path_to_colmap`	Relative path to the COLMAP `sparse/0` folder. Default `""` assumes sparse folder is in the same main folder as the images.
`--depth_trunc`	Maximum depth in pose units. Use `-1` if unknown.
`--sdf_trunc`	TSDF truncation distance, typically a few times the voxel size.
`--min_mesh_size`	Minimum triangle count for keeping a mesh component.
`--num_cluster`	Number of mesh clusters to retain.
`--erode_borders`	Erodes depth borders before fusion.

📊 Evaluation

DTU and Tanks and Temples datasets

Download the DTU dataset prepared by 2DGS. Since their mask folders are named mask instead of masks, and they are not cropped consistently with the images and intrinsics, run:

python scripts/crop_dtu_2dgs_masks.py --data_folder /path/to/DTU

To avoid confusion, we rename the uncropped masks from mask to masks_uncropped. The 2DGS authors provide COLMAP camera calibration on top of the DTU images and cropped them slightly. We use these cropped images. We evaluate as in 2DGS, culling the meshes during evaluation using the uncropped masks and cameras.npz.

For Tanks and Temples (TnT), follow the 2DGS instructions to obtain the dataset. Download the preprocessed TNT data by GOF and the ground truth point clouds, alignments, and crop files from the original TnT website. Combine these resources as described in 2DGS.

Refine baseline meshes with PAGaS and evaluate

First, run 2DGS, PGSR, and MVSAnywhere and save their meshes as mesh_2dgs.ply, mesh_pgsr.ply, and mesh_mvsa.ply in each scene folder. We fuse the baseline depths into the initial meshes using our fusion script scripts/fuse_depth.sh. For TnT evaluation, you need a separated conda environment with open3d==0.10.0. For example, reuse the 2DGS environment and install it there:

pip install open3d==0.10.0

Then run the PAGaS refinement and evaluation, giving the conda environment name with --conda_env_tnt:

./eval.sh --path_DTU=/path/to/DTU --path_DTU_gt=/path/to/MVSData --path_TNT=/path/to/TNT --conda_env_tnt=2dgs

Acknowledgements

Our Occlusion-Aware 3DGS Rasterizer is built on top of Gsplat. Evaluation scripts for DTU and TnT dataset are based on DTUeval-python and TanksAndTemples respectively. We thank all the authors for their great work.

Citation

soon...

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
datareader		datareader
experiments		experiments
pagas		pagas
scripts		scripts
thirdparty		thirdparty
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
eval.sh		eval.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
rerun.rbl		rerun.rbl
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PAGaS: Pixel-Aligned 1DoF Gaussian Splatting for Depth Refinement

Table of Contents

⚙️ Setup

📷 Run on custom data

📊 Evaluation

DTU and Tanks and Temples datasets

Refine baseline meshes with PAGaS and evaluate

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PAGaS: Pixel-Aligned 1DoF Gaussian Splatting for Depth Refinement

Table of Contents

⚙️ Setup

📷 Run on custom data

📊 Evaluation

DTU and Tanks and Temples datasets

Refine baseline meshes with PAGaS and evaluate

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages