Skip to content

Vinayak-VG/GenWildSplat

Repository files navigation

GenWildSplat: Generalizable Sparse-View 3D Reconstruction from Unconstrained Images

Project Website Paper Code

Vinayak Gupta, Chih-Hao Lin, Shenlong Wang, Anand Bhattad, Jia-Bin Huang

CVPR 2026

Overview

GenWildSplat is a feed-forward method for sparse-view 3D reconstruction from unconstrained "in-the-wild" images. From 2–6 unposed images with varying appearance and transient occluders, it produces a 3D Gaussian splat in roughly 3 seconds on a single A6000 GPU. The model jointly predicts camera poses, depth, and per-Gaussian parameters; a lightweight latent appearance encoder modulates view-dependent appearance while keeping geometry consistent, and a segmentation pathway suppresses transient objects. Renderings are then refined with SyncFix for multi-view consistency. The method is built on top of AnySplat with a VGGT backbone.

Installation

Tested on Python 3.10, PyTorch 2.4.0, CUDA 12.4. Other PyTorch / CUDA versions may work, but you'll need to swap the gsplat and torch_scatter wheels accordingly.

1. Create the environment

git clone https://github.com/Vinayak-VG/GenWildSplat.git
cd GenWildSplat

conda create -y -n genwildsplat python=3.10
conda activate genwildsplat

2. Install GCC ≥ 9 (skip if your system already has it)

gsplat falls back to JIT compilation, which needs GCC 9+. The easiest fix is a self-contained conda gcc:

conda install -c conda-forge -y gxx_linux-64=11 gcc_linux-64=11

Then point PyTorch's extension builder at it (add to your shell rc or eval_script.sh):

export CC=$CONDA_PREFIX/bin/x86_64-conda-linux-gnu-gcc
export CXX=$CONDA_PREFIX/bin/x86_64-conda-linux-gnu-g++

3. Install PyTorch + Python deps

pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt --find-links https://data.pyg.org/whl/torch-2.4.0+cu124.html

The --find-links flag is required so pip pulls the prebuilt torch_scatter+pt24cu124 wheel from PyG instead of trying to compile from source.

4. Install SyncFix as a local package (no extra deps)

pip install -e SyncFix --no-deps

The --no-deps flag is critical — it makes syncfix importable without touching any of the carefully pinned versions from requirements.txt.

Checkpoints

GenWildSplat checkpoint

Download from these Google Drive links:

  • Main checkpoint — contains model.safetensors (GenWildSplat weights) and yolov8x-seg.pt (YOLOv8 segmentation weights for transient-object masking)
  • Latent modellatent_model.pth.tar (lighting/appearance latent UNet init weights, used at training time)

Place all three files under checkpoint/ (alongside the existing config.json):

checkpoint/
├── config.json
├── model.safetensors
├── yolov8x-seg.pt
└── latent_model.pth.tar

latent_model.pth.tar is only strictly required for training — at inference time its weights are overwritten by model.safetensors. If you skip it the encoder will print a one-line notice and continue.

SyncFix checkpoint

The diffusion-based refinement step needs the SyncFix model weights. Download them from SyncFix's Google Drive folder and place the contents under SyncFix/checkpoint/ (create the folder if it doesn't exist):

SyncFix/checkpoint/
├── config.yaml
└── model.safetensors

The eval scripts default to --syncfix_ckpt SyncFix/checkpoint. Without these weights the refinement step will fail; pass --no_refine to skip it (raw AnySplat output only).

An example scene is provided at examples/Oidor_Chapel.

Running on Your Own Images

1. Organize the scene

<scene_folder>/
└── images/
    ├── img_001.jpg
    ├── img_002.jpg
    └── ...

2. Generate transient-object masks

python3 create_masks.py

This writes a <scene_folder>/masks/ directory next to images/. Tune the --conf flag (default 0.1) inside create_masks.py to control how aggressively transient objects are suppressed — lower confidence masks more, higher confidence masks less.

3. Run inference

Use either of the eval scripts below. Both expect the images/ + masks/ layout from step 1.

Evaluation

The two evaluation scripts in eval_script.sh cover the two main use cases. Both load the same checkpoint; pick the one that matches what you want to produce.

By default, both scripts pipe their renderings through SyncFix for diffusion-based refinement, using the input images as references (best multi-view consistency).

  • --no_ref (video script only) — reference-free refinement. Use this if you want to preserve the lighting/appearance of the rendered video instead of pulling it back toward the reference images.
  • --no_refine — skip SyncFix entirely and keep the raw AnySplat rendering.

1. Render an interpolated novel-view video

Walks the predicted camera trajectory and writes <output_path>/video.mp4 (refined RGB) and <output_path>/depth.mp4 (raw depth — never refined). The first 6 images (sorted naturally) are used as context views. All intermediate files are deleted after refinement.

# Default: refinement with reference images (best consistency)
python src/eval_nvs_video.py \
  --data_dir     examples/Oidor_Chapel \
  --output_path  eval_outputs/Oidor_Chapel \
  --ckpt_path    checkpoint/model.safetensors \
  --syncfix_ckpt SyncFix/checkpoint
# Reference-free refinement (preserves rendered lighting)
python src/eval_nvs_video.py \
  --data_dir     examples/Oidor_Chapel \
  --output_path  eval_outputs/Oidor_Chapel \
  --ckpt_path    checkpoint/model.safetensors \
  --syncfix_ckpt SyncFix/checkpoint \
  --no_ref
# No refinement — raw AnySplat output only
python src/eval_nvs_video.py \
  --data_dir     examples/Oidor_Chapel \
  --output_path  eval_outputs/Oidor_Chapel \
  --ckpt_path    checkpoint/model.safetensors \
  --no_refine

2. Render & score target views

Splits the input into 6 context views + the rest as target views, renders the targets from the predicted Gaussian splat, and reports PSNR / SSIM / LPIPS to stdout. Refined views are written to <output_path>/refined/. All intermediate gt/ and pred/ folders are deleted after refinement.

# Default: refinement with reference images
python src/eval_nvs_tgt.py \
  --data_dir     examples/Oidor_Chapel \
  --output_path  eval_outputs/Oidor_Chapel \
  --ckpt_path    checkpoint/model.safetensors \
  --syncfix_ckpt SyncFix/checkpoint
# No refinement
python src/eval_nvs_tgt.py \
  --data_dir     examples/Oidor_Chapel \
  --output_path  eval_outputs/Oidor_Chapel \
  --ckpt_path    checkpoint/model.safetensors \
  --no_refine

You can also run both via:

bash eval_script.sh

TODO

  • Code release for inference & evaluation
  • Interpolation video rendering script (with SyncFix refinement)
  • Target-view rendering with PSNR / SSIM / LPIPS
  • Pretrained checkpoint
  • Codebase cleanup
  • SyncFix integration (reference-based and reference-free)
  • Release of training code with reproducible recipes & dataset prep scripts
  • Hugging Face Spaces demo

Citation

@article{gupta2026genwildsplat,
  title={Generalizable Sparse-View 3D Reconstruction from Unconstrained Images},
  author={Gupta, Vinayak and Lin, Chih-Hao and Wang, Shenlong and Bhattad, Anand and Huang, Jia-Bin},
  journal={CVPR},
  year={2026}
}

Acknowledgements

Built on top of AnySplat, VGGT, and SyncFix. Thanks also to NoPoSplat, CUT3R, and gsplat.

About

[CVPR 2026] GenWildSplat: Generalizable Sparse-View 3D Reconstruction from Unconstrained Images

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages