Vinayak Gupta, Chih-Hao Lin, Shenlong Wang, Anand Bhattad, Jia-Bin Huang
CVPR 2026
GenWildSplat is a feed-forward method for sparse-view 3D reconstruction from unconstrained "in-the-wild" images. From 2–6 unposed images with varying appearance and transient occluders, it produces a 3D Gaussian splat in roughly 3 seconds on a single A6000 GPU. The model jointly predicts camera poses, depth, and per-Gaussian parameters; a lightweight latent appearance encoder modulates view-dependent appearance while keeping geometry consistent, and a segmentation pathway suppresses transient objects. Renderings are then refined with SyncFix for multi-view consistency. The method is built on top of AnySplat with a VGGT backbone.
Tested on Python 3.10, PyTorch 2.4.0, CUDA 12.4. Other PyTorch / CUDA versions may work, but you'll need to swap the gsplat and torch_scatter wheels accordingly.
git clone https://github.com/Vinayak-VG/GenWildSplat.git
cd GenWildSplat
conda create -y -n genwildsplat python=3.10
conda activate genwildsplatgsplat falls back to JIT compilation, which needs GCC 9+. The easiest fix is a self-contained conda gcc:
conda install -c conda-forge -y gxx_linux-64=11 gcc_linux-64=11Then point PyTorch's extension builder at it (add to your shell rc or eval_script.sh):
export CC=$CONDA_PREFIX/bin/x86_64-conda-linux-gnu-gcc
export CXX=$CONDA_PREFIX/bin/x86_64-conda-linux-gnu-g++pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt --find-links https://data.pyg.org/whl/torch-2.4.0+cu124.htmlThe --find-links flag is required so pip pulls the prebuilt torch_scatter+pt24cu124 wheel from PyG instead of trying to compile from source.
pip install -e SyncFix --no-depsThe --no-deps flag is critical — it makes syncfix importable without touching any of the carefully pinned versions from requirements.txt.
Download from these Google Drive links:
- Main checkpoint — contains
model.safetensors(GenWildSplat weights) andyolov8x-seg.pt(YOLOv8 segmentation weights for transient-object masking) - Latent model —
latent_model.pth.tar(lighting/appearance latent UNet init weights, used at training time)
Place all three files under checkpoint/ (alongside the existing config.json):
checkpoint/
├── config.json
├── model.safetensors
├── yolov8x-seg.pt
└── latent_model.pth.tar
latent_model.pth.taris only strictly required for training — at inference time its weights are overwritten bymodel.safetensors. If you skip it the encoder will print a one-line notice and continue.
The diffusion-based refinement step needs the SyncFix model weights. Download them from SyncFix's Google Drive folder and place the contents under SyncFix/checkpoint/ (create the folder if it doesn't exist):
SyncFix/checkpoint/
├── config.yaml
└── model.safetensors
The eval scripts default to
--syncfix_ckpt SyncFix/checkpoint. Without these weights the refinement step will fail; pass--no_refineto skip it (raw AnySplat output only).
An example scene is provided at examples/Oidor_Chapel.
<scene_folder>/
└── images/
├── img_001.jpg
├── img_002.jpg
└── ...
python3 create_masks.pyThis writes a <scene_folder>/masks/ directory next to images/. Tune the --conf flag (default 0.1) inside create_masks.py to control how aggressively transient objects are suppressed — lower confidence masks more, higher confidence masks less.
Use either of the eval scripts below. Both expect the images/ + masks/ layout from step 1.
The two evaluation scripts in eval_script.sh cover the two main use cases. Both load the same checkpoint; pick the one that matches what you want to produce.
By default, both scripts pipe their renderings through SyncFix for diffusion-based refinement, using the input images as references (best multi-view consistency).
--no_ref(video script only) — reference-free refinement. Use this if you want to preserve the lighting/appearance of the rendered video instead of pulling it back toward the reference images.--no_refine— skip SyncFix entirely and keep the raw AnySplat rendering.
Walks the predicted camera trajectory and writes <output_path>/video.mp4 (refined RGB) and <output_path>/depth.mp4 (raw depth — never refined). The first 6 images (sorted naturally) are used as context views. All intermediate files are deleted after refinement.
# Default: refinement with reference images (best consistency)
python src/eval_nvs_video.py \
--data_dir examples/Oidor_Chapel \
--output_path eval_outputs/Oidor_Chapel \
--ckpt_path checkpoint/model.safetensors \
--syncfix_ckpt SyncFix/checkpoint# Reference-free refinement (preserves rendered lighting)
python src/eval_nvs_video.py \
--data_dir examples/Oidor_Chapel \
--output_path eval_outputs/Oidor_Chapel \
--ckpt_path checkpoint/model.safetensors \
--syncfix_ckpt SyncFix/checkpoint \
--no_ref# No refinement — raw AnySplat output only
python src/eval_nvs_video.py \
--data_dir examples/Oidor_Chapel \
--output_path eval_outputs/Oidor_Chapel \
--ckpt_path checkpoint/model.safetensors \
--no_refineSplits the input into 6 context views + the rest as target views, renders the targets from the predicted Gaussian splat, and reports PSNR / SSIM / LPIPS to stdout. Refined views are written to <output_path>/refined/. All intermediate gt/ and pred/ folders are deleted after refinement.
# Default: refinement with reference images
python src/eval_nvs_tgt.py \
--data_dir examples/Oidor_Chapel \
--output_path eval_outputs/Oidor_Chapel \
--ckpt_path checkpoint/model.safetensors \
--syncfix_ckpt SyncFix/checkpoint# No refinement
python src/eval_nvs_tgt.py \
--data_dir examples/Oidor_Chapel \
--output_path eval_outputs/Oidor_Chapel \
--ckpt_path checkpoint/model.safetensors \
--no_refineYou can also run both via:
bash eval_script.sh- Code release for inference & evaluation
- Interpolation video rendering script (with SyncFix refinement)
- Target-view rendering with PSNR / SSIM / LPIPS
- Pretrained checkpoint
- Codebase cleanup
- SyncFix integration (reference-based and reference-free)
- Release of training code with reproducible recipes & dataset prep scripts
- Hugging Face Spaces demo
@article{gupta2026genwildsplat,
title={Generalizable Sparse-View 3D Reconstruction from Unconstrained Images},
author={Gupta, Vinayak and Lin, Chih-Hao and Wang, Shenlong and Bhattad, Anand and Huang, Jia-Bin},
journal={CVPR},
year={2026}
}
Built on top of AnySplat, VGGT, and SyncFix. Thanks also to NoPoSplat, CUT3R, and gsplat.