Meixi Song1 ·
Dizhe Zhang1,* ·
Hao Ren1,2 ·
Ruiyang Zhang1,3 ·
Bo Du4 ·
Ming-Hsuan Yang5 ·
Lu Qi1,4,*
1Insta360 Research · 2Sun Yat-sen University · 3Beihang University · 4Wuhan University · 5University of California, Merced
UniSHARP extends SHARP-style photorealistic monocular view synthesis to universal camera systems. Given a single image from a perspective, wide-FoV, fisheye, or panoramic camera, UniSHARP predicts a 3D Gaussian representation and renders high-quality novel views.
Clone this repository and enter the project directory:
git clone https://github.com/Insta360-Research-Team/UniSHARP.git
cd UnisharpCreate a fresh conda environment:
conda create -n unisharp python=3.12 -y
conda activate unisharpInstall PyTorch for your CUDA version. The code was smoke-tested with PyTorch 2.8 and torchvision 0.23:
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0Install the remaining Python dependencies:
pip install -r requirements.txtUniSHARP uses UniK3D for universal camera ray and feature prediction. Clone the official repository into Unisharp/UniK3D:
git clone https://github.com/lpiccinelli-eth/UniK3D.git UniK3DFisheye rendering depends on the GEER CUDA rasterizer from 3DGEER. Clone the repository into Unisharp/3dgeer:
git clone https://github.com/boschresearch/3dgeer.git 3dgeerIf you only use perspective or panoramic inference, the GEER rasterizer may not be needed. It is required for fisheye rendering paths.
The released dataset is hosted on Hugging Face:
- Dataset: Insta360-Research/OmniRooms
- Training manifests: Insta360-Research/OmniRooms/manifests/train
- Validation manifests: Insta360-Research/OmniRooms/manifests/validation
OmniRooms is a panoramic simulation dataset highly suitable for 3D reconstruction, especially for 3DGS tasks. It consists of 16 large indoor scenes, each containing multiple rooms, and 300k RGB images covering both small and large pose movements with corresponding depth information. OmniRooms is collected via AirSim, with OmniRooms-Wide derived by projecting these panoramas into 130-degree equidistant fisheye views. For each anchor point on a 0.5 m voxel grid, we render one central camera and 29 cameras randomly sampled within a local axis-aligned 30 cm cube centered on the source camera. To isolate translation-induced synthesis, all cameras share a fixed orientation. Each frame is rendered as a 1024 x 2048 ERP image.
The code supports the following data sources and manifest aliases:
RealEstate10KHM3DOmniRoomsOmniRooms-WideWildRGB-DDL3DVScanNet++ FisheyeReplica, andTanks and Templesfor validation-only protocols
Training manifests use the names released under manifests/train:
dataset_manifests/
├── re10k_train_chunks.txt
├── hm3d_train_scenes.txt
├── omnirooms.txt
├── wildrgbd_train_scenes.txt
├── dl3dv_train_scenes.txt
└── scanetpp_fisheye_train_scenes.txt
Validation manifests use the names released under manifests/validation:
validation_manifests/
├── re10k.txt
├── dl3dv.txt
├── hm3d.txt
├── omnirooms.txt
├── omnirooms_wide.txt
├── wildrgbd.txt
├── scanetpp_fisheye.txt
├── replica.txt
├── tat.txt
Training starts UniSHARP heads from scratch and loads the original pretrained UniK3D weights through the UniK3D loader. The official launcher does not resume from a previous UniSHARP checkpoint by default.
Released UniSHARP checkpoints are available at Insta360-Research/Unisharp. Place a checkpoint anywhere on disk and pass the path to validation or inference:
CHECKPOINT=/path/to/pretained_model.ptUse the official gt-override training launcher:
bash scripts/train.shTraining outputs are saved under:
outputs/<run_name>/
├── config.json
├── losses.csv
├── step_XXXXXXX.pt
└── vis/
Run validation with a checkpoint:
bash scripts/validate_unisharp.sh /path/to/step_XXXXXXX.ptRun single-image inference:
python scripts/infer_unisharp.py \
--checkpoint /path/to/step_XXXXXXX.pt \
--image /path/to/image.jpg \
--out-dir outputs/inferenceRun a directory or image list:
python scripts/infer_unisharp.py \
--checkpoint /path/to/step_XXXXXXX.pt \
--image-dir /path/to/images \
--out-dir outputs/inferenceIf calibrated camera parameters are available, pass them through a JSON file. Without this file, the script predicts rays with UniK3D and fits the camera parameters automatically.
Example perspective camera JSON:
{
"camera": "perspective",
"intrinsics": {
"fx": 820.0,
"fy": 820.0,
"cx": 512.0,
"cy": 384.0
}
}python scripts/infer_unisharp.py \
--checkpoint /path/to/step_XXXXXXX.pt \
--image /path/to/perspective.jpg \
--camera-json /path/to/perspective_camera.jsonExample Fisheye624 camera JSON:
{
"camera": "fisheye",
"camera_params": [820.0, 820.0, 512.0, 384.0, 0.01, -0.001, 0.0, 0.0]
}python scripts/infer_unisharp.py \
--checkpoint /path/to/step_XXXXXXX.pt \
--image /path/to/fisheye.jpg \
--camera-json /path/to/fisheye_camera.jsonFor batched inference, the JSON can also contain per-image entries:
{
"default": {
"camera": "perspective",
"intrinsics": [820.0, 820.0, 512.0, 384.0]
},
"images": {
"panorama.jpg": {
"camera": "panorama"
},
"fisheye.jpg": {
"camera": "fisheye",
"camera_params": [820.0, 820.0, 512.0, 384.0, 0.01, -0.001, 0.0, 0.0]
}
}
}This project builds on open-source work from:
- SHARP for monocular Gaussian view synthesis
- UniK3D for universal camera geometry and features
- 3DGEER for generic-camera Gaussian rasterization
- gsplat for Gaussian splatting utilities
@article{song2026unisharp,
title={UniSHARP: Universal Sharp Monocular View Synthesis},
author={Song, Meixi and Zhang, Dizhe and Ren, Hao and Zhang, Ruiyang and Du, Bo and Yang, Ming-Hsuan and Qi, Lu},
journal={arXiv},
year={2026}
}

















