Skip to content

Insta360-Research-Team/UniSHARP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UniSHARP:
Universal Sharp Monocular View Synthesis

Meixi Song1 · Dizhe Zhang1,* · Hao Ren1,2 · Ruiyang Zhang1,3 · Bo Du4 · Ming-Hsuan Yang5 · Lu Qi1,4,*
1Insta360 Research · 2Sun Yat-sen University · 3Beihang University · 4Wuhan University · 5University of California, Merced

arXiv Project Page Demo Dataset GitHub

UniSHARP extends SHARP-style photorealistic monocular view synthesis to universal camera systems. Given a single image from a perspective, wide-FoV, fisheye, or panoramic camera, UniSHARP predicts a 3D Gaussian representation and renders high-quality novel views.

UniSHARP teaser UniSHARP teaser 2

UniSHARP method

🔨 Installation

Clone this repository and enter the project directory:

git clone https://github.com/Insta360-Research-Team/UniSHARP.git
cd Unisharp

Create a fresh conda environment:

conda create -n unisharp python=3.12 -y
conda activate unisharp

Install PyTorch for your CUDA version. The code was smoke-tested with PyTorch 2.8 and torchvision 0.23:

pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0

Install the remaining Python dependencies:

pip install -r requirements.txt

🧩 External Dependencies

UniK3D

UniSHARP uses UniK3D for universal camera ray and feature prediction. Clone the official repository into Unisharp/UniK3D:

git clone https://github.com/lpiccinelli-eth/UniK3D.git UniK3D

3DGEER

Fisheye rendering depends on the GEER CUDA rasterizer from 3DGEER. Clone the repository into Unisharp/3dgeer:

git clone https://github.com/boschresearch/3dgeer.git 3dgeer

If you only use perspective or panoramic inference, the GEER rasterizer may not be needed. It is required for fisheye rendering paths.

🖼️ Dataset

The released dataset is hosted on Hugging Face:

OmniRooms is a panoramic simulation dataset highly suitable for 3D reconstruction, especially for 3DGS tasks. It consists of 16 large indoor scenes, each containing multiple rooms, and 300k RGB images covering both small and large pose movements with corresponding depth information. OmniRooms is collected via AirSim, with OmniRooms-Wide derived by projecting these panoramas into 130-degree equidistant fisheye views. For each anchor point on a 0.5 m voxel grid, we render one central camera and 29 cameras randomly sampled within a local axis-aligned 30 cm cube centered on the source camera. To isolate translation-induced synthesis, all cameras share a fixed orientation. Each frame is rendered as a 1024 x 2048 ERP image.

OmniRooms scene AIUE5 vol8 03 OmniRooms scene AIUE5 vol8 04 OmniRooms scene AIUE5 vol8 05 OmniRooms scene AIUE V01 001
OmniRooms scene AIUE V01 003 OmniRooms scene AIUE V01 004 OmniRooms scene AIUE V02 001 OmniRooms scene AI vol3 01
OmniRooms scene AI vol3 02 OmniRooms scene AI vol3 03 OmniRooms scene AI vol3 04 OmniRooms scene AI vol4 01
OmniRooms scene AI vol4 02 OmniRooms scene AI vol4 03 OmniRooms scene AI vol4 04 OmniRooms scene AI vol4 05

The code supports the following data sources and manifest aliases:

  • RealEstate10K
  • HM3D
  • OmniRooms
  • OmniRooms-Wide
  • WildRGB-D
  • DL3DV
  • ScanNet++ Fisheye
  • Replica, and Tanks and Temples for validation-only protocols

Training manifests use the names released under manifests/train:

dataset_manifests/
├── re10k_train_chunks.txt            
├── hm3d_train_scenes.txt            
├── omnirooms.txt              
├── wildrgbd_train_scenes.txt         
├── dl3dv_train_scenes.txt            
└── scanetpp_fisheye_train_scenes.txt 

Validation manifests use the names released under manifests/validation:

validation_manifests/
├── re10k.txt                      
├── dl3dv.txt                         
├── hm3d.txt                          
├── omnirooms.txt                      
├── omnirooms_wide.txt              
├── wildrgbd.txt                     
├── scanetpp_fisheye.txt              
├── replica.txt                       
├── tat.txt                           

🤝 Checkpoints

Training starts UniSHARP heads from scratch and loads the original pretrained UniK3D weights through the UniK3D loader. The official launcher does not resume from a previous UniSHARP checkpoint by default.

Released UniSHARP checkpoints are available at Insta360-Research/Unisharp. Place a checkpoint anywhere on disk and pass the path to validation or inference:

CHECKPOINT=/path/to/pretained_model.pt

🚀 Training

Use the official gt-override training launcher:

bash scripts/train.sh

Training outputs are saved under:

outputs/<run_name>/
├── config.json
├── losses.csv
├── step_XXXXXXX.pt
└── vis/

📊 Validation

Run validation with a checkpoint:

bash scripts/validate_unisharp.sh /path/to/step_XXXXXXX.pt

📒 Inference

Run single-image inference:

python scripts/infer_unisharp.py \
  --checkpoint /path/to/step_XXXXXXX.pt \
  --image /path/to/image.jpg \
  --out-dir outputs/inference

Run a directory or image list:

python scripts/infer_unisharp.py \
  --checkpoint /path/to/step_XXXXXXX.pt \
  --image-dir /path/to/images \
  --out-dir outputs/inference

If calibrated camera parameters are available, pass them through a JSON file. Without this file, the script predicts rays with UniK3D and fits the camera parameters automatically.

Example perspective camera JSON:

{
  "camera": "perspective",
  "intrinsics": {
    "fx": 820.0,
    "fy": 820.0,
    "cx": 512.0,
    "cy": 384.0
  }
}
python scripts/infer_unisharp.py \
  --checkpoint /path/to/step_XXXXXXX.pt \
  --image /path/to/perspective.jpg \
  --camera-json /path/to/perspective_camera.json

Example Fisheye624 camera JSON:

{
  "camera": "fisheye",
  "camera_params": [820.0, 820.0, 512.0, 384.0, 0.01, -0.001, 0.0, 0.0]
}
python scripts/infer_unisharp.py \
  --checkpoint /path/to/step_XXXXXXX.pt \
  --image /path/to/fisheye.jpg \
  --camera-json /path/to/fisheye_camera.json

For batched inference, the JSON can also contain per-image entries:

{
  "default": {
    "camera": "perspective",
    "intrinsics": [820.0, 820.0, 512.0, 384.0]
  },
  "images": {
    "panorama.jpg": {
      "camera": "panorama"
    },
    "fisheye.jpg": {
      "camera": "fisheye",
      "camera_params": [820.0, 820.0, 512.0, 384.0, 0.01, -0.001, 0.0, 0.0]
    }
  }
}

🙏 Acknowledgement

This project builds on open-source work from:

  • SHARP for monocular Gaussian view synthesis
  • UniK3D for universal camera geometry and features
  • 3DGEER for generic-camera Gaussian rasterization
  • gsplat for Gaussian splatting utilities

📝 Citation

@article{song2026unisharp,
  title={UniSHARP: Universal Sharp Monocular View Synthesis},
  author={Song, Meixi and Zhang, Dizhe and Ren, Hao and Zhang, Ruiyang and Du, Bo and Yang, Ming-Hsuan and Qi, Lu},
  journal={arXiv},
  year={2026}
}

About

Official implementation of UniSHARP: Universal Sharp Monocular View Synthesis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors