SegVGGT: Joint 3D Reconstruction and Instance Segmentation from Multi-View Images

Authors: Jinyuan Qu, Hongyang Li, and Lei Zhang.

Overview

SegVGGT is a unified feed-forward framework for joint 3D reconstruction and 3D instance segmentation from unposed multi-view RGB images. It integrates object queries into a geometry-grounded transformer and introduces the FADA module to guide instance-aware attention, enabling accurate reconstruction and segmentation in a single forward pass.

Installation

First, clone this repository to your local machine, and install the dependencies.

# Clone SegVGGT
git clone https://github.com/IDEA-Research/SegVGGT
cd SegVGGT
# Create environment
conda create -n segvggt python=3.10.19 -y
conda activate segvggt
# Install pytorch 2.3.1 with cuda 12.1
pip install torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu121
# Install other requirements
pip install -r requirements.txt

Data Preparation

For ScanNet and ScanNet200 evaluation, please prepare the processed 3D annotations and extracted 2D RGB-D frames under data_processing/. Detailed instructions are provided in docs/data_preparation.md.

The directory structure after data preparation should be as below:

data_processing/
├── scannet/                          # or scannet200/
│   ├── meta_data/
│   │   └── scannetv2_val.txt
│   ├── points/
│   │   └── sceneXXXX_XX.bin
│   ├── semantic_mask/
│   │   └── sceneXXXX_XX.bin
│   ├── instance_mask/
│   │   └── sceneXXXX_XX.bin
│   ├── super_points/
│   │   └── sceneXXXX_XX.bin
│   └── scenes_2d/
│       └── sceneXXXX_XX/
│           ├── color/
│           │   ├── 0.jpg
│           │   └── ...
│           ├── depth/
│           │   ├── 0.png
│           │   └── ...
│           ├── pose/
│           │   ├── 0.txt
│           │   └── ...
│           └── intrinsic/
│               └── intrinsic_depth.txt

Evaluation

First, download our provided checkpoints, and put them at "./checkpoint".

# Evaluate on ScanNet200 (default)
bash scripts/eval.sh

# Evaluate on ScanNet
DATASET=scannet bash scripts/eval.sh

# Example: choose a specific GPU
GPU_ID=1 DATASET=scannet200 bash scripts/eval.sh

Models

We provide the configuration files and checkpoints for the ScanNet and ScanNet200 benchmarks (validation set).

Dataset	mAP	mAP₅₀	mAP₂₅	Download
ScanNet (val)	50.4	71.7	87.0	model \| config
ScanNet200 (val)	31.9	45.7	53.7	model \| config

Citation

If you find this work helpful for your research, please cite:

@article{qu2026segvggt,
  title={SegVGGT: Joint 3D Reconstruction and Instance Segmentation from Multi-View Images},
  author={Qu, Jinyuan and Li, Hongyang and Zhang, Lei},
  journal={arXiv preprint arXiv:2603.19926},
  year={2026}
}

Acknowledgement

SegVGGT is built on VGGT. We thank the VGGT contributors for their open research.

License

See the LICENSE file for details about the license under which this code is made available.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
configs/eval		configs/eval
docs		docs
eval		eval
scripts		scripts
segvggt		segvggt
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SegVGGT: Joint 3D Reconstruction and Instance Segmentation from Multi-View Images

Overview

Installation

Data Preparation

Evaluation

Models

Citation

Acknowledgement

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SegVGGT: Joint 3D Reconstruction and Instance Segmentation from Multi-View Images

Overview

Installation

Data Preparation

Evaluation

Models

Citation

Acknowledgement

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages