Authors: Jinyuan Qu, Hongyang Li, and Lei Zhang.
SegVGGT is a unified feed-forward framework for joint 3D reconstruction and 3D instance segmentation from unposed multi-view RGB images. It integrates object queries into a geometry-grounded transformer and introduces the FADA module to guide instance-aware attention, enabling accurate reconstruction and segmentation in a single forward pass.
First, clone this repository to your local machine, and install the dependencies.
# Clone SegVGGT
git clone https://github.com/IDEA-Research/SegVGGT
cd SegVGGT
# Create environment
conda create -n segvggt python=3.10.19 -y
conda activate segvggt
# Install pytorch 2.3.1 with cuda 12.1
pip install torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu121
# Install other requirements
pip install -r requirements.txtFor ScanNet and ScanNet200 evaluation, please prepare the processed 3D annotations and extracted 2D RGB-D frames under data_processing/.
Detailed instructions are provided in docs/data_preparation.md.
The directory structure after data preparation should be as below:
data_processing/
├── scannet/ # or scannet200/
│ ├── meta_data/
│ │ └── scannetv2_val.txt
│ ├── points/
│ │ └── sceneXXXX_XX.bin
│ ├── semantic_mask/
│ │ └── sceneXXXX_XX.bin
│ ├── instance_mask/
│ │ └── sceneXXXX_XX.bin
│ ├── super_points/
│ │ └── sceneXXXX_XX.bin
│ └── scenes_2d/
│ └── sceneXXXX_XX/
│ ├── color/
│ │ ├── 0.jpg
│ │ └── ...
│ ├── depth/
│ │ ├── 0.png
│ │ └── ...
│ ├── pose/
│ │ ├── 0.txt
│ │ └── ...
│ └── intrinsic/
│ └── intrinsic_depth.txtFirst, download our provided checkpoints, and put them at "./checkpoint".
# Evaluate on ScanNet200 (default)
bash scripts/eval.sh
# Evaluate on ScanNet
DATASET=scannet bash scripts/eval.sh
# Example: choose a specific GPU
GPU_ID=1 DATASET=scannet200 bash scripts/eval.shWe provide the configuration files and checkpoints for the ScanNet and ScanNet200 benchmarks (validation set).
| Dataset | mAP | mAP50 | mAP25 | Download |
|---|---|---|---|---|
| ScanNet (val) | 50.4 | 71.7 | 87.0 | model | config |
| ScanNet200 (val) | 31.9 | 45.7 | 53.7 | model | config |
If you find this work helpful for your research, please cite:
@article{qu2026segvggt,
title={SegVGGT: Joint 3D Reconstruction and Instance Segmentation from Multi-View Images},
author={Qu, Jinyuan and Li, Hongyang and Zhang, Lei},
journal={arXiv preprint arXiv:2603.19926},
year={2026}
}
SegVGGT is built on VGGT. We thank the VGGT contributors for their open research.
See the LICENSE file for details about the license under which this code is made available.
