Skip to content

IDEA-Research/SegVGGT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SegVGGT: Joint 3D Reconstruction and Instance Segmentation from Multi-View Images

Authors: Jinyuan Qu, Hongyang Li, and Lei Zhang.

Paper Checkpoint Code

Overview

SegVGGT is a unified feed-forward framework for joint 3D reconstruction and 3D instance segmentation from unposed multi-view RGB images. It integrates object queries into a geometry-grounded transformer and introduces the FADA module to guide instance-aware attention, enabling accurate reconstruction and segmentation in a single forward pass.

Installation

First, clone this repository to your local machine, and install the dependencies.

# Clone SegVGGT
git clone https://github.com/IDEA-Research/SegVGGT
cd SegVGGT
# Create environment
conda create -n segvggt python=3.10.19 -y
conda activate segvggt
# Install pytorch 2.3.1 with cuda 12.1
pip install torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu121
# Install other requirements
pip install -r requirements.txt

Data Preparation

For ScanNet and ScanNet200 evaluation, please prepare the processed 3D annotations and extracted 2D RGB-D frames under data_processing/. Detailed instructions are provided in docs/data_preparation.md.

The directory structure after data preparation should be as below:

data_processing/
├── scannet/                          # or scannet200/
│   ├── meta_data/
│   │   └── scannetv2_val.txt
│   ├── points/
│   │   └── sceneXXXX_XX.bin
│   ├── semantic_mask/
│   │   └── sceneXXXX_XX.bin
│   ├── instance_mask/
│   │   └── sceneXXXX_XX.bin
│   ├── super_points/
│   │   └── sceneXXXX_XX.bin
│   └── scenes_2d/
│       └── sceneXXXX_XX/
│           ├── color/
│           │   ├── 0.jpg
│           │   └── ...
│           ├── depth/
│           │   ├── 0.png
│           │   └── ...
│           ├── pose/
│           │   ├── 0.txt
│           │   └── ...
│           └── intrinsic/
│               └── intrinsic_depth.txt

Evaluation

First, download our provided checkpoints, and put them at "./checkpoint".

# Evaluate on ScanNet200 (default)
bash scripts/eval.sh

# Evaluate on ScanNet
DATASET=scannet bash scripts/eval.sh

# Example: choose a specific GPU
GPU_ID=1 DATASET=scannet200 bash scripts/eval.sh

Models

We provide the configuration files and checkpoints for the ScanNet and ScanNet200 benchmarks (validation set).

Dataset mAP mAP50 mAP25 Download
ScanNet (val) 50.4 71.7 87.0 model | config
ScanNet200 (val) 31.9 45.7 53.7 model | config

Citation

If you find this work helpful for your research, please cite:

@article{qu2026segvggt,
  title={SegVGGT: Joint 3D Reconstruction and Instance Segmentation from Multi-View Images},
  author={Qu, Jinyuan and Li, Hongyang and Zhang, Lei},
  journal={arXiv preprint arXiv:2603.19926},
  year={2026}
}

Acknowledgement

SegVGGT is built on VGGT. We thank the VGGT contributors for their open research.

License

See the LICENSE file for details about the license under which this code is made available.

About

Official implementation of the paper "SegVGGT: Joint 3D Reconstruction and Instance Segmentation from Multi-View Images"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors