3D-DRES: Detailed 3D Referring Expression Segmentation

NEWS:🔥3D-DRES is accepted at AAAI 2026 !🔥

Qi Chen, Changli Wu, Jiayi Ji, Yiwei Ma, Liujuan Cao

Framework:

Introduction

Current 3D visual grounding tasks only process sentence-level detection or segmentation, which critically fails to leverage the rich compositional contextual reasonings within natural language expressions. To address this challenge, we introduce Detailed 3D Referring Expression Segmentation (3D-DRES), a new task that provides a phrase to 3D instance mapping, aiming at enhancing fine-grained 3D vision-language understanding. To support 3D-DRES, we present DetailRefer, a new dataset comprising 54,432 descriptions spanning 11,054 distinct objects. Unlike previous datasets, DetailRefer implements a pioneering phrase-instance annotation paradigm where each referenced noun phrase is explicitly mapped to its corresponding 3D elements. Additionally, we introduce DetailBase, a purposefully streamlined yet effective baseline architecture that supports dual-mode segmentation at both sentence and phrase levels. Our experimental results demonstrate that models trained on DetailRefer not only excel at phrase-level segmentation but also show surprising improvements on traditional 3D-RES benchmarks.

DetailRefer

Download the dataset here. If you come across any issues within the dataset that we have not yet identified, please feel free to leave a comment, and we will make the necessary corrections.

DetailBase

Requirements

Python 3.7 or higher
Pytorch 1.12
CUDA 11.3 or higher

The following installation suppose python=3.8 pytorch=1.12.1 and cuda=11.3.

Create a conda virtual environment

conda create -n detailbase python=3.8
conda activate detailbase

Clone this repository

git clone https://github.com/80chen86/3D-DRES.git

Install the dependencies

Install Pytorch 1.12.1

pip install spconv-cu113
pip install https://data.pyg.org/whl/torch-1.12.0%2Bcu113/torch_scatter-2.0.9-cp38-cp38-linux_x86_64.whl # please check the versions in the website
pip install -r requirements.txt

Install segmentator from this repo (We wrap the segmentator in ScanNet).

Setup, Install ipdn and pointgroup_ops.

sudo apt-get install libsparsehash-dev
python setup.py develop
cd detailbase/lib/
python setup.py develop

Data Preparation

ScanNet v2 dataset

Download the ScanNet v2 dataset.

Put the downloaded scans folder as follows. You need to download the ['.aggregation.json', '.txt', '_vh_clean_2.0.010000.segs.json', '_vh_clean_2.ply', '_vh_clean_2.labels.ply', '_vh_clean.aggregation.json'] files.

3D-DRES
├── data
│   ├── scannetv2
│   │   ├── scans

Split and preprocess point cloud data

cd data/scannetv2
bash prepare_data.sh

The script data into train/val folder and preprocess the data. After running the script the scannet dataset structure should look like below.

3D-DRES
├── data
│   ├── scannetv2
│   │   ├── scans
│   │   ├── train
│   │   ├── val

DetailRefer dataset

Download DetailRefer and put them as follows.

3D-DRES
├── data
│   ├── DetailRefer
│   │   ├── DetailRefer_train.json
│   │   ├── DetailRefer_val.json
│   │   ├── DetailRefer_test.json

Pretrained Backbone

Download SPFormer pretrained model and move it to backbones.

mkdir backbones
mv ${Download_PATH}/sp_unet_backbone.pth backbones/

Training

bash scripts/train.sh

Inference

You can download and use our pretrain checkpoint.

bash scripts/test.sh

Citation

If you find this work useful in your research, please cite:

Ancknowledgement

Sincerely thanks for MDIN and SPFormer repos. This repo is build upon them.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs		configs
data		data
detailbase		detailbase
docs		docs
scripts		scripts
tools		tools
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

3D-DRES: Detailed 3D Referring Expression Segmentation

Introduction

DetailRefer

DetailBase

Data Preparation

ScanNet v2 dataset

DetailRefer dataset

Pretrained Backbone

Training

Inference

Citation

Ancknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

3D-DRES: Detailed 3D Referring Expression Segmentation

Introduction

DetailRefer

DetailBase

Data Preparation

ScanNet v2 dataset

DetailRefer dataset

Pretrained Backbone

Training

Inference

Citation

Ancknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages