OPEN

OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection

Jinghua Hou ¹, Tong Wang ², Xiaoqing Ye ², Zhe Liu ¹, Shi Gong ², Xiao Tan ²,
Errui Ding ², Jingdong Wang ², Xiang Bai ^1,✉
¹ Huazhong University of Science and Technology, ² Baidu Inc.
✉ Corresponding author.

ECCV 2024

Abstract Accurate depth information is crucial for enhancing the performance of multi-view 3D object detection. Despite the success of some existing multi-view 3D detectors utilizing pixel-wise depth supervision, they overlook two significant phenomena: 1) the depth supervision obtained from LiDAR points is usually distributed on the surface of the object, which is not so friendly to existing DETR-based 3D detectors due to the lack of the depth of 3D object center; 2) for distant objects, fine-grained depth estimation of the whole object is more challenging. Therefore, we argue that the object-wise depth (or 3D center of the object) is essential for accurate detection. In this paper, we propose a new multi-view 3D object detector named OPEN, whose main idea is to effectively inject object-wise depth information into the network through our proposed object-wise position embedding. Specifically, we first employ an object-wise depth encoder, which takes the pixel-wise depth map as a prior, to accurately estimate the object-wise depth. Then, we utilize the proposed object-wise position embedding to encode the object-wise depth information into the transformer decoder, thereby producing 3D object-aware features for final detection. Extensive experiments verify the effectiveness of our proposed method. Furthermore, OPEN achieves a new state-of-the-art performance with 64.4% NDS and 56.7% mAP on the nuScenes test benchmark.

News

2024.07.02: Our another work SEED has also been accepted by ECCV 2024. 🎉
2024.07.02: OPEN has been accepted by ECCV 2024. 🎉

Results

nuScenes Val Set

The reproduced results are slightly higher than the reported results in the paper.

R50：56.4 -> 56.5 NDS, 46.5 -> 47.0mAP

R101: 60.6 -> 60.6 NDS, 51.6 -> 51.9 mAP

Model	Backbone	Pretrain	Resolution	NDS	mAP	Config	Download
OPEN	V2-99	DD3D	320 x 800	61.3	52.1	config	model
OPEN	R50	nuImage	256 x 704	56.5	47.0	config	model
OPEN	R101	nuImage	512 x 1408	60.6	51.9	config	model

nuScenes Test Set

Model	Backbone	Pretrain	Resolution	NDS	mAP	Config	Download
OPEN	V2-99	DD3D	640 x 1600	64.4	56.7	config	model

TODO

Release the paper.
Release the code of OPEN.

Citation

@inproceedings{
  hou2024open,
  title={OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection},
  author={Hou, Jinghua and Wang, Tong and Ye, Xiaoqing and Liu, Zhe and Tan, Xiao and Ding, Errui and Wang, Jingdong and Bai, Xiang},
  booktitle={ECCV},
  year={2024},
}

Acknowledgements

We thank these great works and open-source repositories: 3DPPE, StreamPETR, and MMDetection3D.

Name	Name	Last commit message	Last commit date
Latest commit AlmoonYsl update readme Sep 26, 2024 3830d1d · Sep 26, 2024 History 18 Commits
assets	assets	update readme	Jul 15, 2024
projects	projects	release code	Sep 23, 2024
tools	tools	release code	Sep 23, 2024
.gitignore	.gitignore	update readme	Jul 15, 2024
LICENSE	LICENSE	release code	Sep 23, 2024
README.md	README.md	update readme	Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OPEN

OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection

News

Results

TODO

Citation

Acknowledgements

About

Releases

Packages

Languages

License

AlmoonYsl/OPEN

Folders and files

Latest commit

History

Repository files navigation

OPEN

OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection

News

Results

TODO

Citation

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages