Grounding DINO 1.5

IDEA Research's Most Capable Open-World Object Detection Model Series.

The project provides examples for using the models, which are hosted on DeepDataSpace.

✨ First-Time Application: If you are interested in our project and wish to try our algorithm, you will need to apply for the corresponding API Token through our request API token website for your first attempt.

📌 Request Additional Token Quotas: If you find our project helpful and need more API token quotas, you can request additional tokens by filling out this form. Our team will review your request and allocate more tokens for your use in one or two days. You can also apply for more tokens by sending us an email.

Grounding.DINO.1.5.Pro.mp4

Introduction

We introduce Grounding DINO 1.5, a suite of advanced open-set object detection models developed by IDEA Research, which aims to advanced the "Edge" of open-set object detection. The suite encompasses two models:

Grounding DINO 1.5 Pro: Our most capable model for open-set object detection, which is designed for stronger generalization capability across a wide range of scenarios.
Grounding DINO 1.5 Edge: Our most efficient model for edge computing scenarios, which is optimized for faster speed demanded in many applications requiring edge deployment.

Note: We use "edge" for its dual meaning both as in pushing the boundaries and as in running on edge devices.

Model Framework

The overall framework of Grounding DINO 1.5 is as the following image:

Grounding DINO 1.5 Pro preserves the core architecture of Grounding DINO which employs a deep early fusion architecture.

Performance

Side-by-Side Performance Comparison with Grounding DINO

Zero-Shot Transfer Results of Grounding DINO 1.5 Pro

Model	COCO ^{^{(AP box)}}	LVIS-minival ^{^{(AP all)}}	LVIS-minival ^{^{(AP rare)}}	LVIS-val ^{^{(AP all)}}	LVIS-val ^{^{(AP rare)}}	ODinW35 ^{^{(AP avg)}}	ODinW13 ^{^{(AP avg)}}
Other Best Open-Set Model	53.4 ^{^{(OmDet-Turbo)}}	47.6 ^{^{(T-Rex2 visual)}}	45.4 ^{^{(T-Rex2 visual)}}	45.3 ^{^{(T-Rex2 visual)}}	43.8 ^{^{(T-Rex2 visual)}}	30.1 ^{^{(OmDet-Turbo)}}	59.8 ^{^(APE-B)}
DetCLIPv3	-	48.8	49.9	41.4	41.4	-	-
Grounding DINO	52.5	27.4	18.1	-	-	26.1	56.9
T-Rex2 (text)	52.2	54.9	49.2	45.8	42.7	22.0	-
Grounding DINO 1.5 Pro	54.3	55.7	56.1	47.6	44.6	30.2	58.7

Grounding DINO 1.5 Pro achieves SOTA performance on COCO, LVIS-minival, LVIS-val, and ODinW35 zero-shot transfer benchmarks.

Fine-tuning Results on Downstream Datasets

Model	LVIS-minival ^{^{(AP all)}}	LVIS-minival ^{^{(AP rare)}}	LVIS-val ^{^{(AP all)}}	LVIS-val ^{^{(AP rare)}}	ODinW35 ^{^{(AP avg)}}	ODinW13 ^{^{(AP avg)}}
GLIP	-	-	-	-	-	68.9
GLEE-Pro	-	-	-	-	-	69.0
GLIPv2	59.8	-	-	-	-	70.4
OWL-ST + FT †	54.4	46.1	49.4	44.6	-	-
DetCLIPv2	58.3	60.1	53.1	49.0	-	70.4
DetCLIPv3	60.5	60.7	-	-	-	72.1
DetCLIPv3 †	60.8	56.7	54.1	45.8	-	-
Grounding DINO 1.5 Pro (zero-shot)	55.7	56.1	47.6	44.6	30.2	58.7
Grounding DINO 1.5 Pro	68.1	68.7	63.5	64.0	70.6	72.4

† indicates results of fine-tuning with LVIS base categories only.

API Usage

1. Installation

pip install -v -e .

2. Request API from DeepDataSpace

Refer to the DeepDataSpace for API keys: https://deepdataspace.com/request_api

3. Runing demo code

python demo/demo.py --token <API_TOKEN>

4. Online Grdio demo

python gradio_app.py --token <API_TOKEN>

Case Analysis and Qualitative Visualization

Common Object Detection

Long-tailed Object Detection

Short Caption Grounding

Long Caption Grounding

Dense Object Detection

Video Object Detection

Advanced Object Detection on Edge Devices

Related Work

Grounding DINO: Strong open-set object detection model.
Grounded-Segment-Anything: Open-set detection and segmentation model by combining Grounding DINO with SAM.
T-Rex/T-Rex2: Generic open-set detection model supporting both text and visual prompts.

LICENSE

Grounding DINO 1.5 API License

Grounding DINO 1.5 is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use these files except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

BibTeX

If you find our work helpful for your research, please consider citing the following BibTeX entry.

@misc{ren2024grounding,
      title={Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection}, 
      author={Tianhe Ren and Qing Jiang and Shilong Liu and Zhaoyang Zeng and Wenlong Liu and Han Gao and Hongjie Huang and Zhengyu Ma and Xiaoke Jiang and Yihao Chen and Yuda Xiong and Hao Zhang and Feng Li and Peijun Tang and Kent Yu and Lei Zhang},
      year={2024},
      eprint={2405.10300},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@misc{jiang2024trex2,
      title={T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy}, 
      author={Qing Jiang and Feng Li and Zhaoyang Zeng and Tianhe Ren and Shilong Liu and Lei Zhang},
      year={2024},
      eprint={2403.14610},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@article{liu2023grounding,
  title={Grounding dino: Marrying dino with grounded pre-training for open-set object detection},
  author={Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Li, Chunyuan and Yang, Jianwei and Su, Hang and Zhu, Jun and others},
  journal={arXiv preprint arXiv:2303.05499},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
asset		asset
demo		demo
gdino		gdino
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gradio_app.py		gradio_app.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Grounding DINO 1.5

Contents

Introduction

Model Framework

Performance

Side-by-Side Performance Comparison with Grounding DINO

Zero-Shot Transfer Results of Grounding DINO 1.5 Pro

Fine-tuning Results on Downstream Datasets

API Usage

1. Installation

2. Request API from DeepDataSpace

3. Runing demo code

4. Online Grdio demo

Case Analysis and Qualitative Visualization

Related Work

LICENSE

BibTeX

About

Releases

Packages

Contributors 3

Languages

License

IDEA-Research/Grounding-DINO-1.5-API

Folders and files

Latest commit

History

Repository files navigation

Grounding DINO 1.5

Contents

Introduction

Model Framework

Performance

Side-by-Side Performance Comparison with Grounding DINO

Zero-Shot Transfer Results of Grounding DINO 1.5 Pro

Fine-tuning Results on Downstream Datasets

API Usage

1. Installation

2. Request API from DeepDataSpace

3. Runing demo code

4. Online Grdio demo

Case Analysis and Qualitative Visualization

Related Work

LICENSE

BibTeX

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages