Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grounding-DINO occupies the majority of Grounded-SAM's processing time. #421

Open
xiaobanni opened this issue Dec 21, 2023 · 3 comments
Open

Comments

@xiaobanni
Copy link

Thank you for your excellent work on the Grounded-Segment-Anything project. I've noticed that developers have recently incorporated various advanced SAM models, such as Efficient-SAM and RepViT-SAM. However, it appears that the Grounding-DINO module consumes most of the processing time in Grounded-SAM. As illustrated in the attached picture, while MobileSAM takes only 0.05s, Grounding-DINO requires 1.70s, which is significantly longer. Are there any plans to optimize the Grounding-DINO module, or is there an already available off-the-shelf solution?
image

@rentainhe
Copy link
Collaborator

Hello! For now, we do not have a smaller version of Grounding-DINO, you may replace grounding-dino with other light open-world models as the box prompt generator from the community.

@xiaobanni
Copy link
Author

@rentainhe Thank you for your quick and friendly response. As I am not a professional in the field of Image segmentation, but just want to use its technology in downstream applications. After researching, I didn't find any significantly usable alternatives to Grounding-DINO. Could you recommend some potential solutions for me to try? Also, I found that this need might be common, as evidenced by the widespread discussion in the following link.

@HaoqianSong
Copy link

Does GLIP have the same functions and effects? Compared with Grounding-DINO, can GLIP be seen as a combination of Grounding-DINO detector and BLIP? GLIP seems to have the functions of arbitrary text retrieval and object localization. Does it have the function of image description text output?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants