Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multimodal] Add Foundation Model for Object Detection #3164

Merged
merged 38 commits into from
May 15, 2023

Conversation

FANGAreNotGnu
Copy link
Contributor

@FANGAreNotGnu FANGAreNotGnu commented Apr 19, 2023

  • Integrate GroundingDino into Autogluon to support open vocabulary detection.
  • Add open vocabulary detection problem type.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@FANGAreNotGnu FANGAreNotGnu added model list checked You have updated the model list after modifying multimodal unit tests/docs and removed model list checked You have updated the model list after modifying multimodal unit tests/docs labels Apr 21, 2023
Copy link
Contributor

@zhiqiangdon zhiqiangdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our offline discussion, we don't integrate the source code of grounding-dino into our repo. Instead, we let users install grounding-dino from source. Besides, we can submit two issues in the grounding-dino repo regarding the Pypi release and supporting tokenized tensor input for model.

@FANGAreNotGnu FANGAreNotGnu added the model list checked You have updated the model list after modifying multimodal unit tests/docs label Apr 25, 2023
@github-actions
Copy link

Job PR-3164-67d55be is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3164/67d55be/index.html

.github/workflow_scripts/env_setup.sh Outdated Show resolved Hide resolved
examples/automm/ovd/ovd_demo.py Outdated Show resolved Hide resolved
multimodal/src/autogluon/multimodal/optimization/utils.py Outdated Show resolved Hide resolved
Comment on lines +289 to +299
elif per_name == OVD:
# create a multimodal processor for OVD.
data_processors[OVD].append(
create_data_processor(
data_type=OVD,
config=config,
model=per_model,
)
)
if data_types is not None and IMAGE in data_types:
data_types.remove(IMAGE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OVD is not a data type. If there are no correlations for processing image and text, we can create an image processor and a text processor for the ovd model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be correlations for processing image and bounding boxes if we support training. Will refactor out text processor in next PR. And will keep OVD data type until we combine this and traditional detection (mmdet) to both use ROIS data type.

multimodal/src/autogluon/multimodal/data/process_ovd.py Outdated Show resolved Hide resolved
@@ -2225,6 +2235,12 @@ def predict(
detection_classes=self._model.model.CLASSES,
result_path=None,
)
elif self._problem_type == OPEN_VOCABULARY_OBJECT_DETECTION:
pred = save_ovd_result_df(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to save results for ovd? result_path=None means not saving, but the function name is still misleading. By default, we need to return the same format as input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are returning dict if as_pandas is False. Here we reuse save_ovd_result_df that both formatting result to df and save the df (if result_path is not None). Later we will add save result feature for ovd (using this function) together with evaluation for ovd.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

save_ovd_result_df seems to always return a panda dataframe, even though the input data is a dict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's also true for object detection. We only save panda dataframe. But will return list of dicts if input is not a dataframe and as_pandas=False.

@github-actions
Copy link

github-actions bot commented May 4, 2023

Job PR-3164-669fec5 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3164/669fec5/index.html

@github-actions
Copy link

github-actions bot commented May 6, 2023

Job PR-3164-4a02883 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3164/4a02883/index.html

@github-actions
Copy link

github-actions bot commented May 8, 2023

Job PR-3164-ac5478b is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3164/ac5478b/index.html

@github-actions
Copy link

github-actions bot commented May 8, 2023

Job PR-3164-9bdb722 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3164/9bdb722/index.html

@github-actions
Copy link

Job PR-3164-6f247cd is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3164/6f247cd/index.html

@github-actions
Copy link

Job PR-3164-d8246ff is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3164/d8246ff/index.html

@github-actions
Copy link

Job PR-3164-3a09673 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3164/3a09673/index.html

@github-actions
Copy link

Job PR-3164-a7a9891 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3164/a7a9891/index.html

@github-actions
Copy link

Job PR-3164-0221d57 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3164/0221d57/index.html

Copy link
Contributor

@zhiqiangdon zhiqiangdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Considering adding a ovd unit test later.

@zhiqiangdon zhiqiangdon merged commit ee0967d into autogluon:master May 15, 2023
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model list checked You have updated the model list after modifying multimodal unit tests/docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants