-
Notifications
You must be signed in to change notification settings - Fork 861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[multimodal] Add Foundation Model for Object Detection #3164
[multimodal] Add Foundation Model for Object Detection #3164
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per our offline discussion, we don't integrate the source code of grounding-dino into our repo. Instead, we let users install grounding-dino from source. Besides, we can submit two issues in the grounding-dino repo regarding the Pypi release and supporting tokenized tensor input for model.
Job PR-3164-67d55be is done. |
multimodal/src/autogluon/multimodal/configs/model/fusion_mlp_image_text_tabular.yaml
Outdated
Show resolved
Hide resolved
...odal/src/autogluon/multimodal/configs/pretrain/ovd/grounding_dino/GroundingDINO_SwinB.cfg.py
Outdated
Show resolved
Hide resolved
elif per_name == OVD: | ||
# create a multimodal processor for OVD. | ||
data_processors[OVD].append( | ||
create_data_processor( | ||
data_type=OVD, | ||
config=config, | ||
model=per_model, | ||
) | ||
) | ||
if data_types is not None and IMAGE in data_types: | ||
data_types.remove(IMAGE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OVD
is not a data type. If there are no correlations for processing image and text, we can create an image processor and a text processor for the ovd model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be correlations for processing image and bounding boxes if we support training. Will refactor out text processor in next PR. And will keep OVD data type until we combine this and traditional detection (mmdet) to both use ROIS data type.
@@ -2225,6 +2235,12 @@ def predict( | |||
detection_classes=self._model.model.CLASSES, | |||
result_path=None, | |||
) | |||
elif self._problem_type == OPEN_VOCABULARY_OBJECT_DETECTION: | |||
pred = save_ovd_result_df( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to save results for ovd? result_path=None
means not saving, but the function name is still misleading. By default, we need to return the same format as input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are returning dict if as_pandas
is False
. Here we reuse save_ovd_result_df
that both formatting result to df and save the df (if result_path is not None
). Later we will add save result feature for ovd (using this function) together with evaluation for ovd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
save_ovd_result_df
seems to always return a panda dataframe, even though the input data is a dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's also true for object detection. We only save panda dataframe. But will return list of dicts if input is not a dataframe and as_pandas=False
.
Job PR-3164-669fec5 is done. |
Job PR-3164-4a02883 is done. |
Job PR-3164-ac5478b is done. |
Job PR-3164-9bdb722 is done. |
Job PR-3164-6f247cd is done. |
docs/tutorials/multimodal/object_detection/quick_start/quick_start_ovd.ipynb
Outdated
Show resolved
Hide resolved
docs/tutorials/multimodal/object_detection/quick_start/quick_start_ovd.ipynb
Show resolved
Hide resolved
Job PR-3164-d8246ff is done. |
Job PR-3164-3a09673 is done. |
Job PR-3164-a7a9891 is done. |
Job PR-3164-0221d57 is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Considering adding a ovd unit test later.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.