Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make input a dictionary for multi-modal object detection #95

Merged
merged 12 commits into from
Mar 8, 2023

Conversation

mzweilin
Copy link
Contributor

@mzweilin mzweilin commented Mar 7, 2023

What does this PR do?

Using dictionary input like {"rgb": tensor1, "depth": tensor2} should make it easier to compose multi-modal adversaries.

The dictionary is later converted back to tensor so that models understand the input.

This is backward compatible with single-modal object detection, because single-modal is a special case of multi-modal.

Type of change

Please check all relevant options.

  • Improvement (non-breaking)
  • Bug fix (non-breaking)
  • New feature (non-breaking)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Testing

Please describe the tests that you ran to verify your changes. Consider listing any relevant details of your test configuration.

  • Test A
  • Test B

Before submitting

  • The title is self-explanatory and the description concisely explains the PR
  • My PR does only one thing, instead of bundling different changes together
  • I list all the breaking changes introduced by this pull request
  • I have commented my code
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have run pre-commit hooks with pre-commit run -a command without errors

Did you have fun?

Make sure you had fun coding 🙃

@mzweilin mzweilin marked this pull request as draft March 7, 2023 17:32
@mzweilin mzweilin requested a review from dxoigmn March 7, 2023 17:47
@mzweilin mzweilin marked this pull request as ready for review March 7, 2023 17:47
Copy link
Contributor

@dxoigmn dxoigmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write a test?

mart/transforms/transforms.py Outdated Show resolved Hide resolved
mart/datamodules/coco.py Show resolved Hide resolved
@@ -2,6 +2,7 @@

defaults:
- COCO_TorchvisionFasterRCNN
- override /model/detection@model.modules.preprocessor: preprocessor_multi_modal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this the default now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A preprocessor is required and I made the single-modal normalizer the default one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that specifies the multimodal one?

_target_: torchvision.transforms.Compose
transforms:
- _target_: mart.transforms.GetItems
keys: ${datamodule.test_dataset.modalities}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. These keys need to be specified independently because I could load the images in [depth, rgb] order but require they be order as [rgb, depth] for the model.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also be nice if this could use the yaml file below (preprocessor_single_modal.yaml).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it use the other yaml file.

I think it's convenient to use interpolation by default. We can always change the keys in experiment.yaml or in the command line if we encounter that rare situation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But isn't there a silent bug if I switch the datamodule modality from [rgb, depth] to [depth, rgb]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure...

@@ -0,0 +1,12 @@
# @package model.modules.preprocessor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not create a modules directory? Why does this live under detection when it has nothing to do with detection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model/detection -> model/modules

@@ -0,0 +1,6 @@
# @package model.modules.preprocessor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not create a modules directory? Why does this live under detection when it has nothing to do with detection?

I also think this should just be preprocessor.yaml or something like that. Perhaps 8bit_preprocessor.yaml or something indicating that this is doing 0-255 normalization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I named it tuple_normalizer.yaml

@mzweilin mzweilin requested a review from dxoigmn March 8, 2023 19:22
@dxoigmn dxoigmn changed the title Make dictionary input for multi-modal object detection Make input a dictionary for multi-modal object detection Mar 8, 2023
Copy link
Contributor

@dxoigmn dxoigmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mzweilin mzweilin merged commit de77a9d into main Mar 8, 2023
@mzweilin mzweilin deleted the dict_input_objdet branch March 8, 2023 23:59
dxoigmn added a commit that referenced this pull request Jun 30, 2023
mzweilin added a commit that referenced this pull request Jul 14, 2023
mzweilin added a commit that referenced this pull request Jul 15, 2023
mzweilin pushed a commit that referenced this pull request Jul 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants