Make input a dictionary for multi-modal object detection #95

mzweilin · 2023-03-07T17:31:23Z

What does this PR do?

Using dictionary input like {"rgb": tensor1, "depth": tensor2} should make it easier to compose multi-modal adversaries.

The dictionary is later converted back to tensor so that models understand the input.

This is backward compatible with single-modal object detection, because single-modal is a special case of multi-modal.

Type of change

Please check all relevant options.

Improvement (non-breaking)
Bug fix (non-breaking)
New feature (non-breaking)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Testing

Please describe the tests that you ran to verify your changes. Consider listing any relevant details of your test configuration.

Test A
Test B

Before submitting

The title is self-explanatory and the description concisely explains the PR
My PR does only one thing, instead of bundling different changes together
I list all the breaking changes introduced by this pull request
I have commented my code
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have run pre-commit hooks with pre-commit run -a command without errors

Did you have fun?

Make sure you had fun coding 🙃

dxoigmn

Can you write a test?

mart/transforms/transforms.py

mart/datamodules/coco.py

dxoigmn · 2023-03-07T21:33:12Z

mart/configs/experiment/ArmoryCarlaOverObjDet_TorchvisionFasterRCNN.yaml

@@ -2,6 +2,7 @@

 defaults:
  - COCO_TorchvisionFasterRCNN
+  - override /model/detection@model.modules.preprocessor: preprocessor_multi_modal


Why is this the default now?

A preprocessor is required and I made the single-modal normalizer the default one.

But that specifies the multimodal one?

dxoigmn · 2023-03-07T21:36:03Z

mart/configs/model/detection/preprocessor_multi_modal.yaml

+  _target_: torchvision.transforms.Compose
+  transforms:
+    - _target_: mart.transforms.GetItems
+      keys: ${datamodule.test_dataset.modalities}


I don't think this is correct. These keys need to be specified independently because I could load the images in [depth, rgb] order but require they be order as [rgb, depth] for the model.

Would also be nice if this could use the yaml file below (preprocessor_single_modal.yaml).

Now it use the other yaml file.

I think it's convenient to use interpolation by default. We can always change the keys in experiment.yaml or in the command line if we encounter that rare situation.

But isn't there a silent bug if I switch the datamodule modality from [rgb, depth] to [depth, rgb]?

dxoigmn · 2023-03-07T21:36:53Z

mart/configs/model/detection/preprocessor_multi_modal.yaml

@@ -0,0 +1,12 @@
+# @package model.modules.preprocessor


Why not create a modules directory? Why does this live under detection when it has nothing to do with detection?

model/detection -> model/modules

dxoigmn · 2023-03-07T21:37:57Z

mart/configs/model/detection/preprocessor_single_modal.yaml

@@ -0,0 +1,6 @@
+# @package model.modules.preprocessor


Why not create a modules directory? Why does this live under detection when it has nothing to do with detection?

I also think this should just be preprocessor.yaml or something like that. Perhaps 8bit_preprocessor.yaml or something indicating that this is doing 0-255 normalization?

I named it tuple_normalizer.yaml

dxoigmn

LGTM!

This reverts commit de77a9d.

…" (#187) This reverts commit de77a9d.

Make dict input of images and add conversion in the model.

cc23955

mzweilin marked this pull request as draft March 7, 2023 17:32

Backward compatibility with modalities=null.

2d8a81b

mzweilin requested a review from dxoigmn March 7, 2023 17:47

mzweilin marked this pull request as ready for review March 7, 2023 17:47

mzweilin added 3 commits March 7, 2023 11:46

Add assertion check and comments.

ed5fd58

Move multi-modal configuration to experiments that actually need it.

3cadcf5

Make modules.preprocessor configurable.

e26b0c5

dxoigmn requested changes Mar 7, 2023

View reviewed changes

mzweilin added 7 commits March 7, 2023 16:09

model/detection->model/modules

e19fbe1

Delete unused code.

fd4f7a9

Make the same Compose structure.

216e7e9

Rename preprocessor as tuple_normalizer

f299590

Remove modalities interpolation in preprocessor.

4cf0543

Rename a preprocessor to tuple_tensorizer_normalizer

8858189

Comment.

bcdfb3c

mzweilin requested a review from dxoigmn March 8, 2023 19:22

dxoigmn changed the title ~~Make dictionary input for multi-modal object detection~~ Make input a dictionary for multi-modal object detection Mar 8, 2023

dxoigmn approved these changes Mar 8, 2023

View reviewed changes

mzweilin merged commit de77a9d into main Mar 8, 2023

mzweilin deleted the dict_input_objdet branch March 8, 2023 23:59

dxoigmn added a commit that referenced this pull request Jun 30, 2023

Revert "Make input a dictionary for multi-modal object detection (#95)"

95111d1

This reverts commit de77a9d.

mzweilin added a commit that referenced this pull request Jul 14, 2023

Revert "Make input a dictionary for multi-modal object detection (#95)"

64b6e25

This reverts commit de77a9d.

mzweilin added a commit that referenced this pull request Jul 15, 2023

Revert "Make input a dictionary for multi-modal object detection (#95)"

c5bf847

This reverts commit de77a9d.

mzweilin pushed a commit that referenced this pull request Jul 15, 2023

Revert "Make input a dictionary for multi-modal object detection (#95)…

2cfc963

…" (#187) This reverts commit de77a9d.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make input a dictionary for multi-modal object detection #95

Make input a dictionary for multi-modal object detection #95

mzweilin commented Mar 7, 2023 •

edited

Loading

dxoigmn left a comment

dxoigmn Mar 7, 2023

mzweilin Mar 8, 2023

dxoigmn Mar 8, 2023

dxoigmn Mar 7, 2023

dxoigmn Mar 7, 2023

mzweilin Mar 8, 2023

dxoigmn Mar 8, 2023

mzweilin Mar 8, 2023

dxoigmn Mar 7, 2023

mzweilin Mar 8, 2023

dxoigmn Mar 7, 2023

mzweilin Mar 8, 2023

dxoigmn left a comment

Make input a dictionary for multi-modal object detection #95

Make input a dictionary for multi-modal object detection #95

Conversation

mzweilin commented Mar 7, 2023 • edited Loading

What does this PR do?

Type of change

Testing

Before submitting

Did you have fun?

dxoigmn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dxoigmn left a comment

Choose a reason for hiding this comment

mzweilin commented Mar 7, 2023 •

edited

Loading