[AutoMM] refactor inferring problem type and output shape #3227

zhiqiangdon · 2023-05-19T20:26:31Z

Issue #, if available:
#3017 #3004

Description of changes:

Decouple problem type and output shape inference
Reuse core utils infer_problem_type.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

github-actions · 2023-05-23T07:37:07Z

Job PR-3227-6ff48cb is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3227/6ff48cb/index.html

github-actions · 2023-05-23T08:51:20Z

Job PR-3227-b60e093 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3227/b60e093/index.html

FANGAreNotGnu · 2023-05-24T23:25:02Z

multimodal/src/autogluon/multimodal/data/infer_types.py

-            )
+            return class_num
+        elif problem_type == REGRESSION:
+            if class_num <= 1:  # in case users want to try toy datasets


Is there a reason to raise error when class_num==1? I think it's fine if user want a sanity test with only one data row.

Updated to class_num < 1.

FANGAreNotGnu · 2023-05-24T23:26:15Z

multimodal/src/autogluon/multimodal/data/infer_types.py

-                f"The label column '{label_column}' has type"
-                f" '{column_types[label_column]}', which is not supported yet."
-            )
+        raise ValueError(


It might be confusing if we support CLASSIFICATION as input problem type but not include it in the supported problem types here.

Added CLASSIFICATION back.

FANGAreNotGnu

LGTM with 2 minor comments

github-actions · 2023-05-25T08:35:06Z

Job PR-3227-4f561bc is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3227/4f561bc/index.html

github-actions · 2023-05-25T08:38:38Z

Job PR-3227-0b46482 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3227/0b46482/index.html

Innixma · 2023-05-25T16:55:34Z

multimodal/src/autogluon/multimodal/data/infer_types.py

+        # For zero-shot inference, label column is unnecessary
+        if col_name not in column_types:
+            continue


Why are we calling this method in the first place if we are doing zero-shot inference?

In zero-shot inference, we still need the column types to create the dataframe preprocessor. See https://github.com/autogluon/autogluon/blob/master/multimodal/src/autogluon/multimodal/predictor.py#L1792-L1801

Innixma · 2023-05-25T16:59:16Z

multimodal/src/autogluon/multimodal/data/infer_types.py

-    column_types: Optional[Dict] = None,
-    data: Optional[pd.DataFrame] = None,
+def infer_problem_type(
+    y_train_data: Optional[pd.DataFrame] = None,


Why not adopt the following naming convention?

y = train
y_val = val
y_test = test

In tabular we would simply name y_train_data to y.

Innixma · 2023-05-25T17:05:14Z

multimodal/src/autogluon/multimodal/predictor.py

        )

-        # FIXME: separate infer problem_type with output_shape, should be logically distinct
-        _, output_shape = infer_problem_type_output_shape(
+        output_shape = infer_output_shape(


Move infer_output_shape to be directly after getting the problem type, since they are logically sequential, and generally unrelated to infer_column_types

Make sense. Will switch the location in next PRs.

zhiqiangdon added 2 commits May 19, 2023 10:00

refactor inferring problem type and output shape

6979e3d

fix

997011b

zhiqiangdon added the model list checked You have updated the model list after modifying multimodal unit tests/docs label May 22, 2023

zhiqiangdon added 3 commits May 22, 2023 17:49

fix

01ce6c2

fix

6ff48cb

update

b60e093

zhiqiangdon requested review from Innixma, FANGAreNotGnu and tonyhoo May 23, 2023 06:19

zhiqiangdon requested a review from yongxinw May 23, 2023 21:27

FANGAreNotGnu reviewed May 24, 2023

View reviewed changes

FANGAreNotGnu approved these changes May 24, 2023

View reviewed changes

zhiqiangdon added 2 commits May 24, 2023 22:58

address comments

4f561bc

update

0b46482

zhiqiangdon merged commit 1349d01 into autogluon:master May 25, 2023
29 checks passed

zhiqiangdon mentioned this pull request May 25, 2023

AutoMM incorrectly infers problem_type #3017

Closed

Innixma approved these changes May 25, 2023

View reviewed changes

zhiqiangdon deleted the mm-refactor branch May 25, 2023 23:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoMM] refactor inferring problem type and output shape #3227

[AutoMM] refactor inferring problem type and output shape #3227

zhiqiangdon commented May 19, 2023

github-actions bot commented May 23, 2023

github-actions bot commented May 23, 2023

FANGAreNotGnu May 24, 2023

zhiqiangdon May 25, 2023

FANGAreNotGnu May 24, 2023

zhiqiangdon May 25, 2023

FANGAreNotGnu left a comment

github-actions bot commented May 25, 2023

github-actions bot commented May 25, 2023

Innixma May 25, 2023

zhiqiangdon May 25, 2023

Innixma May 25, 2023

Innixma May 25, 2023

zhiqiangdon May 25, 2023

[AutoMM] refactor inferring problem type and output shape #3227

[AutoMM] refactor inferring problem type and output shape #3227

Conversation

zhiqiangdon commented May 19, 2023

github-actions bot commented May 23, 2023

github-actions bot commented May 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FANGAreNotGnu left a comment

Choose a reason for hiding this comment

github-actions bot commented May 25, 2023

github-actions bot commented May 25, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment