Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoMM] refactor inferring problem type and output shape #3227

Merged
merged 7 commits into from
May 25, 2023

Conversation

zhiqiangdon
Copy link
Contributor

Issue #, if available:
#3017 #3004

Description of changes:

  • Decouple problem type and output shape inference
  • Reuse core utils infer_problem_type.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@zhiqiangdon zhiqiangdon added the model list checked You have updated the model list after modifying multimodal unit tests/docs label May 22, 2023
@github-actions
Copy link

Job PR-3227-6ff48cb is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3227/6ff48cb/index.html

@github-actions
Copy link

Job PR-3227-b60e093 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3227/b60e093/index.html

@zhiqiangdon zhiqiangdon requested a review from yongxinw May 23, 2023 21:27
)
return class_num
elif problem_type == REGRESSION:
if class_num <= 1: # in case users want to try toy datasets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to raise error when class_num==1? I think it's fine if user want a sanity test with only one data row.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to class_num < 1.

f"The label column '{label_column}' has type"
f" '{column_types[label_column]}', which is not supported yet."
)
raise ValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be confusing if we support CLASSIFICATION as input problem type but not include it in the supported problem types here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added CLASSIFICATION back.

Copy link
Contributor

@FANGAreNotGnu FANGAreNotGnu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with 2 minor comments

@github-actions
Copy link

Job PR-3227-4f561bc is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3227/4f561bc/index.html

@github-actions
Copy link

Job PR-3227-0b46482 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3227/0b46482/index.html

@zhiqiangdon zhiqiangdon merged commit 1349d01 into autogluon:master May 25, 2023
29 checks passed
Comment on lines +637 to +639
# For zero-shot inference, label column is unnecessary
if col_name not in column_types:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we calling this method in the first place if we are doing zero-shot inference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In zero-shot inference, we still need the column types to create the dataframe preprocessor. See https://github.com/autogluon/autogluon/blob/master/multimodal/src/autogluon/multimodal/predictor.py#L1792-L1801

column_types: Optional[Dict] = None,
data: Optional[pd.DataFrame] = None,
def infer_problem_type(
y_train_data: Optional[pd.DataFrame] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not adopt the following naming convention?

y = train
y_val = val
y_test = test

In tabular we would simply name y_train_data to y.

)

# FIXME: separate infer problem_type with output_shape, should be logically distinct
_, output_shape = infer_problem_type_output_shape(
output_shape = infer_output_shape(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move infer_output_shape to be directly after getting the problem type, since they are logically sequential, and generally unrelated to infer_column_types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. Will switch the location in next PRs.

@zhiqiangdon zhiqiangdon deleted the mm-refactor branch May 25, 2023 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model list checked You have updated the model list after modifying multimodal unit tests/docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants