-
Notifications
You must be signed in to change notification settings - Fork 862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoMM] refactor inferring problem type and output shape #3227
Conversation
Job PR-3227-6ff48cb is done. |
Job PR-3227-b60e093 is done. |
) | ||
return class_num | ||
elif problem_type == REGRESSION: | ||
if class_num <= 1: # in case users want to try toy datasets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to raise error when class_num==1
? I think it's fine if user want a sanity test with only one data row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to class_num < 1
.
f"The label column '{label_column}' has type" | ||
f" '{column_types[label_column]}', which is not supported yet." | ||
) | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be confusing if we support CLASSIFICATION
as input problem type but not include it in the supported problem types here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added CLASSIFICATION
back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with 2 minor comments
Job PR-3227-4f561bc is done. |
Job PR-3227-0b46482 is done. |
# For zero-shot inference, label column is unnecessary | ||
if col_name not in column_types: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we calling this method in the first place if we are doing zero-shot inference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In zero-shot inference, we still need the column types to create the dataframe preprocessor. See https://github.com/autogluon/autogluon/blob/master/multimodal/src/autogluon/multimodal/predictor.py#L1792-L1801
column_types: Optional[Dict] = None, | ||
data: Optional[pd.DataFrame] = None, | ||
def infer_problem_type( | ||
y_train_data: Optional[pd.DataFrame] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not adopt the following naming convention?
y
= train
y_val
= val
y_test
= test
In tabular we would simply name y_train_data
to y
.
) | ||
|
||
# FIXME: separate infer problem_type with output_shape, should be logically distinct | ||
_, output_shape = infer_problem_type_output_shape( | ||
output_shape = infer_output_shape( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move infer_output_shape
to be directly after getting the problem type, since they are logically sequential, and generally unrelated to infer_column_types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. Will switch the location in next PRs.
Issue #, if available:
#3017 #3004
Description of changes:
infer_problem_type
.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.