-
Notifications
You must be signed in to change notification settings - Fork 862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoMM] Support using data path in fit() #3006
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Great feature!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Job PR-3006-7c81474 is done. |
def split_train_tuning_data( | ||
train_data: Union[pd.DataFrame, str], | ||
tuning_data: Optional[Union[pd.DataFrame, str]] = None, | ||
holdout_frac: Optional[float] = None, | ||
is_classification: Optional[bool] = False, | ||
label_column: Optional[str] = None, | ||
seed: Optional[int] = 123, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data loading should not be done during data splitting. This is not the logical place for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have considered loading data in predictor.py
before calling split_train_tuning_data
, but I need repeat the same code of loading data in matcher.py
, which increase the boilerplate code. How about changing this function name to prepare_train_tuning_data
to make it more than just splitting data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering PR #3004, move it outside of split_train_tuning_data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Innixma Any further questions on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is moved outside, I am ok to merge.
The ideal solution long term is using a mix-in or having both predictor.py and matcher.py inherit from the same abstract class. This would avoid code dupe.
Job PR-3006-1f12920 is done. |
Job PR-3006-13509b2 is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Issue #, if available:
Description of changes:
Support passing path of training data in fit().
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.