-
Notifications
You must be signed in to change notification settings - Fork 862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AutoMM: Add consistent seed, stratification, alignment with tabular for train-val split #3004
AutoMM: Add consistent seed, stratification, alignment with tabular for train-val split #3004
Conversation
Job PR-3004-9175664 is done. |
The ratio of data to use as validation. | ||
If 0.2, 20% of the data will be used for validation, and 80% for training. | ||
If None, the ratio is automatically determined, | ||
ranging from 0.2 for small row count to 0.01 for large row count. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we changing train-test split ratios logic to match tabular here?
If we are changing, this might not be the right thing to do for text models. Generally you’ll need more text rows for validation than in tabular models.
Specifically, cases like when you have 100k rows, giving 0.01 of that for validation might not enough information for stable validation. On top of that, generally we don't train multiple multi-modal models unlike the defaults in tabular top presets => more strict requirements for generalization verification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ratios are not changed. It was already matching Tabular. Also, 100k will not have 0.01, it will have 0.025 (2500).
@@ -530,7 +529,7 @@ def fit( | |||
column_types: Optional[dict] = None, | |||
holdout_frac: Optional[float] = None, | |||
teacher_predictor: Union[str, MultiModalPredictor] = None, | |||
seed: Optional[int] = 123, | |||
seed: Optional[int] = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to compare the performance on AutoMM benchmark datasets before and after changing the seed. If there is no performance drop, we can make this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a performance drop purely by changing the seed, that just means we were overfit on the seed. It shouldn't be a blocker for merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But yes, I would recommend a benchmark run on this PR, since there are other impactful changes, such as using stratification during splits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tonyhoo Do we need to test this PR on the vision and text benchmarks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the tutorial generated from this PR as an indicator on the performance impact? If there are some concerns raised from there, I am fine rerun the NLP and CV benchmark
@@ -718,14 +709,28 @@ def fit( | |||
if self._config is not None: # continuous training | |||
config = self._config | |||
|
|||
problem_type, output_shape = infer_problem_type_output_shape( | |||
# FIXME: Align logic with Tabular, | |||
# don't combine output_shape and problem_type detection, make them separate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. We need to separate the logic. The data split inconsistency between two fit() is due to that we tie them together and don't infer problem type before splitting data.
@@ -687,15 +687,6 @@ def fit( | |||
fit_called=fit_called, | |||
) | |||
|
|||
train_data, tuning_data = split_train_tuning_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving split_train_tuning_data
after infer_column_types
would cause issues since infer_column_types
requires to use tuning_data
in training. Otherwise, columns with one unique value can't be ignored. See https://github.com/autogluon/autogluon/blob/master/multimodal/src/autogluon/multimodal/data/infer_types.py#L504-L528
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to resolve this. Note that it is a bit awkward to resolve this given how logic is being coupled together in the current functions, so I've added TODOs and FIXMEs where I think it can be streamlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
infer_column_types
requires tuning_data
in training.
fixed, although I will note the code as previously written made it very easy to do this incorrectly. We should look to refactor the code so such bugs are harder to accidentally introduce. |
cfde794
to
633433c
Compare
Job PR-3004-633433c is done. |
Job PR-3004-cfde794 is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We can refactor the logic of inferring problem type and output shape later.
633433c
to
4398e43
Compare
@zhiqiangdon rebased to resolve merge conflict. Please feel free to merge when you are ready. |
CI has one unit test error. |
Should be fixed in latest commit, thanks! |
69484dc
to
b084d3c
Compare
Job PR-3004-b084d3c is done. |
Issue #, if available:
#3003
Description of changes:
Reproducing
Tabular & MultiModal Identical Result
Now the following code produces identical results for Tabular and AutoMM (previously did not due to (1) seed difference, (2) data split logic difference:
Output:
MultiModalPredictor Continuous Training Identical Split
Now AutoMM uses the same train/val split when passed the same training data across multiple fit calls with the same seed. This avoids corrupting the validation score across multiple fit calls (#3003)
You can check the difference by running this code (Note: You'll need to add print statements to the train/val data rows or enter debugger to see that they are misaligned in mainline, and aligned in this PR):
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.