AutoMM: Add consistent seed, stratification, alignment with tabular for train-val split #3004

Innixma · 2023-03-06T01:50:24Z

Issue #, if available:

Description of changes:

Previously AutoMM would split the same data differently even with the same seed when calling fit multiple times. Now it will use the same split if called with same data and same seed.
Changed default seed to 0 to align with Tabular
Changed splitting logic to be identical to Tabular, including adding stratification and other guardrails such as ensuring at least 1 sample per class exists in train data after split. Stratification should help improve AutoMM's test scores, especially when train_data has a small amount of rows.

Reproducing

Tabular & MultiModal Identical Result

Now the following code produces identical results for Tabular and AutoMM (previously did not due to (1) seed difference, (2) data split logic difference:

from autogluon.core.utils.loaders import load_pd

train_data = load_pd.load('https://autogluon-text.s3-accelerate.amazonaws.com/glue/sst/train.parquet')
test_data = load_pd.load('https://autogluon-text.s3-accelerate.amazonaws.com/glue/sst/dev.parquet')
subsample_size = 1000  # subsample data for faster demo, try setting this to larger values
train_data = train_data.sample(n=subsample_size, random_state=0)

label = 'label'
eval_metric = 'accuracy'

from autogluon.tabular import TabularPredictor
predictor = TabularPredictor(label=label, eval_metric=eval_metric)
predictor.fit(
    train_data,
    hyperparameters={'AG_AUTOMM': {
        "optimization.max_epochs": 2,
    }},
    fit_weighted_ensemble=False,
)

test_score = predictor.evaluate(test_data)
print(test_score)

from autogluon.multimodal import MultiModalPredictor
predictor = MultiModalPredictor(label=label, eval_metric=eval_metric)
predictor.fit(
    train_data,
    hyperparameters={
        "optimization.max_epochs": 2,
    }
)

test_score = predictor.evaluate(test_data)
print(test_score)

Output:

# TabularPredictor
{'accuracy': 0.8279816513761468, 'balanced_accuracy': 0.8292708596446914, 'mcc': 0.6639300307917014, 'roc_auc': 0.9188057800791447, 'f1': 0.8179611650485437, 'precision': 0.8868421052631579, 'recall': 0.759009009009009}
# MultiModalPredictor
{'accuracy': 0.8279816513761468}

MultiModalPredictor Continuous Training Identical Split

Now AutoMM uses the same train/val split when passed the same training data across multiple fit calls with the same seed. This avoids corrupting the validation score across multiple fit calls (#3003)

You can check the difference by running this code (Note: You'll need to add print statements to the train/val data rows or enter debugger to see that they are misaligned in mainline, and aligned in this PR):

from autogluon.core.utils.loaders import load_pd

train_data = load_pd.load('https://autogluon-text.s3-accelerate.amazonaws.com/glue/sst/train.parquet')
test_data = load_pd.load('https://autogluon-text.s3-accelerate.amazonaws.com/glue/sst/dev.parquet')
subsample_size = 1000  # subsample data for faster demo, try setting this to larger values
train_data = train_data.sample(n=subsample_size, random_state=0)

label = 'label'
eval_metric = 'accuracy'

from autogluon.multimodal import MultiModalPredictor
predictor = MultiModalPredictor(label=label, eval_metric=eval_metric)

predictor.fit(
    train_data,
    hyperparameters={
        "optimization.max_epochs": 2,
    }
)

predictor.fit(
    train_data,
    hyperparameters={
        "optimization.max_epochs": 2,
    }
)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

github-actions · 2023-03-06T04:15:24Z

Job PR-3004-9175664 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3004/9175664/index.html

gradientsky · 2023-03-06T22:58:55Z

multimodal/src/autogluon/multimodal/predictor.py

+            The ratio of data to use as validation.
+            If 0.2, 20% of the data will be used for validation, and 80% for training.
+            If None, the ratio is automatically determined,
+            ranging from 0.2 for small row count to 0.01 for large row count.


Are we changing train-test split ratios logic to match tabular here?

If we are changing, this might not be the right thing to do for text models. Generally you’ll need more text rows for validation than in tabular models.

Specifically, cases like when you have 100k rows, giving 0.01 of that for validation might not enough information for stable validation. On top of that, generally we don't train multiple multi-modal models unlike the defaults in tabular top presets => more strict requirements for generalization verification.

The ratios are not changed. It was already matching Tabular. Also, 100k will not have 0.01, it will have 0.025 (2500).

zhiqiangdon · 2023-03-07T00:26:05Z

multimodal/src/autogluon/multimodal/predictor.py

@@ -530,7 +529,7 @@ def fit(
        column_types: Optional[dict] = None,
        holdout_frac: Optional[float] = None,
        teacher_predictor: Union[str, MultiModalPredictor] = None,
-        seed: Optional[int] = 123,
+        seed: Optional[int] = 0,


We may need to compare the performance on AutoMM benchmark datasets before and after changing the seed. If there is no performance drop, we can make this change.

If we have a performance drop purely by changing the seed, that just means we were overfit on the seed. It shouldn't be a blocker for merging.

But yes, I would recommend a benchmark run on this PR, since there are other impactful changes, such as using stratification during splits.

@tonyhoo Do we need to test this PR on the vision and text benchmarks?

Can we use the tutorial generated from this PR as an indicator on the performance impact? If there are some concerns raised from there, I am fine rerun the NLP and CV benchmark

zhiqiangdon · 2023-03-07T01:22:38Z

multimodal/src/autogluon/multimodal/predictor.py

@@ -718,14 +709,28 @@ def fit(
        if self._config is not None:  # continuous training
            config = self._config

-        problem_type, output_shape = infer_problem_type_output_shape(
+        # FIXME: Align logic with Tabular,
+        #  don't combine output_shape and problem_type detection, make them separate


Agree. We need to separate the logic. The data split inconsistency between two fit() is due to that we tie them together and don't infer problem type before splitting data.

zhiqiangdon · 2023-03-07T05:23:06Z

multimodal/src/autogluon/multimodal/predictor.py

@@ -687,15 +687,6 @@ def fit(
            fit_called=fit_called,
        )

-        train_data, tuning_data = split_train_tuning_data(


Moving split_train_tuning_data after infer_column_types would cause issues since infer_column_types requires to use tuning_data in training. Otherwise, columns with one unique value can't be ignored. See https://github.com/autogluon/autogluon/blob/master/multimodal/src/autogluon/multimodal/data/infer_types.py#L504-L528

Updated to resolve this. Note that it is a bit awkward to resolve this given how logic is being coupled together in the current functions, so I've added TODOs and FIXMEs where I think it can be streamlined.

zhiqiangdon

infer_column_types requires tuning_data in training.

Innixma · 2023-03-07T21:02:55Z

infer_column_types requires tuning_data in training.

fixed, although I will note the code as previously written made it very easy to do this incorrectly. We should look to refactor the code so such bugs are harder to accidentally introduce.

github-actions · 2023-03-07T22:50:46Z

Job PR-3004-633433c is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3004/633433c/index.html

github-actions · 2023-03-07T23:00:11Z

Job PR-3004-cfde794 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3004/cfde794/index.html

zhiqiangdon

LGTM. We can refactor the logic of inferring problem type and output shape later.

Innixma · 2023-03-17T23:34:59Z

@zhiqiangdon rebased to resolve merge conflict. Please feel free to merge when you are ready.

zhiqiangdon · 2023-03-20T17:00:07Z

@zhiqiangdon rebased to resolve merge conflict. Please feel free to merge when you are ready.

CI has one unit test error.

Innixma · 2023-03-20T21:03:29Z

@zhiqiangdon rebased to resolve merge conflict. Please feel free to merge when you are ready.

CI has one unit test error.

Should be fixed in latest commit, thanks!

…o train val split

github-actions · 2023-03-23T21:46:38Z

Job PR-3004-b084d3c is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3004/b084d3c/index.html

Innixma requested review from sxjscience, gradientsky, yongxinw and zhiqiangdon March 6, 2023 01:50

Innixma added this to the 0.7.1 Release milestone Mar 6, 2023

Innixma added bug Something isn't working enhancement New feature or request module: multimodal priority: 1 High priority labels Mar 6, 2023

Innixma mentioned this pull request Mar 6, 2023

[BUG] AutoMM Continuous fit corrupted validation score #3003

Closed

5 tasks

gradientsky reviewed Mar 6, 2023

View reviewed changes

gradientsky approved these changes Mar 7, 2023

View reviewed changes

zhiqiangdon reviewed Mar 7, 2023

View reviewed changes

zhiqiangdon mentioned this pull request Mar 7, 2023

[AutoMM] Support using data path in fit() #3006

Merged

zhiqiangdon suggested changes Mar 7, 2023

View reviewed changes

Innixma force-pushed the automm_consistent_split branch from cfde794 to 633433c Compare March 7, 2023 21:09

zhiqiangdon approved these changes Mar 9, 2023

View reviewed changes

Innixma force-pushed the automm_consistent_split branch from 633433c to 4398e43 Compare March 17, 2023 23:34

Innixma added 3 commits March 23, 2023 12:21

AutoMM: Add consistent seed, stratification, alignment with tabular t…

908c79e

…o train val split

black format

22e4033

update

450df4b

Innixma added 2 commits March 23, 2023 12:21

black

788ac21

fix NER

b084d3c

Innixma force-pushed the automm_consistent_split branch from 69484dc to b084d3c Compare March 23, 2023 19:22

zhiqiangdon merged commit f22b3f0 into autogluon:master Mar 23, 2023

Innixma modified the milestones: 0.7.1 Release, 0.8 Release May 16, 2023

zhiqiangdon mentioned this pull request May 19, 2023

[AutoMM] refactor inferring problem type and output shape #3227

Merged

AnirudhDagar mentioned this pull request Oct 13, 2023

[BUG] AutoMM stratification is inconsistent for arg problem_type #3589

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoMM: Add consistent seed, stratification, alignment with tabular for train-val split #3004

AutoMM: Add consistent seed, stratification, alignment with tabular for train-val split #3004

Innixma commented Mar 6, 2023

github-actions bot commented Mar 6, 2023

gradientsky Mar 6, 2023 •

edited

Innixma Mar 7, 2023 •

edited

zhiqiangdon Mar 7, 2023

Innixma Mar 7, 2023 •

edited

Innixma Mar 7, 2023

zhiqiangdon Mar 9, 2023

tonyhoo Mar 17, 2023

zhiqiangdon Mar 7, 2023

zhiqiangdon Mar 7, 2023

Innixma Mar 7, 2023

zhiqiangdon left a comment

Innixma commented Mar 7, 2023

github-actions bot commented Mar 7, 2023

github-actions bot commented Mar 7, 2023

zhiqiangdon left a comment

Innixma commented Mar 17, 2023

zhiqiangdon commented Mar 20, 2023

Innixma commented Mar 20, 2023 •

edited

github-actions bot commented Mar 23, 2023

AutoMM: Add consistent seed, stratification, alignment with tabular for train-val split #3004

AutoMM: Add consistent seed, stratification, alignment with tabular for train-val split #3004

Conversation

Innixma commented Mar 6, 2023

Reproducing

Tabular & MultiModal Identical Result

MultiModalPredictor Continuous Training Identical Split

github-actions bot commented Mar 6, 2023

gradientsky Mar 6, 2023 • edited

Choose a reason for hiding this comment

Innixma Mar 7, 2023 • edited

Choose a reason for hiding this comment

zhiqiangdon Mar 7, 2023

Choose a reason for hiding this comment

Innixma Mar 7, 2023 • edited

Choose a reason for hiding this comment

Innixma Mar 7, 2023

Choose a reason for hiding this comment

zhiqiangdon Mar 9, 2023

Choose a reason for hiding this comment

tonyhoo Mar 17, 2023

Choose a reason for hiding this comment

zhiqiangdon Mar 7, 2023

Choose a reason for hiding this comment

zhiqiangdon Mar 7, 2023

Choose a reason for hiding this comment

Innixma Mar 7, 2023

Choose a reason for hiding this comment

zhiqiangdon left a comment

Choose a reason for hiding this comment

Innixma commented Mar 7, 2023

github-actions bot commented Mar 7, 2023

github-actions bot commented Mar 7, 2023

zhiqiangdon left a comment

Choose a reason for hiding this comment

Innixma commented Mar 17, 2023

zhiqiangdon commented Mar 20, 2023

Innixma commented Mar 20, 2023 • edited

github-actions bot commented Mar 23, 2023

gradientsky Mar 6, 2023 •

edited

Innixma Mar 7, 2023 •

edited

Innixma Mar 7, 2023 •

edited

Innixma commented Mar 20, 2023 •

edited