Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tabular: Fixed crash during calibration in binary stacking #2589

Merged
merged 2 commits into from
Jan 3, 2023

Conversation

Innixma
Copy link
Contributor

@Innixma Innixma commented Dec 20, 2022

Issue #, if available:

Description of changes:
Fixed crash during calibration due to incorrect predict_proba dimensions when problem_type='binary', eval_metric='log_loss', num_bag_folds>1

  • This bug was first introduced in 0.6 release and impacts 0.6 and 0.6.1 releases.
  • Bug caused due to making out-of-fold predictions and predict_proba outputs consistent in v0.6 (previously they had different dimensions in binary by default), however the calibrate code still treated them inconsistently in binary classification, leading to the error.
  • Workaround prior to this fix: set calibrate=False in predictor.fit(...)
  • Also fixed a bug that made quantile regression on RF crash due to removed fit parameter in sklearn & added unit tests

Code to Reproduce

from autogluon.tabular import TabularPredictor, TabularDataset


if __name__ == '__main__':
    path_prefix = 'https://autogluon.s3.amazonaws.com/datasets/AdultIncomeBinaryClassification/'
    path_train = path_prefix + 'train_data.csv'
    path_test = path_prefix + 'test_data.csv'

    label = 'class'
    train_data = TabularDataset(path_train)
    test_data = TabularDataset(path_test)
    sample = 1000  # Number of rows to use to train / infer

    if sample is not None and (sample < len(train_data)):
        train_data = train_data.sample(n=sample, random_state=0).reset_index(drop=True)

    hyperparameters = {
        'GBM': [
            {},
        ]
    }

    predictor = TabularPredictor(label=label, eval_metric='log_loss')
    predictor.fit(
        train_data=train_data,
        hyperparameters=hyperparameters,
        presets='best_quality',
    )
    leaderboard = predictor.leaderboard(test_data)

Error prior to fix:

Traceback (most recent call last):
  File "/Users/neerick/workspace/virtual/autogluon38/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3433, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-68ebe127e745>", line 1, in <module>
    runfile('/Users/neerick/workspace/code/autogluon-scratch/scripts/run_adult_zs.py', wdir='/Users/neerick/workspace/code/autogluon-scratch/scripts')
  File "/Users/neerick/Library/Application Support/JetBrains/Toolbox/apps/PyCharm-P/ch-1/223.7571.64/PyCharm 2022.3 EAP.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Users/neerick/Library/Application Support/JetBrains/Toolbox/apps/PyCharm-P/ch-1/223.7571.64/PyCharm 2022.3 EAP.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/neerick/workspace/code/autogluon-scratch/scripts/run_adult_zs.py", line 24, in <module>
    predictor.fit(
  File "/Users/neerick/workspace/code/autogluon/core/src/autogluon/core/utils/decorators.py", line 30, in _call
    return f(*gargs, **gkwargs)
  File "/Users/neerick/workspace/code/autogluon/tabular/src/autogluon/tabular/predictor/predictor.py", line 868, in fit
    self._post_fit(
  File "/Users/neerick/workspace/code/autogluon/tabular/src/autogluon/tabular/predictor/predictor.py", line 921, in _post_fit
    self._trainer.calibrate_model()
  File "/Users/neerick/workspace/code/autogluon/core/src/autogluon/core/trainer/abstract_trainer.py", line 2953, in calibrate_model
    temp_scalar = tune_temperature_scaling(y_val_probs=y_val_probs, y_val=y_val,
  File "/Users/neerick/workspace/code/autogluon/core/src/autogluon/core/calibrate/temperature_scaling.py", line 57, in tune_temperature_scaling
    optimizer.step(temperature_scale_step)
  File "/Users/neerick/workspace/virtual/autogluon38/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
    return wrapped(*args, **kwargs)
  File "/Users/neerick/workspace/virtual/autogluon38/lib/python3.8/site-packages/torch/optim/optimizer.py", line 113, in wrapper
    return func(*args, **kwargs)
  File "/Users/neerick/workspace/virtual/autogluon38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Users/neerick/workspace/virtual/autogluon38/lib/python3.8/site-packages/torch/optim/lbfgs.py", line 311, in step
    orig_loss = closure()
  File "/Users/neerick/workspace/virtual/autogluon38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Users/neerick/workspace/code/autogluon/core/src/autogluon/core/calibrate/temperature_scaling.py", line 50, in temperature_scale_step
    temp = temperature_param.unsqueeze(1).expand(logits.size(0), logits.size(1))
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@Innixma Innixma added bug Something isn't working urgent module: tabular labels Dec 20, 2022
@Innixma Innixma added this to the 0.6.2 Release milestone Dec 20, 2022
@github-actions
Copy link

github-actions bot commented Jan 3, 2023

Job PR-2589-720ede7 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2589/720ede7/index.html

@Innixma Innixma merged commit 1779d03 into master Jan 3, 2023
@Innixma Innixma deleted the fix_calibrate_binary branch January 18, 2023 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module: tabular urgent
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants