Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add enable_categorical=True support to XGBoost #3286

Merged
merged 2 commits into from
Jun 7, 2023

Conversation

Innixma
Copy link
Contributor

@Innixma Innixma commented Jun 6, 2023

Issue #, if available:
Resolves #2429

Description of changes:

  • Added support for XGBoost's new (as of XGBoost 1.6) support for categorical features via enable_categorical=True.
  • Currently not enabled by default, will benchmark to consider enabling by default.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@Innixma Innixma added this to the 0.8 Release milestone Jun 6, 2023
@Innixma Innixma added enhancement New feature or request module: tabular labels Jun 6, 2023
@Innixma Innixma requested a review from gradientsky June 6, 2023 23:11
if is_train:
self._ohe_generator.fit(X)
if self._ohe:
if self._ohe_generator is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the model is re-trained (i.e. on the full data), wouldn't this reuse the generator. I think it's better to always create a new one if is_train and self._ohe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, re-training would be a new class instance, it does not cause issues

self._ohe = False
else:
"""One-hot-encode categorical features"""
self._ohe = True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

self._ohe = not enable_categorical

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but will keep as is since I have docstrings to describe what True and False mean inline.

@Innixma
Copy link
Contributor Author

Innixma commented Jun 7, 2023

Note: CI failures are from AutoMM, unrelated to this PR. All tabular tests and CI builds succeeded.

@Innixma Innixma merged commit 0328cbe into autogluon:master Jun 7, 2023
15 checks passed
@github-actions
Copy link

github-actions bot commented Jun 7, 2023

Job PR-3286-80d04e3 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3286/80d04e3/index.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request module: tabular
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test XGBoost Categorical Support
2 participants