Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Arbitrary base learner #5802

Open
zachmayer opened this issue Jun 16, 2020 · 7 comments
Open

[Feature request] Arbitrary base learner #5802

zachmayer opened this issue Jun 16, 2020 · 7 comments

Comments

@zachmayer
Copy link

Its pretty cool that I can define my own loss function and gradient for xgboost, and then use the linear, tree, or dart base learners to optimize my loss function.

It'd be really cool if I could specify my own base learner, perhaps in the form of an sklearn class with a fit method, a predict method, and support for sample weights.

It'd really open up a whole new world of possibilities to be able to use the Xgboost algorithm to fit a wider range of possible base learners.

@hcho3
Copy link
Collaborator

hcho3 commented Jun 16, 2020

@zachmayer Is StackingClassifier / StackingRegressor an option for you? We recently added support for it: #5780

@hcho3
Copy link
Collaborator

hcho3 commented Jun 16, 2020

Oops, my bad. When you say "base learner," you mean that you want to fit a boosted ensemble consisting of your custom models?

@zachmayer
Copy link
Author

zachmayer commented Jun 16, 2020 via email

@tunguz
Copy link

tunguz commented Jun 17, 2020

Here is a related issue that has just been opened:

rapidsai/cuml#2435

Adding AdaBoost to cuml might be a good stop-gap measure.

@zachmayer
Copy link
Author

sklearn adaboost already supports arbitrary base learners: sklearn.ensemble.AdaBoostClassifier

XGboost is way better than adaboost though, and supports a bunch of features adaboost doesn't have:

  1. You can specify an arbitrary loss function in xgboost.
  2. You can specify a gradient for your loss function, and use the gradient in your base learner.
  3. You can specify an arbitrary evaluation function in xgboost.
  4. You can do early stopping with xgboost.
  5. You can run xgboost base learners in parallel, to mix "random forest" type learning with "boosting" type learning

@zachmayer
Copy link
Author

@tunguz "Run adaboost on a gpu" isn't really what I'm looking for.

"Run adaboost with an arbitrary base learner, arbitrary loss function, arbitrary gradient, arbitrary evaluation, early stopping, and a mix of parallel learners (aka bagging) and boosting" would suit my needs, but that's another way to say "run xgboost with an arbitrary base learner" 😁

@zachmayer
Copy link
Author

Just a follow up on this:

  • ngboost supports arbitrary base learners, which solves the problem for me for now.
  • There's an interesting new package called Grownet which has some evidence that boosting different weak learners (specifically neural networks) is useful. (There's a paper too)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants