Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up stacked ensembler in automl #1688

Closed
dsherry opened this issue Jan 12, 2021 · 2 comments
Closed

Speed up stacked ensembler in automl #1688

dsherry opened this issue Jan 12, 2021 · 2 comments
Labels
enhancement An improvement to an existing feature. performance Issues tracking performance improvements.

Comments

@dsherry
Copy link
Contributor

dsherry commented Jan 12, 2021

Background
We built V1 of stacked ensembling at the end of 2020 and added it to automl. And it works! That's great :)

Problem
I don't have the exact figures but the runtime is fairly long at the moment. The current implementation uses sklearn's stacked ensembler to train and score each pipeline in the ensemble from scratch on each CV fold.

Proposal
A quick optimization would be to a) cache the predictions from each of the models in the ensemble when they are first trained, before ensembling runs, and then b) write a new stacked ensembling implementation which uses those cached predictions.

I think this would mean we have to write our own implementation instead of using the sklearn implementation. That will take time, but on the plus-side it could actually help us avoid a threading-related bug/limitation we've been having with that impl.

@angela97lin @rpeck

@dsherry dsherry added enhancement An improvement to an existing feature. performance Issues tracking performance improvements. labels Jan 12, 2021
@dsherry
Copy link
Contributor Author

dsherry commented May 4, 2021

I think our fix for #2093 will close this issue but we'll see

@dsherry
Copy link
Contributor Author

dsherry commented Oct 7, 2021

Duplicate of #2835 , closing

@dsherry dsherry closed this as completed Oct 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An improvement to an existing feature. performance Issues tracking performance improvements.
Projects
None yet
Development

No branches or pull requests

1 participant