Speed up stacked ensembler in automl #1688

dsherry · 2021-01-12T23:23:23Z

Background
We built V1 of stacked ensembling at the end of 2020 and added it to automl. And it works! That's great :)

Problem
I don't have the exact figures but the runtime is fairly long at the moment. The current implementation uses sklearn's stacked ensembler to train and score each pipeline in the ensemble from scratch on each CV fold.

Proposal
A quick optimization would be to a) cache the predictions from each of the models in the ensemble when they are first trained, before ensembling runs, and then b) write a new stacked ensembling implementation which uses those cached predictions.

I think this would mean we have to write our own implementation instead of using the sklearn implementation. That will take time, but on the plus-side it could actually help us avoid a threading-related bug/limitation we've been having with that impl.

@angela97lin @rpeck

dsherry · 2021-05-04T14:27:07Z

I think our fix for #2093 will close this issue but we'll see

dsherry · 2021-10-07T21:19:03Z

Duplicate of #2835 , closing

dsherry added enhancement An improvement to an existing feature. performance Issues tracking performance improvements. labels Jan 12, 2021

dsherry mentioned this issue Sep 29, 2021

Speed up ensembler component fit time #2835

Closed

dsherry closed this as completed Oct 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up stacked ensembler in automl #1688

Speed up stacked ensembler in automl #1688

dsherry commented Jan 12, 2021

dsherry commented May 4, 2021

dsherry commented Oct 7, 2021

Speed up stacked ensembler in automl #1688

Speed up stacked ensembler in automl #1688

Comments

dsherry commented Jan 12, 2021

dsherry commented May 4, 2021

dsherry commented Oct 7, 2021