You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Background
We built V1 of stacked ensembling at the end of 2020 and added it to automl. And it works! That's great :)
Problem
I don't have the exact figures but the runtime is fairly long at the moment. The current implementation uses sklearn's stacked ensembler to train and score each pipeline in the ensemble from scratch on each CV fold.
Proposal
A quick optimization would be to a) cache the predictions from each of the models in the ensemble when they are first trained, before ensembling runs, and then b) write a new stacked ensembling implementation which uses those cached predictions.
I think this would mean we have to write our own implementation instead of using the sklearn implementation. That will take time, but on the plus-side it could actually help us avoid a threading-related bug/limitation we've been having with that impl.
Background
We built V1 of stacked ensembling at the end of 2020 and added it to automl. And it works! That's great :)
Problem
I don't have the exact figures but the runtime is fairly long at the moment. The current implementation uses sklearn's stacked ensembler to train and score each pipeline in the ensemble from scratch on each CV fold.
Proposal
A quick optimization would be to a) cache the predictions from each of the models in the ensemble when they are first trained, before ensembling runs, and then b) write a new stacked ensembling implementation which uses those cached predictions.
I think this would mean we have to write our own implementation instead of using the sklearn implementation. That will take time, but on the plus-side it could actually help us avoid a threading-related bug/limitation we've been having with that impl.
@angela97lin @rpeck
The text was updated successfully, but these errors were encountered: