CV fold for ensembler after ensembling_indices split #2144

angela97lin · 2021-04-15T19:45:43Z

Currently in AutoML, if we want to train an ensemble, we create an ensemble split (

evalml/evalml/automl/engine/engine_base.py

Line 220 in 16c30b8

    
           X_train, y_train = X.iloc[automl_config.ensembling_indices], y.iloc[automl_config.ensembling_indices]

). This is to prevent overfitting, by not training the ensemble on the same data that the metalearners are trained on.

Then, in train_and_score_pipeline, we split the ensembling indices data using our data splitter and train/validate on one fold of the data (

evalml/evalml/automl/engine/engine_base.py

Line 145 in 16c30b8

if pipeline.model_family == ModelFamily.ENSEMBLE and i > 0:

). Is this necessary? The ensemble internally already does cross-validation. For small datasets, this means we're scoring on 1/3 * 0.2 (ensemble indices size) of data. For the happiness dataset with 128 rows (#2093), the mean_cv_score is calculated using just 8 rows. 😬 Perhaps we can remove these lines of code and just train the ensemble on the full ensemble_indices.

@dsherry @bchen1116 @rpeck FYI

The text was updated successfully, but these errors were encountered:

dsherry · 2021-04-20T15:19:54Z

Marking this as blocked on #2093. Its possible the fix for #2093 includes fixing the code described here!

angela97lin · 2021-05-26T03:55:47Z

Closed by #2260

angela97lin added the enhancement An improvement to an existing feature. label Apr 15, 2021

angela97lin mentioned this issue Apr 16, 2021

Stacked ensemble performing poorly #2093

Closed

dsherry added the spike To generate additional issues and kick off a sprint. label Apr 20, 2021

angela97lin closed this as completed May 26, 2021

angela97lin self-assigned this May 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CV fold for ensembler after ensembling_indices split #2144

CV fold for ensembler after ensembling_indices split #2144

angela97lin commented Apr 15, 2021

dsherry commented Apr 20, 2021

angela97lin commented May 26, 2021

CV fold for ensembler after ensembling_indices split #2144

CV fold for ensembler after ensembling_indices split #2144

Comments

angela97lin commented Apr 15, 2021

dsherry commented Apr 20, 2021

angela97lin commented May 26, 2021