Delete scikit-learn ensembler #2819

christopherbunn · 2021-09-20T21:29:00Z

Resolves #2620

codecov · 2021-09-20T21:34:41Z

Codecov Report

Merging #2819 (9c8c206) into main (c817e2b) will decrease coverage by 0.1%.
The diff coverage is 94.6%.

@@           Coverage Diff           @@
##            main   #2819     +/-   ##
=======================================
- Coverage   99.4%   99.3%   -0.0%     
=======================================
  Files        307     302      -5     
  Lines      28670   28256    -414     
=======================================
- Hits       28471   28046    -425     
- Misses       199     210     +11

Impacted Files	Coverage Δ
evalml/automl/automl_algorithm/automl_algorithm.py	`100.0% <ø> (ø)`
evalml/automl/automl_search.py	`99.9% <ø> (ø)`
evalml/automl/engine/engine_base.py	`100.0% <ø> (ø)`
evalml/pipelines/__init__.py	`100.0% <ø> (ø)`
evalml/pipelines/components/__init__.py	`100.0% <ø> (ø)`
evalml/pipelines/components/ensemble/__init__.py	`100.0% <ø> (ø)`
evalml/pipelines/pipeline_base.py	`98.3% <ø> (-<0.1%)`	⬇️
...omponent_tests/test_stacked_ensemble_classifier.py	`100.0% <ø> (ø)`
...component_tests/test_stacked_ensemble_regressor.py	`100.0% <ø> (ø)`
evalml/tests/pipeline_tests/test_pipeline_utils.py	`99.5% <ø> (-<0.1%)`	⬇️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c817e2b...9c8c206. Read the comment docs.

angela97lin

LGTM! Nothing blocking, but left some questions for clarifications! 😁 👏

angela97lin · 2021-10-05T19:56:25Z

evalml/automl/engine/engine_base.py

@@ -182,12 +179,6 @@ def train_and_score_pipeline(
    for i, (train, valid) in enumerate(
        automl_config.data_splitter.split(full_X_train, full_y_train)
    ):
-        if isinstance(pipeline.estimator, SklearnStackedEnsembleBase) and i > 0:


angela97lin · 2021-10-05T19:57:28Z

evalml/tests/automl_tests/test_automl.py

@@ -1504,6 +1506,7 @@ def test_describe_pipeline_with_ensembling(
        ensembling=True,
        optimize_thresholds=False,
        error_callback=raise_error_callback,
+        verbose=True,


necessary change? 👀

Just checked, nope! Will remove.

angela97lin · 2021-10-05T19:58:27Z

evalml/tests/automl_tests/test_automl.py

-        custom_name="Templated Pipeline",
-    )
+    ensemble = _make_stacked_ensemble_pipeline(input_pipelines, ProblemTypes.BINARY)
+    ensemble._custom_name = "Templated Pipeline"


I know this was part of original code... but do we need this?

Updated naming convention to be in line with the other pipelines.

angela97lin · 2021-10-05T20:03:03Z

evalml/tests/model_understanding_tests/prediction_explanations_tests/test_explainers.py

    X_y_binary,
    X_y_multi,
    X_y_regression,
 ):
    if is_binary(problem_type):
        X, y = X_y_binary
-        pipeline = dummy_stacked_ensemble_binary_estimator
+        pipeline = StackedEnsembleClassifier(random_seed=0)


Also pre-existing but i'm confused: it's called pipeline but is an estimator? Does this code work if its a pipeline with an ensemble estimator? Is that what we're trying to test right now?

Yeah I think the point of this test was to test with a pipeline with an ensemble estimator. I updated the test to place the ensembler estimator in a pipeline.

angela97lin · 2021-10-05T20:05:16Z

evalml/tests/model_understanding_tests/test_permutation_importance.py

 pipelines_that_do_not_support_fast_permutation_importance = [
    PipelineWithDimReduction,
    PipelineWithDFS,
    PipelineWithCustomComponent,
    EnsembleDag,
-    SklearnStackedEnsemblePipeline,


Our new impl also cannot be used for permutation importance right? EnsembleDag is that example?

Yep! I updated EnsembleDag to use our StackedEnsemble component instead of the logistic regression component.

angela97lin · 2021-10-05T20:06:20Z

evalml/tests/pipeline_tests/test_pipeline_utils.py

-    else:
-        assert pipeline_score >= comparison_pipeline_score


why was this deleted?

@angela97lin So for the regression pipeline, the new stacked ensembler actually performs very, very slightly worse than the comparison pipeline. The stacked ensembler pipeline has an R2 of 0.9999357228610684 whereas the comparison pipeline has an R2 of 1.0. Given that it's a synthetic dataset and the stacked ensembler is competing against a perfect fit, I think it should be fine.

Hm. It bothers me that we delete this simply because it's no longer applicable 😂 makes me question whether or not we should just delete this entirely: if we don't consider the ensemble as "correct" because it performs better than the baseline, I think it's safe to just delete this test. If we do, then we should revisit this :d

angela97lin · 2021-10-05T20:06:44Z

evalml/tests/pipeline_tests/test_pipelines.py

@@ -128,9 +128,9 @@ def test_all_estimators(
        assert len((_all_estimators_used_in_search())) == 9
    else:
        if is_using_conda:
-            n_estimators = 15
+            n_estimators = 14


Why is only -1?

Hmm, it looks like we actually don't need to change this actually since both implementations were excluded as part of _not_used_in_automl. Thus, removing the sklearn versions actually doesn't have an impact.

christopherbunn changed the base branch from 1930_custom_ensembler to main September 20, 2021 21:29

christopherbunn changed the title ~~Remove Sklearn Ensembler~~ Deleted scikit-learn ensembler Sep 21, 2021

christopherbunn force-pushed the 2620_dep_sklearn_ensembler branch 7 times, most recently from 3c36303 to 9f4b1f1 Compare September 23, 2021 20:16

christopherbunn force-pushed the 2620_dep_sklearn_ensembler branch 3 times, most recently from 7b00cba to 6bf6f8a Compare September 29, 2021 17:05

christopherbunn changed the title ~~Deleted scikit-learn ensembler~~ Delete scikit-learn ensembler Sep 30, 2021

christopherbunn force-pushed the 2620_dep_sklearn_ensembler branch 2 times, most recently from 6b56d09 to e83a71d Compare October 4, 2021 14:46

christopherbunn marked this pull request as ready for review October 5, 2021 14:29

auto-assign bot assigned christopherbunn Oct 5, 2021

christopherbunn requested review from angela97lin and a team October 5, 2021 14:29

angela97lin approved these changes Oct 5, 2021

View reviewed changes

christopherbunn added 6 commits October 6, 2021 12:15

Initial commit

b8a3aeb

Updated test utils

f469495

Remove sklearn entry from docs

578ad25

Fixed component test

37f7793

Fixup for tests

617afba

Removed extra imports

d4e310d

christopherbunn force-pushed the 2620_dep_sklearn_ensembler branch 3 times, most recently from f24f106 to d4e310d Compare October 6, 2021 16:42

christopherbunn added 6 commits October 6, 2021 12:44

Added back line to release notes

d987c2e

Updated numbers for testing

44281c2

Test fixes

7acce6e

Switched sign for one test

1687076

Removed ensemble vs. baseline test

42732d0

RM extra import

9c8c206

christopherbunn merged commit c540a7e into main Oct 7, 2021

chukarsten mentioned this pull request Oct 14, 2021

Release v0.35.0 #2918

Merged

freddyaboulton deleted the 2620_dep_sklearn_ensembler branch May 13, 2022 15:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete scikit-learn ensembler #2819

Delete scikit-learn ensembler #2819

christopherbunn commented Sep 20, 2021

codecov bot commented Sep 20, 2021 •

edited

Loading

angela97lin left a comment

angela97lin Oct 5, 2021

angela97lin Oct 5, 2021

christopherbunn Oct 6, 2021

angela97lin Oct 5, 2021

christopherbunn Oct 6, 2021

angela97lin Oct 5, 2021

christopherbunn Oct 6, 2021

angela97lin Oct 5, 2021

christopherbunn Oct 6, 2021

angela97lin Oct 5, 2021

christopherbunn Oct 6, 2021

angela97lin Oct 6, 2021

angela97lin Oct 5, 2021

christopherbunn Oct 6, 2021

Delete scikit-learn ensembler #2819

Delete scikit-learn ensembler #2819

Conversation

christopherbunn commented Sep 20, 2021

codecov bot commented Sep 20, 2021 • edited Loading

Codecov Report

angela97lin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Sep 20, 2021 •

edited

Loading