Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete scikit-learn ensembler #2819

Merged
merged 12 commits into from Oct 7, 2021
Merged

Conversation

christopherbunn
Copy link
Contributor

Resolves #2620

@christopherbunn christopherbunn changed the base branch from 1930_custom_ensembler to main September 20, 2021 21:29
@codecov
Copy link

codecov bot commented Sep 20, 2021

Codecov Report

Merging #2819 (9c8c206) into main (c817e2b) will decrease coverage by 0.1%.
The diff coverage is 94.6%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2819     +/-   ##
=======================================
- Coverage   99.4%   99.3%   -0.0%     
=======================================
  Files        307     302      -5     
  Lines      28670   28256    -414     
=======================================
- Hits       28471   28046    -425     
- Misses       199     210     +11     
Impacted Files Coverage Δ
evalml/automl/automl_algorithm/automl_algorithm.py 100.0% <ø> (ø)
evalml/automl/automl_search.py 99.9% <ø> (ø)
evalml/automl/engine/engine_base.py 100.0% <ø> (ø)
evalml/pipelines/__init__.py 100.0% <ø> (ø)
evalml/pipelines/components/__init__.py 100.0% <ø> (ø)
evalml/pipelines/components/ensemble/__init__.py 100.0% <ø> (ø)
evalml/pipelines/pipeline_base.py 98.3% <ø> (-<0.1%) ⬇️
...omponent_tests/test_stacked_ensemble_classifier.py 100.0% <ø> (ø)
...component_tests/test_stacked_ensemble_regressor.py 100.0% <ø> (ø)
evalml/tests/pipeline_tests/test_pipeline_utils.py 99.5% <ø> (-<0.1%) ⬇️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c817e2b...9c8c206. Read the comment docs.

@christopherbunn christopherbunn changed the title Remove Sklearn Ensembler Deleted scikit-learn ensembler Sep 21, 2021
@christopherbunn christopherbunn force-pushed the 2620_dep_sklearn_ensembler branch 7 times, most recently from 3c36303 to 9f4b1f1 Compare September 23, 2021 20:16
@christopherbunn christopherbunn force-pushed the 2620_dep_sklearn_ensembler branch 3 times, most recently from 7b00cba to 6bf6f8a Compare September 29, 2021 17:05
@christopherbunn christopherbunn changed the title Deleted scikit-learn ensembler Delete scikit-learn ensembler Sep 30, 2021
@christopherbunn christopherbunn force-pushed the 2620_dep_sklearn_ensembler branch 2 times, most recently from 6b56d09 to e83a71d Compare October 4, 2021 14:46
@christopherbunn christopherbunn marked this pull request as ready for review October 5, 2021 14:29
@christopherbunn christopherbunn requested review from angela97lin and a team October 5, 2021 14:29
Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nothing blocking, but left some questions for clarifications! 😁 👏

@@ -182,12 +179,6 @@ def train_and_score_pipeline(
for i, (train, valid) in enumerate(
automl_config.data_splitter.split(full_X_train, full_y_train)
):
if isinstance(pipeline.estimator, SklearnStackedEnsembleBase) and i > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

@@ -1504,6 +1506,7 @@ def test_describe_pipeline_with_ensembling(
ensembling=True,
optimize_thresholds=False,
error_callback=raise_error_callback,
verbose=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

necessary change? 👀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked, nope! Will remove.

custom_name="Templated Pipeline",
)
ensemble = _make_stacked_ensemble_pipeline(input_pipelines, ProblemTypes.BINARY)
ensemble._custom_name = "Templated Pipeline"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was part of original code... but do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated naming convention to be in line with the other pipelines.

X_y_binary,
X_y_multi,
X_y_regression,
):
if is_binary(problem_type):
X, y = X_y_binary
pipeline = dummy_stacked_ensemble_binary_estimator
pipeline = StackedEnsembleClassifier(random_seed=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also pre-existing but i'm confused: it's called pipeline but is an estimator? Does this code work if its a pipeline with an ensemble estimator? Is that what we're trying to test right now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think the point of this test was to test with a pipeline with an ensemble estimator. I updated the test to place the ensembler estimator in a pipeline.

pipelines_that_do_not_support_fast_permutation_importance = [
PipelineWithDimReduction,
PipelineWithDFS,
PipelineWithCustomComponent,
EnsembleDag,
SklearnStackedEnsemblePipeline,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our new impl also cannot be used for permutation importance right? EnsembleDag is that example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! I updated EnsembleDag to use our StackedEnsemble component instead of the logistic regression component.

Comment on lines -334 to -335
else:
assert pipeline_score >= comparison_pipeline_score
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin So for the regression pipeline, the new stacked ensembler actually performs very, very slightly worse than the comparison pipeline. The stacked ensembler pipeline has an R2 of 0.9999357228610684 whereas the comparison pipeline has an R2 of 1.0. Given that it's a synthetic dataset and the stacked ensembler is competing against a perfect fit, I think it should be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. It bothers me that we delete this simply because it's no longer applicable 😂 makes me question whether or not we should just delete this entirely: if we don't consider the ensemble as "correct" because it performs better than the baseline, I think it's safe to just delete this test. If we do, then we should revisit this :d

@@ -128,9 +128,9 @@ def test_all_estimators(
assert len((_all_estimators_used_in_search())) == 9
else:
if is_using_conda:
n_estimators = 15
n_estimators = 14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is only -1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it looks like we actually don't need to change this actually since both implementations were excluded as part of _not_used_in_automl. Thus, removing the sklearn versions actually doesn't have an impact.

@christopherbunn christopherbunn force-pushed the 2620_dep_sklearn_ensembler branch 3 times, most recently from f24f106 to d4e310d Compare October 6, 2021 16:42
@christopherbunn christopherbunn merged commit c540a7e into main Oct 7, 2021
@chukarsten chukarsten mentioned this pull request Oct 14, 2021
@freddyaboulton freddyaboulton deleted the 2620_dep_sklearn_ensembler branch May 13, 2022 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deprecate Scikit-Learn based Ensembling Component
2 participants