-
Notifications
You must be signed in to change notification settings - Fork 91
Delete scikit-learn ensembler #2819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2819 +/- ##
=======================================
- Coverage 99.4% 99.3% -0.0%
=======================================
Files 307 302 -5
Lines 28670 28256 -414
=======================================
- Hits 28471 28046 -425
- Misses 199 210 +11
Continue to review full report at Codecov.
|
3c36303 to
9f4b1f1
Compare
7b00cba to
6bf6f8a
Compare
6b56d09 to
e83a71d
Compare
angela97lin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Nothing blocking, but left some questions for clarifications! 😁 👏
| for i, (train, valid) in enumerate( | ||
| automl_config.data_splitter.split(full_X_train, full_y_train) | ||
| ): | ||
| if isinstance(pipeline.estimator, SklearnStackedEnsembleBase) and i > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
| ensembling=True, | ||
| optimize_thresholds=False, | ||
| error_callback=raise_error_callback, | ||
| verbose=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
necessary change? 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checked, nope! Will remove.
| custom_name="Templated Pipeline", | ||
| ) | ||
| ensemble = _make_stacked_ensemble_pipeline(input_pipelines, ProblemTypes.BINARY) | ||
| ensemble._custom_name = "Templated Pipeline" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this was part of original code... but do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated naming convention to be in line with the other pipelines.
| if is_binary(problem_type): | ||
| X, y = X_y_binary | ||
| pipeline = dummy_stacked_ensemble_binary_estimator | ||
| pipeline = StackedEnsembleClassifier(random_seed=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also pre-existing but i'm confused: it's called pipeline but is an estimator? Does this code work if its a pipeline with an ensemble estimator? Is that what we're trying to test right now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think the point of this test was to test with a pipeline with an ensemble estimator. I updated the test to place the ensembler estimator in a pipeline.
| PipelineWithDFS, | ||
| PipelineWithCustomComponent, | ||
| EnsembleDag, | ||
| SklearnStackedEnsemblePipeline, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our new impl also cannot be used for permutation importance right? EnsembleDag is that example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! I updated EnsembleDag to use our StackedEnsemble component instead of the logistic regression component.
| else: | ||
| assert pipeline_score >= comparison_pipeline_score |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was this deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin So for the regression pipeline, the new stacked ensembler actually performs very, very slightly worse than the comparison pipeline. The stacked ensembler pipeline has an R2 of 0.9999357228610684 whereas the comparison pipeline has an R2 of 1.0. Given that it's a synthetic dataset and the stacked ensembler is competing against a perfect fit, I think it should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. It bothers me that we delete this simply because it's no longer applicable 😂 makes me question whether or not we should just delete this entirely: if we don't consider the ensemble as "correct" because it performs better than the baseline, I think it's safe to just delete this test. If we do, then we should revisit this :d
| else: | ||
| if is_using_conda: | ||
| n_estimators = 15 | ||
| n_estimators = 14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is only -1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it looks like we actually don't need to change this actually since both implementations were excluded as part of _not_used_in_automl. Thus, removing the sklearn versions actually doesn't have an impact.
f24f106 to
d4e310d
Compare
Resolves #2620