-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix permutation importance when pipeline has target transformer #2782
Fix permutation importance when pipeline has target transformer #2782
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2782 +/- ##
=======================================
+ Coverage 99.8% 99.8% +0.1%
=======================================
Files 298 298
Lines 27595 27604 +9
=======================================
+ Hits 27527 27536 +9
Misses 68 68
Continue to review full report at Codecov.
|
7904a0f
to
bd61f02
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you! I'm curious how you found this bug and determined which permutation importance value was correct 😂
@@ -325,5 +325,6 @@ def _fast_scorer(pipeline, features, X, y, objective): | |||
preds = pipeline.estimator.predict_proba(features) | |||
else: | |||
preds = pipeline.estimator.predict(features) | |||
preds = pipeline.inverse_transform(preds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm following correctly, we only need to change the fast_scorer because in the slow_scorer we call score --> predict --> predict_in_sample which does inverse_transform, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea - this is why the slow score is the correct one. In the "slow" method we basically treat the whole pipeline as an estimator. So we call pipeline.predict
, which does pipeline.inverse_transform
. In the fast method, we optimize by only doing the feature engineering once and get the predictions by doing estimator.predict
. Since the estimator has no idea it's part of a pipeline, we need to remember to call inverse_transform
.
@@ -343,14 +363,17 @@ def test_fast_permutation_importance_matches_slow_output( | |||
"region", | |||
"amount", | |||
]: | |||
if col == "amount" and pipeline_class == PipelineWithTargetTransformer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting way to use a classification dataset for regression 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't decide if I like parametrized tests or not. It's nice to be able to test a lot of cases but it's tough to add a case that doesn't line up 100% with the other cases. Maybe it means I didn't parametrize the test well enough originally 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
bd61f02
to
aaaa613
Compare
Pull Request Description
Fixes #2781
This can go in the release in two weeks. Just opening up for review.
After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of
docs/source/release_notes.rst
to include this pull request by adding :pr:123
.