Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix permutation importance when pipeline has target transformer #2782

Merged
merged 3 commits into from
Sep 15, 2021

Conversation

freddyaboulton
Copy link
Contributor

@freddyaboulton freddyaboulton commented Sep 14, 2021

Pull Request Description

Fixes #2781

This can go in the release in two weeks. Just opening up for review.


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Sep 14, 2021

Codecov Report

Merging #2782 (aaaa613) into main (ce3fc7a) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2782     +/-   ##
=======================================
+ Coverage   99.8%   99.8%   +0.1%     
=======================================
  Files        298     298             
  Lines      27595   27604      +9     
=======================================
+ Hits       27527   27536      +9     
  Misses        68      68             
Impacted Files Coverage Δ
...alml/model_understanding/permutation_importance.py 100.0% <100.0%> (ø)
...understanding_tests/test_permutation_importance.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ce3fc7a...aaaa613. Read the comment docs.

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you! I'm curious how you found this bug and determined which permutation importance value was correct 😂

@@ -325,5 +325,6 @@ def _fast_scorer(pipeline, features, X, y, objective):
preds = pipeline.estimator.predict_proba(features)
else:
preds = pipeline.estimator.predict(features)
preds = pipeline.inverse_transform(preds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm following correctly, we only need to change the fast_scorer because in the slow_scorer we call score --> predict --> predict_in_sample which does inverse_transform, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea - this is why the slow score is the correct one. In the "slow" method we basically treat the whole pipeline as an estimator. So we call pipeline.predict, which does pipeline.inverse_transform. In the fast method, we optimize by only doing the feature engineering once and get the predictions by doing estimator.predict. Since the estimator has no idea it's part of a pipeline, we need to remember to call inverse_transform.

@@ -343,14 +363,17 @@ def test_fast_permutation_importance_matches_slow_output(
"region",
"amount",
]:
if col == "amount" and pipeline_class == PipelineWithTargetTransformer:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting way to use a classification dataset for regression 😂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't decide if I like parametrized tests or not. It's nice to be able to test a lot of cases but it's tough to add a case that doesn't line up 100% with the other cases. Maybe it means I didn't parametrize the test well enough originally 😂

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@freddyaboulton freddyaboulton force-pushed the 2781-permutation-importance-target-transformer branch from bd61f02 to aaaa613 Compare September 15, 2021 20:38
@freddyaboulton freddyaboulton merged commit 2e6b048 into main Sep 15, 2021
@freddyaboulton freddyaboulton deleted the 2781-permutation-importance-target-transformer branch September 15, 2021 21:10
@chukarsten chukarsten mentioned this pull request Oct 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Permutation importance not properly calculated when there is a target transformer
3 participants