Fix permutation importance when pipeline has target transformer #2782

freddyaboulton · 2021-09-14T21:38:47Z

Pull Request Description

This can go in the release in two weeks. Just opening up for review.

After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

codecov · 2021-09-14T21:43:16Z

Codecov Report

Merging #2782 (aaaa613) into main (ce3fc7a) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #2782     +/-   ##
=======================================
+ Coverage   99.8%   99.8%   +0.1%     
=======================================
  Files        298     298             
  Lines      27595   27604      +9     
=======================================
+ Hits       27527   27536      +9     
  Misses        68      68

Impacted Files	Coverage Δ
...alml/model_understanding/permutation_importance.py	`100.0% <100.0%> (ø)`
...understanding_tests/test_permutation_importance.py	`100.0% <100.0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ce3fc7a...aaaa613. Read the comment docs.

angela97lin

Looks good to me, thank you! I'm curious how you found this bug and determined which permutation importance value was correct 😂

angela97lin · 2021-09-15T14:52:57Z

evalml/model_understanding/permutation_importance.py

@@ -325,5 +325,6 @@ def _fast_scorer(pipeline, features, X, y, objective):
        preds = pipeline.estimator.predict_proba(features)
    else:
        preds = pipeline.estimator.predict(features)
+    preds = pipeline.inverse_transform(preds)


If I'm following correctly, we only need to change the fast_scorer because in the slow_scorer we call score --> predict --> predict_in_sample which does inverse_transform, correct?

Yea - this is why the slow score is the correct one. In the "slow" method we basically treat the whole pipeline as an estimator. So we call pipeline.predict, which does pipeline.inverse_transform. In the fast method, we optimize by only doing the feature engineering once and get the predictions by doing estimator.predict. Since the estimator has no idea it's part of a pipeline, we need to remember to call inverse_transform.

angela97lin · 2021-09-15T14:54:58Z

evalml/tests/model_understanding_tests/test_permutation_importance.py

@@ -343,14 +363,17 @@ def test_fast_permutation_importance_matches_slow_output(
        "region",
        "amount",
    ]:
+        if col == "amount" and pipeline_class == PipelineWithTargetTransformer:


Interesting way to use a classification dataset for regression 😂

I can't decide if I like parametrized tests or not. It's nice to be able to test a lot of cases but it's tough to add a case that doesn't line up 100% with the other cases. Maybe it means I didn't parametrize the test well enough originally 😂

chukarsten

Looks good to me!

freddyaboulton force-pushed the 2781-permutation-importance-target-transformer branch from 7904a0f to bd61f02 Compare September 14, 2021 22:15

freddyaboulton marked this pull request as ready for review September 14, 2021 22:53

auto-assign bot assigned freddyaboulton Sep 14, 2021

freddyaboulton requested review from angela97lin, bchen1116, chukarsten, christopherbunn, dsherry, eccabay, jeremyliweishih and ParthivNaresh and removed request for angela97lin and bchen1116 September 14, 2021 22:53

angela97lin approved these changes Sep 15, 2021

View reviewed changes

chukarsten approved these changes Sep 15, 2021

View reviewed changes

freddyaboulton added 3 commits September 15, 2021 16:37

Call inverse_transform

6ad2967

Add to release notes

091d9dd

Move to right section of release notes

aaaa613

freddyaboulton force-pushed the 2781-permutation-importance-target-transformer branch from bd61f02 to aaaa613 Compare September 15, 2021 20:38

freddyaboulton merged commit 2e6b048 into main Sep 15, 2021

freddyaboulton deleted the 2781-permutation-importance-target-transformer branch September 15, 2021 21:10

chukarsten mentioned this pull request Oct 1, 2021

Release v0.34.0 #2864

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix permutation importance when pipeline has target transformer #2782

Fix permutation importance when pipeline has target transformer #2782

freddyaboulton commented Sep 14, 2021 •

edited

Loading

codecov bot commented Sep 14, 2021 •

edited

Loading

angela97lin left a comment

angela97lin Sep 15, 2021

freddyaboulton Sep 15, 2021

angela97lin Sep 15, 2021

freddyaboulton Sep 15, 2021

chukarsten left a comment

Fix permutation importance when pipeline has target transformer #2782

Fix permutation importance when pipeline has target transformer #2782

Conversation

freddyaboulton commented Sep 14, 2021 • edited Loading

Pull Request Description

codecov bot commented Sep 14, 2021 • edited Loading

Codecov Report

angela97lin left a comment

Choose a reason for hiding this comment

angela97lin Sep 15, 2021

Choose a reason for hiding this comment

freddyaboulton Sep 15, 2021

Choose a reason for hiding this comment

angela97lin Sep 15, 2021

Choose a reason for hiding this comment

freddyaboulton Sep 15, 2021

Choose a reason for hiding this comment

chukarsten left a comment

Choose a reason for hiding this comment

freddyaboulton commented Sep 14, 2021 •

edited

Loading

codecov bot commented Sep 14, 2021 •

edited

Loading