Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix permutation importance failing when target is categorical #3017

Merged
merged 5 commits into from
Nov 8, 2021

Conversation

angela97lin
Copy link
Contributor

Closes #3012

@angela97lin angela97lin self-assigned this Nov 7, 2021
@codecov
Copy link

codecov bot commented Nov 7, 2021

Codecov Report

Merging #3017 (2c1f99c) into main (79b9d22) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #3017     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        312     312             
  Lines      30137   30143      +6     
=======================================
+ Hits       30041   30047      +6     
  Misses        96      96             
Impacted Files Coverage Δ
...alml/model_understanding/permutation_importance.py 100.0% <100.0%> (ø)
...understanding_tests/test_permutation_importance.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 79b9d22...2c1f99c. Read the comment docs.

Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Copy link
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for picking this up! Super useful in enabling human readable pipeline explanations :)

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to address the classification aspect that freddy mentioned in the issue?

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me @angela97lin !

@@ -324,6 +325,7 @@ def _fast_scorer(pipeline, features, X, y, objective):
preds = pipeline.estimator.predict_proba(features)
else:
preds = pipeline.estimator.predict(features)
preds = pipeline.inverse_transform(preds)
if is_regression(pipeline.problem_type):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're ok for classification here. In _fast_permutation_importance we encode the target with the pipeline._encode_targets method. So the estimator predictions will also be encoded.

Of course, if the pipeline does not have an encoder but has string-valued targets this would fail but I would say it's a pipeline definition bug as opposed to a permutation importance bug.

The fact that our classification objectives only supports integer-valued targets and that to score a pipeline it should have an encoder may not be super clear though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! I wish there were a way to make this more clear. Pipelines require a label encoder for not only scoring but also fit so it still feel consistent, but I wonder if moving forward we could make these types of error messages more clear.

@angela97lin angela97lin merged commit 75b07c7 into main Nov 8, 2021
@angela97lin angela97lin deleted the 3012_perm_importance_inv_transform branch November 8, 2021 20:05
@chukarsten chukarsten mentioned this pull request Nov 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Permutation importance fails on some objectives when target is categorical
5 participants