-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix permutation importance failing when target is categorical #3017
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3017 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 312 312
Lines 30137 30143 +6
=======================================
+ Hits 30041 30047 +6
Misses 96 96
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for picking this up! Super useful in enabling human readable pipeline explanations :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to address the classification aspect that freddy mentioned in the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me @angela97lin !
@@ -324,6 +325,7 @@ def _fast_scorer(pipeline, features, X, y, objective): | |||
preds = pipeline.estimator.predict_proba(features) | |||
else: | |||
preds = pipeline.estimator.predict(features) | |||
preds = pipeline.inverse_transform(preds) | |||
if is_regression(pipeline.problem_type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're ok for classification here. In _fast_permutation_importance
we encode the target with the pipeline._encode_targets method. So the estimator predictions will also be encoded.
Of course, if the pipeline does not have an encoder but has string-valued targets this would fail but I would say it's a pipeline definition bug as opposed to a permutation importance bug.
The fact that our classification objectives only supports integer-valued targets and that to score a pipeline it should have an encoder may not be super clear though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! I wish there were a way to make this more clear. Pipelines require a label encoder for not only scoring but also fit so it still feel consistent, but I wonder if moving forward we could make these types of error messages more clear.
Closes #3012