New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Human Readable Pipeline Explanations #2861
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2861 +/- ##
=======================================
- Coverage 99.8% 99.4% -0.4%
=======================================
Files 305 307 +2
Lines 28373 28670 +297
=======================================
+ Hits 28292 28471 +179
- Misses 81 199 +118
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments and questions on things I think would help clarify this! Great work on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks amazing, well done! 👏
I left a few comments for cleanup. I'm also curious about max_features. I see in the docs that there are more than the default (5) features that are considered detrimental?: https://feature-labs-inc-evalml--2861.com.readthedocs.build/en/2861/user_guide/model_understanding.html#Human-Readable-Importance
) | ||
from evalml.pipelines import BinaryClassificationPipeline | ||
|
||
elasticnet_component_graph = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this a pytest fixture so it doesn't automatically run if we need this! :)
I'm curious if there are any other pyfixture pipelines we can use, or if this is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it into a fixture!
There's a Logistic Regression Classifier
pyfixture I tried to replace this with, but there's a woodwork related bug that prevents the code from running successfully :( so I'll leave this here for now and we can consider swapping the pipeline out once the bug is fixed!
evalml/tests/model_understanding_tests/test_feature_explanations.py
Outdated
Show resolved
Hide resolved
evalml/tests/model_understanding_tests/test_feature_explanations.py
Outdated
Show resolved
Hide resolved
evalml/tests/model_understanding_tests/test_feature_explanations.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this @eccabay . The testing is super thorough, the code is really clean. Really, quite marvellous. I think this exceeds what we were expecting from this issue and will really help bridge the gap between your more casual user and your more advanced user.
I left a few open ended questions and suggestions. Nothing particularly blocking. Take them or leave them. Again, great work.
from evalml.model_understanding import calculate_permutation_importance | ||
from evalml.utils import get_logger, infer_feature_types | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels to me that the supreme convenience might be to make the current readable_explanation()
from a stand alone function to a instance function on a pipeline so that your work flow goes from:
from evalml.model_understanding.feature_explanations import readable_explanation
...
pl = ...blah...
pl.fit()
pl.predict()
readable_explanation(pl)
into
pl = ...blah...
pl.fit()
pl.predict()
pl.explain()
I feel like that's the ultimate convenience. Does it fit and make sense with our current API? I dunno. But to me as a relatively novice user compared to all of you, that makes some sense to me as an easy way to get into explanations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, not blocking. If you like it better as-is, no problemo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, take it up with @freddyaboulton (https://alteryx.atlassian.net/wiki/spaces/PS/pages/1044611439/Human-readable+Pipeline+Descriptions?focusedCommentId=1051231012#comment-1051231012) 😂
I'm really ambivalent either way!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bahaha, I have thoughts but no opinions on this, buuut I will say that it's really not a big lift if we decide later down the line that we want one or the other so this is fine either way :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you for addressing everything! I replied to your comment about the circular import / reusing method and think it'd still be a good idea to use the method we already have. Otherwise, just left some non-blocking suggestions/comments. Excited for this to get in!! 😁
Closes #2194. Design doc here