-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataframe export #211
Dataframe export #211
Conversation
Codecov Report
@@ Coverage Diff @@
## master #211 +/- ##
=========================================
+ Coverage 97.4% 97.4% +<.01%
=========================================
Files 41 42 +1
Lines 2585 2663 +78
Branches 496 514 +18
=========================================
+ Hits 2518 2594 +76
Misses 35 35
- Partials 32 34 +2
|
docs/requirements.txt
Outdated
@@ -2,4 +2,5 @@ ipython | |||
scipy | |||
numpy > 1.9.0 | |||
scikit-learn >= 0.18 | |||
pandas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right; it could make sense to remove IPython from here, as it is also optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kmike I still don't understand the failure here: https://travis-ci.org/TeamHG-Memex/eli5/jobs/239548322#L508 - it tries to import pandas
and fails, and it's indeed imported at the module level, but it's the same for lightdb/xgboost/lightning, so I don't understand yet why they don't fail the docs build, but pandas does fail.
On the other hand, don't we want to check all libraries docs when doing the travis docs build? So maybe it makes sense to include all optional ones here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lopuhin unsupported libraries are mocked out here for docs: https://github.com/TeamHG-Memex/eli5/blob/master/docs/source/conf.py#L38
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kmike aha, thanks! That's what I was missing. Let me mock pandas there too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems ipython is in requirements.txt because for IPython mock didn't work for some reason. But I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in e5f4fba
it seems ipython is in requirements.txt because for IPython mock didn't work for some reason. But I'm not sure.
Yes, just noticed, likely you already tried to mock it :)
neg=[], | ||
)), | ||
], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be good to add tests for format_as_dataframe(s) applied to explain_weights / explain_prediction results; existing tests will still pass for DataFrame if we change output format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, that would be much more robust. I added most tests in b4bc427, going to add CRF checks right in existing CRF tests - they show a failure during export by the way :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And CRF tests done in 83f1cfc, now everything is covered I think.
eli5/formatters/as_dataframe.py
Outdated
|
||
|
||
@format_as_dataframe.register(FeatureImportances) | ||
def feature_importances_to_df(feature_importances): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about making such functions private, as they are implementation details of format_.. functions? Or do you see them as a part of public API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, we don't want to expose them, fixed in 1e235a2, thanks!
Looks good! There is a gotcha though, which could make DataFrame support less useful / less easy to use: feature weights are filtered out at explain_... stage, so DataFrame only contains a few values by default, not all weights. This reduces usefulness of DataFrame export format, as there is not a lot one may want to do with e.g. top-20 features. So users in most cases should bump the limit to a very large value before using format_as_dataframes. I wonder if it makes sense to add functions similar to show_.. functions, but tailored to DataFrames ( Without these helpers it'd be good to at least document this gotcha, and maybe provide an example of working with features sing DataFrame format, maybe even a small tutorial. Some recipe ideas:
This all doesn't have to be a part of this pull request, but I think we should document or fix gotcha with limits before the release. |
@kmike yeah, that's a very important point. I had another idea of how to address it, but I'm not sure how viable it is. Here it is:
The major problem with this idea is that we can't do this naively without sacrificing performance: I just tried
I like the idea of this helpers - they will also make dataframe export more visible, and are simpler to implement. |
I think it can be merged almost as-is, if we add a warning to format_as_dataframe(s) methods about It won't be the final user-facing API (explain_weights_df(s)?), and there won't be tutorials, but it is already a good improvement if you don't have time to finish it at the moment. |
@kmike I don't think I have time to try my suggestion from #211 (comment) at the moment, but I can at least wrap it up on Friday: add helpers and warnings (it's a blessing that notebooks show them by default). |
@lopuhin sounds good, thanks! |
I was thinking about warnings in docstrings, but showing real warnings also make sense. When we should show them? When default |
Thanks for idea @kmike!
d689b48
to
9432781
Compare
They combine explanation and export to DataFrame and set top to None by default.
Maybe it was due to a "cyclic" import from mypy pov, but it's still very strange.
@kmike I added helpers, notes in the docs about missing features, and also raise warnings (e600f35) if any features are really missing. |
@lopuhin looks good, thanks! I'll merge it without a warning. |
Merged, thanks @lopuhin! |
Thanks @kmike ! |
Related to #196
Add
format_as_dataframe
andformat_as_dataframes
, as discussed here #196 (comment)TODO: