Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improved LIME for text data #110

Merged
merged 60 commits into from
Dec 30, 2016
Merged

improved LIME for text data #110

merged 60 commits into from
Dec 30, 2016

Conversation

kmike
Copy link
Contributor

@kmike kmike commented Dec 13, 2016

This is a work-in-progress.

TODO:

  • tutorial for debugging complex text processing pipelines using LIME;
  • cleanup LIME documentation, remove redundancy;
  • TextExplainer should handle target_names better when there is no examples of some class in the generated dataset;
  • fix test coverage

This PR also fixes #102 and #39.

@codecov-io
Copy link

codecov-io commented Dec 14, 2016

Current coverage is 97.27% (diff: 98.78%)

Merging #110 into master will decrease coverage by 0.03%

@@             master       #110   diff @@
==========================================
  Files            35         37     +2   
  Lines          1894       2090   +196   
  Methods           0          0          
  Messages          0          0          
  Branches        362        390    +28   
==========================================
+ Hits           1843       2033   +190   
- Misses           24         28     +4   
- Partials         27         29     +2   
Diff Coverage File Path
••••••••• 96% eli5/lime/lime.py
••••••••• 97% new eli5/lime/_vectorizer.py
••••••••• 97% eli5/lime/utils.py
•••••••••• 100% eli5/lime/textutils.py
•••••••••• 100% eli5/sklearn_crfsuite/explain_weights.py
•••••••••• 100% new eli5/sklearn/_span_analyzers.py
•••••••••• 100% eli5/utils.py
•••••••••• 100% eli5/sklearn/utils.py
•••••••••• 100% eli5/_graphviz.py
•••••••••• 100% eli5/ipython.py

Review all 27 files changed

Powered by Codecov. Last update 5f4e975...aebd959

@kmike kmike force-pushed the lime-text branch 5 times, most recently from 24e79b4 to b73b393 Compare December 27, 2016 19:06
@kmike kmike added this to the 0.3 milestone Dec 28, 2016
@kmike kmike changed the title [wip] improved LIME for text data improved LIME for text data Dec 29, 2016
* add decision tree example;
* add some docs for sampling;
* other documentation improvements.
@kmike
Copy link
Contributor Author

kmike commented Dec 29, 2016

//cc @lopuhin I think this is ready. Do you have any comments?

Tutorial doesn't explain all options - most notably, it doesn't use position_dependent=True flag and doesn't explain rbf_sigma parameter. I know how they work and what they intend to do, but I don't know when should one use them in practice, so it is not in the tutorial :)

Tutorial: https://github.com/TeamHG-Memex/eli5/blob/d6786c56f51d8fa485829adcf641967ff8839416/notebooks/TextExplainer.ipynb


If a library is not supported by eli5 directly, or the text processing
pipeline is too complex for eli5, eli5 can still help - it provides an
implementation of LIME (Ribeiro et al., 2016) algorithm which allows to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to link to the paper (https://arxiv.org/abs/1602.04938)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense.

@lopuhin
Copy link
Contributor

lopuhin commented Dec 30, 2016

Hey @kmike , I just finished reading the notebook, it looks great - I like that most important stuff is at the start, and you show how it can break.

KL divergence and score of the white-box classifier predictions of the black-box classifier seem to be important to know if explanation should be trusted - does it make sense to include them in explanation output by default?

@lopuhin
Copy link
Contributor

lopuhin commented Dec 30, 2016

... does it make sense to include them in explanation output by default?

Hm, but I see it's not really convenient to implement, and it's not obvious that they are needed in the explanation.

The PR looks great! I didn't know about # type: (...) -> syntax, that is much more convenient.

@kmike
Copy link
Contributor Author

kmike commented Dec 30, 2016

I was also thinking about adding scores to the output by default, but it was not straightforward to implement indeed. Adding a custom field just for scores looks like a bit too much; putting it to description is not enough because description is hidden by default.

@kmike kmike merged commit ad4e6bf into master Dec 30, 2016
@kmike kmike deleted the lime-text branch December 30, 2016 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LIME: add support for char-based text classifiers
3 participants