Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

format_as_html(explain(prediction(...)) fails to show the highlighted text #361

Open
tsela opened this issue Feb 6, 2020 · 20 comments
Open

Comments

@tsela
Copy link

tsela commented Feb 6, 2020

Hi,

I'm trying to use eli5 to explain the results of a simple Scikit-Learn pipeline made of a TfIdfVectorizer and a LogisticRegressionCV. In particular, I'm trying to replicate the looks of the results of eli5.show_prediction() as shown in https://eli5.readthedocs.io/en/latest/tutorials/sklearn-text.html, but using format_as_html() and explain_prediction() directly, since I'm building a web app rather than working with Jupyter.

The problem I have is that whatever I'm trying, I only get a weight table as output, and the highlighted text is missing. Even when I set force_weights to False, it still only shows the weight table. I've inspected the output of format_as_html() and I can't find any trace of highlighted text, only the HTML for the table. So it's not a case of styling moving the highlighted text away, it's quite simply missing.

Even checking the source code doesn't help, and I feel like I'm missing something. Is there a reason why I can't get the highlighted text to show up?

@lopuhin
Copy link
Contributor

lopuhin commented Feb 6, 2020

@tsela I see, that sounds very reasonable. Does eli5.show_prediction() show the text explanations on your pipeline?

@tsela
Copy link
Author

tsela commented Feb 7, 2020

@lopuhin I just checked, and no, eli5.show_prediction() only shows the weight table as well. Any idea where that comes from? My pipeline is relative simple, with the only complication is that I load the model from a pickled file using joblib, and I use a custom tokeniser in the TfIdfVectoriser. Could one of these be the issue?

@lopuhin
Copy link
Contributor

lopuhin commented Feb 7, 2020

The issue is likely due to a custom tokenizer, here is the relevant code which checks the class I believe:

def _get_doc_weighted_spans(doc,
vec,
feature_weights, # type: FeatureWeights
feature_fn=None # type: Optional[Callable[[str], str]]
):
# type: (...) -> Optional[Tuple[FoundFeatures, DocWeightedSpans]]
if isinstance(vec, InvertableHashingVectorizer):
vec = vec.vec
if hasattr(vec, 'get_doc_weighted_spans'):
return vec.get_doc_weighted_spans(doc, feature_weights, feature_fn)
if not isinstance(vec, VectorizerMixin):
return None
span_analyzer, preprocessed_doc = build_span_analyzer(doc, vec)
if span_analyzer is None:
return None
and also
def build_span_analyzer(document, vec):

so one option it so define a get_doc_weighted_spans method on your vectorizer - sorry this part is not really documented, you'll have to check the source.

@tsela
Copy link
Author

tsela commented Feb 7, 2020

Thanks for your help! It looks indeed like the custom tokeniser is the problem. I'm kind of misusing the tokeniser to do all the text preprocessing (so that the pickled model is the only thing I have to send around for whoever will be working on the production frontend), so it's quite understandable that this could cause the problem.

I'll see if I can define a get_doc_weighted_spans() method. My tokeniser is lossy (on purpose), so that might be a challenge, but I'll try and see if it's possible.

Thanks for your help!

@hohl
Copy link

hohl commented Mar 4, 2020

I'm having the same issue. Tried both eli5.show_prediction() and eli5.explain_weights together with eli5.format_as_html. Both show me the table, but no nicely formatted text with coloured overlays.

But I am not using a custom tokenizer. Instead, I even tried to use just a plain TfIdfVectorizer() with all parameters left to their default values and it still didn't work.

I then even tried whether it works with TextExplainer:

te = TextExplainer()
te.fit(samples[0], model.predict_proba)
te.show_prediction()

But then again only showed the table and no highlighted text. Any other ideas what I could try?

Eli5 version is 0.10.1. scikit-learn is 0.22.1. Tried to run on two different machines too: one running Ubuntu and one running macOS.

@hohl
Copy link

hohl commented Mar 5, 2020

I now even tried to run one of the sample notebooks in this repo. I hoped that I am just using the library wrong, but that does not seem to be the case.

The output of the first block with explain_prediction (..., force_weights=False, ...) does also not show me the highlighted text, but just the weights table, even thought force_weights is set to False in that sample.

I also tried whether it changes anything when I downgrade ELI5 to 0.10 or 0.9.0, but both these versions delivered the same results as the 0.10.1 release.

@hohl
Copy link

hohl commented Mar 5, 2020

Finally found some configuration that worked: Downgrading scikit-learn to 0.21.3 does finally output the texts. I guess there is some incompatibility of sckit-learn 0.22 and ELI5 0.10.1?

@sobayed
Copy link

sobayed commented Apr 7, 2020

Same issue here, the highlighted text does not show for show_prediction() with ELI5 0.10.1 and scikit-learn 0.22

@Querela
Copy link

Querela commented Apr 7, 2020

I'm also on the latest versions, trying to get a transformer-based explanation, but just using a prediction method and not getting any highlighted text. According to this: https://eli5.readthedocs.io/en/latest/tutorials/black-box-text-classifiers.html

def predict_proba(docs):
    # here obviously with code ...
   pass

label_list = ["0", "1"]
doc = "My example sentence expressing a strong optionen etc."

# ---

te = TextExplainer(random_state=42)
te.fit(doc, predict_proba)
te.show_prediction(target_names=label_list)

Fishing through the source, it might be some check, like this:

if not isinstance(vec, VectorizerMixin):
(suggested by @lopuhin )
The code above is for sklearn, but stuffing my te.doc_, te.vec_, te.explain_prediction().targets[0].feature_weights into _get_doc_weighted_spans fails on VectorizerMixin. This may be related but I can be wrong...

@eloukas
Copy link

eloukas commented May 8, 2020

Same problem here. Could not show highlighted text, either in Jupyter, neither in other envs.
Solved it with what @hohl suggested.

@bakarep
Copy link

bakarep commented May 21, 2020

@lopuhin , Was this looked into by eli5 team to make sure eli5 is compatible with latest version of sklearn ?
I also tried downgrading sklearn but I am getting other issues like:
ModuleNotFoundError: No module named 'sklearn.feature_selection._univariate_selection'

Request to please help with permanent solution

@datanizing
Copy link

VectorizerMixin was renamed to _VectorizerMixin in scikit-learn 0.22. Changing that in the code text.py (two occurences) as @Querela mentioned makes it work again.

@jonas-nothnagel
Copy link

Hi everyone, is there a fix now? I downgraded scikit-learn to 0.21.3 and still are not able to see any highlighted text unfortunately.

@Bougeant
Copy link

Hi @jonas-nothnagel, it looks like @icfly2 has submitted a PR for this fix. I'm not sure what needs to happen for his work to be merged in.

@lopuhin
Copy link
Contributor

lopuhin commented Feb 10, 2021

@Bougeant could you please link which PR was that? Existing sklearn compatibility PRs were merged in eli5-org/eli5#2 and released with v0.11 - so if still does not work with v0.11, then it's something else. And sorry for confusion with different repos - I still hope we can get back to this one.

@jonas-nothnagel
Copy link

jonas-nothnagel commented Feb 10, 2021

Thank you! It is such an important feature, in my opinion, to be able to explain why our models do predictions that I almost wonder why it is not implemented in many more libraries.
For now I hardcoded around the issue by extracting the top 5, top 5-10 and last 5, last 5-10 feature names and weights from the explain_prediction() function, match them with the original text and highlight the words with html markdown commands. It is a bit hacky but works as well. I could also add the weights.
image

@Bougeant
Copy link

@lopuhin Sorry I did not actually check if it was still failing. Awesome that you guys fixed that.

@lopuhin
Copy link
Contributor

lopuhin commented Feb 10, 2021

right, there are multiple issues - I think original issue was about showing explanation for a custom pipeline, then when a new sklearn was released and we didn't support it, we were failing earlier - now we support latest sklearn, but the original issue of highlighting for a more custom pipeline remains.

@jonas-nothnagel
Copy link

jonas-nothnagel commented Feb 10, 2021

I still do not see highlighted text, even using no pipelines and just simply specifying tfidf vectorizer and, for example, a logistic regression.
Can you share under what circumstances you obtain the highlighted text?

@cbjrobertson
Copy link

Has any progress been made on this? I'm using sklearn 1.2.0 and eli5 0.13.0 in python 3.9 and running into this issue. Down grading sklearn no longer works, it just gives rise to a host of incompatibility errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests