`format_as_html(explain(prediction(...))` fails to show the highlighted text #361

tsela · 2020-02-06T15:07:56Z

Hi,

I'm trying to use eli5 to explain the results of a simple Scikit-Learn pipeline made of a TfIdfVectorizer and a LogisticRegressionCV. In particular, I'm trying to replicate the looks of the results of eli5.show_prediction() as shown in https://eli5.readthedocs.io/en/latest/tutorials/sklearn-text.html, but using format_as_html() and explain_prediction() directly, since I'm building a web app rather than working with Jupyter.

The problem I have is that whatever I'm trying, I only get a weight table as output, and the highlighted text is missing. Even when I set force_weights to False, it still only shows the weight table. I've inspected the output of format_as_html() and I can't find any trace of highlighted text, only the HTML for the table. So it's not a case of styling moving the highlighted text away, it's quite simply missing.

Even checking the source code doesn't help, and I feel like I'm missing something. Is there a reason why I can't get the highlighted text to show up?

The text was updated successfully, but these errors were encountered:

lopuhin · 2020-02-06T15:19:03Z

@tsela I see, that sounds very reasonable. Does eli5.show_prediction() show the text explanations on your pipeline?

tsela · 2020-02-07T09:24:26Z

@lopuhin I just checked, and no, eli5.show_prediction() only shows the weight table as well. Any idea where that comes from? My pipeline is relative simple, with the only complication is that I load the model from a pickled file using joblib, and I use a custom tokeniser in the TfIdfVectoriser. Could one of these be the issue?

lopuhin · 2020-02-07T09:33:25Z

The issue is likely due to a custom tokenizer, here is the relevant code which checks the class I believe:

eli5/eli5/sklearn/text.py

Lines 53 to 70 in 017c738

    
           def _get_doc_weighted_spans(doc, 
        
                                       vec, 
        
                                       feature_weights,  # type: FeatureWeights 
        
                                       feature_fn=None   # type: Optional[Callable[[str], str]] 
        
                                       ): 
        
               # type: (...) -> Optional[Tuple[FoundFeatures, DocWeightedSpans]] 
        
               if isinstance(vec, InvertableHashingVectorizer): 
        
                   vec = vec.vec 
        
               if hasattr(vec, 'get_doc_weighted_spans'): 
        
                   return vec.get_doc_weighted_spans(doc, feature_weights, feature_fn) 
        
               if not isinstance(vec, VectorizerMixin): 
        
                   return None 
        
               span_analyzer, preprocessed_doc = build_span_analyzer(doc, vec) 
        
               if span_analyzer is None: 
        
                   return None

and also

eli5/eli5/sklearn/_span_analyzers.py

Line 7 in 017c738

def build_span_analyzer(document, vec):

so one option it so define a get_doc_weighted_spans method on your vectorizer - sorry this part is not really documented, you'll have to check the source.

tsela · 2020-02-07T10:38:00Z

Thanks for your help! It looks indeed like the custom tokeniser is the problem. I'm kind of misusing the tokeniser to do all the text preprocessing (so that the pickled model is the only thing I have to send around for whoever will be working on the production frontend), so it's quite understandable that this could cause the problem.

I'll see if I can define a get_doc_weighted_spans() method. My tokeniser is lossy (on purpose), so that might be a challenge, but I'll try and see if it's possible.

Thanks for your help!

hohl · 2020-03-04T21:34:37Z

I'm having the same issue. Tried both eli5.show_prediction() and eli5.explain_weights together with eli5.format_as_html. Both show me the table, but no nicely formatted text with coloured overlays.

But I am not using a custom tokenizer. Instead, I even tried to use just a plain TfIdfVectorizer() with all parameters left to their default values and it still didn't work.

I then even tried whether it works with TextExplainer:

te = TextExplainer()
te.fit(samples[0], model.predict_proba)
te.show_prediction()

But then again only showed the table and no highlighted text. Any other ideas what I could try?

Eli5 version is 0.10.1. scikit-learn is 0.22.1. Tried to run on two different machines too: one running Ubuntu and one running macOS.

hohl · 2020-03-05T10:06:54Z

I now even tried to run one of the sample notebooks in this repo. I hoped that I am just using the library wrong, but that does not seem to be the case.

The output of the first block with explain_prediction (..., force_weights=False, ...) does also not show me the highlighted text, but just the weights table, even thought force_weights is set to False in that sample.

I also tried whether it changes anything when I downgrade ELI5 to 0.10 or 0.9.0, but both these versions delivered the same results as the 0.10.1 release.

hohl · 2020-03-05T10:23:53Z

Finally found some configuration that worked: Downgrading scikit-learn to 0.21.3 does finally output the texts. I guess there is some incompatibility of sckit-learn 0.22 and ELI5 0.10.1?

sobayed · 2020-04-07T06:41:45Z

Same issue here, the highlighted text does not show for show_prediction() with ELI5 0.10.1 and scikit-learn 0.22

Querela · 2020-04-07T16:59:19Z

I'm also on the latest versions, trying to get a transformer-based explanation, but just using a prediction method and not getting any highlighted text. According to this: https://eli5.readthedocs.io/en/latest/tutorials/black-box-text-classifiers.html

def predict_proba(docs):
    # here obviously with code ...
   pass

label_list = ["0", "1"]
doc = "My example sentence expressing a strong optionen etc."

# ---

te = TextExplainer(random_state=42)
te.fit(doc, predict_proba)
te.show_prediction(target_names=label_list)

Fishing through the source, it might be some check, like this:

eli5/eli5/sklearn/text.py

Line 65 in 4839d19

if not isinstance(vec, VectorizerMixin):

(suggested by @lopuhin )
The code above is for sklearn, but stuffing my te.doc_, te.vec_, te.explain_prediction().targets[0].feature_weights into _get_doc_weighted_spans fails on VectorizerMixin. This may be related but I can be wrong...

eloukas · 2020-05-08T10:52:26Z

Same problem here. Could not show highlighted text, either in Jupyter, neither in other envs.
Solved it with what @hohl suggested.

bakarep · 2020-05-21T03:45:02Z

@lopuhin , Was this looked into by eli5 team to make sure eli5 is compatible with latest version of sklearn ?
I also tried downgrading sklearn but I am getting other issues like:
ModuleNotFoundError: No module named 'sklearn.feature_selection._univariate_selection'

Request to please help with permanent solution

datanizing · 2020-06-28T19:40:33Z

VectorizerMixin was renamed to _VectorizerMixin in scikit-learn 0.22. Changing that in the code text.py (two occurences) as @Querela mentioned makes it work again.

jonas-nothnagel · 2021-02-05T11:48:02Z

Hi everyone, is there a fix now? I downgraded scikit-learn to 0.21.3 and still are not able to see any highlighted text unfortunately.

Bougeant · 2021-02-10T07:44:17Z

Hi @jonas-nothnagel, it looks like @icfly2 has submitted a PR for this fix. I'm not sure what needs to happen for his work to be merged in.

lopuhin · 2021-02-10T07:52:29Z

@Bougeant could you please link which PR was that? Existing sklearn compatibility PRs were merged in eli5-org/eli5#2 and released with v0.11 - so if still does not work with v0.11, then it's something else. And sorry for confusion with different repos - I still hope we can get back to this one.

jonas-nothnagel · 2021-02-10T09:28:49Z

Thank you! It is such an important feature, in my opinion, to be able to explain why our models do predictions that I almost wonder why it is not implemented in many more libraries.
For now I hardcoded around the issue by extracting the top 5, top 5-10 and last 5, last 5-10 feature names and weights from the explain_prediction() function, match them with the original text and highlight the words with html markdown commands. It is a bit hacky but works as well. I could also add the weights.

Bougeant · 2021-02-10T10:30:08Z

@lopuhin Sorry I did not actually check if it was still failing. Awesome that you guys fixed that.

lopuhin · 2021-02-10T10:38:23Z

right, there are multiple issues - I think original issue was about showing explanation for a custom pipeline, then when a new sklearn was released and we didn't support it, we were failing earlier - now we support latest sklearn, but the original issue of highlighting for a more custom pipeline remains.

jonas-nothnagel · 2021-02-10T11:56:57Z

I still do not see highlighted text, even using no pipelines and just simply specifying tfidf vectorizer and, for example, a logistic regression.
Can you share under what circumstances you obtain the highlighted text?

cbjrobertson · 2022-12-08T19:28:23Z

Has any progress been made on this? I'm using sklearn 1.2.0 and eli5 0.13.0 in python 3.9 and running into this issue. Down grading sklearn no longer works, it just gives rise to a host of incompatibility errors.

Bougeant mentioned this issue Sep 5, 2020

formatting html #388

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`format_as_html(explain(prediction(...))` fails to show the highlighted text #361

`format_as_html(explain(prediction(...))` fails to show the highlighted text #361

tsela commented Feb 6, 2020

lopuhin commented Feb 6, 2020

tsela commented Feb 7, 2020

lopuhin commented Feb 7, 2020

tsela commented Feb 7, 2020

hohl commented Mar 4, 2020 •

edited

hohl commented Mar 5, 2020 •

edited

hohl commented Mar 5, 2020

sobayed commented Apr 7, 2020

Querela commented Apr 7, 2020

eloukas commented May 8, 2020

bakarep commented May 21, 2020

datanizing commented Jun 28, 2020

jonas-nothnagel commented Feb 5, 2021

Bougeant commented Feb 10, 2021

lopuhin commented Feb 10, 2021

jonas-nothnagel commented Feb 10, 2021 •

edited

Bougeant commented Feb 10, 2021

lopuhin commented Feb 10, 2021

jonas-nothnagel commented Feb 10, 2021 •

edited

cbjrobertson commented Dec 8, 2022

format_as_html(explain(prediction(...)) fails to show the highlighted text #361

format_as_html(explain(prediction(...)) fails to show the highlighted text #361

Comments

tsela commented Feb 6, 2020

lopuhin commented Feb 6, 2020

tsela commented Feb 7, 2020

lopuhin commented Feb 7, 2020

tsela commented Feb 7, 2020

hohl commented Mar 4, 2020 • edited

hohl commented Mar 5, 2020 • edited

hohl commented Mar 5, 2020

sobayed commented Apr 7, 2020

Querela commented Apr 7, 2020

eloukas commented May 8, 2020

bakarep commented May 21, 2020

datanizing commented Jun 28, 2020

jonas-nothnagel commented Feb 5, 2021

Bougeant commented Feb 10, 2021

lopuhin commented Feb 10, 2021

jonas-nothnagel commented Feb 10, 2021 • edited

Bougeant commented Feb 10, 2021

lopuhin commented Feb 10, 2021

jonas-nothnagel commented Feb 10, 2021 • edited

cbjrobertson commented Dec 8, 2022

`format_as_html(explain(prediction(...))` fails to show the highlighted text #361

`format_as_html(explain(prediction(...))` fails to show the highlighted text #361

hohl commented Mar 4, 2020 •

edited

hohl commented Mar 5, 2020 •

edited

jonas-nothnagel commented Feb 10, 2021 •

edited

jonas-nothnagel commented Feb 10, 2021 •

edited