Skip to content

A repo to keep all resources about interpretability in NLP organised and up to date

Notifications You must be signed in to change notification settings

copenlu/awesome-text-interpretability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 

Repository files navigation

Opinion Papers

The mythos of model interpretability

Pathologies of Neural Models Make Interpretations Difficult Model pathologies are due to model over-confidence or second order sensitivity. The first issue is addressed by training a classifier to be less certain about examples with reduced number of words. The second problem is interesting and understudied. There are only some indicators of the second order sensitivity, which in essense is common among interpretability techniques and occurse when slightly changing the heatmap of the input, which changes immensely the prediction of the interpretability technique. It would be interesting to study more which are the more stable interpretability techniques.

Expalanation Studies in particular domains

Social Sciences

  • Explanation in artificial intelligence: Insights from the social sciences Views from psychology and philosophy on explanations. Among many other things, authors point that explanations can have different forms, where the most useful are the contrastive explanations (can be 3 types: Why does object a have property P, rather than property Q?, Why does object a have property P, while object b has property Q?, Why does object a have property P at time t, but property Q at time t′)]

Fact Checking

Machine Reaching Comprehension / Question Answering

Explainability Techniques

Saliency maps

Generating Rationales

Improving Model Interpretability

Other

Adversarial Attacks as Model Interpretations

Evaluation

  • Sanity Checks for Saliency Maps Authors propose model parameter and label randomization to estimate the invariance of the interpretability tools to the model and the data. They find that guided BackProb and Grad CAM are invariant to both.

  • Manipulating and Measuring Model Interpretability The authors examine how number of features and transparency of model influence the model interpretability. They found that a smaller model was easier for simulation. However, showing a simpler model did not help the annotators to correct the model's behavior or identify wrong decisions.

  • Human-grounded Evaluations of Explanation Methods for Text Classification, EMNLP 2019 The authors design three tasks, where humans evaluate the following explanation techniques: LIME, LRP, DeepLIFT, Grad-CAM-Text and Decision Trees (for words and n-grams). They find that LIME is the most class-discriminative approach. Unfortunately, the annotator agreement is considerably low in most tasks and one general improvement would be to provide the words and n-grams together with the context they appear in.

  • Analysis Methods in Neural Language Processing: A Survey, TACL 2019

  • Interpretation of Neural Networks is Fragile The authors devise adversarial attacks with different types of perturbations, which do not change the prediction or the confidence scores of the model, but change the explanation with a lot. This, however does not mean that the components that the different interpretation techniques use did not change. It might be a good idea to measure the change of the model's weights change, not only the confidence, because the model might pick up other words and still be confident (pathologies in neural networks). The most robust method turned to be the integrated gradients. The analysis is for images.

Annotated Human Rationales and Datasets

On Human Rationales

  • From Language to Language-ish: How Brain-Like is an LSTM's Representation of Nonsensical Language Stimuli?, EMNLP 2020 - The syntactic signatures available in Sentence and Jabberwocky LSTM representations are similar, and can be predicted from either the Sentence or Jabberwocky EEG. From our results, we can infer which LSTM representations encode semantic and/or syntactic information. We confirm using syntactic and semantic probing tasks. Our results show that there are similarities between the way the brain and an LSTM represent stimuli from both the Sentence (within-distribution) and Jabberwocky (out-of-distribution) conditions.
  • Evaluating and Characterizing Human Rationales, EMNLP 2020 - An open question, however, is how human rationales fare with these automatic metrics - do not necessarily perform well- reveal irrelevance and redundancy. Our work leads to actionable suggestions for evaluating and characterizing rationales.

Datasets with Highlights

Datasets with Textual explanations

Demos

AllenNLP INterpret, Demo

Tutorials

Interpretable Machine Learning [Interpretability EMNLP 2020]https://github.com/Eric-Wallace/interpretability-tutorial-emnlp2020/

About

A repo to keep all resources about interpretability in NLP organised and up to date

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published