Opinion Papers

Pathologies of Neural Models Make Interpretations Difficult Model pathologies are due to model over-confidence or second order sensitivity. The first issue is addressed by training a classifier to be less certain about examples with reduced number of words. The second problem is interesting and understudied. There are only some indicators of the second order sensitivity, which in essense is common among interpretability techniques and occurse when slightly changing the heatmap of the input, which changes immensely the prediction of the interpretability technique. It would be interesting to study more which are the more stable interpretability techniques.

Expalanation Studies in particular domains

Social Sciences

Explanation in artificial intelligence: Insights from the social sciences Views from psychology and philosophy on explanations. Among many other things, authors point that explanations can have different forms, where the most useful are the contrastive explanations (can be 3 types: Why does object a have property P, rather than property Q?, Why does object a have property P, while object b has property Q?, Why does object a have property P at time t, but property Q at time t′)]

Fact Checking

Explainable Fact Checking with Probabilistic Answer Set Programming, TTO 2019 - Retrieve tripes from knowledge graphs and combine them using rules to produce explanations for fact checking

Machine Reaching Comprehension / Question Answering

Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering, EMNLP 2020 - 3 datasets and delexicalized chain representations in which repeated noun phrases are replaced by variables, thus turning them into generalized reasoning chains

Explainability Techniques

Saliency maps

Hierarchical interpretations for neural network predictions, ICLR 2019 - Provide a hierarchical visualisation of how words contribute to phrases and how phrases contribute to bigger piesces of text and eventually to the overall prediction.
Towards a Deep and Unified Understanding of Deep Neural Models in NLP, ICML 2019 Provide a technique to show how word saliency develops through all of the layers until the final output. The authors compare their approach to LRP, perturbations and gradients. They also provide a comparison between explanations from BERT, Transformer, LSTM, CNN.

Generating Rationales

Towards Explainable NLP: A Generative Explanation Framework for Text Classification, ACL 2019
Why do you think that? Exploring Faithful Sentence-Level Rationales Without Supervision, EMNLP 2020 - a differentiable training-framework to create models which output faithful rationales on a sentence level, by solely applying supervision on the target task; model solves the task based on each rationale individually and learns to assign high scores to those which solved the task best
F1 is Not Enough! Models and Evaluation Towards User-Centered Explainable Question Answering, EMNLP 2020 - two novel evaluation scores: (i) tracks prediction changes when removing facts, (ii) assesses whether the answer is contained in the explanation or not; Further strengthen the coupling of answer and explanation prediction in the model architecture and during training

Improving Model Interpretability

Learning Variational Word Masks to Improve the Interpretability of Neural Text Classifiers, EMNLP 2020 - variational word masks (VMASK) that are inserted into a neural text classifier, after the word embedding layer, and trained jointly with the model. VMASK learns to restrict the information of globally irrelevant or noisy wordlevel features flowing to subsequent network layers, hence forcing the model to focus on important features to make predictions

Other

Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics. EMNLP 2020 - a model-based tool to characterize and diagnose datasets
How does this interaction affect me? Interpretable attribution for feature interactions, NeurlIPS 2020 - propose an interaction attribution and detection framework called Archipelago; scalable in real-world settings; more interpretable explanations than comparable methods, which is important for analyzing the impact of interactions on predictions

Adversarial Attacks as Model Interpretations

Evaluation

Sanity Checks for Saliency Maps Authors propose model parameter and label randomization to estimate the invariance of the interpretability tools to the model and the data. They find that guided BackProb and Grad CAM are invariant to both.
Manipulating and Measuring Model Interpretability The authors examine how number of features and transparency of model influence the model interpretability. They found that a smaller model was easier for simulation. However, showing a simpler model did not help the annotators to correct the model's behavior or identify wrong decisions.
Human-grounded Evaluations of Explanation Methods for Text Classification, EMNLP 2019 The authors design three tasks, where humans evaluate the following explanation techniques: LIME, LRP, DeepLIFT, Grad-CAM-Text and Decision Trees (for words and n-grams). They find that LIME is the most class-discriminative approach. Unfortunately, the annotator agreement is considerably low in most tasks and one general improvement would be to provide the words and n-grams together with the context they appear in.
Analysis Methods in Neural Language Processing: A Survey, TACL 2019
Interpretation of Neural Networks is Fragile The authors devise adversarial attacks with different types of perturbations, which do not change the prediction or the confidence scores of the model, but change the explanation with a lot. This, however does not mean that the components that the different interpretation techniques use did not change. It might be a good idea to measure the change of the model's weights change, not only the confidence, because the model might pick up other words and still be confident (pathologies in neural networks). The most robust method turned to be the integrated gradients. The analysis is for images.

Annotated Human Rationales and Datasets

On Human Rationales

From Language to Language-ish: How Brain-Like is an LSTM's Representation of Nonsensical Language Stimuli?, EMNLP 2020 - The syntactic signatures available in Sentence and Jabberwocky LSTM representations are similar, and can be predicted from either the Sentence or Jabberwocky EEG. From our results, we can infer which LSTM representations encode semantic and/or syntactic information. We confirm using syntactic and semantic probing tasks. Our results show that there are similarities between the way the brain and an LSTM represent stimuli from both the Sentence (within-distribution) and Jabberwocky (out-of-distribution) conditions.
Evaluating and Characterizing Human Rationales, EMNLP 2020 - An open question, however, is how human rationales fare with these automatic metrics - do not necessarily perform well- reveal irrelevance and redundancy. Our work leads to actionable suggestions for evaluating and characterizing rationales.

Datasets with Highlights

e-SNLI e-SNLI: Natural Language Inference with Natural Language Explanations
CoS-E Explain Yourself! Leveraging Language Models for Commonsense Reasoning
BeerAdvocate Rationalizing Neural Predictions
BabbleLabble Training Classifiers with Natural Language Explanations

Datasets with Textual explanations

Demos

AllenNLP INterpret, Demo

Tutorials

Interpretable Machine Learning [Interpretability EMNLP 2020]https://github.com/Eric-Wallace/interpretability-tutorial-emnlp2020/

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Repository files navigation

Opinion Papers

Expalanation Studies in particular domains

Social Sciences

Fact Checking

Machine Reaching Comprehension / Question Answering

Explainability Techniques

Saliency maps

Generating Rationales

Improving Model Interpretability

Other

Adversarial Attacks as Model Interpretations

Evaluation

Annotated Human Rationales and Datasets

On Human Rationales

Datasets with Highlights

Datasets with Textual explanations

Demos

Tutorials

About

Releases

Packages

Contributors 2

copenlu/awesome-text-interpretability

Folders and files

Latest commit

History

README.md

README.md

Repository files navigation

Opinion Papers

Expalanation Studies in particular domains

Social Sciences

Fact Checking

Machine Reaching Comprehension / Question Answering

Explainability Techniques

Saliency maps

Generating Rationales

Improving Model Interpretability

Other

Adversarial Attacks as Model Interpretations

Evaluation

Annotated Human Rationales and Datasets

On Human Rationales

Datasets with Highlights

Datasets with Textual explanations

Demos

Tutorials

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages