# Detecting Annotation Artifacts with Local Explanations

- Add a definition of data artifact 

- Introduce a controled example from Section 6.1. here: https://arxiv.org/abs/2107.00323

- Re-iterate what is the methodology for artifact discovery, Section 4 above

## Task and Dataset

We will demonstrate how to detect annotation artifacts in the context of **binary sentiment classification** of movie reviews, i.e., classifying a given movie review as positive or negative. One commonly used dataset for this task is the [IMDB dataset](http://ai.stanford.edu/~amaas/data/sentiment/). A useful library for accessing NLP datasets is [datasets by Huggingface](https://huggingface.co/docs/datasets/) 🤗. We can load the IMDB dataset using `datasets` as follows:

In [1]:
from datasets import load_dataset

In [2]:
!jupyter nbextension enable --py widgetsnbextension --sys-prefix

Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: [32mOK[0m


In [3]:
dataset = load_dataset("imdb")

Reusing dataset imdb (/Users/anam/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1)


  0%|          | 0/3 [00:00<?, ?it/s]

This loads a `DatasetDict` object which you can index into to view an example:

In [4]:
dataset["train"][0]

{'text': 'I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it when it was first released in 1967. I also heard that at first it was seized by U.S. customs if it ever tried to enter this country, therefore being a fan of films considered "controversial" I really had to see this for myself.<br /><br />The plot is centered around a young Swedish drama student named Lena who wants to learn everything she can about life. In particular she wants to focus her attentions to making some sort of documentary on what the average Swede thought about certain political issues such as the Vietnam War and race issues in the United States. In between asking politicians and ordinary denizens of Stockholm about their opinions on politics, she has sex with her drama teacher, classmates, and married men.<br /><br />What kills me about I AM CURIOUS-YELLOW is that 40 years ago, this was considered pornographic. Really, the sex and nudity scenes are few and far be

You can select an example randomly like this:

In [5]:
dataset["train"].shuffle().select(range(1))[0]

{'text': "IN LOVING MEMORY OF DAVID TOMLINSON (1917-2000)<br /><br />When I watched this movie for the first time I was 4 years old and I got fascinated by this story of witches in the 2nd World War. The scene, which impressed me the most, was the fight between the Nazi soldiers and the medieval army. It was exceptional to see this army without a body walk to fight the astonished singing their march. This movie is fantastic, from the trip to Portobello Road (which became to me the most fantastic place of London) to the journey to Naboomboo. Angela Lansbury and David Tomlinson are really a fantastic couple. She is always great, it seems the good aunt of a family and David with his always astonished face is her great co-protagonist. we'll miss him a lot.",
 'label': 1}

`'label': 0` stands for `negative`, and `1` for `positive`.

## Annotation Artifact

It has been reported that neural models solely use the numberical rating at the end of the review instead of reading and understanding the semantics of the review. 

## Model

In [13]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from transformers import pipeline

model_name = "aychang/roberta-base-imdb"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
nlp = pipeline("sentiment-analysis", model=model_name, tokenizer=model_name)

In [14]:
classifier = nlp(["I didn't really like it because it was so terrible.", "I love how easy it is to watch and get good results."])
print (results)

[{'label': 'neg', 'score': 0.9984696507453918}, {'label': 'pos', 'score': 0.9991361498832703}]


## Gradient-Based Highlights

In [24]:
from allennlp.interpret.saliency_interpreters import SimpleGradient
from allennlp.predictors import Predictor

archive = (
    "https://storage.googleapis.com/allennlp-public-models/"
    "basic_stanford_sentiment_treebank-2020.06.09.tar.gz"
)
predictor = Predictor.from_path(archive)
interpreter = SimpleGradient(predictor)

2022-02-25 18:36:51,548 - ERROR - allennlp.common.plugins - Plugin allennlp_models could not be loaded: No module named 'transformers.tokenization_bert'
2022-02-25 18:36:51,665 - INFO - cached_path - cache of https://storage.googleapis.com/allennlp-public-models/basic_stanford_sentiment_treebank-2020.06.09.tar.gz is up-to-date
2022-02-25 18:36:51,666 - INFO - allennlp.models.archival - loading archive file https://storage.googleapis.com/allennlp-public-models/basic_stanford_sentiment_treebank-2020.06.09.tar.gz from cache at /Users/anam/.allennlp/cache/a6cc14fc8a3970ecd7dd29cfdb352b0481f4b253eddd52e2b2a92e2a1ad0ca1b.933ed3c54ce1300985e2c0c124ee818dde430c718deca4e72a9a42ff857a4bdf
2022-02-25 18:36:51,667 - INFO - allennlp.models.archival - extracting archive file /Users/anam/.allennlp/cache/a6cc14fc8a3970ecd7dd29cfdb352b0481f4b253eddd52e2b2a92e2a1ad0ca1b.933ed3c54ce1300985e2c0c124ee818dde430c718deca4e72a9a42ff857a4bdf to temp dir /var/folders/qr/8__6lqs525vbb3xk4c52jhxc0000gp/T/tmpua2mkd

In [26]:
from allennlp.data.tokenizers.spacy_tokenizer import SpacyTokenizer

inputs = {"sentence": "this movie is 10 / 10."}
interpretation = interpreter.saliency_interpret_from_json(inputs)

tokenized_sentence = SpacyTokenizer().tokenize(inputs["sentence"])

sentence_attribution = zip(tokenized_sentence, interpretation["instance_1"]["grad_input_1"])

for word, grad in sentence_attribution: 
    print (word, grad)

this 0.004374315073110228
movie 0.01883447832886506
is 0.040131830169589024
10 0.33293188191427076
/ 0.3059516655881175
10 0.2592315336969585
. 0.03854428921039927


### Option 1 

https://github.com/successar/instance_attributions_NLP

Use allennlp to first learn a classifier and then calculate different instance attribution methods on it

The scripts which calculate the different instance attributions are in this directory: https://github.com/successar/instance_attributions_NLP/tree/master/influence_info/influencers

### Option 2

https://guide.allennlp.org/interpret#5

## Combining Word and Instance Attribution 

Apply feature attribution on top of instance attribution.

https://github.com/successar/instance_attributions_NLP/tree/master/influence_info/influencers

## Contrastive Edits

https://github.com/allenai/mice