# Transformers Interpret NER Example

### Setup

In [1]:
from transformers import AutoModelForTokenClassification, AutoTokenizer

  from .autonotebook import tqdm as notebook_tqdm


Import NER model

In [2]:
model_path = 'dslim/bert-base-NER'
model = AutoModelForTokenClassification.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

We can explored the possible entity tags that this model classifies each token into:

In [3]:
model.config.id2label

{0: 'O',
 1: 'B-MISC',
 2: 'I-MISC',
 3: 'B-PER',
 4: 'I-PER',
 5: 'B-ORG',
 6: 'I-ORG',
 7: 'B-LOC',
 8: 'I-LOC'}


Import the **TokenClassificationExplainer** from the *transformers interpret* package. 

In [4]:
from transformers_interpret import TokenClassificationExplainer
from pprint import pprint

Create an instance of the explainer providing the model and the tokenizer

In [5]:
ner_explainer = TokenClassificationExplainer(model=model, tokenizer=tokenizer)

### Basic usage

Call the explainer to generate the attributions. In the NER task, for each token of the sentence, the predicted class is considered as the target and the attributions of the rest of the tokens are calculated. This can take up to one or two minutes. In the following sections we will see how to limit the computation of the attributions to only certain tokens or only certain NER labels.

In [6]:
sample_text = "We visited Paris last weekend, where Emmanuel Macron lives."

In [7]:
attributions = ner_explainer(sample_text)

The explainer returns a dictionary with every token and its attribtuions. The `predicted_class_names` property provides the predicted classes of the last input provided to the explainer.

In [8]:
attributions

{'[CLS]': [('[CLS]', 0.0),
  ('We', -0.2534486578565234),
  ('visited', 0.4686220531823899),
  ('Paris', -0.5552067191430367),
  ('last', 0.17618330045106978),
  ('weekend', 0.03978536790388006),
  (',', 0.12149553464903377),
  ('where', 0.4087050862938605),
  ('Emmanuel', -0.024215167246120547),
  ('Mac', 0.045593419168988965),
  ('##ron', 0.29089222046009366),
  ('lives', 0.0991196198708179),
  ('.', 0.31043531950674064),
  ('[SEP]', 0.0)],
 'We': [('[CLS]', 0.0),
  ('We', 0.8825076513969251),
  ('visited', -0.2027764560597029),
  ('Paris', -0.35651725240066356),
  ('last', -0.05024148371267044),
  ('weekend', -0.12606631061497517),
  (',', -0.08775108476325028),
  ('where', 0.10509517493725025),
  ('Emmanuel', -0.056409170737236135),
  ('Mac', -0.02046001774405784),
  ('##ron', 0.10371176632276123),
  ('lives', 0.003340920088968598),
  ('.', -0.037780070332314106),
  ('[SEP]', 0.0)],
 'visited': [('[CLS]', 0.0),
  ('We', 0.029099424791891152),
  ('visited', 0.8530576583802609),
  ('

**Note the alogirthm used to calcualte attributions are Layer Integreated Gradients to read more about them click [here](https://captum.ai/docs/algorithms)**

####  Visualizing explanations

With a single call to the `visualize()` method we get a nice inline display of what inputs are causing the activations to fire that led to classifying each of the tokens into a particular class. 

In [9]:
html = ner_explainer.visualize()

True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
O,O (1.00),[CLS],1.13,"[CLS] We visited Paris last weekend , where Emmanuel Mac ##ron lives . [SEP]"
,,,,
O,O (1.00),We,0.16,"[CLS] We visited Paris last weekend , where Emmanuel Mac ##ron lives . [SEP]"
,,,,
O,O (1.00),visited,1.15,"[CLS] We visited Paris last weekend , where Emmanuel Mac ##ron lives . [SEP]"
,,,,
B-LOC,B-LOC (1.00),Paris,1.03,"[CLS] We visited Paris last weekend , where Emmanuel Mac ##ron lives . [SEP]"
,,,,
O,O (1.00),last,1.46,"[CLS] We visited Paris last weekend , where Emmanuel Mac ##ron lives . [SEP]"
,,,,


### Ignore indexes

To save computation time, we can indicate a list of token indexes that we want to ignore. The explainer will not compute explanations for these tokens, although attributions of these tokens will be calculated to explain the predictions over other tokens.

In [10]:
attributions_2 = ner_explainer(sample_text, ignored_indexes=[0, 1, 2, 11, 12, 13, 14])

When calling *visualize()*, we can see how the tokens corresponding to the ignored indexes do not appear.

In [11]:
html = ner_explainer.visualize()

True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
B-LOC,B-LOC (1.00),Paris,1.03,"[CLS] We visited Paris last weekend , where Emmanuel Mac ##ron lives . [SEP]"
,,,,
O,O (1.00),last,1.46,"[CLS] We visited Paris last weekend , where Emmanuel Mac ##ron lives . [SEP]"
,,,,
O,O (1.00),weekend,1.97,"[CLS] We visited Paris last weekend , where Emmanuel Mac ##ron lives . [SEP]"
,,,,
O,O (1.00),",",2.23,"[CLS] We visited Paris last weekend , where Emmanuel Mac ##ron lives . [SEP]"
,,,,
O,O (1.00),where,1.94,"[CLS] We visited Paris last weekend , where Emmanuel Mac ##ron lives . [SEP]"
,,,,


### Ignore labels

In a similar way, we can also tell the explainer to ignore certain labels, e.g. we might not be interested in seeing the explanations of those tokens that are classified as *'O'*.

In [12]:
attributions_3 = ner_explainer(sample_text, ignored_labels=['O'])

In [13]:
attributions_3

{'Paris': [('[CLS]', 0.0),
  ('We', -0.026176158026416003),
  ('visited', 0.1289424214695528),
  ('Paris', 0.9820248557100411),
  ('last', -0.13107116600328042),
  ('weekend', 0.010904143358261256),
  (',', 0.011178508562829472),
  ('where', 0.023509729193194558),
  ('Emmanuel', 0.009830564279598433),
  ('Mac', 0.00034993661827931806),
  ('##ron', 0.014679480802688051),
  ('lives', 0.00523146911109651),
  ('.', 9.093164937093907e-05),
  ('[SEP]', 0.0)],
 'Emmanuel': [('[CLS]', 0.0),
  ('We', -0.02795400955500282),
  ('visited', -0.019536101884928644),
  ('Paris', 0.016961229547365956),
  ('last', 0.02993168457599209),
  ('weekend', -0.0005594575778221615),
  (',', -0.0028076008557282677),
  ('where', 0.4016979194311053),
  ('Emmanuel', 0.9076506398659525),
  ('Mac', -0.09408993595706144),
  ('##ron', -0.04634243847125355),
  ('lives', 0.02766440864687646),
  ('.', -0.026236800991730185),
  ('[SEP]', 0.0)],
 'Mac': [('[CLS]', 0.0),
  ('We', -0.023946141938553923),
  ('visited', -0.02314

In [14]:
html = ner_explainer.visualize()

True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
B-LOC,B-LOC (1.00),Paris,1.03,"[CLS] We visited Paris last weekend , where Emmanuel Mac ##ron lives . [SEP]"
,,,,
B-PER,B-PER (1.00),Emmanuel,1.17,"[CLS] We visited Paris last weekend , where Emmanuel Mac ##ron lives . [SEP]"
,,,,
I-PER,I-PER (1.00),Mac,1.27,"[CLS] We visited Paris last weekend , where Emmanuel Mac ##ron lives . [SEP]"
,,,,
I-PER,I-PER (1.00),##ron,1.57,"[CLS] We visited Paris last weekend , where Emmanuel Mac ##ron lives . [SEP]"
,,,,


The *ignored_labels* parameter can be combined with the *ignored_indexes* parameter.

In [19]:
attributions_4 = ner_explainer(sample_text, ignored_indexes=[2, 11, 12, 13, 14], ignored_labels=['O', 'B-PER'])

In [20]:
html = ner_explainer.visualize()

True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
B-LOC,B-LOC (1.00),Paris,1.03,"[CLS] We visited Paris last weekend , where Emmanuel Mac ##ron lives . [SEP]"
,,,,
I-PER,I-PER (1.00),Mac,1.27,"[CLS] We visited Paris last weekend , where Emmanuel Mac ##ron lives . [SEP]"
,,,,
I-PER,I-PER (1.00),##ron,1.57,"[CLS] We visited Paris last weekend , where Emmanuel Mac ##ron lives . [SEP]"
,,,,


As we can see, the tokens are ignored based on the predicted class, which is why *Emmanuel*, with predicted label *B-PER*, is ignored (even though its true label would be *I-PER*).