# Introduction to *ferret*

Hi there! This notebook will guide you through the basic functionalities of *ferret*, using as an example the Sentiment Classification task.

Specifically, you will see how to:

- load a model from the Hugging Face Hub into our `Benchmark` client interface;
- use the class to explain a text query using all the supported post-hoc feature attribution methods;
- visualize the explanations in tabular format;
- **evaluate** all the explanations over the metrics (faithfulness and plausibility).

Scroll over to know more 😉

In [1]:
%load_ext autoreload
%autoreload 2

In [6]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from ferret import Benchmark
import numpy as np
import torch

For the purpose of this tutorial, we will use the sentiment classification model `cardiffnlp/twitter-xlm-roberta-base-sentiment`.

In [7]:
device = "cuda:0" if torch.cuda.is_available() else "cpu"
name = "cardiffnlp/twitter-xlm-roberta-base-sentiment"
model = AutoModelForSequenceClassification.from_pretrained(name).to(device)
tokenizer = AutoTokenizer.from_pretrained(name)

## Explain a single instance

The fastest way to get started with *ferret* is using the `Benchmark` interface class. Starting from v0.5.0, the method accepts a `task_name` parameter to support different NLP tasks. By default, `task_name` is set to `text-classification`. 

In [9]:
bench = Benchmark(model, tokenizer)

Since we are considering a text classification task, let's use the benchmark object to assign class probabilities.

In [11]:
text = "I love your style!"
scores = bench.score(text)
print(scores)

{'negative': 0.012691126205027103, 'neutral': 0.061643846333026886, 'positive': 0.925665020942688}


Extracting post-hoc explanations with all the supported methods and standard parameters is as easy as:

In [13]:
explanations = bench.explain(text, target="positive")

Explainer:  17%|█████████████████▊                                                                                         | 1/6 [00:02<00:14,  2.94s/it]
Batch:   0%|                                                                                                                       | 0/7 [00:00<?, ?it/s][A
Batch:  86%|███████████████████████████████████████████████████████████████████████████████████████████████▏               | 6/7 [00:00<00:00, 58.18it/s][A
                                                                                                                                                         [A

Note that we are explicitly asking for the attribution scores relative to the `positive` class. Roughly, a positive score indicates that the token *pushed* the prediction toward that class, otherwise a negative one *pulled it away* from it.

In [14]:
explanations

[Explanation(text=['I love your style!'], tokens=['<s>', '▁I', '▁love', '▁your', '▁style', '!', '</s>'], scores=array([ 0.        , -0.05859814,  0.42798148,  0.08728391,  0.19652878,
         0.2296077 ,  0.        ]), explainer='Partition SHAP', target_pos_idx=2, helper_type='sequence-classification', target_token_pos_idx=None, target='positive', target_token=None),
 Explanation(text=['I love your style!'], tokens=['<s>', '▁I', '▁love', '▁your', '▁style', '!', '</s>'], scores=array([0.        , 0.046511  , 0.42884937, 0.10180191, 0.20548557,
        0.21735215, 0.        ]), explainer='LIME', target_pos_idx=2, helper_type='sequence-classification', target_token_pos_idx=None, target='positive', target_token=None),
 Explanation(text=['I love your style!'], tokens=['<s>', '▁I', '▁love', '▁your', '▁style', '!', '</s>'], scores=array([0.10165081, 0.07546063, 0.16653863, 0.11737087, 0.3520872 ,
        0.0746433 , 0.11224855], dtype=float32), explainer='Gradient', target_pos_idx=2, helper_

Let's visualize the results.

In [16]:
bench.show_table(explanations)

Unnamed: 0,▁I,▁love,▁your,▁style,!
Partition SHAP,-0.06,0.43,0.09,0.2,0.23
LIME,0.05,0.43,0.1,0.21,0.22
Gradient,0.08,0.17,0.12,0.35,0.07
Gradient (x Input),-0.02,0.08,-0.32,0.22,0.08
Integrated Gradient,-0.03,-0.04,-0.15,0.3,-0.14
Integrated Gradient (x Input),0.01,0.23,0.08,0.03,0.66


## Evaluate explanation of a single instance

Evaluate explanations with all the supported evaluators is straightforward. Remember to specify the `target` parameter to match the one used during the explanation!

In [17]:
explanation_evaluations = bench.evaluate_explanations(explanations, target="positive")

                                                                                                                                                         

Again, we can look at the results in a tabular format.

In [18]:
bench.show_evaluation_table(explanation_evaluations)

Unnamed: 0_level_0,aopc_compr,aopc_suff,taucorr_loo
Explainer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Partition SHAP,0.49,0.28,0.6
LIME,0.51,0.23,0.6
Gradient,0.43,0.22,0.2
Gradient (x Input),0.36,0.39,0.8
Integrated Gradient,0.03,0.61,0.4
Integrated Gradient (x Input),0.44,0.23,0.2


Area Over the Perturbation Curve (AOPC) Comprehensiveness (aopc_compr), AOPC Sufficiency (aopc_suff) and Correlation with Leave-One-Out scores (taucorr_loo) are three measures of faithfulness.

**AOPC Comprehensiveness**. Comprehensiveness measures the drop in the model probability if the relevant tokens of the explanations are removed. We measure comprehensiveness via the Area Over the Perturbation Curve by progressively considering the most $k$ important tokens, with $k$ from 1 to #tokens (as default) and then averaging the result. The higher the value, the more the explainer is able to select the relevant tokens for the prediction.

**AOPC Sufficiency**. Sufficiency captures if the tokens in the explanation are sufficient for the model to make the prediction. As for comprehensiveness, we use the AOPC score.

**Correlation with Leave-One-Out scores**. We first compute the leave-one-out scores by computing the prediction difference when one feature at the time is omitted. We then measure the Spearman correlation with the explanations.

### Plausibility

We can also specify a human rationale and evaluate plausibility.

In [19]:
explanation_evaluations = bench.evaluate_explanations(
    explanations,
    target=0,
    human_rationale=[0, 1, 0, 0, 0],
    top_k_rationale = 1
)
bench.show_evaluation_table(explanation_evaluations)

                                                                                                                                                         

Unnamed: 0_level_0,aopc_compr,aopc_suff,taucorr_loo,auprc_plau,token_f1_plau,token_iou_plau
Explainer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Partition SHAP,0.49,0.28,0.6,1.0,1.0,1.0
LIME,0.51,0.23,0.6,1.0,1.0,1.0
Gradient,0.43,0.22,0.2,0.25,0.0,0.0
Gradient (x Input),0.36,0.39,0.8,0.25,0.0,0.0
Integrated Gradient,0.03,0.61,0.4,0.1,0.0,0.0
Integrated Gradient (x Input),0.44,0.23,0.2,0.25,0.0,0.0


Plausibility evaluates how well the explanation agree with human rationale. We evaluate plausibility via 
Area Under the Precision Recall curve (AUPRC) (auprc_plau),  token-level f1-score (token_f1_plau) and average Intersection-Over-Union (IOU) at the token level (token_iou_plau).


**Area Under the Precision Recall curve (AUPRC)** is computed by sweeping a threshold over token scores.

Token-level f1-score and the average Intersection-Over-Unionconsider discrete rationales.
We derive a discrete rationale by taking the top-k values. K in the example is set to 1. * 

**Token-level f1-score** is the token-level F1 scores derived from the token-level precision and recall. 
**Intersection-Over-Union (IOU)** is the size of the overlap of the tokens they cover divided by the size of their union.

*When the set of human rationales for the dataset is available, K is set as the average rationale length (as in ERASER)

# Loading pre-computed explanations from XAI Datasets

We can directly load a dataset with rationales using our Dataset API -- since we use Hugging Face's [datasets](https://huggingface.co/datasets), you will download the dataset just once and cache it 🚀

In [20]:
hatexdata = bench.load_dataset("hatexplain")

Downloading data: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 1.68M/1.68M [00:01<00:00, 1.43MB/s]
Downloading data: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 212k/212k [00:00<00:00, 468kB/s]
Downloading data: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 212k/212k [00:00<00:00, 733kB/s]
Generating train split: 100%|███████████████████████████████████████████████████████████████████████████| 15383/15383 [00:00<00:00, 195045.28 examples/s]
Generating validation split: 100%|█████████████████████████████████████████████████████████████████████████| 1922/1922 [00:00<00:00, 98394.39 examples/s]
Generating test split: 100%|██████████████████████████████████████████████████████████████████████████████| 1924/1924 [00:00<00:00, 114339.33 examples/s]


Here we show an example of text and its human rationales.

In [21]:
hatexdata[2]["text"], hatexdata[2]["rationale"]

('<user> <user> why are you repeating yourself are you a little retarded',
 [[], [], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]])

# Bonus!

There is more! You can:

- use *ferret* built-in explainers to have fine-grained control over their *init* and *call* parameters (please refer to our [doc](https://ferret.readthedocs.io/en/latest/?version=latest) to know more)
- compute individual faithfulness and plausibility metrics over explanations

**Interface to individual explainers**

You can also use individual explainers using an object oriented interface.

In [22]:
from ferret import SHAPExplainer, LIMEExplainer

In [28]:
exp = LIMEExplainer(model, tokenizer)
exp(text, target="positive")

No helper provided. Using default 'text-classification' helper.
                                                                                                                                                         

Explanation(text=['I love your style!'], tokens=['<s>', '▁I', '▁love', '▁your', '▁style', '!', '</s>'], scores=array([0.        , 0.05963124, 0.13740776, 0.0458926 , 0.1224144 ,
       0.17161505, 0.        ]), explainer='LIME', target_pos_idx=2, helper_type='sequence-classification', target_token_pos_idx=None, target='positive', target_token=None)

In [29]:
exp = SHAPExplainer(model, tokenizer)
exp(text, target="positive")

No helper provided. Using default 'text-classification' helper.


Explanation(text=['I love your style!'], tokens=['<s>', '▁I', '▁love', '▁your', '▁style', '!', '</s>'], scores=array([ 0.        , -0.0400334 ,  0.29239076,  0.0596311 ,  0.13426562,
        0.15686466,  0.        ]), explainer='Partition SHAP', target_pos_idx=2, helper_type='sequence-classification', target_token_pos_idx=None, target='positive', target_token=None)

In [30]:
exp = SHAPExplainer(model, tokenizer)
e = exp(text, target="positive")

In [33]:
bench.show_table([e])

Unnamed: 0,▁I,▁love,▁your,▁style,!
Partition SHAP,-0.04,0.29,0.06,0.13,0.16


and evaluate an individual evaluation measure:

In [36]:
from ferret import AOPC_Comprehensiveness_Evaluation

aopc_compr_eval = AOPC_Comprehensiveness_Evaluation(model, tokenizer, task_name="text-classification")

In [37]:
aopc_compr_eval.compute_evaluation(e, target="positive")

EvaluationMetricOutput(metric={'NAME': 'aopc_comprehensiveness', 'SHORT_NAME': 'aopc_compr', 'MIN_VALUE': -1.0, 'MAX_VALUE': 1.0, 'LOWER_IS_BETTER': False, 'METRIC_FAMILY': <EvaluationMetricFamily.FAITHFULNESS: 'faithfulness'>}, value=0.48677334)

In [38]:
ev = bench.evaluate_explanation(e, target="positive")
bench.show_evaluation_table([ev])

                                                                                                                                                         

Unnamed: 0_level_0,aopc_compr,aopc_suff,taucorr_loo
Explainer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Partition SHAP,0.49,0.28,0.6
