# Introduction to *ferret*

Hi there! This notebook will guide you through the basic functionalities of *ferret*, using as an example the Sentiment Classification task.

Specifically, you will see how to:

- load a model from the Hugging Face Hub into our `Benchmark` client interface;
- use the class to explain a text query using all the supported post-hoc feature attribution methods;
- visualize the explanations in tabular format;
- **evaluate** all the explanations over the metrics (faithfulness and plausibility).

Scroll over to know more 😉

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from ferret import Benchmark
import numpy as np

  from .autonotebook import tqdm as notebook_tqdm


For the purpose of this tutorial, we will use the sentiment classification model `cardiffnlp/twitter-xlm-roberta-base-sentiment`.

In [3]:
name = "cardiffnlp/twitter-xlm-roberta-base-sentiment"
model = AutoModelForSequenceClassification.from_pretrained(name)
tokenizer = AutoTokenizer.from_pretrained(name)

## Explain a single instance

The fastest way to get started with *ferret* is using the `Benchmark` interface class.

In [4]:
bench = Benchmark(model, tokenizer)

Extracting post-hoc explanations with all the supported methods and standard parameters is as easy as:

In [5]:
explanations = bench.explain("I love your style!", target=2)

Explainer:   0%|                                          | 0/6 [00:00<?, ?it/s]`return_all_scores` is now deprecated, use `top_k=1` if you want similar functionnality
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
                                                                                

In [6]:
explanations

[Explanation(text='I love your style!', tokens=['<s>', '▁I', '▁love', '▁your', '▁style', '!', '</s>'], scores=array([0.        , 0.01447297, 0.423284  , 0.28050641, 0.22077435,
        0.06096227, 0.        ]), explainer='Partition SHAP', target=2),
 Explanation(text='I love your style!', tokens=['<s>', '▁I', '▁love', '▁your', '▁style', '!', '</s>'], scores=array([-0.09891703, -0.07561425,  0.33465704,  0.17949801,  0.14378843,
         0.08339744, -0.0841278 ]), explainer='LIME', target=2),
 Explanation(text='I love your style!', tokens=['<s>', '▁I', '▁love', '▁your', '▁style', '!', '</s>'], scores=tensor([0.1017, 0.0755, 0.1665, 0.1174, 0.3521, 0.0746, 0.1122]), explainer='Gradient', target=2),
 Explanation(text='I love your style!', tokens=['<s>', '▁I', '▁love', '▁your', '▁style', '!', '</s>'], scores=tensor([ 0.1606, -0.0242,  0.0818, -0.3194,  0.2226,  0.0763,  0.1152]), explainer='Gradient (x Input)', target=2),
 Explanation(text='I love your style!', tokens=['<s>', '▁I', '▁love'

Let's visualize the results.

In [7]:
t = bench.show_table(explanations)
t

Token,▁I,▁love,▁your,▁style,!
Partition SHAP,0.01,0.42,0.28,0.22,0.06
LIME,-0.08,0.33,0.18,0.14,0.08
Gradient,0.08,0.17,0.12,0.35,0.07
Gradient (x Input),-0.02,0.08,-0.32,0.22,0.08
Integrated Gradient,0.17,0.2,0.24,-0.07,0.03
Integrated Gradient (x Input),0.15,0.09,0.12,0.22,0.41


## Evaluate explanation of a single instance

Evaluate explanations with all the supported evaluators is straightforward. Remember to specify the `target` parameter to match the one used during the explanation!

In [8]:
explanation_evaluations = bench.evaluate_explanations(explanations, target=2)

                                                                                

Again, we can look at the results in a tabular format.

In [9]:
bench.show_evaluation_table(explanation_evaluations)

Unnamed: 0,aopc_compr,aopc_suff,taucorr_loo
Partition SHAP,0.41,0.2,0.4
LIME,0.36,0.25,0.4
Gradient,0.43,0.22,0.2
Gradient (x Input),0.36,0.39,0.8
Integrated Gradient,0.29,0.27,-0.4
Integrated Gradient (x Input),0.33,0.26,0.0


Area Over the Perturbation Curve (AOPC) Comprehensiveness (aopc_compr), AOPC Sufficiency (aopc_suff) and Correlation with Leave-One-Out scores (taucorr_loo) are three measures of faithfulness.

**AOPC Comprehensiveness**. Comprehensiveness measures the drop in the model probability if the relevant tokens of the explanations are removed. We measure comprehensiveness via the Area Over the Perturbation Curve by progressively considering the most $k$ important tokens, with $k$ from 1 to #tokens (as default) and then averaging the result. The higher the value, the more the explainer is able to select the relevant tokens for the prediction.

**AOPC Sufficiency**. Sufficiency captures if the tokens in the explanation are sufficient for the model to make the prediction. As for comprehensiveness, we use the AOPC score.

**Correlation with Leave-One-Out scores**. We first compute the leave-one-out scores by computing the prediction difference when one feature at the time is omitted. We then measure the Spearman correlation with the explanations.

### Plausibility

We can also specify a human rationale and evaluate plausibility.

In [10]:
explanation_evaluations = bench.evaluate_explanations(
    explanations,
    target=0,
    human_rationale=[0, 1, 0, 0, 0],
    top_k_rationale = 1
)
bench.show_evaluation_table(explanation_evaluations)

                                                                                

Unnamed: 0,aopc_compr,aopc_suff,taucorr_loo,auprc_plau,token_f1_plau,token_iou_plau
Partition SHAP,-0.19,-0.09,-0.2,1.0,1.0,1.0
LIME,-0.15,-0.11,-0.2,1.0,1.0,1.0
Gradient,-0.19,-0.1,0.0,0.25,0.0,0.0
Gradient (x Input),-0.13,-0.18,-0.6,0.25,0.0,0.0
Integrated Gradient,-0.12,-0.12,0.2,0.25,0.0,0.0
Integrated Gradient (x Input),-0.16,-0.1,-0.2,0.1,0.0,0.0


Plausibility evaluates how well the explanation agree with human rationale. We evaluate plausibility via 
Area Under the Precision Recall curve (AUPRC) (auprc_plau),  token-level f1-score (token_f1_plau) and average Intersection-Over-Union (IOU) at the token level (token_iou_plau).


**Area Under the Precision Recall curve (AUPRC)** is computed by sweeping a threshold over token scores.

Token-level f1-score and the average Intersection-Over-Unionconsider discrete rationales.
We derive a discrete rationale by taking the top-k values. K in the example is set to 1. * 

**Token-level f1-score** is the token-level F1 scores derived from the token-level precision and recall. 
**Intersection-Over-Union (IOU)** is the size of the overlap of the tokens they cover divided by the size of their union.

*When the set of human rationales for the dataset is available, K is set as the average rationale length (as in ERASER)

# Evaluating explainers on a supported XAI Datasets

We can directly load a dataset with rationales using our Dataset API -- since we use Hugging Face's [datasets](https://huggingface.co/datasets), you will download the dataset just once and cache it 🚀

In [11]:
hatexdata = bench.load_dataset("hatexplain")

Reusing dataset hatexplain (/Users/eliana/.cache/huggingface/datasets/hatexplain/plain_text/1.0.0/df474d8d8667d89ef30649bf66e9c856ad8305bef4bc147e8e31cbdf1b8e0249)
100%|████████████████████████████████████████████| 3/3 [00:00<00:00, 120.58it/s]


Here we show an example of text and its human rationales.

In [12]:
hatexdata[2]["text"], hatexdata[2]["rationale"]

('<user> <user> why are you repeating yourself are you a little retarded',
 [[], [], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]])

We can compute evaluate explanations for a set of the samples of the dataset.

As a default, explanations and their evaluation is computed w.r.t. the predicted class. We can otherwise specify the target class via the parameter 'target'

In [13]:
# Compute and average evaluation scores one of the supported dataset
samples = np.arange(5)
sample_evaluations =  bench.evaluate_samples(hatexdata, samples)

explain:   0%|                                            | 0/5 [00:00<?, ?it/s]`return_all_scores` is now deprecated, use `top_k=1` if you want similar functionnality
explain:  20%|███████                            | 1/5 [02:41<10:46, 161.72s/it]`return_all_scores` is now deprecated, use `top_k=1` if you want similar functionnality
explain:  40%|██████████████                     | 2/5 [04:33<06:37, 132.39s/it]`return_all_scores` is now deprecated, use `top_k=1` if you want similar functionnality
explain:  60%|█████████████████████              | 3/5 [05:41<03:25, 102.81s/it]`return_all_scores` is now deprecated, use `top_k=1` if you want similar functionnality
explain:  80%|████████████████████████████       | 4/5 [07:54<01:54, 114.87s/it]`return_all_scores` is now deprecated, use `top_k=1` if you want similar functionnality
                                                                                

and visualize the evaluation results

In [14]:
bench.show_samples_evaluation_table(sample_evaluations)

Unnamed: 0,aopc_compr,aopc_suff,taucorr_loo,auprc_plau,token_f1_plau,token_iou_plau
Partition SHAP,0.520906,-0.049725,0.292903,0.803985,0.413793,0.26087
LIME,0.183729,0.060867,0.069774,0.76873,0.413793,0.26087
Gradient,0.412918,0.026478,0.114198,0.939195,0.482759,0.318182
Gradient (x Input),0.070884,0.263768,-0.041688,0.778192,0.413793,0.26087
Integrated Gradient,0.142813,0.128925,0.054109,0.668896,0.344828,0.208333
Integrated Gradient (x Input),0.133697,0.255439,0.030192,0.824997,0.413793,0.26087


# Bonus!

There is more! You can:

- use *ferret* built-in explainers to have fine-grained control over their *init* and *call* parameters (please refer to our [doc](https://ferret.readthedocs.io/en/latest/?version=latest) to know more)
- compute individual faithfulness and plausibility metrics over explanations

**Interface to individual explainers**

You can also use individual explainers using an object oriented interface.

In [15]:
from ferret import SHAPExplainer, LIMEExplainer

In [16]:
exp = LIMEExplainer(model, tokenizer)
exp("hello my friend")

Explanation(text='hello my friend', tokens=['<s>', '▁hell', 'o', '▁my', '▁friend', '</s>'], scores=[0.050474912847888576, 0.03263839024009319, 0.014845006777639913, -0.0036407287599287508, -0.023999428967105964, 0.06192408403365266], explainer='LIME', target=1)

In [17]:
exp = SHAPExplainer(model, tokenizer)
exp("hello my friend")

`return_all_scores` is now deprecated, use `top_k=1` if you want similar functionnality


Explanation(text='hello my friend', tokens=['<s>', '▁hell', 'o', '▁my', '▁friend', '</s>'], scores=array([ 0.        ,  0.00641441,  0.07140772, -0.01425122, -0.01147836,
        0.        ]), explainer='Partition SHAP', target=1)

In [18]:
exp = SHAPExplainer(model, tokenizer)
e = exp("I love your style!", target = 0)

In [19]:
bench.show_table([e])

Token,▁I,▁love,▁your,▁style,!
Partition SHAP,0.0,-0.13,-0.08,-0.09,-0.02


and evaluate an individual evaluation measure

In [20]:
from ferret import AOPC_Comprehensiveness_Evaluation
from ferret.evaluators import Model

aopc_compr_eval = AOPC_Comprehensiveness_Evaluation(Model(model), tokenizer)

In [21]:
aopc_compr_eval.compute_evaluation(e, target = 0)

Evaluation(name='aopc_compr', score=-0.0044906978)

In [22]:
ev = bench.evaluate_explanation(e, target = 0)
bench.show_evaluation_table([ev])

                                                                                

Unnamed: 0,aopc_compr,aopc_suff,taucorr_loo
Partition SHAP,-0.0,-0.32,0.4
