[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AkhilMunjaal/explainable-ai/blob/main/ferret_xai.ipynb)


# Introduction to *ferret*

Hi there! This notebook will guide you through the basic functionalities of *ferret*, using as an example the Sentiment Classification task.

Specifically, you will see how to:

- load a model from the Hugging Face Hub into our `Benchmark` client interface;
- use the class to explain a text query using all the supported post-hoc feature attribution methods;
- visualize the explanations in tabular format;
- **evaluate** all the explanations over the metrics (faithfulness and plausibility).



In [1]:
!pip install ferret-xai


Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com


In [2]:
%set_env TOKENIZERS_PARALLELISM=FALSE

env: TOKENIZERS_PARALLELISM=FALSE


In [3]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from ferret import Benchmark
import numpy as np

For the purpose of this tutorial, we will use the sentiment classification model __[`cardiffnlp/twitter-xlm-roberta-base-sentiment`](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment)__ .

It has three classes,
 - 0 - Negative
 - 1 - Neutral
 - 2 - Positive

In [4]:
name = "cardiffnlp/twitter-xlm-roberta-base-sentiment"
model = AutoModelForSequenceClassification.from_pretrained(name)
tokenizer = AutoTokenizer.from_pretrained(name)

## Explain a single instance

The fastest way to get started with *ferret* is using the `Benchmark` interface class.

In [5]:
bench = Benchmark(model, tokenizer)

Extracting post-hoc explanations with all the supported methods and standard parameters is as easy as:

In [6]:
query = "I love your style!"
print(bench.score(query))
explanations = bench.explain(query, target=2)

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


{'negative': 0.012691131792962551, 'neutral': 0.061643872410058975, 'positive': 0.925665020942688}


Explainer:   0%|          | 0/6 [00:00<?, ?it/s]

Let's visualize the results.

 - Red -> Higher contribution towards the prediction
 - Blue -> higher contribution away from the predictions

In [7]:
bench.show_table(explanations)


Token,▁I,▁love,▁your,▁style,!
Partition SHAP,0.01,0.42,0.28,0.22,0.06
LIME,0.04,0.39,0.05,0.18,0.34
Gradient,0.08,0.17,0.12,0.35,0.07
Gradient (x Input),-0.02,0.08,-0.32,0.22,0.08
Integrated Gradient,0.12,-0.04,-0.23,-0.16,0.27
Integrated Gradient (x Input),0.01,0.23,0.08,0.03,0.66


## Evaluate explanation of a single instance

Evaluate explanations with all the supported evaluators is straightforward. Remember to specify the `target` parameter to match the one used during the explanation!

In [8]:
explanation_evaluations = bench.evaluate_explanations(explanations, target=2)

Explanation eval:   0%|          | 0/6 [00:00<?, ?it/s]

## Faithfulness




**AOPC Comprehensiveness**. Comprehensiveness measures the drop in the model probability if the relevant tokens of the explanations are removed. 

**AOPC Sufficiency**. Sufficiency captures if the subset of  tokens in the explanation are good enough for the model to make the prediction. 

**Correlation with Leave-One-Out scores**. We first compute the leave-one-out scores by computing the prediction difference when one feature at the time is omitted. We then measure the Spearman correlation with the explanations.

In [9]:
bench.show_evaluation_table(explanation_evaluations)

Unnamed: 0,aopc_compr,aopc_suff,taucorr_loo
Partition SHAP,0.41,0.2,0.4
LIME,0.51,0.23,0.6
Gradient,0.43,0.22,0.2
Gradient (x Input),0.36,0.39,0.8
Integrated Gradient,0.06,0.57,0.2
Integrated Gradient (x Input),0.44,0.23,0.2


### Plausibility

We can also specify a human rationale and evaluate plausibility.

Plausibility evaluates how well the explanation agree with human rationale. 


- **Area Under the Precision Recall curve (AUPRC)** is computed by sweeping a threshold over token scores.
We derive a discrete rationale by taking the top-k values. K in the example is set to 1.

- **Token-level f1-score** is the token-level F1 scores derived from the token-level precision and recall. 
- **Intersection-Over-Union (IOU)** is the size of the overlap of the tokens they cover divided by the size of their union.



In [10]:
query

'I love your style!'

In [11]:
bench.show_table(explanations)

Token,▁I,▁love,▁your,▁style,!
Partition SHAP,0.01,0.42,0.28,0.22,0.06
LIME,0.04,0.39,0.05,0.18,0.34
Gradient,0.08,0.17,0.12,0.35,0.07
Gradient (x Input),-0.02,0.08,-0.32,0.22,0.08
Integrated Gradient,0.12,-0.04,-0.23,-0.16,0.27
Integrated Gradient (x Input),0.01,0.23,0.08,0.03,0.66


In [12]:
explanation_evaluations = bench.evaluate_explanations(
    explanations,
    target=2,
    human_rationale=[0, 1, 0, 0, 0],
    top_k_rationale = 1
)
bench.show_evaluation_table(explanation_evaluations)

Explanation eval:   0%|          | 0/6 [00:00<?, ?it/s]

Unnamed: 0,aopc_compr,aopc_suff,taucorr_loo,auprc_plau,token_f1_plau,token_iou_plau
Partition SHAP,0.41,0.2,0.4,1.0,1.0,1.0
LIME,0.51,0.23,0.6,1.0,1.0,1.0
Gradient,0.43,0.22,0.2,0.25,0.0,0.0
Gradient (x Input),0.36,0.39,0.8,0.25,0.0,0.0
Integrated Gradient,0.06,0.57,0.2,0.1,0.0,0.0
Integrated Gradient (x Input),0.44,0.23,0.2,0.25,0.0,0.0


In [13]:
query = "I hate your style!"

In [14]:
explanations = bench.explain(query, target=0)

Explainer:   0%|          | 0/6 [00:00<?, ?it/s]

In [15]:
bench.score(query)

{'negative': 0.9444717168807983,
 'neutral': 0.033811308443546295,
 'positive': 0.021716950461268425}

In [16]:
bench.show_table(explanations)


Token,▁I,▁hate,▁your,▁style,!
Partition SHAP,0.08,0.76,-0.03,-0.12,-0.01
LIME,0.06,0.81,0.03,-0.07,-0.02
Gradient,0.07,0.27,0.14,0.22,0.06
Gradient (x Input),0.04,-0.33,0.09,0.3,-0.06
Integrated Gradient,0.07,-0.16,-0.12,0.04,-0.03
Integrated Gradient (x Input),0.15,0.52,0.05,-0.09,-0.18


In [17]:
explanation_evaluations = bench.evaluate_explanations(
    explanations,
    target=0,
    human_rationale=[0, 1, 0, 0, 0],
    top_k_rationale = 1
)
bench.show_evaluation_table(explanation_evaluations)

Explanation eval:   0%|          | 0/6 [00:00<?, ?it/s]

Unnamed: 0,aopc_compr,aopc_suff,taucorr_loo,auprc_plau,token_f1_plau,token_iou_plau
Partition SHAP,0.85,0.24,0.8,1.0,1.0,1.0
LIME,0.82,0.16,1.0,1.0,1.0,1.0
Gradient,0.74,0.14,0.2,1.0,1.0,1.0
Gradient (x Input),0.07,0.68,-0.6,0.1,0.0,0.0
Integrated Gradient,0.02,0.7,-0.4,0.1,0.0,0.0
Integrated Gradient (x Input),0.82,0.16,0.8,1.0,1.0,1.0


In [18]:
query = "Great movie for a great nap!"

In [19]:
bench.score(query)

{'negative': 0.013735519722104073,
 'neutral': 0.06385015696287155,
 'positive': 0.9224143028259277}

In [20]:
explanations = bench.explain(query, target=2)
bench.show_table(explanations)


Explainer:   0%|          | 0/6 [00:00<?, ?it/s]

Token,▁Great,▁movie,▁for,▁a,▁great,▁nap,!
Partition SHAP,0.35,0.12,0.05,0.06,0.35,-0.0,0.05
LIME,0.24,0.09,0.09,0.03,0.27,-0.16,0.12
Gradient,0.12,0.17,0.06,0.04,0.14,0.23,0.05
Gradient (x Input),-0.11,-0.09,-0.08,0.03,0.03,0.11,-0.05
Integrated Gradient,0.09,-0.04,0.02,-0.2,0.25,-0.04,0.22
Integrated Gradient (x Input),0.02,-0.03,0.02,0.1,0.48,0.04,0.31


In [21]:
explanation_evaluations = bench.evaluate_explanations(
    explanations,
    target=2,
    human_rationale=[0, 0, 0, 0, 0,1,1],
    top_k_rationale = 3
)
bench.show_evaluation_table(explanation_evaluations)

Explanation eval:   0%|          | 0/6 [00:00<?, ?it/s]

Unnamed: 0,aopc_compr,aopc_suff,taucorr_loo,auprc_plau,token_f1_plau,token_iou_plau
Partition SHAP,0.41,0.09,0.43,0.15,0.0,0.0
LIME,0.48,0.09,0.52,0.2,0.4,0.25
Gradient,0.34,0.21,0.05,0.63,0.4,0.25
Gradient (x Input),-0.01,0.44,-0.81,0.65,0.4,0.25
Integrated Gradient,0.32,0.15,0.24,0.26,0.4,0.25
Integrated Gradient (x Input),0.12,0.11,-0.24,0.33,0.4,0.25


In [22]:
explanations = bench.explain(query, target=0)
bench.show_table(explanations)


Explainer:   0%|          | 0/6 [00:00<?, ?it/s]

Token,▁Great,▁movie,▁for,▁a,▁great,▁nap,!
Partition SHAP,-0.28,-0.13,-0.08,-0.08,-0.32,-0.07,-0.04
LIME,-0.25,0.04,-0.09,0.06,-0.32,-0.0,-0.24
Gradient,0.13,0.15,0.06,0.04,0.14,0.24,0.06
Gradient (x Input),0.02,0.12,-0.12,-0.16,-0.1,0.17,0.03
Integrated Gradient,-0.02,0.06,0.01,-0.03,0.04,0.12,-0.01
Integrated Gradient (x Input),0.12,0.09,0.05,-0.03,-0.39,-0.09,-0.23


# Datasets for XAI evaluation

Human annotated rationale and further examples on how to evaluate explanations are available in __[official ferret repo](https://github.com/g8a9/ferret/blob/main/examples/thermostat_integration_example.ipynb)__

