# Assess predictions on Stanford Question Answering Dataset (SQuAD) with a huggingface question answering model

This notebook demonstrates the use of the `responsibleai` API to assess a huggingface question answering model on the SQuAD dataset (see https://huggingface.co/datasets/squad for more information about the dataset). It walks through the API calls necessary to create a widget with model analysis insights, then guides a visual analysis of the model.

* [Launch Responsible AI Toolbox](#Launch-Responsible-AI-Toolbox)
    * [Load Model and Data](#Load-Model-and-Data)
    * [Create Model and Data Insights](#Create-Model-and-Data-Insights)

## Launch Responsible AI Toolbox

The following section examines the code necessary to create datasets and a model. It then generates insights using the `responsibleai` API that can be visually analyzed.

### Prepare

To run this notebook, we need to install the following packages:

```
raiutils
raiwidgets
datasets
transformers
responsibleai_text
torch
```

Run the following command to load the spacy pipeline:

```bash
python -m spacy download en_core_web_sm
```

In [1]:
# Update jupyter and ipywidgets
!pip install -U jupyter ipywidgets

# Run this if needed
!pip install raiutils raiwidgets datasets transformers responsibleai_text torch

# Load the spacy pipeline
!python -m spacy download en_core_web_sm

Collecting ipykernel (from jupyter)
  Using cached ipykernel-6.29.4-py3-none-any.whl.metadata (6.3 kB)
Using cached ipykernel-6.29.4-py3-none-any.whl (117 kB)
Installing collected packages: ipykernel
  Attempting uninstall: ipykernel
    Found existing installation: ipykernel 6.8.0
    Uninstalling ipykernel-6.8.0:
      Successfully uninstalled ipykernel-6.8.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
responsibleai 0.34.1 requires ipykernel<=6.8.0, but you have ipykernel 6.29.4 which is incompatible.[0m[31m
[0mSuccessfully installed ipykernel-6.29.4
Collecting ipykernel<=6.8.0 (from responsibleai==0.34.1->raiwidgets)
  Using cached ipykernel-6.8.0-py3-none-any.whl.metadata (2.0 kB)
Using cached ipykernel-6.8.0-py3-none-any.whl (128 kB)
Installing collected packages: ipykernel
  Attempting uninstall: ipykernel
    Found existing installation: ipyk

### Load Model and Data
*The following section can be skipped. It loads a dataset and trains a model for illustrative purposes.*

First we import all necessary dependencies

In [2]:
import datasets
import pandas as pd
from transformers import pipeline


Next we load the SQuAD dataset from huggingface datasets

In [3]:
dataset = datasets.load_dataset("squad", split="train")
dataset

Dataset({
    features: ['id', 'title', 'context', 'question', 'answers'],
    num_rows: 87599
})

Reformat the dataset to be a pandas dataframe with three columns: context, questions and answers

In [4]:
questions = []
context = []
answers = []
for row in dataset:
    context.append(row['context'])
    questions.append(row['question'])
    answers.append(row['answers']['text'][0])

In [5]:
data = pd.DataFrame({'context': context, 'questions': questions, 'answers': answers})
data = data.sample(frac=1.0, random_state=42).reset_index(drop=True)
data.head()

Unnamed: 0,context,questions,answers
0,The world's first institution of technology or...,What year was the Banská Akadémia founded?,1735
1,The standard specifies how speed ratings shoul...,What is another speed that can also be reporte...,SOS-based speed
2,The most impressive and famous of Sumerian bui...,Where were the use of advanced materials and t...,Sumerian temples and palaces
3,Ann Arbor has a council-manager form of govern...,Who is elected every even numbered year?,mayor
4,"Shortly before his death, when he was already ...",What was the purpose of top secret ICBM commit...,decide on the feasibility of building an ICBM ...


Fetch a huggingface question answering model

In [6]:
# load the question-answering model
pipeline_model = pipeline('question-answering')
test_size = 5

train_data = data
test_data = data[:test_size]

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

See an example of the model's predictions

In [7]:
def get_answer(dataset, idx):
    model_output = pipeline_model(question=dataset['questions'][idx], 
                                  context=dataset['context'][idx])
    pred = model_output['answer']
    return pred

def check_answer(dataset, idx):
    pred = get_answer(dataset, idx)
    print('Question  : ', dataset['questions'][idx])
    print('Answer    : ', dataset['answers'][idx])
    print('Predicted : ', pred)
    print('Correct   : ', pred == dataset['answers'][idx])

check_answer(test_data, 0)


Question  :  What year was the Banská Akadémia founded?
Answer    :  1735
Predicted :  1735
Correct   :  True


### Create Model and Data Insights

In [8]:
from responsibleai_text import RAITextInsights, ModelTask
from raiwidgets import ResponsibleAIDashboard

Dataset download attempt 1 of 4


To use Responsible AI Dashboard, initialize a RAITextInsights object upon which different components can be loaded.

RAITextInsights accepts the model, the test dataset, the classes and the task type as its arguments.

In [9]:
rai_insights = RAITextInsights(pipeline_model, test_data, "answers",
                               task_type=ModelTask.QUESTION_ANSWERING)

5it [00:01,  4.86it/s]


Add the components of the toolbox for model assessment.

In [10]:
rai_insights.error_analysis.add()
rai_insights.explainer.add()

Once all the desired components have been loaded, compute insights on the test set.

In [11]:
rai_insights.compute()

  0%|          | 0/498 [00:00<?, ?it/s]

PartitionExplainer explainer:  20%|██        | 1/5 [00:00<?, ?it/s]

  0%|          | 0/498 [00:00<?, ?it/s]

PartitionExplainer explainer:  60%|██████    | 3/5 [02:00<01:10, 35.14s/it]

  0%|          | 0/498 [00:00<?, ?it/s]

PartitionExplainer explainer:  80%|████████  | 4/5 [02:47<00:40, 40.15s/it]

  0%|          | 0/498 [00:00<?, ?it/s]

PartitionExplainer explainer: 100%|██████████| 5/5 [03:44<00:00, 46.69s/it]

  0%|          | 0/498 [00:00<?, ?it/s]

PartitionExplainer explainer: 6it [04:37, 55.46s/it]                       


  0%|          | 0/498 [00:00<?, ?it/s]

PartitionExplainer explainer:  20%|██        | 1/5 [00:00<?, ?it/s]

  0%|          | 0/498 [00:00<?, ?it/s]

PartitionExplainer explainer:  60%|██████    | 3/5 [01:56<01:10, 35.11s/it]

  0%|          | 0/498 [00:00<?, ?it/s]

PartitionExplainer explainer:  80%|████████  | 4/5 [02:43<00:39, 39.84s/it]

  0%|          | 0/498 [00:00<?, ?it/s]

PartitionExplainer explainer: 100%|██████████| 5/5 [03:40<00:00, 46.29s/it]

  0%|          | 0/498 [00:00<?, ?it/s]

PartitionExplainer explainer: 6it [04:32, 54.56s/it]                       


Error Analysis
Current Status: Generating error analysis reports.
Current Status: Finished generating error analysis reports.
Time taken: 0.0 min 0.2229323060000752 sec


Finally, visualize and explore the model insights. Use the resulting widget or follow the link to view this in a new tab.

In [12]:
ResponsibleAIDashboard(rai_insights)



ResponsibleAI started at http://localhost:8704


<raiwidgets.responsibleai_dashboard.ResponsibleAIDashboard at 0x7fbaba0163b0>