# Evidence classification for boolean questions

In this notebook, we look at the evidence classifation, which is a component in the Tydi pipeline which decides whether a boolean question should be answered `yes` or `no`.

## Preliminaries
We assume that the machine reading comprehension and the question type classifier components of the Tydi pipeline have already run, either through the integrated command line or the step-by-step process, both described [here](../../examples/boolqa/README.md)

First some setup.  The classifier will obtain its input from the `eval_predictions.json` file of the previous step.
Most of this setup is very similar to the setup for [mrc](../mrc/mrc.ipynb)

In [1]:
output_dir="out"
input_file="/dccstor/jsmc-nmt-01/bool/expts/toolkit/b/b21/qtc/eval_predictions.json"

from transformers import (
    AutoConfig,
    AutoModelForSequenceClassification,
    AutoTokenizer,
    DataCollatorWithPadding,
    HfArgumentParser,
    Trainer,
    TrainingArguments)
from transformers.trainer_utils import set_seed
from oneqa.boolqa.processors.postprocessors.nway import NWayClassifierPostProcessor
from oneqa.boolqa.processors.preprocessors.default import NWayPreProcessor

from examples.boolqa.mrc2dataset  import create_dataset_from_run_mrc_output, create_dataset_from_json_str
import pandas as pd

seed = 42
set_seed(seed)

training_args = TrainingArguments(
    output_dir=output_dir,
    overwrite_output_dir=True,
    do_train=False,
    do_eval=True,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    gradient_accumulation_steps=1,
    num_train_epochs=1,
    evaluation_strategy='no',
    learning_rate=4e-05,
    warmup_ratio=0.1,
    weight_decay=0.1,
    save_steps=50000,
    seed=seed,
)

In [2]:
config = AutoConfig.from_pretrained('/dccstor/jsmc-nmt-01/bool/git/IOTA-boolean-challenge/model/evc-c5', num_labels=3)

tokenizer=AutoTokenizer.from_pretrained('/dccstor/jsmc-nmt-01/bool/git/IOTA-boolean-challenge/model/evc-c5', use_fast=True)

model = AutoModelForSequenceClassification.from_pretrained('/dccstor/jsmc-nmt-01/bool/git/IOTA-boolean-challenge/model/evc-c5', config=config)

In [3]:
label_list=['no', 'no_answer', 'yes']

postprocessor_class = NWayClassifierPostProcessor
postprocessor = postprocessor_class(
    k=10,       
    drop_label="no_answer",
    label_list = label_list,
    id_key='example_id',
    output_label_prefix='boolean_answer'
)

preprocessor_class = NWayPreProcessor
preprocessor = preprocessor_class(
    sentence1_key='question',
    sentence2_key='passage_answer_text',
    tokenizer=tokenizer,
    load_from_cache_file=False,
    max_seq_len=500,
    padding=False
)

## Inputs
Here we create a dataset from the input file.  For illustrative purposes, we filter it to focus on the english questions that have been predicted to be boolean.

In [4]:
examples=create_dataset_from_run_mrc_output(input_file, unpack=False)
examples=examples.filter(lambda x:x['language']=='english' and x['question_type_pred']=='boolean')
eval_examples, eval_dataset = preprocessor.process_eval(examples)
eval_examples

  0%|          | 0/19 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

Dataset({
    features: ['example_id', 'cls_score', 'start_logit', 'end_logit', 'span_answer', 'span_answer_score', 'start_index', 'end_index', 'passage_index', 'target_type_logits', 'span_answer_text', 'yes_no_answer', 'start_stdev', 'end_stdev', 'query_passage_similarity', 'normalized_span_answer_score', 'confidence_score', 'question', 'language', 'passage_answer_text', 'order', 'rank', 'question_type_pred', 'question_type_scores', 'question_type_conf'],
    num_rows: 112
})

## Do the predictions.
As in mrc, the trainer class instance runs the predictions.

In [5]:
trainer = Trainer( 
    model=model,
    args=training_args,
    train_dataset=None,
    eval_dataset=eval_dataset,
    compute_metrics=None, #compute_metrics,
    tokenizer=tokenizer,
    data_collator=None,
)
predictions = trainer.predict(eval_dataset, metric_key_prefix="predict").predictions



The following columns in the test set  don't have a corresponding argument in `XLMRobertaForSequenceClassification.forward` and have been ignored: passage_answer_text, question, language, example_id. If passage_answer_text, question, language, example_id are not expected by `XLMRobertaForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Prediction *****
  Num examples = 112
  Batch size = 16


## Predictions

The pretrained model we provide is actually as ternary model - it predictions `no`, `no_answer`, or `yes`.  The `no_answer` is discarded for pipelines that end with an actual tydi evaluation, since the tydi evaluation script selects the score threshold that distinguishes answerable and unanswerable questions.  However, other applications may want to make use of this category.

In [6]:
pd.DataFrame.from_records(predictions[0:5,:])

Unnamed: 0,0,1,2
0,-5.042522,3.994114,1.586288
1,-5.808026,0.21062,5.49601
2,5.695564,-1.510027,-3.645893
3,0.671103,-3.745991,2.243662
4,-3.872703,6.205071,-1.955134


In [7]:
eval_preds = postprocessor.process_references_and_predictions(eval_examples, eval_dataset, predictions)
eval_preds_ds = create_dataset_from_json_str(eval_preds.predictions, False)
print(eval_preds_ds)

in process_references_and_predictions
Dataset({
    features: ['example_id', 'cls_score', 'start_logit', 'end_logit', 'span_answer', 'span_answer_score', 'start_index', 'end_index', 'passage_index', 'target_type_logits', 'span_answer_text', 'yes_no_answer', 'start_stdev', 'end_stdev', 'query_passage_similarity', 'normalized_span_answer_score', 'confidence_score', 'question', 'language', 'passage_answer_text', 'order', 'rank', 'question_type_pred', 'question_type_scores', 'question_type_conf', 'boolean_answer_pred', 'boolean_answer_scores', 'boolean_answer_conf'],
    num_rows: 112
})


## Questions and answers

Here we display some questions that have been identified as boolean, and their predicted answers, based on the system output of the MRC system.  A weakness in the TydiQA dataset is that most (85%) of the boolean questions have an answer of `yes` - apparently the question writers wrote questions seeking confirmations of what they already knew or suspected.

In [8]:
from examples.boolqa.mrc2dataset  import create_dataset_from_run_mrc_output

from datasets import ClassLabel, Sequence
from numpy.random import permutation
import random
import pandas as pd
from IPython.display import display, HTML

# Based on https://github.com/huggingface/notebooks/blob/main/examples/question_answering.ipynb
def show_balanced_examples(dataset, perm, groups, nrows, maxchars, cols):
    df = pd.DataFrame(dataset)
    dfp = df.iloc[perm] # shuffle
    dfg = dfp.groupby(groups)
    df_todisplay = dfg.head(nrows)[cols]
    if 'passage_answer_text' in cols:
        df_todisplay['passage_answer_text'] = df_todisplay['passage_answer_text'].str.slice(0,maxchars) + '...'
    display(HTML(df_todisplay.to_html()))
    
    

english_boolean_eval_examples = eval_preds_ds.filter(lambda x:x['language']=='english' and x['question_type_pred']=='boolean')
random_idxs = permutation(len(english_boolean_eval_examples))
cols=['example_id','question','passage_answer_text', 'boolean_answer_pred', 'boolean_answer_scores']
show_balanced_examples(english_boolean_eval_examples, random_idxs, 'boolean_answer_pred', 5, 300, cols)


  0%|          | 0/1 [00:00<?, ?ba/s]

Unnamed: 0,example_id,question,passage_answer_text,boolean_answer_pred,boolean_answer_scores
40,98f388c7-979b-4e7b-9ed3-c8f5d3c0d413,Does an animal with vertebrae have to be a chordate?,"The Chordata and Ambulacraria together form the superphylum Deuterostomia. Chordates are divided into three subphyla: Vertebrata (fish, amphibians, reptiles, birds, and mammals); Tunicata (salps and sea squirts); and Cephalochordata (which includes lancelets). There are also extinct taxa such as the...",yes,"{'no': -0.42267709970474243, 'no_answer': -3.2861926555633545, 'yes': 3.6643331050872803}"
65,4378e5d8-fee1-4fee-be66-f2b0fe5f61e9,Does California get snow?,"The high mountains, including the Sierra Nevada, the Cascade Range, and the Klamath Mountains, have a mountain climate with snow in winter and mild to moderate heat in summer. Ski resorts at Lake Tahoe, Mammoth Lakes, and Mount Shasta routinely receive over 10 feet (3.0m) of snow in a season, and so...",yes,"{'no': -5.855449676513672, 'no_answer': -0.5776659846305847, 'yes': 5.964153289794922}"
4,3a8ee247-2976-43cf-8a04-8cd7970ee323,Does the Magellanic Cloud system have a super massive black hole?,"The Large Magellanic Cloud and its neighbour and relative, the Small Magellanic Cloud, are conspicuous objects in the southern hemisphere, looking like separated pieces of the Milky Way to the naked eye. Roughly 21° apart in the night sky, the true distance between them is roughly 75,000 light-years...",yes,"{'no': -3.8727025985717773, 'no_answer': 6.205071449279785, 'yes': -1.955134391784668}"
47,466eb35c-762b-4d75-84d1-b3434b512555,Is the great horned owl endangered?,"Frequently, the species were denominated a pest due to the perceived threat it posed to domestic fowl and potentially small game. The first genuine nature conservationists, while campaigning against the ""Extermination Being Waged Against the Hawks and Owls"", continued to advocate the destruction of ...",yes,"{'no': -5.3778839111328125, 'no_answer': -1.124822735786438, 'yes': 6.441001892089844}"
42,3a1cd42c-7bfe-47b5-8288-ac7df0213b8d,Is the Mauser C96 produced today?,The Mauser C96 (Construktion 96)[4] is a semi-automatic pistol that was originally produced by German arms manufacturer Mauser from 1896 to 1937.[5] Unlicensed copies of the gun were also manufactured in Spain and China in the first half of the 20th century.[5][6]...,no,"{'no': 4.936239719390869, 'no_answer': 0.35698825120925903, 'yes': -4.551218032836914}"
69,d9d4a59f-01ea-4c67-8f9c-58c679ab6412,Can the central nervous system heal itself?,"Nervous system injuries affect over 90,000 people every year.[2] It is estimated that spinal cord injuries alone affect 10,000 each year.[3] As a result of this high incidence of neurological injuries, nerve regeneration and repair, a subfield of neural tissue engineering, is becoming a rapidly grow...",no,"{'no': 1.9971495866775513, 'no_answer': -1.7380597591400146, 'yes': -0.20945601165294647}"
26,dbcd04df-cb7b-47e0-8ae0-2f8d9f294061,Is Daydream Software still an active company?,"The Israeli company Waze Mobile developed the Waze software. Ehud Shabtai, Amir Shinar and Uri Levine founded the company. Two Israeli venture capital firms, Magma and Vertex, and an early-stage American venture capital firm, Bluerun Ventures, provided funding. Google acquired Waze Mobile in 2013....",yes,"{'no': -4.1507720947265625, 'no_answer': 6.917901992797852, 'yes': -2.0596728324890137}"
10,41471b45-8b0f-4a83-a91d-19f388a544cb,Is Cantonese written the same as Mandarin?,"Written Cantonese is the written form of Cantonese, the most complete written form of Chinese after that for Mandarin Chinese and Classical Chinese. Written Chinese was originally developed for Classical Chinese, and was the main literary language of China until the 19th century. Written vernacular...",no,"{'no': 4.523313522338867, 'no_answer': -5.283786773681641, 'yes': 0.3928956389427185}"
44,70b2ef92-450b-4074-978f-01204487ea27,Does the KGB still exist?,"The agency was a military service governed by army laws and regulations, in the same fashion as the Soviet Army or MVD Internal Troops. While most of the KGB archives remain classified, two online documentary sources are available.[1][2] Its main functions were foreign intelligence, counter-intellig...",no,"{'no': 2.16046142578125, 'no_answer': 0.9643279314041138, 'yes': -2.6827592849731445}"
53,94276bb1-d708-4abc-8da8-616877e9996a,Is there a cure for juvenile rheumatoid arthritis?,"JIA is best treated by a multidisciplinary team. The major emphasis of treatment for JIA is to help the child regain normal level of physical and social activities. This is accomplished with the use of physical therapy, pain management strategies, and social support.[28] Another emphasis of treatmen...",no,"{'no': 5.612573623657227, 'no_answer': -0.8928341269493103, 'yes': -3.8825323581695557}"
