# Extractive Approach for Open Question Answering

This Jupyter notebook evaluates the performance of extractive Question-Answering transformers models on Pirá Dataset after the retriever step. 

Extractive Models generate the answer as spam of the supporting text.

Check the full GitHub at: https://github.com/C4AI/Pira

## Imports

In [None]:
import pandas as pd
from transformers import pipeline
from __future__ import print_function
from collections import Counter
import string
import re
import argparse
import json
import sys

## Dataset information

Be sure to run the BM25.ipynb file before to save retrieved supporting texts

In [1]:
PATH_BASE = 'finetune_PT_PT_100Words_5Passages/'

## Loading Dataset

In [None]:
pira_dataset = pd.read_csv(PATH_BASE + "extractive.csv", index_col = 0).values.tolist()
    
quest = []
for line in pira_dataset:
    quest.append([str(line[0]), str(line[1]), str(line[2])])
    


## Iniatializing the model

Initializing the Extractive QA model from HuggingFace

In [2]:
qa_pipeline = pipeline(
    "question-answering",
    model="pierreguillou/bert-base-cased-squad-v1.1-portuguese",
    tokenizer="pierreguillou/bert-base-cased-squad-v1.1-portuguese"
)

predictions = qa_pipeline({
    'context': "The game was played on February 7, 2016 at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California.",
    'question': "What day was the game played on?"
})

print(predictions)

{'score': 0.5075850486755371, 'start': 23, 'end': 39, 'answer': 'February 7, 2016'}


## Generating each answer

In [None]:
true_answers = []
gen_answers = []
passages = []
questions =[]
for i in range(len(quest)):
    print(i)
    predictions = qa_pipeline({
    'context': quest[i][2],
    'question': quest[i][0]
    })
    passages.append(quest[i][2])
    questions.append(quest[i][0])
    gen_answers.append(str(predictions["answer"]))
    true_answers.append([quest[i][1]])




## Evaluationg script

SQuAD evaluation script: https://github.com/allenai/bi-att-flow/blob/master/squad/evaluate-v1.1.py 

Modified slightly for this notebook since we do not remove articles to remain consistent for both Portuguese and English

In [None]:
def normalize_answer(s):
    """Lower text and remove punctuation and extra whitespace."""

    def white_space_fix(text):
        return ' '.join(text.split())

    def remove_punc(text):
        exclude = set(string.punctuation)
        return ''.join(ch for ch in text if ch not in exclude)

    def lower(text):
        return text.lower()

    return white_space_fix(remove_punc(lower(s)))


def f1_score(prediction, ground_truth):
    prediction_tokens = normalize_answer(prediction).split()
    ground_truth_tokens = normalize_answer(ground_truth).split()
    common = Counter(prediction_tokens) & Counter(ground_truth_tokens)
    num_same = sum(common.values())
    if num_same == 0:
        return 0
    precision = 1.0 * num_same / len(prediction_tokens)
    recall = 1.0 * num_same / len(ground_truth_tokens)
    f1 = (2 * precision * recall) / (precision + recall)
    return f1


def exact_match_score(prediction, ground_truth):
    return (normalize_answer(prediction) == normalize_answer(ground_truth))


def metric_max_over_ground_truths(metric_fn, prediction, ground_truths):
    scores_for_ground_truths = []
    for ground_truth in ground_truths:
        score = metric_fn(prediction, ground_truth)
        scores_for_ground_truths.append(score)
    return max(scores_for_ground_truths)


def evaluate(gold_answers, predictions):
    f1 = exact_match = total = 0

    for ground_truths, prediction in zip(gold_answers, predictions):
      total += 1
      exact_match += metric_max_over_ground_truths(
                    exact_match_score, prediction, ground_truths)
      f1 += metric_max_over_ground_truths(
          f1_score, prediction, ground_truths)
    
    exact_match = 100.0 * exact_match / total
    f1 = 100.0 * f1 / total

    return {'exact_match': exact_match, 'f1': f1}

## Performing Evaluation

In [None]:
evaluate(true_answers, gen_answers)