# Extractive Approach for Machine Reading Comprehension (MRC)

This Jupyter notebook evaluates the performance of extractive Question-Answering transformers models on Pirá Dataset. 

Extractive Models generate the answer as spam of the supporting text.

Check the full GitHub at: https://github.com/C4AI/Pira

## Imports

In [None]:
from transformers import pipeline
from transformers import AutoModelForQuestionAnswering, AutoTokenizer
import pandas as pd
from __future__ import print_function
from collections import Counter
import string
import re
import argparse
import json
import sys

## Dataset information

Here we set some values necessary to load the dataset

PATH_BASE -> Dataset path


SUPPORTING_TEXT_COLUMN -> Indicates the Supporting Text Column. Use "10" for English or "-2" for Portuguese.

ANSWER_COLUMN -> Indicates the Answer Column. Use "6" for English or "7" for Portuguese.

QUESTION_COLUMN -> Indicates the Answer Column. Use "2" for English or "3" for Portuguese.


In [11]:
PATH_BASE = './Data/test.csv'

SUPPORTING_TEXT_COLUMN = -2
ANSWER_COLUMN = 7
QUESTION_COLUMN = 3



## Loading Dataset

In [12]:
pira_dataset = pd.read_csv(PATH_BASE).values.tolist()
    
quest = []
for line in pira_dataset:
    quest.append([str(line[QUESTION_COLUMN]), str(line[ANSWER_COLUMN]), str(line[SUPPORTING_TEXT_COLUMN])])
    


In [13]:
quest[0]

['O que permitiu a descoberta de novas reservas de petróleo e gás distantes da costa nos últimos 10 anos?',
 'Avanço tecnológico',
 'Os adiantamentos no conhecimento e capacidade nova exploração e desenvolvimento em áreas offshore continuam sendo uma importante fonte de aumento da produção global de petróleo e gás. Os avanços tecnológicos na última década incentivaram a exploração nas águas profundas e ultradeeste a mais longe da costa e permitiram a descoberta de novas reservas significativas. As capacidades de profundidade da água para a exploração offshore aumentaram de cerca de 3.050 m a mais de 3.350 m entre 2010 e 2018, enquanto a capacidade de produção utilizando plataformas flutuantes atingiu quase 2.900 m em 2018, de 2.438 m em 2010 (Barton e outros, 2019). Tais avanços tecnológicos permitiram em parte a expansão do setor offshore de petróleo e gás para novas regiões, incluindo o Mediterrâneo oriental e as áreas da costa da Guiana. Também houve avanços na compreensão dos poten

## Iniatializing the model

Initializing the Extractive QA model from HuggingFace

In [2]:
model_name = "pierreguillou/bert-base-cased-squad-v1.1-portuguese"


qa_pipeline = pipeline(
    "question-answering",
    model = model_name,
    tokenizer = model_name
)



predictions = qa_pipeline({
    'context': "The game was played on February 7, 2016 at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California.",
    'question': "What day was the game played on?"
})

print(predictions)

Downloading:   0%|          | 0.00/862 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/433M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/494 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/210k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

{'score': 0.5075850486755371, 'start': 23, 'end': 39, 'answer': 'February 7, 2016'}


## Generating each answer

In [14]:
true_answers = []
gen_answers = []
passages = []
questions =[]
for i in range(len(quest)):
    print(i)
    predictions = qa_pipeline({
    'context': quest[i][2],
    'question': quest[i][0]
    })
    passages.append(quest[i][2])
    questions.append(quest[i][0])
    gen_answers.append(str(predictions["answer"]))
    true_answers.append([quest[i][1]])




0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226


## Evaluationg script

SQuAD evaluation script: https://github.com/allenai/bi-att-flow/blob/master/squad/evaluate-v1.1.py 

Modified slightly for this notebook since we do not remove articles to remain consistent for both Portuguese and English

In [2]:
def normalize_answer(s):
    """Lower text and remove punctuation and extra whitespace."""

    def white_space_fix(text):
        return ' '.join(text.split())

    def remove_punc(text):
        exclude = set(string.punctuation)
        return ''.join(ch for ch in text if ch not in exclude)

    def lower(text):
        return text.lower()

    return white_space_fix(remove_punc(lower(s)))


def f1_score(prediction, ground_truth):
    prediction_tokens = normalize_answer(prediction).split()
    ground_truth_tokens = normalize_answer(ground_truth).split()
    common = Counter(prediction_tokens) & Counter(ground_truth_tokens)
    num_same = sum(common.values())
    if num_same == 0:
        return 0
    precision = 1.0 * num_same / len(prediction_tokens)
    recall = 1.0 * num_same / len(ground_truth_tokens)
    f1 = (2 * precision * recall) / (precision + recall)
    return f1


def exact_match_score(prediction, ground_truth):
    return (normalize_answer(prediction) == normalize_answer(ground_truth))


def metric_max_over_ground_truths(metric_fn, prediction, ground_truths):
    scores_for_ground_truths = []
    for ground_truth in ground_truths:
        score = metric_fn(prediction, ground_truth)
        scores_for_ground_truths.append(score)
    return max(scores_for_ground_truths)


def evaluate(gold_answers, predictions):
    f1 = exact_match = total = 0

    for ground_truths, prediction in zip(gold_answers, predictions):
      total += 1
      exact_match += metric_max_over_ground_truths(
                    exact_match_score, prediction, ground_truths)
      f1 += metric_max_over_ground_truths(
          f1_score, prediction, ground_truths)
    
    exact_match = 100.0 * exact_match / total
    f1 = 100.0 * f1 / total

    return {'exact_match': exact_match, 'f1': f1}

## Performing Evaluation

In [16]:
evaluate(true_answers, gen_answers)

{'exact_match': 4.405286343612334, 'f1': 37.531883499754755}