<h1><center>BiDAF Model</center></h1>

Installing **allennlp** module required for BiDAF model.

In [5]:
#!pip install allennlp==1.0.0 allennlp-models==1.0.0

In [None]:
from allennlp.predictors.predictor import Predictor
import json
import pandas as pd

Fetching the bidaf model

In [None]:
predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/bidaf-model-2020.03.19.tar.gz")

Function for pre-processing the input text.

In [None]:
def norm_text(text):
    text = text.lower()
    text = "".join(ch for ch in text if ch not in set(string.punctuation))
    text = re.sub(re.compile(r"\b(a|an|the)\b", re.UNICODE), " ", text)
    text = " ".join(text.split())
    return text

F1-score function used as evaluation metric

In [None]:
def f1(prediction, answer):
    prediction = prediction.split()
    answer = answer.split()
    
    if len(prediction) == 0 or len(answer) == 0:
        return int(pred_tokens == truth_tokens)
    
    c = set(prediction) & set(answer)
    if(len(c) == 0):
        return 0
    
    precision = len(c) / len(prediction)
    recall = len(c) / len(answer)
    return round(2 * (precision * recall) / (precision + recall), 2)

Helper function predicting and displaying F1-score and Exact Match score.

In [None]:
def question_answer(context, question,answer, prediction):
    em_score = bool(norm_text(prediction) == norm_text(answer))
    f1_score = f1(norm_text(prediction), norm_text(answer))

    print(question)
    print(prediction)
    print(answer)
    print(em_score)
    print(f1_score)

    return question, prediction, answer, em_score, f1_score

Loading the Validation data in proper format.

In [None]:
valid_path = 'dev-v2.0.json'

with open(valid_path, 'rb') as f:
    squad = json.load(f)

valid_contexts, valid_questions,valid_answers = [],[],[]

for group in squad['data']:
    for passage in group['paragraphs']:
        context = passage['context']
        for qa in passage['qas']:
            question = qa['question']
            for answer in qa['answers']:
                valid_contexts.append(context)
                valid_questions.append(question)
                valid_answers.append(answer)

Result DataFrame used for Saving the output in a csv file.

In [None]:
Result = pd.DataFrame()
Q_L, P_L, A_L, E_L, F_L = [],[],[],[],[]

In [None]:
for context,question,ans in zip(valid_contexts, valid_questions, valid_answers):
    Q,P,A,E,F = question_answer(context, question,ans['text'])
    Q_L.append(Q)
    P_L.append(P)
    A_L.append(A)
    E_L.append(E)
    F_L.append(F)

In [None]:
Result['Question'] = Q_L
Result['Prediction'] = P_L
Result['True Answer'] = A_L
Result['Exact match'] = E_L
Result['F1 score'] = F_L

Result.to_csv(r"bidaf.csv")

In [None]:
context = """Harry Potter is a series of seven fantasy novels written by British author J. K. Rowling. The novels chronicle 
             the lives of a young wizard, Harry Potter, and his friends Hermione Granger and Ron Weasley, all of whom are 
             students at Hogwarts School of Witchcraft and Wizardry. The main story arc concerns Harry's struggle against 
             Lord Voldemort, a dark wizard who intends to become immortal, overthrow the wizard governing body known as the 
             Ministry of Magic and subjugate all wizards and Muggles (non-magical people). The series was originally published 
             in English by Bloomsbury in the United Kingdom and Scholastic Press in the United States. All versions around the
             world are printed by Grafica Veneta in Italy. A series of many genres, including fantasy, drama, coming of age, 
             and the British school story (which includes elements of mystery, thriller, adventure, horror, and romance), the 
             world of Harry Potter explores numerous themes and includes many cultural meanings and references. According to 
             Rowling, the main theme is death. Other major themes in the series include prejudice, corruption, and madness."""


questions = ["Where do Harry and his friends study?", 
             "What is the main theme of Harry Potter?", 
             "Who originally published Harry Potter in United Kingdon?", 
             "how many novels are there in Harry Potter?", 
             "Who are Harry Potter's friends?",
             "Who is Lord Voldemort?"] 
 
answers = ["Hogwarts School of Witchcraft and Wizardry", 
           "death", 
           "Bloomsbury",  
           "seven", 
           "Hermione Granger and Ron Weasley", 
           "a dark wizard who intends to become immortal, overthrow the wizard governing body known as the Ministry of Magic and subjugate all wizards and Muggles" ]


for question, answer in zip(questions, answers):
    question_answer(context, question, answer)