## Question Answering with a pretrained BERT model (Finetuned for QA)

### Importing the libraries

In [1]:
!pip install transformers #Huggingface
!pip install torch

ERROR: Invalid requirement: '#Huggingface'




In [2]:
import numpy as np
import torch #pytorch
from transformers import BertForQuestionAnswering, AutoModelForQuestionAnswering
from transformers import BertTokenizer, AutoTokenizer

### Loading the pretrained models

In [3]:
# model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
# tokenizer = BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

model = AutoModelForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
tokenizer = AutoTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

Some weights of the model checkpoint at bert-large-uncased-whole-word-masking-finetuned-squad were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


### Function

In [4]:
question  = "Where did the FIFA world cup 2022 happen ?"
text = "The 2022 FIFA World Cup was the 22nd FIFA World Cup, the quadrennial world championship for national football teams organized by FIFA. It took place in Qatar from 20 November to 18 December 2022, after the country was awarded the hosting rights in 2010."

In [5]:
input_ids = tokenizer.encode(question, text)

In [6]:
tokens = tokenizer.convert_ids_to_tokens(input_ids)

In [7]:
dict(zip(input_ids, tokens))

{101: '[CLS]',
 2073: 'where',
 2106: 'did',
 1996: 'the',
 5713: 'fifa',
 2088: 'world',
 2452: 'cup',
 16798: '202',
 2475: '##2',
 4148: 'happen',
 1029: '?',
 102: '[SEP]',
 2001: 'was',
 13816: '22nd',
 1010: ',',
 17718: 'quad',
 7389: '##ren',
 6200: '##nia',
 2140: '##l',
 2528: 'championship',
 2005: 'for',
 2120: 'national',
 2374: 'football',
 2780: 'teams',
 4114: 'organized',
 2011: 'by',
 1012: '.',
 2009: 'it',
 2165: 'took',
 2173: 'place',
 1999: 'in',
 12577: 'qatar',
 2013: 'from',
 2322: '20',
 2281: 'november',
 2000: 'to',
 2324: '18',
 2285: 'december',
 2044: 'after',
 2406: 'country',
 3018: 'awarded',
 9936: 'hosting',
 2916: 'rights',
 2230: '2010'}

In [8]:
tokenizer.sep_token_id

102

In [9]:
'''
Input 1 = Question
Input 2 = Passage/Text
Output 1 = Answer

'''


def question_answer(question, text):
    
    #tokenize question and text in ids as a pair
    input_ids = tokenizer.encode(question, text)
    
    #string version of tokenized ids
    tokens = tokenizer.convert_ids_to_tokens(input_ids)
    
    #segment IDs
    #first occurence of [SEP] token
    sep_idx = input_ids.index(tokenizer.sep_token_id)

    #number of tokens in segment A - question
    num_seg_a = sep_idx+1

    #number of tokens in segment B - text
    num_seg_b = len(input_ids) - num_seg_a
    
    #list of 0s and 1s
    segment_ids = [0]*num_seg_a + [1]*num_seg_b
    
    assert len(segment_ids) == len(input_ids)
    
    #model output using input_ids and segment_ids
    output = model(torch.tensor([input_ids]), token_type_ids=torch.tensor([segment_ids]))
    
    #reconstructing the answer
    answer_start = torch.argmax(output.start_logits)
    answer_end = torch.argmax(output.end_logits)

    if answer_end >= answer_start:
        answer = tokens[answer_start]
        for i in range(answer_start+1, answer_end+1):
            if tokens[i][0:2] == "##":
                answer += tokens[i][2:]
            else:
                answer += " " + tokens[i]
                
    if answer.startswith("[CLS]"):
        answer = "Unable to find the answer to your question."
    
#     print("Text:\n{}".format(text.capitalize()))
#     print("\nQuestion:\n{}".format(question.capitalize()))
    print("\nAnswer:\n{}".format(answer.capitalize()))

In [10]:
text = """The 2022 FIFA World Cup was the 22nd FIFA World Cup, the quadrennial world championship for national football teams organized by FIFA. It took place in Qatar from 20 November to 18 December 2022, after the country was awarded the hosting rights in 2010"""

question = "Was the FIFA World cup Quadrennial world championship ?"

question_answer(question, text)


Answer:
The 2022 fifa world cup was the 22nd fifa world cup


### Playing with the chatbot

In [12]:
text = input("Please enter your text: \n")
question = input("\nPlease enter your question: \n")

while True:
    question_answer(question, text)
    
    flag = True
    flag_N = False
    
    while flag:
        response = input("\nDo you want to ask another question based on this text (Y/N)? ")
        if response[0] == "Y":
            question = input("\nPlease enter your question: \n")
            flag = False
        elif response[0] == "N":
            print("\nBye!")
            flag = False
            flag_N = True
            
    if flag_N == True:
        break

Please enter your text: 
My name is Aditi Verma. I have completed my Masters in Applied Mathematics in 2023.

Please enter your question: 
Whne did Aditi complete her Masters?

Answer:
Applied mathematics

Do you want to ask another question based on this text (Y/N)? Y

Please enter your question: 
In which year did Aditi Completed MSc Applied mathematics?

Answer:
2023

Do you want to ask another question based on this text (Y/N)? What is the name?

Do you want to ask another question based on this text (Y/N)? Y

Please enter your question: 
What is her name?

Answer:
Aditi verma

Do you want to ask another question based on this text (Y/N)? Y

Please enter your question: 
Has she done her Masters?

Answer:
I have completed my masters in applied mathematics in 2023

Do you want to ask another question based on this text (Y/N)? N

Bye!
