In [3]:
pip install transformers

Note: you may need to restart the kernel to use updated packages.


In [4]:
pip install torch

Note: you may need to restart the kernel to use updated packages.


In [5]:
from transformers import BertForQuestionAnswering
from transformers import BertTokenizer
import torch
import numpy as np

In [6]:
model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
 
tokenizer_for_bert = BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

Downloading (…)lve/main/config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


Downloading model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-large-uncased-whole-word-masking-finetuned-squad were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

In [50]:
def bert_question_answer(question, passage, max_len=500):
    #Tokenize input question and passage 
    #Include unique tokens- [CLS] and [SEP]
    input_ids = tokenizer_for_bert.encode(question, passage,  max_length= max_len, truncation=True)
    #Getting number of tokens in 1st sentence (question) and 2nd sentence (passage that contains answer)
    sep_index = input_ids.index(102) 
    len_question = sep_index + 1  
    len_passage = len(input_ids)- len_question  
    segment_ids =  [0]*len_question + [1]*(len_passage)  
    #Converting token ids to tokens
    tokens = tokenizer_for_bert.convert_ids_to_tokens(input_ids) 
    #Getting start and end scores for answer
    #Converting input arrays to torch tensors before passing to the model
    start_token_scores = model(torch.tensor([input_ids]), token_type_ids=torch.tensor([segment_ids]) )[0]
    end_token_scores = model(torch.tensor([input_ids]), token_type_ids=torch.tensor([segment_ids]) )[1]
    #Converting scores tensors to numpy arrays
    start_token_scores = start_token_scores.detach().numpy().flatten()
    end_token_scores = end_token_scores.detach().numpy().flatten()
    #Getting start and end index of answer based on highest scores
    answer_start_index = np.argmax(start_token_scores)
    answer_end_index = np.argmax(end_token_scores)
    #Getting scores for start and end token of the answer
    start_token_score = np.round(start_token_scores[answer_start_index], 2)
    end_token_score = np.round(end_token_scores[answer_end_index], 2)
    #Combining subwords starting with ## and get full words in output. 
    #It is because tokenizer breaks words which are not in its vocab.
    answer = tokens[answer_start_index] 
    for i in range(answer_start_index + 1, answer_end_index + 1):
        if tokens[i][0:2] == '##':  
            answer += tokens[i][2:] 
        else:
            answer += ' ' + tokens[i]  
 
    # If the answer didn't find in the passage
    if (start_token_score < 0 ) or ( answer_start_index == 0) or ( answer_end_index <  answer_start_index) or (answer == '[SEP]'):
        answer = "Sorry!, I was unable to discover an answer in the passage."
     
    return (answer_start_index, answer_end_index, start_token_score, end_token_score,  answer)

In [63]:
# Let me define one passage
passage = """John say Let's assign tasks for the launch.
Amy say I'll handle the marketing strategy.
David say I'll take care of the product design.
Lisa say I'll manage the content creation.
David say We need at least 3 months for design.
Amy say Marketing can be ready in 2 months.
Lisa say Content creation will take 2.5 months.
Let's aim for a 4-month launch timeline.
David say I'll create a design proposal.
Amy say I'll work on the marketing plan.
john say onkar i'm assigning you a work that is start working on content planning"""
 

In [64]:
question1 ="what work did john assign to onkar "
print ('\nQuestion 1:\n', question1)
_, _ , _ , _, ans  = bert_question_answer( question1, passage)
print('\nAnswer: ', ans ,  '\n')



Question 1:
 what work did john assign to onkar 

Answer:  start working on content planning 



In [52]:
question1 ="who will work on marketing plan"
print ('\nQuestion 1:\n', question1)
_, _ , _ , _ , ans  = bert_question_answer( question1, passage)
print('\nAnswer: ', ans )



Question 1:
 who will work on marketing plan

Answer:  amy


In [66]:
question1 ="what is main goal of this meeting?"
print ('\nQuestion 1:\n', question1)
_, _ , _ , _, ans  = bert_question_answer( question1, passage)
print('\nAnswer : ', ans ,  '\n')



Question 1:
 what is main goal of this meeting?

Answer :  4 - month launch timeline 



In [61]:
question1 ="John is assigning task for?"
print ('\nQuestion 1:\n', question1)
_,_,_,_, ans  = bert_question_answer( question1, passage)
print('\nAnswer from BERT: ', ans ,  '\n')


Question 1:
 John is assigning task for?

Answer from BERT:  the launch 



In [58]:
question1 ="marketing will be ready in how many month?"
print ('\nQuestion 1:\n', question1)
_,_,_,_, ans  = bert_question_answer( question1, passage)
print('\nAnswer from BERT: ', ans ,  '\n')


Question 1:
 marketing will be ready in how many month?

Answer from BERT:  2 



In [55]:
question1 ="what is time duration for timeline launch?"
print ('\nQuestion 1:\n', question1)
_, _ , _ , _, ans  = bert_question_answer( question1, passage)
print('\nAnswer from BERT: ', ans ,  '\n')


Question 1:
 what is time duration for timeline launch?

Answer from BERT:  4 - month 

