Data link: https://www.kaggle.com/narendrageek/covid19-frequent-asked-questions

In [1]:
!pip install transformers



In [2]:
import tensorflow as tf
from transformers import BertTokenizer
from transformers import TFBertForQuestionAnswering
from transformers import TFTrainer, TFTrainingArguments
import numpy as np
import pandas as pd
from sklearn.decomposition import TruncatedSVD
from sklearn.metrics.pairwise import cosine_similarity
import spacy

In [3]:
data= pd.read_csv('COVID19_FAQ.csv')

In [7]:
nlp = spacy.load('en_core_web_sm')

In [4]:
data.head()

Unnamed: 0,questions,answers
0,1. How does COVID-19 spread?,People can catch COVID-19 from others who have...
1,2. What are the symptoms of COVID-19?,The most common symptoms of COVID-19 are fever...
2,3. How do I know if it is COVID-19 or just the...,A COVID-19 infection has the same signs and sy...
3,4. Can the virus that causes COVID-19 be trans...,Studies to date suggest that the virus that ca...
4,5. What can I do to protect myself and prevent...,Protection measures for everyone Stay aware ...


In [8]:
#Seperating sentences to words.
question_vectors=[nlp(x).vector for x in data['questions'].values]

In [9]:
#Vectorization of words and finding similarity matrix of words using cosine similarity.
svd=TruncatedSVD(n_components=3)
svd_questions=svd.fit_transform(question_vectors)
cos_sim=cosine_similarity(svd_questions,svd_questions)

In [36]:
question=input("Please enter your question: ")
my_question_vector=np.stack([nlp(question).vector])
question_vectors=np.append(question_vectors,my_question_vector,axis=0)
svd_questions=svd.fit_transform(question_vectors)
cos_sim=cosine_similarity(svd_questions,svd_questions)

Please enter your question: how covid spread?


In [42]:
print(pd.DataFrame(cos_sim)[68].sort_values(ascending=False)[1:10])
number=int(input("Enter the sentence number that is smaller than 1.0: "))
answer=data['answers'][number]
print(answer)

67    0.937804
0     0.837722
48    0.804925
43    0.787931
6     0.780949
20    0.775102
28    0.736660
37    0.717705
18    0.715567
Name: 68, dtype: float32
Enter the sentence number that is smaller than 1.0: 0
People can catch COVID-19 from others who have the virus. The disease can spread from person   to person through small droplets from the nose or mouth which are spread when a person with   COVID-19 coughs or exhales. These droplets land on objects and surfaces around the person.   Other people then catch COVID-19 by touching these objects or surfaces, then touching their   eyes, nose or mouth. People can also catch COVID-19 if they breathe in droplets from a person   with COVID-19 who coughs out or exhales droplets. This is why it is important to stay more than   1 meter (3 feet) away from a person who is sick.


In [43]:
tokenizer= BertTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")

In [44]:
input_dict=(tokenizer((question,answer),return_tensors='tf',padding=True,truncation=True))

In [45]:
#bert-large-uncased-whole-word-masking-finetuned-squad: 1.25GB
model=TFBertForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
outputs=model(input_dict)

All model checkpoint layers were used when initializing TFBertForQuestionAnswering.

All the layers of TFBertForQuestionAnswering were initialized from the model checkpoint at bert-large-uncased-whole-word-masking-finetuned-squad.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForQuestionAnswering for predictions without further training.


In [46]:
start_logits=outputs.start_logits
end_logits=outputs.end_logits

In [70]:
all_tokens=tokenizer.convert_ids_to_tokens(input_dict["input_ids"].numpy()[1])
answer=' '.join(all_tokens[tf.math.argmax(start_logits,1)[1]:tf.math.argmax(end_logits,1)[1]+1])

In [73]:
#BERT use wordpiece tokenization, rare words get broken down into subwords.
#Wordpiece tokenization uses ## to delimit tokens that have been split.
answer= answer.replace(" ##","")
print(answer)

other people then catch covid - 19 by touching these objects or surfaces , then touching their eyes , nose or mouth . people can also catch covid - 19 if they breathe in droplets from a person with covid - 19 who coughs out or exhales droplets [CLS] people can catch covid - 19 from others who have the virus . the disease can spread from person to person through small droplets from the nose or mouth which are spread when a person with covid - 19 coughs or exhales . these droplets land on objects and surfaces around the person . other people then catch covid - 19 by touching these objects or surfaces , then touching their eyes , nose or mouth . people can also catch covid - 19 if they breathe in droplets from a person with covid - 19 who coughs out or exhales droplets . this is why it is important to stay more than 1 meter ( 3 feet ) away from a person who is sick . [SEP]
