# Objective:

We develop a simple Question and Answer System using bert model for covid-19 queries.

# Importing the required modules and frameworks

In [1]:
import tensorflow as tf
from transformers import BertTokenizer, TFBertForQuestionAnswering 
import warnings
warnings.filterwarnings('ignore')
gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)

# Taking a pretrained model

We will be taking the pretrained model of bert from the hugging face website. We take the large bert model because we need good accuracy for answering the question that ha been provided since it is related to medical fied.<br> 

Tokenizer is used to encode the query and sentences for the model to understand.


In [2]:
modelName = 'bert-large-uncased-whole-word-masking-finetuned-squad' # https://huggingface.co/transformers/pretrained_models.html
#modelName = 'distilbert-base-cased-distilled-squad'
tokenizer = BertTokenizer.from_pretrained(modelName)
model = TFBertForQuestionAnswering.from_pretrained(modelName)

All model checkpoint weights were used when initializing TFBertForQuestionAnswering.

All the weights of TFBertForQuestionAnswering were initialized from the model checkpoint at bert-large-uncased-whole-word-masking-finetuned-squad.
If your task is similar to the task the model of the ckeckpoint was trained on, you can already use TFBertForQuestionAnswering for predictions without further training.


# Get source and query

We will be taking Covid-19 information from the wikipedia text as the truth source for our model.
Link: https://en.wikipedia.org/wiki/Coronavirus_disease_2019

For a example query we will be taking a question such as **"What are the symptoms of COVID-19?"**

In [3]:
source=r"""The COVID‑19 pandemic, also known as the coronavirus pandemic, is an ongoing global pandemic of coronavirus disease 2019 (COVID‑19). The outbreak was first identified in December 2019 in Wuhan, China. The World Health Organization declared the outbreak a Public Health Emergency of International Concern on 30 January 2020 and a pandemic on 11 March. As of 6 August 2020, more than 18.7 million cases of COVID‑19 have been reported in more than 188 countries and territories, resulting in more than 706,000 deaths; more than 11.3 million people have recovered.The virus is primarily spread between people during close contact,most often via small droplets produced by coughing,sneezing, and talking.The droplets usually fall to the ground or onto surfaces rather than travelling through air over long distances.However, the transmission may also occur through smaller droplets that are able to stay suspended in the air for longer periods of time in enclosed spaces, as typical for airborne diseases. Less commonly, people may become infected by touching a contaminated surface and then touching their face.It is most contagious during the first three days after the onset of symptoms, although spread is possible before symptoms appear, and from people who do not show symptoms.Common symptoms include fever, cough, fatigue, shortness of breath, and loss of sense of smell. Complications may include pneumonia and acute respiratory distress syndrome.The time from exposure to onset of symptoms is typically around five days but may range from two to fourteen days.There is no known vaccine or specific antiviral treatment.Primary treatment is symptomatic and supportive therapy.Recommended preventive measures include hand washing, covering one's mouth when coughing, maintaining distance from other people, wearing a face mask in public settings, disinfecting surfaces, increasing ventilation and air filtration indoors, and monitoring and self-isolation for people who suspect they are infected.Authorities worldwide have responded by implementing travel restrictions, lockdowns, workplace hazard controls, and facility closures in order to slow the spread of the disease. Many places have also worked to increase testing capacity and trace contacts of infected persons.The pandemic has caused global social and economic disruption, global famines affecting 265 million people."""
question =r"""What are the symptoms of COVID-19?"""

# Preprocess them to pass into a model

We add a seperator such as **" [SEP] "** between the source and the question for the model to know. Then we encode the input text with the tokenizer.

In [4]:
input_text=question+" [SEP] "+source
input_ids=tokenizer.encode(input_text)
input_1 = tf.constant(input_ids)[None, :]

In [5]:
print(input_ids)
print(tokenizer.decode(input_ids))

[101, 2054, 2024, 1996, 8030, 1997, 2522, 17258, 1011, 2539, 1029, 102, 1996, 2522, 17258, 1514, 2539, 6090, 3207, 7712, 1010, 2036, 2124, 2004, 1996, 21887, 23350, 6090, 3207, 7712, 1010, 2003, 2019, 7552, 3795, 6090, 3207, 7712, 1997, 21887, 23350, 4295, 10476, 1006, 2522, 17258, 1514, 2539, 1007, 1012, 1996, 8293, 2001, 2034, 4453, 1999, 2285, 10476, 1999, 8814, 4819, 1010, 2859, 1012, 1996, 2088, 2740, 3029, 4161, 1996, 8293, 1037, 2270, 2740, 5057, 1997, 2248, 5142, 2006, 2382, 2254, 12609, 1998, 1037, 6090, 3207, 7712, 2006, 2340, 2233, 1012, 2004, 1997, 1020, 2257, 12609, 1010, 2062, 2084, 2324, 1012, 1021, 2454, 3572, 1997, 2522, 17258, 1514, 2539, 2031, 2042, 2988, 1999, 2062, 2084, 19121, 3032, 1998, 6500, 1010, 4525, 1999, 2062, 2084, 3963, 2575, 1010, 2199, 6677, 1025, 2062, 2084, 2340, 1012, 1017, 2454, 2111, 2031, 6757, 1012, 1996, 7865, 2003, 3952, 3659, 2090, 2111, 2076, 2485, 3967, 1010, 2087, 2411, 3081, 2235, 27126, 2550, 2011, 21454, 1010, 1055, 24045, 6774, 1010, 1

**Observation:** From the above we can see how the input text is encoded.

In [6]:
token_ids=[0 if i < input_ids.index(102) else 1 for i in range(len(input_ids))]
print(token_ids)

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 

**Observation:** We will be using token_ids to give importance to the source by giving the source tokens a value of 1 and question token a value of 0.

#### Prediction from our model:

In [7]:
answer=model({'input_ids':input_1,'token_type_ids': tf.convert_to_tensor([token_ids])})
startScores,endScores=answer
input_tokens = tokenizer.convert_ids_to_tokens(input_ids)
startIdx = tf.math.argmax(startScores[0],0).numpy()
endIdx = tf.math.argmax(endScores[0],0).numpy()+1



In [8]:
def process(st):
    new=st.split('#')
    new_st=""
    #print(new)
    #print(len(new))
    for word in new:
        if word=="":
            new_st=new_st[:-1]
        new_st+=word
    return new_st
    

In [9]:
ans=process(" ".join(input_tokens[startIdx:endIdx]))
print(ans)

fever , cough , fatigue , shortness of breath , and loss of sense of smell


**Observation:** we can see it is giving accurate results for the symptoms of covid-19.

# Utility function:
We will be developing a utility function so that this could be integrated anywhere for covid-19 QnA system:


In [10]:
def covid_19_qna(question):
    # Load the pretrained model
    modelName = 'bert-large-uncased-whole-word-masking-finetuned-squad' # https://huggingface.co/transformers/pretrained_models.html
    tokenizer = BertTokenizer.from_pretrained(modelName)
    model = TFBertForQuestionAnswering.from_pretrained(modelName)
    # Covid-19 Information
    source=r"""The COVID‑19 pandemic, also known as the coronavirus pandemic, is an ongoing global pandemic of coronavirus disease 2019 (COVID‑19). The outbreak was first identified in December 2019 in Wuhan, China. The World Health Organization declared the outbreak a Public Health Emergency of International Concern on 30 January 2020 and a pandemic on 11 March. As of 6 August 2020, more than 18.7 million cases of COVID‑19 have been reported in more than 188 countries and territories, resulting in more than 706,000 deaths; more than 11.3 million people have recovered.The virus is primarily spread between people during close contact,most often via small droplets produced by coughing,sneezing, and talking.The droplets usually fall to the ground or onto surfaces rather than travelling through air over long distances.However, the transmission may also occur through smaller droplets that are able to stay suspended in the air for longer periods of time in enclosed spaces, as typical for airborne diseases. Less commonly, people may become infected by touching a contaminated surface and then touching their face.It is most contagious during the first three days after the onset of symptoms, although spread is possible before symptoms appear, and from people who do not show symptoms.Common symptoms include fever, cough, fatigue, shortness of breath, and loss of sense of smell. Complications may include pneumonia and acute respiratory distress syndrome.The time from exposure to onset of symptoms is typically around five days but may range from two to fourteen days.There is no known vaccine or specific antiviral treatment.Primary treatment is symptomatic and supportive therapy.Recommended preventive measures include hand washing, covering one's mouth when coughing, maintaining distance from other people, wearing a face mask in public settings, disinfecting surfaces, increasing ventilation and air filtration indoors, and monitoring and self-isolation for people who suspect they are infected.Authorities worldwide have responded by implementing travel restrictions, lockdowns, workplace hazard controls, and facility closures in order to slow the spread of the disease. Many places have also worked to increase testing capacity and trace contacts of infected persons.The pandemic has caused global social and economic disruption, global famines affecting 265 million people."""
    # Concatiing and preprocessing for our bert model
    input_text=question+" [SEP] "+source
    input_ids=tokenizer.encode(input_text)
    input_1 = tf.constant(input_ids)[None, :]
    # Token Ids
    token_ids=[0 if i < input_ids.index(102) else 1 for i in range(len(input_ids))]
    # Model prediction
    answer=model({'input_ids':input_1,'token_type_ids': tf.convert_to_tensor([token_ids])})
    startScores,endScores=answer
    input_tokens = tokenizer.convert_ids_to_tokens(input_ids)
    startIdx = tf.math.argmax(startScores[0],0).numpy()
    endIdx = tf.math.argmax(endScores[0],0).numpy()+1
    # [!] Note: we need process function from the above
    ans=process(" ".join(input_tokens[startIdx:endIdx]))
    return ans

In [11]:
query=input("Enter your query for Covid-19: ")
print(covid_19_qna(query))

Enter your query for Covid-19: Where has covid-19 first detected ?


All model checkpoint weights were used when initializing TFBertForQuestionAnswering.

All the weights of TFBertForQuestionAnswering were initialized from the model checkpoint at bert-large-uncased-whole-word-masking-finetuned-squad.
If your task is similar to the task the model of the ckeckpoint was trained on, you can already use TFBertForQuestionAnswering for predictions without further training.


wuhan , china


**Observation:** We can see that the query gave us correct results and this function can be integrated anywhere for QnA of Covid-19 System.