# Multilingual QA search

Downloads and parses several coronavirus related WHO FAQ, uses Multilingual Universal Sentence Encoder QA, to encode paragraphs or list elements

USE supports 16 languages (Arabic, Chinese-simplified, Chinese-traditional, English, French, German, Italian, Japanese, Korean, Dutch, Polish, Portuguese, Spanish, Thai, Turkish, Russian)

uses some code from https://github.com/wearetriple/ai-faqbot-who/blob/master/Corona_WHO_FAQ.ipynb

In [1]:
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_text
import re

import requests
from bs4 import BeautifulSoup

import numpy as np
import pandas as pd

pd.set_option("max_colwidth", 200)

### Download and parse QA sites

In [2]:
def iter_who_faq(url="https://www.who.int/news-room/q-a-detail/q-a-coronaviruses"):
    page = requests.get(url)
    soup = BeautifulSoup(page.text, "lxml")
    for panel in soup.find_all(attrs={"class": "sf-accordion__panel"}):
        question = panel.find_all("a")[0].text.strip()
        answers = [p.text.strip() for p in panel.find_all("p") if p.text.strip()] + [
            list_element.text.strip() for list_element in panel.find_all("li")
        ]
        if not answers:
            print(url, question)
        for answer in answers:
            yield question, answer, url

In [3]:
urls = [
    "https://www.who.int/news-room/q-a-detail/q-a-coronaviruses",
    "https://www.who.int/news-room/q-a-detail/q-a-on-covid-19-pregnancy-childbirth-and-breastfeeding",
    "https://www.who.int/news-room/q-a-detail/q-a-on-covid-19-hiv-and-antiretrovirals",
    "https://www.who.int/news-room/q-a-detail/q-a-similarities-and-differences-covid-19-and-influenza",
    "https://www.who.int/news-room/q-a-detail/q-a-on-mass-gatherings-and-covid-19",
    "https://www.who.int/news-room/q-a-detail/q-a-on-smoking-and-covid-19",
    "https://www.who.int/news-room/q-a-detail/be-active-during-covid-19",
    "https://www.who.int/news-room/q-a-detail/malaria-and-the-covid-19-pandemic",
    "https://www.who.int/news-room/q-a-detail/q-a-on-infection-prevention-and-control-for-health-care-workers-caring-for-patients-with-suspected-or-confirmed-2019-ncov",
    "https://www.who.int/news-room/q-a-detail/middle-east-respiratory-syndrome-coronavirus-(mers-cov)",
]


def iter_who_faqs(urls=urls):
    for url in urls:
        for out in iter_who_faq(url=url):
            yield out

In [4]:
%%time
data = pd.DataFrame(data=list(iter_who_faqs()), columns=["Context", "Answer", "Source"])
data.head()

CPU times: user 592 ms, sys: 13.2 ms, total: 605 ms
Wall time: 2.72 s


Unnamed: 0,Context,Answer,Source
0,What is a coronavirus?,"Coronaviruses are a large family of viruses which may cause illness in animals or humans. In humans, several coronaviruses are known to cause respiratory infections ranging from the common cold t...",https://www.who.int/news-room/q-a-detail/q-a-coronaviruses
1,What is COVID-19?,"COVID-19 is the infectious disease caused by the most recently discovered coronavirus. This new virus and disease were unknown before the outbreak began in Wuhan, China, in December 2019.",https://www.who.int/news-room/q-a-detail/q-a-coronaviruses
2,What are the symptoms of COVID-19?,"The most common symptoms of COVID-19 are fever, tiredness, and dry cough. Some patients may have aches and pains, nasal congestion, runny nose, sore throat or diarrhea. These symptoms are usually ...",https://www.who.int/news-room/q-a-detail/q-a-coronaviruses
3,How does COVID-19 spread?,People can catch COVID-19 from others who have the virus. The disease can spread from person to person through small droplets from the nose or mouth which are spread when a person with COVID-19 co...,https://www.who.int/news-room/q-a-detail/q-a-coronaviruses
4,How does COVID-19 spread?,WHO is assessing ongoing research on the ways COVID-19 is spread and will continue to share updated findings.,https://www.who.int/news-room/q-a-detail/q-a-coronaviruses


### Preprocess Sentences
because USE has never seen covid or covid-19 but has seen coronavirus

In [5]:
%%time
def preprocess_sentences(input_sentences):
    return [
        re.sub(r"(covid-19|covid)", "coronavirus", input_sentence, flags=re.I)
        for input_sentence in input_sentences
    ]


# Load module containing USE
module = hub.load(
    "https://tfhub.dev/google/universal-sentence-encoder-multilingual-qa/3"
)

CPU times: user 9.46 s, sys: 841 ms, total: 10.3 s
Wall time: 10.1 s


In [6]:
%%time
# Create response embeddings
response_encodings = module.signatures["response_encoder"](
    input=tf.constant(preprocess_sentences(data.Answer)),
    context=tf.constant(preprocess_sentences(data.Context)),
)["outputs"]

CPU times: user 44.2 s, sys: 8.15 s, total: 52.3 s
Wall time: 15.2 s


### And test some new questions...

In [7]:
%%time
test_questions = [
    "What about pregnant women?",
    "Wat is de lengte van de incubatietijd?",
    "Are animals contagious COVID-19?",
    "Are there medicine against the coronavirus?",
    "Can I breastfead when I have COVID-19?",
    "Should I stay inside the house?",  # English questions are also possible.
    "Kann ich mit meinem Hund spazieren gehen?",  # As well as German, and all the other languages supported by use-multilingual.
]

# Create encodings for test questions
question_encodings = module.signatures["question_encoder"](
    tf.constant(preprocess_sentences(test_questions))
)["outputs"]

# Get the responses
test_responses = data.loc[
    np.argmax(np.inner(question_encodings, response_encodings), axis=1),
    ["Answer", "Source"],
]

# Show them in a dataframe
pd.DataFrame(
    {
        "Test Questions": test_questions,
        "Test Responses": test_responses["Answer"],
        "Test Sources": test_responses["Source"],
    }
)

CPU times: user 1.77 s, sys: 37 ms, total: 1.81 s
Wall time: 1.68 s


Unnamed: 0,Test Questions,Test Responses,Test Sources
80,What about pregnant women?,"All pregnant women, including those with confirmed or suspected COVID-19 infections, have the right to high quality care before, during and after childbirth. This includes antenatal, newborn, post...",https://www.who.int/news-room/q-a-detail/q-a-on-covid-19-pregnancy-childbirth-and-breastfeeding
45,Wat is de lengte van de incubatietijd?,"The “incubation period” means the time between catching the virus and beginning to have symptoms of the disease. Most estimates of the incubation period for COVID-19 range from 1-14 days, most com...",https://www.who.int/news-room/q-a-detail/q-a-coronaviruses
46,Are animals contagious COVID-19?,"Coronaviruses are a large family of viruses that are common in animals. Occasionally, people get infected with these viruses which may then spread to other people. For example, SARS-CoV was associ...",https://www.who.int/news-room/q-a-detail/q-a-coronaviruses
27,Are there medicine against the coronavirus?,"While some western, traditional or home remedies may provide comfort and alleviate symptoms of COVID-19, there is no evidence that current medicine can prevent or cure the disease. WHO does not re...",https://www.who.int/news-room/q-a-detail/q-a-coronaviruses
90,Can I breastfead when I have COVID-19?,Yes. Women with COVID-19 can breastfeed if they wish to do so. They should:,https://www.who.int/news-room/q-a-detail/q-a-on-covid-19-pregnancy-childbirth-and-breastfeeding
14,Should I stay inside the house?,"Stay home if you feel unwell. If you have a fever, cough and difficulty breathing, seek medical attention and call in advance. Follow the directions of your local health authority.Why? National an...",https://www.who.int/news-room/q-a-detail/q-a-coronaviruses
49,Kann ich mit meinem Hund spazieren gehen?,"As the intergovernmental body responsible for improving animal health worldwide, the World Organisation for Animal Health (OIE) has been developing technical guidance on specialised topi...",https://www.who.int/news-room/q-a-detail/q-a-coronaviruses


In [8]:
from timeit import default_timer as timer

In [9]:
question = input()
start = timer()
question_encoding = module.signatures["question_encoder"](
    tf.constant(preprocess_sentences([question]))
)["outputs"]
test_response = data.loc[
    np.argmax(np.inner(question_encoding, response_encodings), axis=1),
    ["Answer", "Source"],
]
end = timer()
print(test_response["Answer"].values[0])
print("This response was gathered from:")
print(test_response["Source"].values[0])
print("This query took {} seconds".format(end - start))

 Does smoking make it easier to get COVID19?


Smokers are likely to be more vulnerable to COVID-19 as the act of smoking means that fingers (and possibly contaminated cigarettes) are in contact with lips which increases the possibility of transmission of virus from hand to mouth. Smokers may also already have lung disease or reduced lung capacity which would greatly increase risk of serious illness.
This response was gathered from:
https://www.who.int/news-room/q-a-detail/q-a-on-smoking-and-covid-19
This query took 0.04099278151988983 seconds
