# QA model  
## References:
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova (24 May 2019, Google AI Language); BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding; [PDF](https://arxiv.org/pdf/1810.04805.pdf)  
- Google Research (New March 11th, 2020: Smaller BERT Models), [GitHub](https://github.com/google-research/bert)  
- [Question Answering System In Python Using BERT NLP](https://www.pragnakalp.com/case-study/question-answering-system-in-python-using-bert-nlp/)  
- Jonathan Besomi, [A QA model to answer them all](https://www.kaggle.com/jonathanbesomi/a-qa-model-to-answer-them-all), on the [COVID-19 Open Research Dataset challenge](https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge/tasks)  
- Ranjan Satapathy (Aug 11, 2018); Question Answering in Natural Language Processing; [Medium](https://medium.com/lingvo-masino/question-and-answering-in-natural-language-processing-part-i-168f00291856)  
- Akshay Navalakha (May 15, 2019); NLP — Question Answering System using Deep Learning; [Medium](https://medium.com/@akshaynavalakha/nlp-question-answering-system-f05825ef35c8)  
- The Stanford Question Answering Dataset ([SQuAD](https://rajpurkar.github.io/SQuAD-explorer/))  
- Tweet Sentiment Extraction: Extract support phrases for sentiment labels; [Kaggle](https://www.kaggle.com/c/tweet-sentiment-extraction/discussion)  
- Yuriy Ostapov (18 Nov 2011); Question Answering in a Natural Language: Understanding System Based on Object–Oriented Semantics; [PDF](https://arxiv.org/pdf/1111.4343.pdf) 
- Yi Yang, Wen-tau Yih and Christopher Meek((2015); WIKIQA: A Challenge Dataset for Open-Domain Question Answering; [PDF](https://www.aclweb.org/anthology/D15-1237.pdf)  

## Necessary Libraries

In [1]:
import numpy as np
import pandas as pd 
from pathlib import Path, PurePath

import nltk
from nltk.corpus import stopwords
import re
import string
import torch

from rank_bm25 import BM25Okapi #Search engine

pd.options.display.max_colwidth = 100 #Display setting to show more characters in column

## Read the processed data

In [2]:
cord19_df = pd.read_csv('../data_processed/cord19_processed.csv',
                        dtype={'paper_id':str, 'title':str, 'authors':str, 'journal':str, 'abstract':str, 'body_text':str},
                        parse_dates=['publish_time'])
cord19_df.head()

Unnamed: 0,paper_id,title,authors,journal,publish_time,abstract,body_text
0,aecbc613ebdab36753235197ffb4f35734b5ca63,Clinical and immunologic studies in identical twins discordant for systemic lupus erythematosus,"Brunner, Carolyn M. et al.",The American Journal of Medicine,1973-08-31,"Middle-aged female identical twins, one of whom had systemic lupus erythematosus (SLE), were eva...","The patient (Fo, ) was a 58 year old mentally retarded white woman, born in a rural area of sout..."
1,212e990b378e8d267042753d5f9d4a64ea5e9869,Infectious diarrhea: Pathogenesis and risk factors,"Cantey, J.Robert",The American Journal of Medicine,1985-06-28,"Our understanding of the pathogenesis of infectious, especially bacterial, diarrhea has increase...","Pathogenesis and Risk Factors J. ROBERT CANTEY, M.D. Charleston, South Carolina Our understandin..."
2,bf5d344243153d58be692ceb26f52c08e2bd2d2f,New perspectives on the pathogenesis of rheumatoid arthritis,"Zvaifler, Nathan J.",The American Journal of Medicine,1988-10-14,"In the pathogenesis of rheumatoid arthritis, locally produced antibodies complex with an incitin...","In the pathogenesis of rheumatoid arthritis, locally produced antibodies complex with an incitin..."
3,ddd2ecf42ec86ad66072962081e1ce4594431f9c,Management of acute and chronic respiratory tract infections,"Ellner, Jerrold J.",The American Journal of Medicine,1988-09-16,"Pharyngitis, bronchitis, and pneumonia represent the most common respiratory tract infections. W...","Respiratory Tract Infections JERROLD J. ELLNER, M.D. Cleveland, CM Pharyngitis, bronchitis, and ..."
4,a55cb4e724091ced46b5e55b982a14525eea1c7e,Acute bronchitis: Results of U.S. and European trials of antibiotic therapy,"Dere, Willard H.",The American Journal of Medicine,1992-06-22,"Acute bronchitis, an illness frequently encountered by primary-care physicians, is an inflammati...","A cute bronchitis, an illness frequently encountered by primary-care physicians [1] , is an infl..."


## Preprocessing and Search Engine

In [3]:
# Stop words and extension words
english_stopwords = stopwords.words('english') 
english_stopwords.extend(['_url_','_mention_','_hashtag_','figure','unmanned',
                          'also','use','say','subject','edu','would','say','know',
                          'good','go','get','done','try','many','nice','thank','think',
                          'see','rather','easy','easily','lot','lack','make','want','seem',
                          'run','need','even','right','line','even','also','may','take','come',
                          'year','time','hour','first','last','second','high','new','low'])

# Replace contractions with their longer forms
contraction_mapping = {"u.s.":"america", "u.s":"america", "usa":"america", "u.k.":"england", "u.k":"england", "e-mail":"email",
                       "can't": "cannot", "'cause": "because", "could've": "could have","he'd": "he would","he'll": "he will", 
                       "he's": "he is", 
                       "how'd": "how did", "how'd'y": "how do you", "how'll": "how will", "how's": "how is", "I'd": "I would", 
                       "I'd've": "I would have", "I'll": "I will", "I'll've": "I will have","I'm": "I am", "I've": "I have", 
                       "i'd": "i would", "i'd've": "i would have", "i'll": "i will",  "i'll've": "i will have","i'm": "i am",
                       "i've": "i have", "it'd": "it would", "it'd've": "it would have", "it'll": "it will", 
                       "it'll've": "it will have","let's": "let us", "ma'am": "madam", "mayn't": "may not", 
                       "might've": "might have","mightn't've": "might not have", "must've": "must have",
                       "mustn't've": "must not have", "needn't've": "need not have",
                       "o'clock": "of the clock", "oughtn't": "ought not", "oughtn't've": "ought not have",
                       "sha'n't": "shall not", "shan't've": "shall not have", "she'd": "she would", "she'd've": "she would have", 
                       "she'll": "she will", "she'll've": "she will have",
                       "shouldn't've": "should not have", "so've": "so have","so's": "so as", 
                       "this's": "this is","that'd": "that would", "that'd've": "that would have", "that's": "that is", 
                       "there'd": "there would", "there'd've": "there would have", "there's": "there is", "here's": "here is",
                       "they'd": "they would", "they'd've": "they would have", "they'll": "they will", "they'll've": "they will have", 
                       "they're": "they are", "they've": "they have", "to've": "to have", "we'd":"we would", 
                       "we'd've": "we would have", "we'll": "we will", "we'll've": "we will have", "we're":"we are", "we've":"we have",
                       "what'll": "what will", "what'll've": "what will have", "what're": "what are",
                       "what's": "what is", "what've": "what have", "when's": "when is", "when've": "when have", "where'd": "where did",
                       "where's": "where is", "where've":"where have", "who'll":"who will", "who'll've":"who will have", "who's":"who is", 
                       "who've": "who have", "why's": "why is", "why've": "why have", "will've": "will have",
                       "would've": "would have", "wouldn't've":"would not have", 
                       "y'all": "you all", "y'all'd": "you all would","y'all'd've": "you all would have","y'all're": "you all are",
                       "y'all've": "you all have","you'd've": "you would have"}

# URL, MENTION, HASHTAG 
giant_url_regex= 'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'
mention_regex  = '@[\w\-]+'
hashtag_regex  = '#[\w\-]+'
space_pattern  = '\s+'

In [4]:
class CovidSearchEngine:
    """
    Simple CovidSearchEngine.
    """
    
    def text_process(self, text):
        
        # Deal with URL, MENTION, HASHTAG
        text = re.sub(space_pattern, ' ', text)
        text = re.sub(giant_url_regex, '_URL_', text)
        text = re.sub(mention_regex, '_MENTION_', text)
        text = re.sub(hashtag_regex, '_HASHTAG_', text)
        
        # Special_character
        text = re.sub(r"\/"," ", text)
        text = re.sub(r"\^"," ^ ", text)
        text = re.sub(r"\+"," + ", text)
        text = re.sub(r"\-"," - ", text)
        text = re.sub(r"\="," = ", text)
        text = re.sub(r"\/"," ", text)
        text = re.sub(r"\^"," ^ ", text)
        text = re.sub(r"\+"," + ", text)
        text = re.sub(r"\-"," - ", text)
        text = re.sub(r"\="," = ", text)
        
        # contraction and punctuation
        text = text.lower()
        text = ' '.join([contraction_mapping[t] if t in contraction_mapping else t for t in nltk.word_tokenize(text)])
        text = text.replace(' .', '.').replace('( ', '(').replace(' )', ')')
        text = text.translate(str.maketrans('', '', string.punctuation))
        return text

    def text_tokenize(self, text):
        
        # tokenize text
        words = nltk.word_tokenize(text)
        return list(set([word for word in words 
                         if len(word) > 2
                         and not word in english_stopwords
                         and not word.isnumeric() # if word.isalpha()
                        ]))
    
    def preprocess(self, text):
        # Clean and tokenize text input
        return self.text_tokenize(self.text_process(text.lower()))

    def __init__(self, corpus: pd.DataFrame):
        self.corpus = corpus
        self.columns = corpus.columns
        
        raw_search_str = self.corpus.abstract.fillna('') + ' ' \
                            + self.corpus.title.fillna('')
        
        self.index = raw_search_str.apply(self.preprocess).to_frame()
        self.index.columns = ['terms']
        self.index.index = self.corpus.index
        self.bm25 = BM25Okapi(self.index.terms.tolist())
    
    def search(self, query, num):
        """
        Return top `num` results that better match the query
        """
        # obtain scores
        search_terms = self.preprocess(query) 
        doc_scores = self.bm25.get_scores(search_terms)
        
        # sort by scores
        ind = np.argsort(doc_scores)[::-1][:num] 
        
        # select top results and returns
        results = self.corpus.iloc[ind][self.columns]
        results['score'] = doc_scores[ind]
        results = results[results.score > 0]
        return results.reset_index()

In [5]:
%%time
cse = CovidSearchEngine(cord19_df)

CPU times: user 1min 23s, sys: 514 ms, total: 1min 23s
Wall time: 1min 24s


## Download pre-trained QA model

In [2]:
%%time

import torch
from transformers import BertTokenizer
from transformers import BertForQuestionAnswering

# Use GPU for computation, if available 
torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'

# Use the pre-trained model to get answer to a question 
BERT_SQUAD = 'bert-large-uncased-whole-word-masking-finetuned-squad'

model = BertForQuestionAnswering.from_pretrained(BERT_SQUAD)
tokenizer = BertTokenizer.from_pretrained(BERT_SQUAD)

model = model.to(torch_device)
model.eval()

print()

ModuleNotFoundError: No module named 'transformers'

## Answer questions
### Get the answers

In [7]:
def answer_question(question, context):
    
    # anser question given question and context
    encoded_dict = tokenizer.encode_plus(
                        question, context,
                        add_special_tokens = True,
                        max_length = 256,
                        pad_to_max_length = True,
                        return_tensors = 'pt')
    
    input_ids = encoded_dict['input_ids'].to(torch_device)
    token_type_ids = encoded_dict['token_type_ids'].to(torch_device)
    
    start_scores, end_scores = model(input_ids, token_type_ids=token_type_ids)

    all_tokens = tokenizer.convert_ids_to_tokens(input_ids[0])
    start_index = torch.argmax(start_scores)
    end_index = torch.argmax(end_scores)
    
    answer = tokenizer.convert_tokens_to_string(all_tokens[start_index:end_index+1])
    answer = answer.replace('[CLS]', '')
    return answer

### Get the contexts

In [9]:
NUM_CONTEXT_FOR_EACH_QUESTION = 10

def get_all_context(query, num_results):
    """
    Return ^num_results' papers that better match the query
    """
    papers_df = cse.search(query, num_results)
    return papers_df['abstract'].tolist()

def get_all_answers(question, all_contexts):
    """
    Ask the same question to all contexts (all papers)
    """
    all_answers = []
    
    for context in all_contexts:
        all_answers.append(answer_question(question, context))
    return all_answers

def create_output_results(question, 
                          all_contexts, 
                          all_answers, 
                          summary_answer='', 
                          summary_context=''):
    """
    Return results in json format
    """
    
    def find_start_end_index_substring(context, answer):
        search_re = re.search(re.escape(answer.lower()), context.lower())
        if search_re:
            return search_re.start(), search_re.end()
        else:
            return 0, len(context)
        
    output = {}
    output['question'] = question
    output['summary_answer'] = summary_answer
    output['summary_context'] = summary_context
    
    results = []
    for c, a in zip(all_contexts, all_answers):
        span = {}
        span['context'] = c
        span['answer'] = a
        span['start_index'], span['end_index'] = find_start_end_index_substring(c,a)
        
        results.append(span)
    
    output['results'] = results
    
    return output
    
def get_results(question, 
                summarize=False, 
                num_results=NUM_CONTEXT_FOR_EACH_QUESTION,
                verbose=True):
    """
    Get results
    """
    all_contexts = get_all_context(question, num_results)
    all_answers = get_all_answers(question, all_contexts)
    
    if summarize:
        # NotImplementedYet
        summary_answer = get_summary(all_answers)
        summary_context = get_summary(all_contexts)
    
    return create_output_results(question, 
                                 all_contexts, 
                                 all_answers)

### Tasks and Questions dictionary

In [11]:
cord19_kaggle_questions = {
    "data":[
        {
            "task": "What is known about transmission, incubation, and environmental stability?",
            "questions": [
                "Is the virus transmitted by aerisol, droplets, food, close contact, fecal matter, or water?",
                "How long is the incubation period for the virus?",
                "Can the virus be transmitted asymptomatically or during the incubation period?",
                "How does weather, heat, and humidity affect the tramsmission of 2019-nCoV?",
                "How long can the 2019-nCoV virus remain viable on common surfaces?"
            ]
        },
        {
            "task": "What do we know about COVID-19 risk factors?",
            "questions": [
                "What risk factors contribute to the severity of 2019-nCoV?",
                "How does hypertension affect patients?",
                "How does heart disease affect patients?",
                "How does copd affect patients?",
                "How does smoking affect patients?",
                "How does pregnancy affect patients?",
                "What is the fatality rate of 2019-nCoV?",
                "What public health policies prevent or control the spread of 2019-nCoV?"
            ]
        },
        {
            "task": "What do we know about virus genetics, origin, and evolution?",
            "questions": [
                "Can animals transmit 2019-nCoV?",
                "What animal did 2019-nCoV come from?",
                "What real-time genomic tracking tools exist?",
                "What geographic variations are there in the genome of 2019-nCoV?",
                "What effors are being done in asia to prevent further outbreaks?"
            ]
        },
        {
            "task": "What do we know about vaccines and therapeutics?",
            "questions": [
                "What drugs or therapies are being investigated?",
                "Are anti-inflammatory drugs recommended?"
            ]
        },
        {
            "task": "What do we know about non-pharmaceutical interventions?",
            "questions": [
                "Which non-pharmaceutical interventions limit tramsission?",
                "What are most important barriers to compliance?"
            ]
        },
        {
            "task": "What has been published about medical care?",
            "questions": [
                "How does extracorporeal membrane oxygenation affect 2019-nCoV patients?",
                "What telemedicine and cybercare methods are most effective?",
                "How is artificial intelligence being used in real time health delivery?",
                "What adjunctive or supportive methods can help patients?"
            ]
        },
        {
            "task": "What do we know about diagnostics and surveillance?",
            "questions": [
                "What diagnostic tests (tools) exist or are being developed to detect 2019-nCoV?"
            ]
        },
        {
            "task": "Other interesting questions",
            "questions": [
                "What is the immune system response to 2019-nCoV?",
                "Can personal protective equipment prevent the transmission of 2019-nCoV?",
                "Can 2019-nCoV infect patients a second time?"
            ]
        }
    ]
}


### Get the results

In [10]:
%%time
all_tasks = []

for i, t in enumerate(cord19_kaggle_questions['data']):
    print("Answering questions to task {}. ...".format(i+1))
    answers_to_question = []
    for q in t['questions']:
            answers_to_question.append(get_results(q, verbose=False))
    task = {}
    task['task'] = t['task']
    task['questions'] = answers_to_question
    
    all_tasks.append(task)

print("Hey, we're okay! All the questions of the tasks are answered.")
cord19_answers_json = {}
cord19_answers_json['data'] = all_tasks

Answering questions to task 1. ...
Answering questions to task 2. ...
Answering questions to task 3. ...
Answering questions to task 4. ...
Answering questions to task 5. ...
Answering questions to task 6. ...
Answering questions to task 7. ...
Answering questions to task 8. ...
Hey, we're okay! All the questions of the tasks are answered.
CPU times: user 25min 14s, sys: 1min 25s, total: 26min 39s
Wall time: 6min 43s


### To JSON format

In [17]:
# Save to JSON format
#import json
#with open('../data_processed/cord19_answers.json', 'w') as outfile:
#    json.dump(cord19_answers_json, outfile)

### To CSV format
#### Answer reference (authors and year of publication)

In [245]:
authors_year = []
for j in range(cord19_df.shape[0]):
    authors_year.append(str(cord19_df.authors[j])+' ('+str(cord19_df.publish_time[j].year)+')')

In [246]:
j = 0
results_df = pd.DataFrame(columns=['task','question','context','answer','reference'])
for tasks in cord19_answers_json['data']:
    for questions in tasks['questions']:
        for question in questions['results']:
            results_df.loc[j,'task']    = tasks['task']
            results_df.loc[j,'question']= questions['question']
            results_df.loc[j,'context'] = question['context']
            results_df.loc[j,'answer']  = question['answer']
            j+=1

for abstract, ref in zip(cord19_df.abstract, authors_year):
    for j, context in enumerate(results_df.context):
        if context in abstract:
            results_df.loc[j,'reference'] = ref

results_df.head()

Unnamed: 0,task,question,context,answer,reference
0,"What is known about transmission, incubation, and environmental stability?","Is the virus transmitted by aerisol, droplets, food, close contact, fecal matter, or water?","Viral infections can be transmitted by various routes. At one extreme, airborne or droplet viral...","transmission is via aerosols , droplets , or fomites ( contaminated surfaces ) . environmental f...","Limeres Posse, Jacobo et al. (2017)"
1,"What is known about transmission, incubation, and environmental stability?","Is the virus transmitted by aerisol, droplets, food, close contact, fecal matter, or water?",Bacterial infections have a large impact on public health. Disease can occur at any body site an...,"air , water , food , or living vectors","Doron, S.; Gorbach, S.L. (2008)"
2,"What is known about transmission, incubation, and environmental stability?","Is the virus transmitted by aerisol, droplets, food, close contact, fecal matter, or water?",BACKGROUND: Respiratory infections are the leading cause of childhood deaths in Bangladesh. Prom...,"close contact with a sick person ' s breath , cough droplets , or spit","Nizame, Fosiul A et al. (2011)"
3,"What is known about transmission, incubation, and environmental stability?","Is the virus transmitted by aerisol, droplets, food, close contact, fecal matter, or water?",Emerging infectious diseases (EID) and reemerging infectious diseases are increasing globally. Z...,"direct contact or through food , water , and the environment","McArthur, Donna Behler (2019)"
4,"What is known about transmission, incubation, and environmental stability?","Is the virus transmitted by aerisol, droplets, food, close contact, fecal matter, or water?",Viruses are often transmitted via food and the environment. Contamination may be controlled eith...,viruses are often transmitted via food and the environment,"Cliver, Dean O. (2008)"


In [None]:
#results_df.to_csv('../data_processed/cord19_answers.csv', index=False)

## Visualisation

In [29]:
import json

with open('../data_processed/cord19_answers.json', mode='r') as file:
     cord19_answers_json = json.loads(file.read())
        

In [33]:
tasks = [
    {'label': task['task'], 
     'value': task['task'],
    } 
    for task in cord19_answers_json['data']]

questions = [
    {
        'label': [q['question'] for q in tq['questions']],
        'value': [q['question'] for q in tq['questions']],
    }
    for tq in cord19_answers_json['data']]


### HTML Layout Style

In [34]:
from IPython.display import display, Markdown, Latex, HTML

def layout_style():
    style = """
        div {
            color: black;
            }
        .single_answer {
            border-left: 3px solid green;
            padding-left: 10px;
            font-family: Arial;
            font-size: 16px;
            color: #777777;
            margin-left: 5px;
            }
        .answer{
            color: #dc7b15;
            }
        .ay{
            color: red;
            }
        .question_title {
            color: darkblue;
            display: block;
            text-transform: none;
            }
        .task_title {
            color: darkgreen;
            }
        div.output_scroll {
            height: auto;
            }
    """
    return "<style>" + style + "</style>"

def dm(x): display(Markdown(x))
def dh(x): display(HTML(layout_style() + x))

### Display the results

In [36]:
def display_single_context(context, start_index, end_index):
    
    before_answer = context[:start_index]
    answer = context[start_index:end_index]
    after_answer = context[end_index:]
    
    content = before_answer + "<span class='answer'>" + answer + "</span>" + after_answer
    
    return dh("""<div class="single_answer">{}</div>""".format(content))

def display_question_title(question):
    return dh("<h2 class='question_title'>{}</h2>".format(question)) #.capitalize()

def display_all_contexts(index, question):
    
    def answer_not_found(context, start_index, end_index):
        return (start_index == 0 and len(context) == end_index) or (start_index == 0 and end_index == 0)

    display_question_title(str(index + 1) + ". " + question['question'].capitalize())
    
    # display context
    for i in question['results']:
        for a, ay in zip(cord19_df.abstract, authors_year):
            if i['context'] in a:
                
                if answer_not_found(i['context'], i['start_index'], i['end_index']):
                    continue # skip not found questions
                display_single_context(i['context']+'<br>'+'<strong>'+'<font color=black>'+ay,
                                       i['start_index'], i['end_index'])

def display_task_title(index, task):
    task_title = "Task " + str(index) + ": " + task
    return dh("<h1 class='task_title'>{}</h1>".format(task_title))

def display_single_task(index, task):
    
    display_task_title(index, task['task'])
    
    for i, question in enumerate(task['questions']):
        display_all_contexts(i, question)

In [37]:
task = int(input('Enter a task number'))

display_single_task(task, cord19_answers_json['data'][task-1])

Enter a task number 4


In [39]:
task = int(input('Enter a task number'))

display_single_task(task, cord19_answers_json['data'][task-1])

Enter a task number 8


In [66]:
    output = {'Context'+str(j): {out} for j, out in enumerate(ra)}
    #output[context]
    #html.P(output['Context'+str(j)] for j, out in enumerate(ra))



NameError: name 'ra' is not defined

In [None]:
=======================================

In [267]:
# =========================================================================================================
from IPython.display import display, Markdown, Latex, HTML

def layout_style():
    style = """
        div {
            color: black;
            }
        .single_answer {
            border-left: 3px solid green;
            padding-left: 10px;
            font-family: Arial;
            font-size: 16px;
            color: #777777;
            margin-left: 5px;
            }
        .answer{
            color: #dc7b15;
            }
        .ay{
            color: red;
            }
        .question_title {
            color: darkblue;
            display: block;
            text-transform: none;
            }
        .task_title {
            color: darkgreen;
            }
        div.output_scroll {
            height: auto;
            }
    """
    return "<style>" + style + "</style>"

def dm(x): display(Markdown(x))
def dh(x): display(HTML(layout_style() + x))

    
def display_single_context(context, start_index, end_index):
    
    before_answer = context[:start_index]
    answer = context[start_index:end_index]
    after_answer = context[end_index:]
    
    content = before_answer + "<span class='answer'>" + answer + "</span>" + after_answer
    
    return dh("""<div class="single_answer">{}</div>""".format(content))

def display_question_title(question):
    return dh("<h2 class='question_title'>{}</h2>".format(question)) #.capitalize()

def display_all_contexts(question):
    
    def answer_not_found(context, start_index, end_index):
        return (start_index == 0 and len(context) == end_index) or (start_index == 0 and end_index == 0)

    display_question_title(str(1) + ". " + question['question'].capitalize())
    
    # display context
    for i in question['results']:
        for context, ref in zip(results_answers.context, results_answers.reference):
            if i['context'] in context:
                
                if answer_not_found(i['context'], i['start_index'], i['end_index']):
                    continue # skip not found questions
                display_single_context(i['context']+'<br>'+'<strong>'+'<font color=black>'+ref,
                                       i['start_index'], i['end_index'])

def display_task_title(task):
    task_title = "Task " + str('index') + ": " + task
    return dh("<h1 class='task_title'>{}</h1>".format(task_title))

def display_single_task(task):
    
    display_task_title(task['task'])
    
    for i, question in enumerate(task['questions']):
        display_all_contexts(question)
        
task = 8

display_single_task(cord19_answers_json['data'][task-1])
# =========================================================================================================


In [255]:
q = {'data':{q['question'] for q in tq['questions']} for tq in cord19_answers_json['data']}
q['data']

{'Can 2019-nCoV infect patients a second time?',
 'Can personal protective equipment prevent the transmission of 2019-nCoV?',
 'What is the immune system response to 2019-nCoV?'}

In [5]:
import pandas as pd
results_answers = pd.read_csv('../data_processed/cord19_answers.csv')

results_answers

Unnamed: 0,task,question,context,answer,reference
0,"What is known about transmission, incubation, ...","Is the virus transmitted by aerisol, droplets,...",Viral infections can be transmitted by various...,"transmission is via aerosols , droplets , or f...","Limeres Posse, Jacobo et al. (2017)"
1,"What is known about transmission, incubation, ...","Is the virus transmitted by aerisol, droplets,...",Bacterial infections have a large impact on pu...,"air , water , food , or living vectors","Doron, S.; Gorbach, S.L. (2008)"
2,"What is known about transmission, incubation, ...","Is the virus transmitted by aerisol, droplets,...",BACKGROUND: Respiratory infections are the lea...,"close contact with a sick person ' s breath , ...","Nizame, Fosiul A et al. (2011)"
3,"What is known about transmission, incubation, ...","Is the virus transmitted by aerisol, droplets,...",Emerging infectious diseases (EID) and reemerg...,"direct contact or through food , water , and t...","McArthur, Donna Behler (2019)"
4,"What is known about transmission, incubation, ...","Is the virus transmitted by aerisol, droplets,...",Viruses are often transmitted via food and the...,viruses are often transmitted via food and the...,"Cliver, Dean O. (2008)"
...,...,...,...,...,...
295,Other interesting questions,Can 2019-nCoV infect patients a second time?,AbstractThere is concern about a new coronavir...,can 2019 - ncov infect patients a second time...,Domenico Benvenuto et al. (2020)
296,Other interesting questions,Can 2019-nCoV infect patients a second time?,BACKGROUND The outbreak of a novel coronavirus...,,Bo Diao et al. (2020)
297,Other interesting questions,Can 2019-nCoV infect patients a second time?,"AbstractOver the past 20 years, several corona...",human ace2 is the receptor,Michael Letko; Vincent Munster (2020)
298,Other interesting questions,Can 2019-nCoV infect patients a second time?,INTRODUCTION: The outbreak of the new Coronavi...,"march 18 , 2020","Panahi, Latif et al. (2020)"


In [None]:
def set_display_children(selected_task, selected_question, context):
    ra = results_answers[(results_answers.task== selected_task) &
                         (results_answers.question==selected_question)]['context']
    output = {'context'+str(j):[] for j, t in enumerate(ra)}
    return output[context]



In [27]:
import dash_html_components as html

html.P(),
html.Div()

(P(None),)

In [63]:
t = []
d = results_answers[(results_answers.task.str.contains('Other interesting questions')==True) | 
                                        (results_answers.question.str.contains('Can 2019-nCoV infect patients a second time?')==True)]['context']
d

270    Background China is running a national level a...
271    The recent outbreak of SARS-CoV-2 (2019-nCoV) ...
272    The 2019-nCoV is reported to share the same en...
273    The outcome of a viral infection within the ne...
274    AbstractA newly identified coronavirus, 2019-n...
275    Timely detection of novel coronavirus (2019-nC...
276    This paper focuses on the formulation of a det...
277    The 2019-nCoV infection that is caused by a no...
278    AbstractSince December 2019, a novel coronavir...
279    ABSTRACTThe respiratory syndrome caused by a n...
280    In late December 2019 a previous unidentified ...
281    We evaluated a personal protective equipment r...
282    ABSTRACT THE CENTERS FOR DISEASE CONTROL and P...
283    The severe acute respiratory syndrome (SARS) o...
284    Over the past several decades, we have witness...
285    A possible threat in the ophthalmology clinic ...
286    In late December 2019, a previous unidentified...
287    Healthcare settings can 

In [37]:
output = {'context'+str(j):[]}
for j, out in enumerate(d):
    output['context'+str(j)].append(out)
output

KeyError: 'context0'

In [49]:
output = []
for j, out in enumerate(d):
    output.append('Context'+str(j)+': { '+out+' }')

output.todict()

AttributeError: 'list' object has no attribute 'todict'

In [52]:
output = {'Context'+str(j): {out} for j, out in enumerate(d)}
output['Context0']

{'Background China is running a national level antivirus campaign against the novel coronavirus (2019-nCoV). Strict control measures are being enforced in either the populated areas and remote regions. While the virus is closed to be under control, tremendous economic loss has been caused. Methods and findings We assessed the pandemic risk of 2019-nCoV for all cities/regions in China using the random forest algorithm, taking into account the effect of five factors: the accumulative and increased numbers of confirmed cases, total population, population density, and GDP. We defined four levels of the risk, corresponding to the four response levels to public health emergencies in China. The classification system has good consistency among cities in China, as the error rate of the confusion matrix is 1.58%. Conclusions The pandemic risk of 2019-nCoV is dramatically different among the 442 cities/regions. We recommend to adopt proportionate control policy according to the risk level to redu

In [None]:
ajjoute un element vide dans le dict pour pouvoir metre la valeur par default None ou ''