## Introduction

  As everyone unfortunately knows, COVID-19 has brought the world to a standstill and is the worst pandemic in a century. Because of this, the world health community has devoted an unprecedented amount of resources towards the study, containment, and hopefully the eradication of this disease. However, this also means that an unprecedented amount of data has been produced, much more than researchers and journalists can sort through by hand. Thus, this contest was started by Kaggle to help develop new methods of sorting through the massive amount of papers being written on the subject. Our contribution is to make major improvements on the method developed by "dirktheeng" in their “Anserini+BERT-SQuAD for Semantic Corpus Search” notebook. The link to their notebook can be found here: https://www.kaggle.com/dirktheeng
  




## Methodology

Our methodology is very similar to the one used in the original notebook. The basic idea is to use the BERT model to perform basic question and answer tasks, i.e. the user can ask the model a question, and it will return what it thinks the best answer is from a database. In the case of the original notebook, this model was used to scan through numerous paper abstracts to find which abstracts best answer the question. To do this, the model would take in a question plus candidate abstract and output a start index, end index, and its confidence value. The start index is where the most relevant part of the text starts, the end index is where it ends, and the confidence value is how sure it is that’s the right answer. The model then does this for the rest of the abstracts, ranks them based on confidence values, and returns the answer it’s most confident of, along with the paper that it comes from. This is a perfectly fine model, however, there were a few key improvements that we were able to make.

First, we extended the model to search the entire paper, not just the abstract. The BERT model is only able to handle text shorter than 512 words. When factoring in the length of the question statement, this means that the original model could only handle chunks of text shorter than 500 words. Since abstracts are always shorter than that, the length constraints weren’t an issue for the original model. However, there is a lot of potentially useful information in the body of these papers, so it would better if the model could scan this as well. To do this, we simply broke up longer text into chucks less than 500 words, and ran each smaller chunk through the model separately. Thus, we were able to find which part of the paper was most relevant and return it. Also, this requires much more data than the original model’s database, which only included the abstracts. So, we added in a way to upload the body text, match it to the right abstract, merge the two, and pass the full text on to the model. This allows our improved model to handle much more data.

Second, there was a major bug in the original model where it could return nonsensical index ranges where the start index came after the end index. For example, if the model returns a range of [26,13], it’s impossible to reconstruct the answer from that. We came up with a rather ingenious solution to this problem. If the model returns a range such as [26,13], our new model fix that by breaking the original text chunk into two parts, say [0,13] and [26,n-1] (n is the length of the original text chunk). Then, it will find two new potential answer ranges, say [7,13] and [26,31], and return the one it’s more confident of. Thus, our new model fixes this bug and is more reliable. 





## Contribution
* 1.
* 2.
* 3.

## Setup Envirnment

Note, this is the line of code you need to run when using google Colab. On other platforms, this may be different.  

In [0]:
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-11-openjdk-amd64"

In [0]:
!pip install pyserini
!pip install transformers

Collecting pyserini
[?25l  Downloading https://files.pythonhosted.org/packages/19/67/f11bd9c9afcef667b816864a38cea950d1245fc56e4617714530ba4fdccc/pyserini-0.9.0.0-py3-none-any.whl (57.7MB)
[K     |████████████████████████████████| 57.7MB 68kB/s 
[?25hCollecting pyjnius
[?25l  Downloading https://files.pythonhosted.org/packages/d8/50/098cb5fb76fb7c7d99d403226a2a63dcbfb5c129b71b7d0f5200b05de1f0/pyjnius-1.3.0-cp36-cp36m-manylinux2010_x86_64.whl (1.1MB)
[K     |████████████████████████████████| 1.1MB 37.8MB/s 
Installing collected packages: pyjnius, pyserini
Successfully installed pyjnius-1.3.0 pyserini-0.9.0.0
Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/a3/78/92cedda05552398352ed9784908b834ee32a0bd071a9b32de287327370b7/transformers-2.8.0-py3-none-any.whl (563kB)
[K     |████████████████████████████████| 573kB 5.2MB/s 
Collecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/99/50/93509f906a40bffd7d175f97fd75ea328ad9bd9

In [0]:
from transformers import BertForQuestionAnswering
from transformers import BertTokenizer
from transformers import BartTokenizer, BartForConditionalGeneration
import pandas as pd
from pyserini.search import pysearch
import numpy as np
from BERT_func import BERT_SQUAD_QA
import json
import tensorflow as tf
import tensorflow_hub as hub


Note: You will need to load in a file called "database.json" into your workspace for this notebook to work. 

In [0]:
#%%capture
!wget -O lucene.tar.gz https://www.dropbox.com/s/d6v9fensyi7q3gb/lucene-index-covid-2020-04-03.tar.gz?dl=0
!tar xvfz lucene.tar.gz
minDate = '2020/04/02'
luceneDir = 'lucene-index-covid-2020-04-03/'
torch_device = 'cpu'

In [0]:
#You can just run this cell if you ran the previous cell or already have 'lucene-index-covid-2020-04-03/' set up
torch_device = 'cpu'
minDate = '2020/04/02'
luceneDir = 'lucene-index-covid-2020-04-03/'

In [0]:
QA_MODEL = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
QA_TOKENIZER = BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
QA_MODEL.to(torch_device)
QA_MODEL.eval()


Note, we need to actually get the data first. 

In [0]:
!mkdir /content/ # There is no need to run this if using google Colab, since the content folder should already exist

In [0]:
!mkdir /content/kaggle/
!mkdir /content/kaggle/working/
!mkdir /content/kaggle/working/sentence_wise_email/
!mkdir /content/kaggle/working/sentence_wise_email/module/
!mkdir /content/kaggle/working/sentence_wise_email/module/module_useT
# Download the module, and uncompress it to the destination folder. 
!curl -L "https://tfhub.dev/google/universal-sentence-encoder-large/3?tf-hub-format=compressed" | tar -zxvC /content/kaggle/working//sentence_wise_email/module/module_useT

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
./
./tfhub_module.pb
./variables/
./variables/variables.data-00000-of-00001
 92  745M   92  689M    0     0  52.8M      0  0:00:14  0:00:13  0:00:01 54.7M./variables/variables.index
./assets/
./saved_model.pb
100  745M  100  745M    0     0  53.2M      0  0:00:14  0:00:14 --:--:-- 56.5M


In [0]:
!mkdir /content/result/

## Embbeding Method

In [0]:
def embed_useT(module):
    with tf.Graph().as_default():
        sentences = tf.compat.v1.placeholder(tf.string)
        embed = hub.Module(module)
        embeddings = embed(sentences)
        session = tf.compat.v1.train.MonitoredSession()
    return lambda x: session.run(embeddings, {sentences: x})
embed_fn = embed_useT('/content/kaggle/working/sentence_wise_email/module/module_useT')

INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


## Display the result

In [0]:
workingPath = '/content/kaggle/working'
import pandas as pd
from IPython.core.display import display, HTML

#from summarizer import Summarizer
#summarizerModel = Summarizer()
def displayResults(hit_dictionary, answers, question, abst):
    
    question_HTML = '<div style="font-family: Times New Roman; font-size: 28px; padding-bottom:28px"><b>Query</b>: '+question+'</div>'
    #all_HTML_txt = question_HTML
    confidence = list(answers.keys())
    confidence.sort(reverse=True)
    
    confidence = list(answers.keys())
    confidence.sort(reverse=True)
    

    for c in confidence:
        if c>0 and c <= 1 and len(answers[c]['answer']) != 0:
            rowData = []
#             idx = answers[c]['idx']
#             title = hit_dictionary[idx]['title']
#             authors = hit_dictionary[idx]['authors'] + ' et al.'

            
            full_abs = answers[c]['abstract_bert']
            bert_ans = answers[c]['answer']
            #print(full_abs)
            
            
            split_abs = full_abs.split(bert_ans)
            sentance_beginning = split_abs[0][split_abs[0].rfind('.')+1:]
            #print (sentance_beginning)
            if len(split_abs) == 1:
                sentance_end_pos = len(full_abs)
                sentance_end =''
            else:
                sentance_end_pos = split_abs[1].find('. ')+1
                if sentance_end_pos == 0:
                    sentance_end = split_abs[1]
                else:
                    sentance_end = split_abs[1][:sentance_end_pos]
                
            #sentance_full = sentance_beginning + bert_ans+ sentance_end
            answers[c]['full_answer'] = sentance_beginning+bert_ans+sentance_end
            answers[c]['sentence_beginning'] = sentance_beginning
            answers[c]['sentence_end'] = sentance_end
            #answers[c]['title'] = title
            #answers[c]['doi'] = doi
        else:
            answers.pop(c)
            
    #print(list(answers.keys()))
    
    ## now rerank based on semantic similarity of the answers to the question
    cList = list(answers.keys())
    allAnswers = [answers[c]['full_answer'] for c in cList]
    #print('all:', allAnswers)
    
    messages = [question]+allAnswers
    
    encoding_matrix = embed_fn(messages)
    similarity_matrix = np.inner(encoding_matrix, encoding_matrix)
    rankings = similarity_matrix[1:,0]
    
    for i,c in enumerate(cList):
        answers[rankings[i]] = answers.pop(c)
    
    ## now form pandas dv
    confidence = list(answers.keys())
    confidence.sort(reverse=True)
    pandasData = []
    ranked_aswers = []
    for c in confidence:
        rowData=[]
        title = answers[c]['title']
        author = answers[c]['author']
        doi = None
        #idx = answers[c]['idx']
        #rowData += [idx]            
        sentance_html = '<div>' +answers[c]['sentence_beginning'] + " <font color='red'>"+answers[c]['answer']+"</font> "+answers[c]['sentence_end']+'</div>'
        #print (sentance_html)
        rowData += [title,author, sentance_html, c]
        pandasData.append(rowData)
        ranked_aswers.append(' '.join([answers[c]['full_answer']]))
    
    pdata2 = pandasData
        
    
    display(HTML(question_HTML))
    
    df = pd.DataFrame(pdata2, columns = ['Title','Authors', 'BERT-SQuAD Answer with Highlights', 'Confidence'])
    tit = '_'.join(question.split(' '))
    if abst:
        df.to_csv('./result/Abs+' + tit + '.csv')
        print('Search with only Abstract')
    else:
        df.to_csv('./result/Full+' + tit + '.csv')
        print ('Search with full paper')
        
    display(HTML(df.to_html(render_links=True, escape=False)))
    


In [0]:
def Display_all(query, keywords, abst):
    
    #search with luceneDir database by anserini
    searcher = pysearch.SimpleSearcher(luceneDir)
    hits = searcher.search(query + '. ' + keywords)
    n_hits = len(hits)
    #finds the most relvent docs from the database
    
    #get database by ourselves, this is what database.json does
    with open('database.json', 'r') as fp:
        database = json.loads(fp.read())
        
    ID = []
    for i in range(0, n_hits):
        doc_json = json.loads(hits[i].raw)
        try:
            ID.append(doc_json['paper_id'])
        except:
            pass
        
    database_df = pd.DataFrame(database).T
    
    database_df['abs_text'] = database_df.abstract+ database_df['full-text']
    #this part adds in the full text

    
    #match with own database, this compares the two results
    ID_real = []
    for Id in ID:
        if abst:
            if Id in database and ~database_df.loc[Id].isna().abstract:
                #print(database_df.loc[Id].isna().abstract)
                ID_real.append(Id)
        else:
            if Id in database and ~database_df.loc[Id].isna()['full-text']:
                ID_real.append(Id)
            
    #print (ID_real)
    
    hit_dictionary = database_df.loc[ID_real].to_dict('index')
    
    QA_model = BERT_SQUAD_QA(QA_TOKENIZER, QA_MODEL)
    ans = QA_model.search_abstracts(hit_dictionary, query, abst)
    
    displayResults(hit_dictionary, ans, query, abst)
        

In [0]:
all_topics=[
    'What is known about transmission, incubation, and environmental stability?',
    'What do we know about COVID-19 risk factors?',
    'What do we know about virus genetics, origin, and evolution?',
    'What do we know about vaccines and therapeutics?',
    'What do we know about non-pharmaceutical interventions?',
    'What has been published about medical care?',
    'What do we know about diagnostics and surveillance?',
    'What has been published about information sharing and inter-sectoral collaboration?',
    'What has been published about ethical and social science considerations?'
]
topic_area = {}

#0
#What is known about transmission, incubation, and environmental stability?
question_list = []
kw_list = []
pm_kw_list = []
question_list.append("What is known about transmission, incubation, and environmental stability")
kw_list.append("2019-nCoV, COVID-19, coronavirus, person to person,touch,temperature, human to human, humidity, interpersonal contact,, transmission, shedding")



topic_area['What is known about transmission, incubation, and environmental stability?'] = list(zip(question_list,kw_list))



#1
#What do we know about COVID-19 risk factors?
question_list = []
kw_list = []

question_list.append("What risk factors contribute to the severity of 2019-nCoV")
kw_list.append("2019-nCoV, COVID-19, coronavirus, novel coronavirus, susceptible, neonates, pregnant, socio-economic, behavioral, age, elderly, young, old, children")


topic_area['What do we know about COVID-19 risk factors?'] = list(zip(question_list,kw_list))


#2
#What do we know about virus genetics, origin, and evolution?
question_list = []
kw_list = []


question_list.append("What animal did 2019-nCoV come from")
kw_list.append("2019-nCoV, SARS-CoV-2, COVID-19, coronavirus, novel coronavirus, animals, zoonotic, farm, spillover, animal to human, bats, snakes, exotic animals")


topic_area['What do we know about virus genetics, origin, and evolution?'] = list(zip(question_list,kw_list))

#3
#What do we know about vaccines and therapeutics?
question_list = []
kw_list = []
pm_kw_list = []
question_list.append("What drugs or therapies or antiviral are being investigated and recommended")
kw_list.append("2019-nCoV,  COVID-19, coronavirus, novel coronavirus, drug, antiviral, testing, clinical trial, study")


topic_area['What do we know about vaccines and therapeutics?'] = list(zip(question_list,kw_list))


#4
#What do we know about non-pharmaceutical interventions?
question_list = []
kw_list = []
question_list.append("Which non-pharmaceutical interventions limit tramsission")
kw_list.append("2019-nCoV, SARS-CoV-2, COVID-19, non-pharmaceutical interventions, npi")


topic_area['What do we know about non-pharmaceutical interventions?'] = list(zip(question_list,kw_list))

#5
#What has been published about medical care?
question_list = []
kw_list = []


question_list.append("What adjunctive or supportive methods can help patients")
kw_list.append("2019-nCoV, SARS-CoV-2, COVID-19, adjunctive, supportive, extracorporeal membrane oxygenation, ecmo")


topic_area['What has been published about medical care?'] = list(zip(question_list,kw_list))

#6
#What do we know about diagnostics and surveillance?
question_list = []
kw_list = []
question_list.append("What diagnostic tests (tools) exist or are being developed to detect 2019-nCoV")
kw_list.append("2019-nCoV, SARS-CoV-2, COVID-19, coronavirus, novel coronavirus, diagnosis, tools, detetion, testing, throughput")

topic_area['What do we know about diagnostics and surveillance?'] = list(zip(question_list,kw_list))



#7
#What has been published about information sharing and inter-sectoral collaboration?
question_list = []
kw_list = []

question_list.append('What collaborations are happening within the research community')
kw_list.append('inter-sectorial, international, collaboration, global, coronavirus, novel coronavirus, sharing')


topic_area['What has been published about information sharing and inter-sectoral collaboration?'] = list(zip(question_list,kw_list))


#8
#What has been published about ethical and social science considerations?
question_list = []
kw_list = []


question_list.append("What are the major ethical issues related pandemic outbreaks")
kw_list.append("ehtics, pandemic, caregivers, health care workers, social media")


topic_area['What has been published about ethical and social science considerations?'] = list(zip(question_list,kw_list))




In [0]:
i = all_topics[0]
Display_all(topic_area[i][0][0], topic_area[i][0][1], abst = True)
Display_all(topic_area[i][0][0], topic_area[i][0][1], abst = False)

 50%|█████     | 1/2 [00:02<00:02,  2.31s/it]

28 48 -2.045008


100%|██████████| 2/2 [00:04<00:00,  2.05s/it]

69 95 4.751896





Search with only Abstract


Unnamed: 0,Title,Authors,BERT-SQuAD Answer with Highlights,Confidence
0,Isolation and identification of human coronavirus 229E from frequently touched environmental surfaces of a university classroom that is cleaned daily,Tania Bonny,our findings reinforce the notion that contact transmission may be possible for this virus cov-229e is relatively stable in the environment. our findings reinforce the notion that contact transmission may be possible for this virus.,0.637799
1,First known person-to-person transmission of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in the USA,Isaac Ghinai,contacts were people with exposure to a patient with covid-19 on or after the patient ' s symptom onset date .,0.341198


  0%|          | 0/2 [00:00<?, ?it/s]Token indices sequence length is longer than the specified maximum sequence length for this model (5675 > 512). Running this sequence through the model will result in indexing errors


268 278 3.5963469
7 15 -6.0997705
443 448 -0.6629781
81 88 -1.4098103
2 47 -11.257873
183 197 -0.683667
129 130 -4.745346
173 176 -0.2543293
3 3 -5.820122
117 124 -5.118868
69 222 1.0908566
89 373 -2.0283432


 50%|█████     | 1/2 [01:28<01:28, 88.64s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (1703 > 512). Running this sequence through the model will result in indexing errors


43 55 -0.37333322
69 79 2.3101263
378 389 3.9716048
304 305 -4.9994955


100%|██████████| 2/2 [01:55<00:00, 57.72s/it]

370 391 4.4934115





Search with full paper


Unnamed: 0,Title,Authors,BERT-SQuAD Answer with Highlights,Confidence
0,First known person-to-person transmission of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in the USA,Isaac Ghinai,we declare no competing interests substantial knowledge gaps remain regarding the transmissibility between humans,0.513984
1,Isolation and identification of human coronavirus 229E from frequently touched environmental surfaces of a university classroom that is cleaned daily,Tania Bonny,"cov-229e can remain infectious on environmental surfaces, and potentially poses a biohazard by contact transmission",0.361994


In [0]:
i = all_topics[1]
Display_all(topic_area[i][0][0], topic_area[i][0][1], abst = True)
Display_all(topic_area[i][0][0], topic_area[i][0][1], abst = False)

JSONDecodeError: ignored

In [0]:
i = all_topics[2]
Display_all(topic_area[i][0][0], topic_area[i][0][1], abst = True)
Display_all(topic_area[i][0][0], topic_area[i][0][1], abst = False)


  0%|          | 0/6 [00:00<?, ?it/s][A
 17%|█▋        | 1/6 [00:01<00:05,  1.09s/it][A
 33%|███▎      | 2/6 [00:01<00:03,  1.01it/s][A
 50%|█████     | 3/6 [00:02<00:02,  1.29it/s][AToken indices sequence length is longer than the specified maximum sequence length for this model (736 > 512). Running this sequence through the model will result in indexing errors

 67%|██████▋   | 4/6 [00:05<00:03,  1.67s/it][A
 83%|████████▎ | 5/6 [00:07<00:01,  1.72s/it][A
100%|██████████| 6/6 [00:08<00:00,  1.38s/it][A


Search with only Abstract


Unnamed: 0,Title,Authors,BERT-SQuAD Answer with Highlights,Confidence
0,"Outbreak 2019-nCoV (Wuhan virus), a novel Coronavirus: human-to-human transmission, travel-related cases, and vaccine readiness",Robyn Ralph,"with a seemingly comparable chain of events as the origin of sars-cov, the initial infections with 2019-ncov appears to be linked to contact with animals in wet markets .",0.423477
1,Consensus statement The species Severe acute respiratory syndrome- related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2 Coronaviridae Study Group of the International Committee on Taxonomy of Viruses*,Info missed,"based on phylogeny, taxonomy and established practice, the csg recognizes this virus as forming a sister clade to the prototype human and bat severe acute respiratory syndrome corona -",0.380103
2,,Ping Liu,"the outbreak of 2019-ncov pneumonia in the city of wuhan, china has resulted in more than 70,000 laboratory confirmed cases, and recent studies showed that 2019-ncov (sars-cov-2) could be of bat origin but involve other potential intermediate hosts.",0.327245
3,"Emerging novel coronavirus (2019-nCoV)-current scenario, evolutionary perspective based on genome analysis and recent developments",Yashpal Singh Malik,"coronaviruses are the well-known cause of severe respiratory, enteric and systemic infections in a wide range of hosts including man, mammals, fish, and avian .",0.280161
4,Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding,Roujian Lu,"as of jan 26,2020, more than 2000 cases of 2019-ncov infection have been confirmed, most of which involved people living in or visiting wuhan, and human -to-",0.142267



  0%|          | 0/8 [00:00<?, ?it/s][AToken indices sequence length is longer than the specified maximum sequence length for this model (3822 > 512). Running this sequence through the model will result in indexing errors

 12%|█▎        | 1/8 [00:20<02:20, 20.01s/it][AToken indices sequence length is longer than the specified maximum sequence length for this model (6509 > 512). Running this sequence through the model will result in indexing errors

 25%|██▌       | 2/8 [00:54<02:25, 24.24s/it][AToken indices sequence length is longer than the specified maximum sequence length for this model (6860 > 512). Running this sequence through the model will result in indexing errors

 38%|███▊      | 3/8 [01:30<02:19, 27.83s/it][AToken indices sequence length is longer than the specified maximum sequence length for this model (4428 > 512). Running this sequence through the model will result in indexing errors

 50%|█████     | 4/8 [01:53<01:45, 26.40s/it][AToken indices sequence length i

Search with full paper


Unnamed: 0,Title,Authors,BERT-SQuAD Answer with Highlights,Confidence
0,,Ping Liu,/ 2020 malayan pangolins,0.567853
1,RNA based mNGS approach identifies a novel human coronavirus from two individual pneumonia cases in 2019 Wuhan outbreak,Liangjun Chen,bats,0.386718
2,"Outbreak 2019-nCoV (Wuhan virus), a novel Coronavirus: human-to-human transmission, travel-related cases, and vaccine readiness",Robyn Ralph,"1), were obtained from genbank .",0.324047
3,Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding,Roujian Lu,"therefore, on the basis of current data, it seems likely that the 2019-ncov causing the wuhan outbreak might also be initially hosted by bats , and might have been transmitted to humans via currently unknown wild animal (s) sold at the huanan seafood market.",0.307604
4,"Emerging novel coronavirus (2019-nCoV)-current scenario, evolutionary perspective based on genome analysis and recent developments",Yashpal Singh Malik,"bats are considered as the natural reservoir hosts and play a crucial role in transmitting various viruses, including ebola, nipah, coronavirus and others (cui et al.",0.28442
5,Consensus statement The species Severe acute respiratory syndrome- related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2 Coronaviridae Study Group of the International Committee on Taxonomy of Viruses*,Info missed,human coronavirus 3,0.281045
6,The deadly coronaviruses: The 2003 SARS pandemic and the 2020 novel coronavirus epidemic in China,Yongshi Yang,civets,0.207842
7,"COVID-19: Epidemiology, Evolution, and Cross-Disciplinary Perspectives Trends in Molecular Medicine",Jiumeng Sun,xx 11 bats,0.19411


In [0]:
i = all_topics[3]
Display_all(topic_area[i][0][0], topic_area[i][0][1], abst = True)
Display_all(topic_area[i][0][0], topic_area[i][0][1], abst = False)

NameError: ignored

In [0]:
i = all_topics[4]
Display_all(topic_area[i][0][0], topic_area[i][0][1], abst = True)
Display_all(topic_area[i][0][0], topic_area[i][0][1], abst = False)

NameError: ignored

In [0]:
i = all_topics[5]
Display_all(topic_area[i][0][0], topic_area[i][0][1], abst = True)
Display_all(topic_area[i][0][0], topic_area[i][0][1], abst = False)


  0%|          | 0/6 [00:00<?, ?it/s][A
 17%|█▋        | 1/6 [00:01<00:05,  1.12s/it][A
 33%|███▎      | 2/6 [00:01<00:03,  1.06it/s][A
 50%|█████     | 3/6 [00:02<00:02,  1.21it/s][AToken indices sequence length is longer than the specified maximum sequence length for this model (535 > 512). Running this sequence through the model will result in indexing errors

 67%|██████▋   | 4/6 [00:05<00:02,  1.46s/it][A
 83%|████████▎ | 5/6 [00:06<00:01,  1.42s/it][A
100%|██████████| 6/6 [00:07<00:00,  1.21s/it][A


Search with only Abstract


Unnamed: 0,Title,Authors,BERT-SQuAD Answer with Highlights,Confidence
0,Emergent severe acute respiratory distress syndrome caused by adenovirus type 55 in immunocompetent adults in 2013: a prospective observational study,Bing Sun,the clinical features and outcomes of the most critically ill patients with severe acute respiratory distress syndrome (ards) caused by hadv-55 requiring invasive mechanical ventilation (imv) and / or extracorporeal membrane oxygenation (ecmo) are lacking.,0.521152
1,Intravenous vitamin C as adjunctive therapy for enterovirus/rhinovirus induced acute respiratory distress syndrome,Alpha Fowler,"this report outlines the first use of high dose intravenous vitamin c as an interventional therapy for ards, resulting from enterovirus / rhinovirus respiratory infection.",0.458741
2,Extracorporeal membrane oxygenation for severe Middle East respiratory syndrome coronavirus,Mohammed Alshahrani,the objective of this study is to compare the outcomes of mers-cov patients before and after the availability of extracorporeal membrane oxygenation (ecmo) as a rescue therapy in severely hypoxemic patients who failed conventional strategies.,0.450459
3,Mobile ECMO team for inter-hospital transportation of patients with ARDS: a retrospective case series,Alberto Lucchini,"29 patients (69 %) were transported with extracorporeal membrane oxygenation support , while 13 patients (31 %) were transported with conventional ventilation.",0.324622
4,Application of extracorporeal membrane oxygenation in patients with severe acute respiratory distress syndrome induced by avian influenza A (H7N9) viral pneumonia: national data from the Chinese multicentre collaboration,Linna Huang,05) after 48 h on ecmo support .,0.315255



  0%|          | 0/7 [00:00<?, ?it/s][AToken indices sequence length is longer than the specified maximum sequence length for this model (5968 > 512). Running this sequence through the model will result in indexing errors

 14%|█▍        | 1/7 [00:31<03:07, 31.27s/it][AToken indices sequence length is longer than the specified maximum sequence length for this model (3546 > 512). Running this sequence through the model will result in indexing errors

 29%|██▊       | 2/7 [00:49<02:17, 27.43s/it][AToken indices sequence length is longer than the specified maximum sequence length for this model (3129 > 512). Running this sequence through the model will result in indexing errors

 43%|████▎     | 3/7 [01:05<01:36, 24.06s/it][AToken indices sequence length is longer than the specified maximum sequence length for this model (3535 > 512). Running this sequence through the model will result in indexing errors

 57%|█████▋    | 4/7 [01:24<01:07, 22.34s/it][AToken indices sequence length i

Search with full paper


Unnamed: 0,Title,Authors,BERT-SQuAD Answer with Highlights,Confidence
0,Extracorporeal membrane oxygenation for severe Middle East respiratory syndrome coronavirus,Mohammed Alshahrani,adjunctive therapies,0.644751
1,Extracorporeal membrane oxygenation with prone position ventilation successfully rescues infantile pertussis: a case report and literature review,Jingyi Shi,lung protective strategies and a restrictive fluid strategy,0.449246
2,Emergent severe acute respiratory distress syndrome caused by adenovirus type 55 in immunocompetent adults in 2013: a prospective observational study,Bing Sun,invasive mechanical ventilation and / or extracorporeal membrane oxygenation (ecmo),0.447816
3,Mobile ECMO team for inter-hospital transportation of patients with ARDS: a retrospective case series,Alberto Lucchini,extracorporeal respiratory support,0.430143
4,Planning and provision of ECMO services for severe ARDS during the COVID-19 pandemic and other outbreaks of emerging infectious diseases Health-care Development,( Hospital,extracorporeal membrane oxygenation,0.364851
5,Application of extracorporeal membrane oxygenation in patients with severe acute respiratory distress syndrome induced by avian influenza A (H7N9) viral pneumonia: national data from the Chinese multicentre collaboration,Linna Huang,05) after ecmo support .,0.276846


In [0]:
i = all_topics[6]
Display_all(topic_area[i][0][0], topic_area[i][0][1], abst = True)
Display_all(topic_area[i][0][0], topic_area[i][0][1], abst = False)


  0%|          | 0/6 [00:00<?, ?it/s][A
 17%|█▋        | 1/6 [00:01<00:08,  1.75s/it][A
 33%|███▎      | 2/6 [00:02<00:06,  1.58s/it][A
 50%|█████     | 3/6 [00:05<00:05,  1.74s/it][A
 67%|██████▋   | 4/6 [00:06<00:03,  1.62s/it][A
 83%|████████▎ | 5/6 [00:08<00:01,  1.64s/it][A
100%|██████████| 6/6 [00:09<00:00,  1.56s/it][A


Search with only Abstract


Unnamed: 0,Title,Authors,BERT-SQuAD Answer with Highlights,Confidence
0,"Potential Rapid Diagnostics, Vaccine and Therapeutics for 2019 Novel Coronavirus (2019-nCoV): A Systematic Review",Junxiong Pang,"however, serological assays as well as point-of-care testing kits have not been developed but are likely in the near future.",0.566219
1,Potential T-cell and B-cell Epitopes of 2019-nCoV,Ethan Fast,here we use computational tools from structural biology and machine learning to identify 2019-ncov t-cell and b-cell epitopes based on viral protein antigen presentation and antibody binding properties.,0.539724
2,Title: Genome Detective Coronavirus Typing Tool for rapid identification and characterization of novel coronavirus genomes Short title: Automated tool for phylogenetic and mutational analysis of coronaviruses genomes,Sara Cleemput,"the tool also allows tracking of new viral mutations as the outbreak expands globally, which may help to accelerate the development of novel diagnostics, drugs and vaccines .",0.518947
3,Genome Detective Coronavirus Typing Tool for rapid identification and characterization of novel coronavirus genomes,Sara Cleemput,"the tool also allows tracking of new viral mutations as the outbreak expands globally, which may help to accelerate the development of novel diagnostics, drugs and vaccines to stop the covid-19 disease.",0.507591
4,Rapid colorimetric detection of COVID-19 coronavirus using a reverse tran- scriptional loop-mediated isothermal amplification (RT-LAMP) diagnostic plat- form: iLACO,Lin Yu,"the accuracy, simplicity and versatility of the new developed method suggests that ilaco assays can be conveniently applied with for 2019-ncov threat control, even in those cases where specialized molecular biology equipment is not available .",0.48179



  0%|          | 0/6 [00:00<?, ?it/s][AToken indices sequence length is longer than the specified maximum sequence length for this model (3393 > 512). Running this sequence through the model will result in indexing errors

 17%|█▋        | 1/6 [00:18<01:31, 18.20s/it][AToken indices sequence length is longer than the specified maximum sequence length for this model (2579 > 512). Running this sequence through the model will result in indexing errors

 33%|███▎      | 2/6 [00:31<01:07, 16.85s/it][AToken indices sequence length is longer than the specified maximum sequence length for this model (1986 > 512). Running this sequence through the model will result in indexing errors

 50%|█████     | 3/6 [00:42<00:44, 14.99s/it][AToken indices sequence length is longer than the specified maximum sequence length for this model (2062 > 512). Running this sequence through the model will result in indexing errors

 67%|██████▋   | 4/6 [00:53<00:27, 13.81s/it][AToken indices sequence length i

Search with full paper


Unnamed: 0,Title,Authors,BERT-SQuAD Answer with Highlights,Confidence
0,"Potential Rapid Diagnostics, Vaccine and Therapeutics for 2019 Novel Coronavirus (2019-nCoV): A Systematic Review",Junxiong Pang,"com / xxx / s1, table s1 : example of full search strategy in pubmed, table s2 : google search : 2019-ncov diagnostics, table s3 : summary of diagnostic assays developed for 2019-ncov, table s4 rapid diagnostics, vaccines and therapeutics",0.494263


In [0]:
i = all_topics[7]
Display_all(topic_area[i][0][0], topic_area[i][0][1], abst = True)
Display_all(topic_area[i][0][0], topic_area[i][0][1], abst = False)


  0%|          | 0/7 [00:00<?, ?it/s][A
 14%|█▍        | 1/7 [00:02<00:13,  2.18s/it][A
 29%|██▊       | 2/7 [00:04<00:10,  2.08s/it][A
 43%|████▎     | 3/7 [00:05<00:07,  1.80s/it][A
 57%|█████▋    | 4/7 [00:07<00:05,  1.94s/it][A
 71%|███████▏  | 5/7 [00:10<00:04,  2.18s/it][A
 86%|████████▌ | 6/7 [00:11<00:01,  1.82s/it][A
100%|██████████| 7/7 [00:12<00:00,  1.76s/it][A


Search with only Abstract


Unnamed: 0,Title,Authors,BERT-SQuAD Answer with Highlights,Confidence
0,"C-ME: A 3D Community-Based, Real-Time Collaboration Tool for Scientific Research and Training",A Kolatkar,the need for effective collaboration tools is growing as multidisciplinary proteome-wide projects and distributed research teams become more common .,0.681837
1,R E V I E W partnership: experiences of co-learning and supporting the healthcare system in Uganda,Open Access,"training and research are a key focus of the partnership and have involved both staff and students of both institutions including guest lectures, seminars and conference presentations .",0.592211
2,Emerging respiratory tract infections 2 Emerging infectious diseases and pandemic potential: status quo and reducing risk of global spread,Brian Mccloskey,collaboration between countries should be encouraged in a way that acknowledges the benefi ts that derive from sharing biological material and establishing equitable collaborative research partnerships .,0.5885
3,Fogarty International Center collaborative networks in infectious disease modeling: Lessons learnt in research and capacity building,Martha Nelson,"due to a combination of ecological, political, and demographic factors, the emergence of novel pathogens has been increasingly observed in animals and humans in recent decades. enhancing global capacity to study and interpret infectious disease surveillance data, and to develop data-driven computational models to guide policy, represents one of the most cost-effective, and yet overlooked, ways to prepare for the next pandemic. epidemiological and behavioral data from recent pandemics and historic scourges have provided rich opportunities for validation of computational models, while new sequencing technologies and the ' big data ' revolution present new tools for studying the epidemiology of outbreaks in real time. for the past two decades, the division of international epidemiology and population studies (dieps) of the nih fogarty international center has spearheaded two synergistic programs to better understand and devise control strategies for global infectious disease threats. the multinational influenza seasonal mortality study (misms) has strengthened global capacity to study the epidemiology and evolutionary dynamics of influenza viruses in 80 countries by organizing international research activities and training workshops. the research and policy in infectious disease dynamics (rapidd) program and its precursor activities has established a network of global experts in infectious disease modeling operating at the research-policy interface, with collaborators in 78 countries. these activities have provided evidence-based recommendations for disease control, including during large-scale outbreaks of pandemic influenza, ebola and zika virus. together, these programs have coordinated international collaborative networks to advance the study of emerging disease threats and the field of computational epidemic modeling.",0.533397
4,A global bibliometric analysis of Plesiomonas- related research (1990 -2017),Temitope Ekundayoid,"here, we carried out a bibliometric survey that aimed to examine publication trends in plesiomonas-related research by time and place, international collaborative works , identify gaps and suggest directions for future research.",0.502882



  0%|          | 0/9 [00:00<?, ?it/s][AToken indices sequence length is longer than the specified maximum sequence length for this model (9063 > 512). Running this sequence through the model will result in indexing errors

 11%|█         | 1/9 [00:47<06:21, 47.63s/it][AToken indices sequence length is longer than the specified maximum sequence length for this model (3995 > 512). Running this sequence through the model will result in indexing errors

 22%|██▏       | 2/9 [01:08<04:37, 39.60s/it][AToken indices sequence length is longer than the specified maximum sequence length for this model (5146 > 512). Running this sequence through the model will result in indexing errors

 33%|███▎      | 3/9 [01:35<03:34, 35.78s/it][AToken indices sequence length is longer than the specified maximum sequence length for this model (5780 > 512). Running this sequence through the model will result in indexing errors

 44%|████▍     | 4/9 [02:05<02:50, 34.18s/it][AToken indices sequence length i

Search with full paper


Unnamed: 0,Title,Authors,BERT-SQuAD Answer with Highlights,Confidence
0,Fogarty International Center collaborative networks in infectious disease modeling: Lessons learnt in research and capacity building,Martha Nelson,establishing strong international collaborative research networks,0.668296
1,"C-ME: A 3D Community-Based, Real-Time Collaboration Tool for Scientific Research and Training",A Kolatkar,82 mb swf) research teams are increasingly interdisciplinary and collaborative among laboratories in different departments and institutions located around the world,0.593848


In [0]:
i = all_topics[8]
Display_all(topic_area[i][0][0], topic_area[i][0][1], abst = True)
Display_all(topic_area[i][0][0], topic_area[i][0][1], abst = False)


  0%|          | 0/10 [00:00<?, ?it/s][A
 10%|█         | 1/10 [00:00<00:08,  1.03it/s][A
 20%|██        | 2/10 [00:02<00:09,  1.16s/it][A
 30%|███       | 3/10 [00:04<00:09,  1.37s/it][A
 40%|████      | 4/10 [00:05<00:07,  1.30s/it][A
 50%|█████     | 5/10 [00:06<00:05,  1.12s/it][A
 60%|██████    | 6/10 [00:06<00:03,  1.00it/s][A
 70%|███████   | 7/10 [00:07<00:02,  1.17it/s][A
 80%|████████  | 8/10 [00:09<00:02,  1.11s/it][A
 90%|█████████ | 9/10 [00:10<00:01,  1.20s/it][A
100%|██████████| 10/10 [00:12<00:00,  1.25s/it][A


Search with only Abstract


Unnamed: 0,Title,Authors,BERT-SQuAD Answer with Highlights,Confidence
0,"Ethics for pandemics beyond influenza: Ebola, drug- resistant tuberculosis, and anticipating future ethical challenges in pandemic preparedness and response",Maxwell Smith,"the unprecedented outbreak of ebola virus disease (evd) in west africa has raised several novel ethical issues for global outbreak preparedness. it has also illustrated that familiar ethical issues in infectious disease management endure despite considerable efforts to understand and mitigate such issues in the wake of past outbreaks. to improve future global outbreak preparedness and response, we must examine these shortcomings and reflect upon the current state of ethical preparedness. to this end, we focus our efforts in this article on the examination of one substantial area : ethical guidance in pandemic plans. we argue that, due in part to their focus on considerations arising specifically in relation to pandemics of influenza origin, pandemic plans and their existing ethical guidance are ill-equipped to anticipate and facilitate the navigation of unique ethical challenges that may arise in other infectious disease pandemics. we proceed by outlining three reasons why this is so, and situate our analysis in the context of the evd outbreak and the threat posed by drug-resistant tuberculosis : (1) different infectious diseases have distinct characteristics that challenge anticipated or existing modes of pandemic prevention, preparedness, response, and recovery , (2) clear, transparent, context-specific ethical reasoning and justification within current influenza pandemic plans are lacking, and (3) current plans neglect the context of how other significant pandemics may manifest.",0.71084
1,Risk Management and Healthcare Policy Dovepress Critical role of ethics in clinical management and public health response to the West Africa Ebola epidemic,Morenike Folayan,ethical issues related to prevention and containment include the appropriateness and scope of quarantine and isolation within and outside affected countries .,0.68957
2,Special Issue Pandethics,M Selgelid,"this paper explains the ethical importance of infectious diseases, and reviews four major ethical issues associated with pandemic influenza : the obligation of individuals to avoid infecting others, healthcare workers ' ' duty to treat ', allocation of scarce resources, and coercive social distancing measures .",0.659091
3,The prospect of pandemic influenza: Why should the optometrist be concerned about a public health problem?,Gregory Hom,the ethical and legal issues surrounding control of a pandemic influenza and the prospect of telemedicine as a form of social distancing are also discussed.,0.639385
4,BMC Medical Ethics On pandemics and the duty to care: whose duty? who cares?,Carly Ruderman,"despite this challenge, professional codes of ethics are silent on the issue of duty to care during communicable disease outbreaks , thus providing no guidance on what is expected of hcps or how they ought to approach their duty to care in the face of risk.",0.615739
5,653-662 Lor et al,Aun Lor,"methods : we reviewed the meeting reports, notes and stories and mapped outcomes to the key ethical challenges for pandemic influenza response described in the world health organization ' s (who ' s) guidance, ethical considerations in developing a public health response to pandemic influenza : transparency and public engagement, allocation of resources, social distancing, obligations to and of healthcare workers, and international collaboration .",0.605742
6,The duty to care in an influenza pandemic: A qualitative study of Canadian public perspectives,Cécile Bensimon,"this study involved three townhall meetings held between february 2008 and may 2010 in three urban settings in canada in order to probe lay citizens ' views about ethical issues related to pandemic influenza, including issues surrounding the duty to care .",0.492084
7,Fight or Flight: The Ethics of Emergency Physician Disaster Response,Kenneth Iserson,"however, we need to ask : should they, and will they, work rather than flee ?",0.278741



  0%|          | 0/10 [00:00<?, ?it/s][AToken indices sequence length is longer than the specified maximum sequence length for this model (4292 > 512). Running this sequence through the model will result in indexing errors

 10%|█         | 1/10 [00:22<03:21, 22.44s/it][AToken indices sequence length is longer than the specified maximum sequence length for this model (6534 > 512). Running this sequence through the model will result in indexing errors

 20%|██        | 2/10 [00:56<03:27, 25.96s/it][AToken indices sequence length is longer than the specified maximum sequence length for this model (7944 > 512). Running this sequence through the model will result in indexing errors

 30%|███       | 3/10 [01:39<03:37, 31.01s/it][AToken indices sequence length is longer than the specified maximum sequence length for this model (11056 > 512). Running this sequence through the model will result in indexing errors

 40%|████      | 4/10 [02:38<03:57, 39.53s/it][AToken indices sequence le

Search with full paper


Unnamed: 0,Title,Authors,BERT-SQuAD Answer with Highlights,Confidence
0,The prospect of pandemic influenza: Why should the optometrist be concerned about a public health problem?,Gregory Hom,extreme measures that may be required to quickly control a deadly virus,0.532387
1,"Ethics for pandemics beyond influenza: Ebola, drug- resistant tuberculosis, and anticipating future ethical challenges in pandemic preparedness and response",Maxwell Smith,testing investigational agents in vaccine trials,0.507242
2,Special Issue Pandethics,M Selgelid,"the obligation of individuals to avoid infecting others, healthcare workers ' ' duty to treat ', allocation of scarce resources, and the use of coercive social distancing measures",0.487058
3,Ethics-sensitivity of the Ghana national integrated strategic response plan for pandemic influenza,Amos Laar,recurring tension in public health between the rights of individual liberties versus public health promotion,0.431597
4,653-662 Lor et al,Aun Lor,"low literacy level, poverty, and trust of and / or deference to health authorities",0.405956
5,Risk Management and Healthcare Policy Dovepress Critical role of ethics in clinical management and public health response to the West Africa Ebola epidemic,Morenike Folayan,the authors report no conflicts of interest in this work informed consent,0.278898
6,BMC Medical Ethics On pandemics and the duty to care: whose duty? who cares?,Carly Ruderman,physicians ' duty to care,0.276583
7,Fight or Flight: The Ethics of Emergency Physician Disaster Response,Kenneth Iserson,professional ethical statements about expected conduct establish important professional expectations and norms,0.275383
8,The duty to care in an influenza pandemic: A qualitative study of Canadian public perspectives,Cécile Bensimon,issues surrounding the duty to care,0.208956
9,Pandémie grippale A/H5N1 et niveau de préparation du Niger : une étude sur les connaissances des soignants et l'organisation générale des soins Preparedness for influenza A/H5N1 pandemic in Niger: a study on health care workers' knowledge and global organization of health activities,E D',"differences economiques et sanitaires sont essentielles, puisque plus encore qu ' auparavant, a l ' heure de la mondialisation, les pathogenes se jouent des frontieres",0.139903
