### Imports

In [1]:
!pip install transformers
!pip install datasets evaluate transformers[sentencepiece]

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
! unzip "/content/drive/MyDrive/data"

Archive:  /content/drive/MyDrive/data.zip
replace data/consumer_questions_prelim.json? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace data/epic_qa_consumer_2020-06-10_v2.tar.gz? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace data/epic_qa_cord_2020-06-19_v3.tar.gz? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace data/expert_questions_prelim.json? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace data/prelim_judgments_corrected.json.gz? [y]es, [n]o, [A]ll, [N]one, [r]ename: n


In [4]:
! tar xvzf "/content/data/epic_qa_cord_2020-06-19_v3.tar.gz"

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
2020-06-19-clean/oi9j5o0n.json
2020-06-19-clean/fs2gwmg1.json
2020-06-19-clean/qzbm7k9h.json
2020-06-19-clean/vtgazwb0.json
2020-06-19-clean/vhxuvrk7.json
2020-06-19-clean/4iff3pix.json
2020-06-19-clean/3v9cp89p.json
2020-06-19-clean/gffflvs5.json
2020-06-19-clean/upgbpacz.json
2020-06-19-clean/wr8biaor.json
2020-06-19-clean/tfb1ve2p.json
2020-06-19-clean/efb10j3u.json
2020-06-19-clean/6osemfvk.json
2020-06-19-clean/6agdpff3.json
2020-06-19-clean/2zh87tmo.json
2020-06-19-clean/z325srgf.json
2020-06-19-clean/ua5564mx.json
2020-06-19-clean/fjflzjxc.json
2020-06-19-clean/b0tcrxbo.json
2020-06-19-clean/nmyyt51x.json
2020-06-19-clean/frrhqtw8.json
2020-06-19-clean/dig9h0i0.json
2020-06-19-clean/7ndk8muj.json
2020-06-19-clean/i0hizpho.json
2020-06-19-clean/e7r565y3.json
2020-06-19-clean/bjmhg6z4.json
2020-06-19-clean/g3oucxul.json
2020-06-19-clean/35jn14az.json
2020-06-19-clean/b9dbpn5a.json
2020-06-19-clean/63wqn2fg.json
2020-

In [5]:
import json
import os
import csv
import pandas as pd
import numpy as np

### Loading data

In [6]:
files = os.listdir('/content/2020-06-19-clean')
len(files)

129069

*Making a dataframe of relevant information*

In [7]:
doc_id = []
titles = []
contexts = []

for i in range(len(files)):
  file_path = os.path.join('/content/2020-06-19-clean', files[i])
  if file_path[-5:] == ".json":
    f = open(file_path, 'r')
    data = json.load(f)

    # Extract the relevant information
    context_text = ''
    document_id = data['document_id']
    title = data['metadata']['title']
    for dicts in data['contexts']:
      line = dicts['text']
      # basic cleaning (space removal)
      context_text += line.strip() + ' '
    context_text = context_text.strip()
    
    # making lists for dataframe
    doc_id.append(document_id)
    titles.append(title)
    contexts.append(context_text)

In [8]:
contexts[0]

'BACKGROUND: Percutaneous tracheostomy (PT) in patients with coronavirus disease (COVID‐19) included several critical steps associated with increased risk of aerosol generation. We reported a modified PT technique aiming to minimize the risk of aerosol generation and to increase the staff safety in COVID‐19 patients. METHODS: PT was performed with a modified technique including the use of a smaller endotracheal tube (ETT) cuffed at the carina during the procedure. RESULTS: The modified technique we proposed was successfully performed in three critically ill patients with COVID‐19. CONCLUSIONS: In COVID‐19 critically ill patients, a modified PT technique, including the use of a smaller ETT cuffed at the carina and fiber‐optic bronchoscope inserted between the tube and the inner surface of the trachea, may ensure a better airway management, respiratory function, patient comfort, and great safety for the staff. As the novel coronavirus (2019-nCov) globally spreads, the coronavirus disease

*Saving to csv for ease of further use*

In [9]:
context_df = pd.DataFrame(list(zip(doc_id, titles, contexts)), columns = ['doc_id', 'title', 'context_text'])
context_df.to_csv('expert_context.csv')
# the csv has been added to the submission folder

In [11]:
!cp expert_context.csv /content/drive/MyDrive

In [12]:
context_df.head()

Unnamed: 0,doc_id,title,context_text
0,dkyamm1i,Modified percutaneous tracheostomy in COVID‐19...,BACKGROUND: Percutaneous tracheostomy (PT) in ...
1,ebnljv7z,Evolving guidelines in the use of topical nons...,BACKGROUND Nonsteroidal anti-inflammatory drug...
2,gsh3hr16,Long-Term Clinical Outcomes Following Radiofre...,PURPOSE Microwave ablation (MWA) is a relative...
3,oi9vf7ss,Role of Fibrinolysis in the Nasal System,"In this chapter, we show the presence of tissu..."
4,59ghorzf,Accurate Prediction of COVID-19 using Chest X-...,According to the World Health Organization (WH...


### Retrieving the top 10 documents

In [13]:
!pip install rank_bm25
from rank_bm25 import BM25Okapi

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


* Okapi model used to find similarity scores between the query and the document titles
* Based on these scores, the top 10 documents titles are identified
* The corresponding document ids are retrieved from the dataframe made

In [14]:
# retrieving top 10 documents
def topk(query, documents):
    tokenized_docs = [doc.split(" ") for doc in documents]
    for i in range(len(tokenized_docs)):
      tokenized_docs[i] = [x.lower() for x in tokenized_docs[i]]
      
    # Create BM25 model
    bm25 = BM25Okapi(tokenized_docs)

    # Extract query terms
    tokenized_query = query.split(" ")

    # Calculate document scores
    doc_scores = bm25.get_scores(tokenized_query)
    # Rank documents
    ranked_docs = sorted(range(len(doc_scores)), key=lambda i: doc_scores[i], reverse=True)
    top_docs = [documents[i] for i in ranked_docs[:10]]
    ids = []
    for doc in top_docs:
      id = context_df.loc[context_df['title'] == doc, 'doc_id']
      ids.append(list(id)[0])
    return ids, top_docs

In [48]:
retrieved = topk('coronavirus superspreaders', list(context_df['title']))
retrieved

(['pzquq8mq',
  'bph4nuch',
  'mr4v0hkn',
  'm5ptobvz',
  '0gt21051',
  '70kfu3qu',
  'h7vqmlq9',
  '5906wju4',
  'oocco483',
  '64u656ji'],
 ['Are SARS Superspreaders Cloud Adults?',
  'Effects of superspreaders in spread of epidemic',
  'Fundamental difference between superblockers and superspreaders in networks',
  'Impact of Superspreaders on dissemination and mitigation of COVID-19',
  'Autonomous Targeting of Infectious Superspreaders Using Engineered Transmissible Therapies',
  'From superspreaders to disease hotspots: linking transmission across hosts and space',
  'Early detection of superspreaders by mass group pool testing can mitigate COVID-19 pandemic',
  'Do superspreaders generate new superspreaders? a hypothesis to explain the propagation pattern of COVID-19',
  'Do superspreaders generate new superspreaders? A hypothesis to explain the propagation pattern of COVID-19',
  'Quantify the role of superspreaders -opinion leaders- on COVID-19 information propagation in the C

In [17]:
results = topk('coronavirus immunity', list(context_df['title']))
texts = []
for title in results[0]:
     col = context_df['context_text'].loc[context_df['doc_id']  == title]
     text = list(col)[0]
     texts.append(text)

In [19]:
texts

["The emergence of the highly pathogenic SARS coronavirus (SARS-CoV) has reignited interest in coronavirus biology and pathogenesis. An emerging theme in coronavirus pathogenesis is that the interaction between specific viral genes and the host immune system, specifically the innate immune system, functions as a key determinant in regulating virulence and disease outcomes. Using SARS-CoV as a model, we will review the current knowledge of the interplay between coronavirus infection and the host innate immune system in vivo, and then discuss the mechanisms by which specific gene products antagonize the host innate immune response in cell culture models. Our data suggests that the SARS-CoV uses specific strategies to evade and antagonize the sensing and signaling arms of the interferon pathway. We summarize by identifying future points of consideration that will contribute greatly to our understanding of the molecular mechanisms governing coronavirus pathogenesis and virulence, and the d

In [20]:
# function to split sentences effectively
import re
def split_req(document):
  text = re.sub(r'\([^)]*\)', '', document)
  # Split the text into sentences using regular expressions
  sentences = re.split(r'(?<=[^A-Z].[.?]) +(?=[A-Z])', text)
  sentences = [x.strip() for x in sentences]
  return sentences

### Retrieving the top sentences from each document

* Using the Okapi model to score individual sentences from the top 10 documents retrieved earlier.
* Taking the top scored sentences
* Returning a list of these top scored sentences 

In [21]:
def top_sents(query, documents):
    #tokenized_docs = [doc.split(" ") for doc in documents]
    tokenized_docs = []
    for doc in documents:
      split_made = split_req(doc)
      tokenized_docs.append(split_made)
      
    for i in range(len(tokenized_docs)):
      tokenized_docs[i] = [x.lower() for x in tokenized_docs[i]]
    #tokenized_docs = [x.lower() for x in tokenized_docs]
        
    # Create BM25 model
    bm25 = BM25Okapi(tokenized_docs)
    # Extract query terms
    tokenized_query = query.split(" ")

    # Calculate document scores
    doc_scores = bm25.get_scores(tokenized_query)

    # Rank documents
    ranked_docs = sorted(range(len(doc_scores)), key=lambda i: doc_scores[i], reverse=True)
    if len(ranked_docs) > 50:
      top_docs = [documents[i] for i in ranked_docs[:50]]
    else:
      top_docs = [documents[i] for i in ranked_docs]
    return top_docs

In [22]:
results = topk('coronavirus immunity', list(context_df['title']))
texts = []
for title in results[0]:
 # referencing doc text based on the doc id
      col = context_df['context_text'].loc[context_df['doc_id']  == title]
      text = list(col)[0]
      split_made = split_req(text)
      texts += split_made
      texts.append(para)
sentences = top_sents('coronavirus immunity', texts)

In [23]:
sentences

['The emergence of the highly pathogenic SARS coronavirus (SARS-CoV) has reignited interest in coronavirus biology and pathogenesis',
 ' An emerging theme in coronavirus pathogenesis is that the interaction between specific viral genes and the host immune system, specifically the innate immune system, functions as a key determinant in regulating virulence and disease outcomes',
 ' Using SARS-CoV as a model, we will review the current knowledge of the interplay between coronavirus infection and the host innate immune system in vivo, and then discuss the mechanisms by which specific gene products antagonize the host innate immune response in cell culture models',
 ' Our data suggests that the SARS-CoV uses specific strategies to evade and antagonize the sensing and signaling arms of the interferon pathway',
 ' We summarize by identifying future points of consideration that will contribute greatly to our understanding of the molecular mechanisms governing coronavirus pathogenesis and viru

In [24]:
# takes given query and the made dataframe as input
def model_docs_init(query, data):
  results = topk(query, list(data['title']))
  texts = []
  for title in results[0]:
      col = context_df['context_text'].loc[context_df['doc_id']  == title]
      text = list(col)[0]
      split_made = split_req(text)
      texts += split_made
  sentences = top_sents(query, texts)
  cont = ""
  para = ""
  for sent in sentences:
    cont+=sent
  if len(cont.split(" ")) > 300:
    cont_new = cont.split()[:300]
    for word in cont_new:
      para += word + " "
  else:
    para = cont 
  return para

### Re-ranking the sentences

* BERT model used for BERT Re-ranking
* returns a list of tuples containing the ranks according to BERT and corresponding initial indices
* taking the top 30 sentences according to this
* concatenating these to make context that can be given to QA BERT model

In [25]:
import torch

In [26]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [27]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# TODO consider other reranking models
rerank_model_name = 'nboost/pt-bert-large-msmarco'
tokenizer = AutoTokenizer.from_pretrained(rerank_model_name)

model1 = AutoModelForSequenceClassification.from_pretrained(rerank_model_name)
model1.to(device)
model1.eval()

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 1024, padding_idx=0)
      (position_embeddings): Embedding(512, 1024)
      (token_type_embeddings): Embedding(2, 1024)
      (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-23): 24 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=1024, out_features=1024, bias=True)
              (key): Linear(in_features=1024, out_features=1024, bias=True)
              (value): Linear(in_features=1024, out_features=1024, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=1024, out_features=1024, bias=True)
              (LayerNorm): LayerNorm((1024,

In [28]:
def rerank(query, passages):
  batch_size = 8
  pairs = [(query, passage) for passage in passages]
  all_scores = []
  for b_idx in range(int(np.ceil(len(pairs)/batch_size))):
    batch = tokenizer.batch_encode_plus(
      # query, passage
      batch_text_or_text_pairs=pairs[b_idx * batch_size:(b_idx+1)*batch_size], 
      add_special_tokens=True,
      padding=True,
      return_tensors='pt'
    )

    scores = model1(
      input_ids=batch['input_ids'].to(device),
      token_type_ids=batch['token_type_ids'].to(device),
      attention_mask=batch['attention_mask'].to(device)
    )[0][:, 0].data.cpu().numpy()
    all_scores.extend(scores)
  
  passages_sorted = list(sorted(zip(all_scores, range(len(passages))), key=lambda x: x[0]))
  return passages_sorted


### Consolidated function

* takes query and dataframe made initially as input
* outputs the concatenated sentences

In [33]:
# takes given query and the made dataframe as input
def model_docs(query, data):
  results = topk(query, list(data['title']))
  texts = []
  for title in results[0]:
      col = context_df['context_text'].loc[context_df['doc_id']  == title]
      text = list(col)[0]
      split_made = split_req(text)
      texts += split_made
  sentences = top_sents(query, texts)
  reranked_sentences = rerank(query, sentences)
  concat = ''
  for i in range(1, 3):
    score, ind = reranked_sentences[0-i]
    concat += sentences[ind] + " "
  return sentences, concat

In [34]:
results = topk('origin of covid', list(context_df['title']))
texts = []
for title in results[0]:
    col = context_df['context_text'].loc[context_df['doc_id']  == title]
    text = list(col)[0]
    split_made = split_req(text)
    texts += split_made
sentences = top_sents('origin of covid', texts)
reranked_sentences = rerank('origin of covid', sentences)
concat = ''
for i in range(1, 5):
  score, ind = reranked_sentences[0-i]
  concat += sentences[ind] + " "

In [35]:
reranked_sentences

[(5.7006392, 13),
 (5.7033596, 21),
 (5.7487445, 20),
 (5.7595596, 41),
 (5.787584, 22),
 (5.7909713, 1),
 (5.8022265, 31),
 (5.8062234, 29),
 (5.817418, 23),
 (5.818586, 26),
 (5.8188825, 27),
 (5.8306746, 25),
 (5.8318458, 45),
 (5.8339353, 30),
 (5.8358655, 32),
 (5.836523, 2),
 (5.8406315, 46),
 (5.8429923, 19),
 (5.8439364, 15),
 (5.8456907, 37),
 (5.8466167, 9),
 (5.846663, 43),
 (5.847121, 38),
 (5.847347, 28),
 (5.847488, 35),
 (5.847573, 11),
 (5.8500247, 6),
 (5.850388, 24),
 (5.8510203, 18),
 (5.8529296, 16),
 (5.854554, 49),
 (5.855842, 17),
 (5.8560944, 36),
 (5.8562484, 8),
 (5.857493, 10),
 (5.8579955, 48),
 (5.8580775, 42),
 (5.858089, 5),
 (5.8583684, 0),
 (5.858782, 39),
 (5.858789, 44),
 (5.859657, 12),
 (5.86001, 34),
 (5.8605747, 47),
 (5.861891, 33),
 (5.863766, 3),
 (5.8703322, 7),
 (5.8710017, 14),
 (5.8739433, 40),
 (5.8873477, 4)]

In [36]:
sentences

['Bats have been recognized as the natural reservoirs of a large variety of viruses',
 ' Special attention has been paid to bat coronaviruses as the two emerging coronaviruses which have caused unexpected human disease outbreaks in the 21st century, Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory Syndrome Coronavirus (MERS-CoV), are suggested to be originated from bats',
 ' Various species of horseshoe bats in China have been found to harbor genetically diverse SARS-like coronaviruses',
 ' Some strains are highly similar to SARS-CoV even in the spike protein and are able to use the same receptor as SARS-CoV for cell entry',
 ' On the other hand, diverse coronaviruses phylogenetically related to MERS-CoV have been discovered worldwide in a wide range of bat species, some of which can be classified to the same coronavirus species as MERS-CoV',
 ' Coronaviruses genetically related to human coronavirus 229E and NL63 have been detected in bats as well',


### Extracting answers

* taking a pre-trained model
* fitting it onto the given data

In [37]:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import json
import torch
# Load the JSON file

# Load the pre-trained model and tokenizer
model_name = 'bert-large-uncased-whole-word-masking-finetuned-squad'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForQuestionAnswering.from_pretrained(model_name)

# Define a function to extract answers from the documents
def extract_answer(contexts, question):
    # Tokenize the question and contexts
    inputs = tokenizer(question, contexts, add_special_tokens=True, return_tensors='pt')

    # Get the start and end positions of the answer
    start_scores, end_scores = model(**inputs).values()
    start_index = torch.argmax(start_scores)
    end_index = torch.argmax(end_scores) + 1

    # Decode the answer from the tokenized representation
    answer_tokens = inputs['input_ids'][0][start_index:end_index]
    answer = tokenizer.decode(answer_tokens)
    return answer


### Run on some sample queries

* cnt gives answer for Task A (paragraph/ detailed output)
* answer gives answer for Task B (phrase / accurate output)

In [38]:
cnt = model_docs('covid origin', context_df)
question = 'what is the origin of covid' 
contexts = concat
answer = extract_answer(contexts, question)
print(answer)

animals to humans


In [39]:
concat

' On the other hand, diverse coronaviruses phylogenetically related to MERS-CoV have been discovered worldwide in a wide range of bat species, some of which can be classified to the same coronavirus species as MERS-CoV  Based on the classification criteria of the the International Committee on Taxonomy of Viruses (ICTV), SARS-CoV and MERS-CoV represent two novel distinct coronavirus species in the genus Betacoronavirus ( Fig  We present an overview of current evidence for bat origin of these two viruses and also discuss how the spillover events of coronavirus from animals to humans may have happened  Understanding the bat origin of human coronaviruses is helpful for the prediction and prevention of another pandemic emergence in the future '

In [None]:
cnt = model_docs('covid origin', context_df)
question = 'what is the origin of covid' 
contexts = concat
answer = extract_answer(contexts, question)
print(answer)

bat species


In [44]:
cnt = model_docs('covid origin', context_df)
context = cnt[0]
cnt_ = cnt[1]
question = 'what is the origin of covid'
answer = extract_answer(cnt_, question)
print(answer)




In [None]:
cnt = model_docs('origin of covid', context_df)

In [None]:
cnt = model_docs('origin of covid', context_df)
question = 'what is the origin' 
contexts = cnt
answer = extract_answer(contexts, question)
print(answer)

bats have been recognized as the natural reservoirs of a large variety of viruses


In [None]:
cnt

'Bats have been recognized as the natural reservoirs of a large variety of viruses. Special attention has been paid to bat coronaviruses as the two emerging coronaviruses which have caused unexpected human disease outbreaks in the 21st century, Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory Syndrome Coronavirus (MERS-CoV), are suggested to be originated from bats. Various species of horseshoe bats in China have been found to harbor genetically diverse SARS-like coronaviruses. Some strains are highly similar to SARS-CoV even in the spike protein and are able to use the same receptor as SARS-CoV for cell entry. On the other hand, diverse coronaviruses phylogenetically related to MERS-CoV have been discovered worldwide in a wide range of bat species, some of which can be classified to the same coronavirus species as MERS-CoV. Coronaviruses genetically related to human coronavirus 229E and NL63 have been detected in bats as well. Moreover, intermediate

In [None]:
cnt = model_docs('covid origin', context_df)
question = 'what is the origin of covid' 
contexts = 'On the other hand, diverse coronaviruses phylogenetically related to MERS-CoV have been discovered worldwide in a wide range of bat species, some of which can be classified to the same coronavirus species as MERS-CoV'
answer = extract_answer(contexts, question)
print(answer)

bat species


In [None]:
cnt

'Autophagy is a degradative transport route conserved among all eukaryotic organisms. During starvation, cytoplasmic components are randomly sequestered into large double‐membrane vesicles called autophagosomes and delivered into the lysosome/vacuole where they are destroyed. Cells are able to modulate autophagy in response to their needs, and under certain circumstances, cargoes, such as aberrant protein aggregates, organelles, and bacteria can be selectively and exclusively incorporated into autophagosomes. As a result, this pathway plays an active role in many physiological processes, and it is induced in numerous pathological situations because of its ability to rapidly eliminate unwanted structures. Despite the advances in understanding the functions of autophagy and the identification of several factors, named Atg proteins that mediate it, the mechanism that leads to autophagosome formation is still a mystery. A major challenge in unveiling this process arises from the fact that 

In [41]:
cnt = model_docs('coronavirus immunity', context_df)
question = 'are covid effected people immune' 
contexts = concat
answer = extract_answer(contexts, question)
print(answer)

spillover events of coronavirus from animals to humans


In [42]:
concat

' On the other hand, diverse coronaviruses phylogenetically related to MERS-CoV have been discovered worldwide in a wide range of bat species, some of which can be classified to the same coronavirus species as MERS-CoV  Based on the classification criteria of the the International Committee on Taxonomy of Viruses (ICTV), SARS-CoV and MERS-CoV represent two novel distinct coronavirus species in the genus Betacoronavirus ( Fig  We present an overview of current evidence for bat origin of these two viruses and also discuss how the spillover events of coronavirus from animals to humans may have happened  Understanding the bat origin of human coronaviruses is helpful for the prediction and prevention of another pandemic emergence in the future '

In [None]:
cnt = model_docs('coronavirus immunity', context_df)
question = 'are covid effected people immune' 
contexts = concat
answer = extract_answer(contexts, question)
print(answer)

In [None]:
cnt = model_docs('coronavirus weather', context_df)
question = "how does coronavirus weather changes"
contexts = cnt
answer = extract_answer(contexts, question)
print(answer)

as the weather warms, its distribution area expands and extends to higher latitudes of northern hemisphere


### Novelty

*Generate more questions from the text retrieved*

In [None]:
! git clone "https://github.com/AMontgomerie/question_generator.git"

fatal: destination path 'question_generator' already exists and is not an empty directory.


In [None]:
%cd "/content/question_generator"

/content/question_generator


In [None]:
%load questiongenerator.py
from questiongenerator import QuestionGenerator
from questiongenerator import print_qa

In [None]:
con = "COVID-19 is a novel coronavirus that emerged from Wuhan, China in December 2019, and within 3 months became a global pandemic. AREAS COVERED: PubMed search of published data on COVID-19, respiratory infections, and diabetes mellitus (DM). DM associates with impairments of both cellular and humoral immunity. Early emergent global data reveal that severity of clinical outcome from COVID-19 infection (including hospitalization and admission to Intensive Care Unit [ICU]), associate with co-morbidities, prominently DM. The key principles of management of COVID-19 in patients with DM include ongoing focused outpatient management (remotely where necessary) and maintenance of good glycemic control. EXPERT OPINION: We will remember the dawn of the third decade of the twenty-first century as a time when the world changed, the true scale and impact of which is hard for us to imagine. Like a phoenix from the ashes though, COVID-19 provides us with a great learning opportunity to renew insights into ourselves as individuals, our clinical teams, and the optimized provision of care for our patients. COVID-19 has re-shaped and re-focused our collective societal values, with a sea-changed shift from materialistic to human-centric, from self-centredness to altruism, ultimately for the betterment of patient care and the whole of society."

In [None]:
qg = QuestionGenerator()
qg.generate(con, num_questions=10)

Generating questions...





Evaluating QA pairs...



[{'question': 'What are the key principles of management of COVID-19 in patients with DM?',
  'answer': 'The key principles of management of COVID-19 in patients with DM include ongoing focused outpatient management (remotely where necessary) and maintenance of good glycemic control.'},
 {'question': 'What is the key principle of COVID-19?',
  'answer': 'Like a phoenix from the ashes though, COVID-19 provides us with a great learning opportunity to renew insights into ourselves as individuals, our clinical teams, and the optimized provision of care for our patients.'},
 {'question': 'What is the most common cause of DM?',
  'answer': 'DM associates with impairments of both cellular and humoral immunity.'},
 {'question': 'What is the impact of COVID-19 on society?',
  'answer': 'COVID-19 has re-shaped and re-focused our collective societal values, with a sea-changed shift from materialistic to human-centric, from self-centredness to altruism, ultimately for the betterment of patient car

In [None]:
con = "INTRODUCTION The aim of this analysis was to describe comprehensively the cross-sectional and longitudinal patterns of analgesic and nutraceutical medication use for knee osteoarthritis  in a contemporary US cohort and to investigate associated demographic and clinical factors. Overall there was no change in the proportion of participants frequently using prescription or over the counter  analgesics at 36 months, although most people had changed medication type; of those using a traditional analgesic at baseline approximately one third were still using the same type at 36 months . Familiarize yourself with the physiologic changes of aging, especially the respiratory system. 2, 3 Know, at minimum, that chief complaints and presentations are a lot less specific with age. How has age contributed to my patients' physiology? "

In [None]:
qg.generate(con, num_questions=3)

Generating questions...





Evaluating QA pairs...



[{'question': 'how many people are using analgesics at 36 months?',
  'answer': 'Overall there was no change in the proportion of participants frequently using prescription or over the counter analgesics at 36 months, although most people had changed medication type; of those using a traditional analgesic at baseline approximately one third were still using the same type at 36 months.'},
 {'question': 'What was the purpose of this study?',
  'answer': 'INTRODUCTION The aim of this analysis was to describe comprehensively the cross-sectional and longitudinal patterns of analgesic and nutraceutical medication use for knee osteoarthritis in a contemporary US cohort and to investigate associated demographic and clinical factors.'},
 {'question': 'How has age contributed to the physiology of my patients?',
  'answer': "How has age contributed to my patients' physiology?"}]