<p align="center">
 <img src="http://www.di.uoa.gr/themes/corporate_lite/logo_el.png" title="Department of Informatics and Telecommunications - University of Athens"/> </p>

---
<h1 align="center" style="font-style: italic;"> 
  Artificial Intelligence II
</h1>
<h1 align="center" > 
  Deep Learning for Natural Language Processing
</h1>
<h1 align="center" > 
  Homework <b>#4</b>
</h1>

---

<h3 align="center"> 
 <b>Winter semester 2020-2021</b>
</h3>
<h3 align="center"> 
 <b>Konstantinos Nikoletos</b>
</h3>
<h3 align="center"> 
 <b>sdi: 1115201700104</b>
</h3>



---
---
#  <center> <b>  </b> </center>

### __Task__ 
This exercise is about developing a document retrieval system to return titles of scientific
papers containing the answer to a given user question. You will use the first version of
the COVID-19 Open Research Dataset (CORD-19) in your work (articles in the folder
comm use subset).


For example, for the question “What are the coronaviruses?”, your system can return the
paper title “Distinct Roles for Sialoside and Protein Receptors in Coronavirus Infection”
since this paper contains the answer to the asked question.


To achieve the goal of this exercise, you will need first to read the paper Sentence-BERT:
Sentence Embeddings using Siamese BERT-Networks, in order to understand how you
can create sentence embeddings. In the related work of this paper, you will also find other
approaches for developing your model. For example, you can using Glove embeddings,
etc. In this link, you can find the extended versions of this dataset to test your model, if
you want. You are required to:


<ol type="a">
  <li>Preprocess the provided dataset. You will decide which data of each paper is useful
to your model in order to create the appropriate embeddings. You need to explain
your decisions.</li>
  <li>Implement at least 2 different sentence embedding approaches (see the related work
of the Sentence-BERT paper), in order for your model to retrieve the titles of the
papers related to a given question.</li>
  <li>Compare your 2 models based on at least 2 different criteria of your choice. Explain
why you selected these criteria, your implementation choices, and the results. Some
questions you can pose are included here. You will need to provide the extra questions
you posed to your model and the results of all the questions as well.</li>
</ol>

### __Notebook__ 
Same implementation as Sentence Bert notebook but with adding CrossEncoders that I read that they perform even better 
---
---

__Import__ of essential libraries


In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd 
import sys # only needed to determine Python version number
import matplotlib # only needed to determine Matplotlib version 
import nltk
from nltk.stem import WordNetLemmatizer
import pprint
import torch
import torch.nn as nn
import torch.optim as optim
from torchtext import data
import logging
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('stopwords')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

Selecting device (GPU - CUDA if available)

In [None]:
# First checking if GPU is available
train_on_gpu=torch.cuda.is_available()

if(train_on_gpu):
    print('Training on GPU.')
else:
    print('No GPU available, training on CPU.')

Training on GPU.


# Loading data
---

In [None]:
# Opening data file
import io
from google.colab import drive
from os import listdir
from os.path import isfile, join
import json

drive.mount('/content/drive',force_remount=True)

Mounted at /content/drive


Loading the dictionary if it has been created

In [None]:
#@title Select number of papers that will be feeded in the model { vertical-output: true, display-mode: "both" }
number_of_papers = "9000" #@param ["1000","3000", "6000","9000"]
import pickle

CORD19_Dataframe = r"/content/drive/My Drive/AI_4/CORD19_SentenceMap_"+number_of_papers+".pkl"
with open(CORD19_Dataframe, 'rb') as drivef:
  CORD19Dictionary = pickle.load(drivef)

OR the summary of the papers

In [None]:
#@title Select number of summarized papers that will be feeded in the model { vertical-output: true, display-mode: "both" }
number_of_papers = "9000" #@param ["1000", "3000", "6000", "9000"]
import pickle

CORD19_Dataframe = r"/content/drive/My Drive/AI_4/CORD19_SentenceMap_Summarized_"+number_of_papers+".pkl"
with open(CORD19_Dataframe, 'rb') as drivef:
  CORD19Dictionary = pickle.load(drivef)

## Queries
---

In [None]:
query_list = [
  'What are the coronoviruses?',
  'What was discovered in Wuhuan in December 2019?',
  'What is Coronovirus Disease 2019?',
  'What is COVID-19?',
  'What is caused by SARS-COV2?', 'How is COVID-19 spread?',
  'Where was COVID-19 discovered?','How does coronavirus spread?'
]

proposed_answers = [
  'Coronaviruses (CoVs) are common human and animal pathogens that can transmit zoonotically and cause severe respiratory disease syndromes. ',
  'In December 2019, a novel coronavirus, called COVID-19, was discovered in Wuhan, China, and has spread to different cities in China as well as to 24 other countries.',
  'Coronavirus Disease 2019 (COVID-19) is an emerging disease with a rapid increase in cases and deaths since its first identification in Wuhan, China, in December 2019.',
  'COVID-19 is a viral respiratory illness caused by a new coronavirus called SARS-CoV-2.',
  'Coronavirus disease (COVID-19) is caused by SARS-COV2 and represents the causative agent of a potentially fatal disease that is of great global public health concern.', 
  'First, although COVID-19 is spread by the airborne route, air disinfection of cities and communities is not known to be effective for disease control and needs to be stopped.',
  'In December 2019, a novel coronavirus, called COVID-19, was discovered in Wuhan, China, and has spread to different cities in China as well as to 24 other countries.',
  'The new coronavirus was reported to spread via droplets, contact and natural aerosols from human-to-human.'
]

myquery_list = [
  "How long can the coronavirus survive on surfaces?",
  "What means COVID-19?",
  "Is COVID19 worse than flue?",
  "When the vaccine will be ready?",
  "Whats the proteins that consist COVID-19?",
  "Whats the symptoms of COVID-19?",
  "How can I prevent COVID-19?",
  "What treatments are available for COVID-19?",
  "Is hand sanitizer effective against COVID-19?",
  "Am I at risk for serious complications from COVID-19 if I smoke cigarettes?",
  "Are there any FDA-approved drugs (medicines) for COVID-19?",
  "How are people tested?",
  "Why is the disease being called coronavirus disease 2019, COVID-19?",
  "Am I at risk for COVID-19 from mail, packages, or products?",
  "What is community spread?",
  "How can I protect myself?",
  "What is a novel coronavirus?",
  "Was Harry Potter a good magician?"
]

# Results dataframes

In [None]:
resultsDf = pd.DataFrame(columns=['Number of papers','Embeddings creation time'])

queriesDf = pd.DataFrame(columns=['Query','Proposed_answer','Model_answer','Cosine_similarity'])
queriesDf['Query'] = query_list
queriesDf['Proposed_answer'] = proposed_answers

myQueriesDf = pd.DataFrame(columns=['Query','Model_answer','Cosine_similarity'])
myQueriesDf['Query'] = myquery_list

queriesDf

Unnamed: 0,Query,Proposed_answer,Model_answer,Cosine_similarity
0,What are the coronoviruses?,Coronaviruses (CoVs) are common human and anim...,,
1,What was discovered in Wuhuan in December 2019?,"In December 2019, a novel coronavirus, called ...",,
2,What is Coronovirus Disease 2019?,Coronavirus Disease 2019 (COVID-19) is an emer...,,
3,What is COVID-19?,COVID-19 is a viral respiratory illness caused...,,
4,What is caused by SARS-COV2?,Coronavirus disease (COVID-19) is caused by SA...,,
5,How is COVID-19 spread?,"First, although COVID-19 is spread by the airb...",,
6,Where was COVID-19 discovered?,"In December 2019, a novel coronavirus, called ...",,
7,How does coronavirus spread?,The new coronavirus was reported to spread via...,,


# SBERT
---

In [None]:
!pip install -U sentence-transformers

Requirement already up-to-date: sentence-transformers in /usr/local/lib/python3.6/dist-packages (0.4.1.2)


# Selecting transformer and Cross Encoder

In [None]:
from sentence_transformers import SentenceTransformer, util, CrossEncoder
import torch
import time

encoder = SentenceTransformer('msmarco-distilbert-base-v2')
cross_encoder = CrossEncoder('cross-encoder/ms-marco-TinyBERT-L-6')

# Initializing corpus

In [None]:
corpus = list(CORD19Dictionary.keys())

# Creating the embeddings

Encoding the papers

In [None]:
%%time
corpus_embeddings = encoder.encode(corpus, convert_to_tensor=True, show_progress_bar=True,device='cuda')

HBox(children=(FloatProgress(value=0.0, description='Batches', max=6793.0, style=ProgressStyle(description_wid…


CPU times: user 1min 48s, sys: 32.6 s, total: 2min 20s
Wall time: 2min 16s


# Saving corpus as tensors to drive

In [None]:
corpus_embeddings_path = r"/content/drive/My Drive/AI_4/corpus_embeddings_6000_CrossEncoder.pt"
torch.save(corpus_embeddings,corpus_embeddings_path)

# Loading embeddings if have been created and saved



---

In [None]:
corpus_embeddings_path = r"/content/drive/My Drive/AI_4/corpus_embeddings_6000_CrossEncoder.pt"
with open(corpus_embeddings_path, 'rb') as f:
    corpus_embeddings = torch.load(f)

# Evaluation
---


In [None]:
import re
from  nltk import tokenize
from termcolor import colored


def paperTitle(answer,SentenceMap):
  record = SentenceMap[answer]
  print("Paper title:",record[1])
  print("Paper id:   ",record[0])  
  
def evaluation(query_list,top_k,resultsDf):
  query_answers = []
  scores = []

  for query in query_list:
    #Encode the query using the bi-encoder and find potentially relevant corpus
    start_time = time.time()
    question_embedding = encoder.encode(query, convert_to_tensor=True,device='cuda')
    hits = util.semantic_search(question_embedding, corpus_embeddings, top_k=top_k)
    hits = hits[0]  # Get the hits for the first query

    #Now, score all retrieved corpus with the cross_encoder
    cross_inp = [[query, corpus[hit['corpus_id']]] for hit in hits]
    cross_scores = cross_encoder.predict(cross_inp)
  
    #Sort results by the cross-encoder scores
    for idx in range(len(cross_scores)):
        hits[idx]['cross-score'] = cross_scores[idx]

    hits = sorted(hits, key=lambda x: x['cross-score'], reverse=True)
    end_time = time.time()

    #Output of top-5 hits
    print("\n\n======================\n\n")
    print("Query:",colored(query,'green') )
    
    print("Results (after {:.3f} seconds):".format(end_time - start_time))
    iter=0
    for hit in hits[0:top_k]:
        print("\n-> ",iter+1)
        answer = ' '.join([re.sub(r"^\[.*\]", "", x) for x in corpus[hit['corpus_id']].split()])
        if len(tokenize.word_tokenize(answer)) > 1:
          print("Score: {:.4f}".format(hit['cross-score']))
          
          paperTitle(corpus[hit['corpus_id']],CORD19Dictionary)
          print("Anser size: ",len(tokenize.word_tokenize(answer)))
          print("Anser: ")
          if iter==0:
            query_answers.append(answer)
            scores.append(hit['cross-score'].item())
          iter+=1
          print(colored(answer,'yellow'))
          
  
  resultsDf['Model_answer'] = query_answers
  resultsDf['Cosine_similarity'] = scores


In [None]:
top_k = 3
evaluation(query_list,top_k,queriesDf)





Query: [32mWhat are the coronoviruses?[0m
Results (after 0.839 seconds):

->  1
Score: 0.0639
Paper title: Citation: Interactions Between Enteroviruses and the Inflammasome: New Insights Into Viral Pathogenesis
Paper id:    423e1f15afb86012057acacc26d0766aa4bc582a
Anser size:  7
Anser: 
[33mEnteroviruses are the members of Picornaviridae.[0m

->  2
Score: 0.0185
Paper title: Full Genome Virus Detection in Fecal Samples Using Sensitive Nucleic Acid Preparation, Deep Sequencing, and a Novel Iterative Sequence Classification Algorithm
Paper id:    ab98d1b125aa0704e63adef426b27abd32e935f0
Anser size:  14
Anser: 
[33mCosavirus is a new genus in the Picornaviridae family first described in 2008   .[0m

->  3
Score: 0.0073
Paper title: Identification of diverse viruses in upper respiratory samples in dromedary camels from United Arab Emirates
Paper id:    04b5f15cca91a7b810216682780f8ea6e1ab3046
Anser size:  2
Anser: 
[33mOrthonairoviruses.[0m




Query: [32mWhat was discovered i

In [None]:
top_k = 3
evaluation(myquery_list,top_k,myQueriesDf)





Query: [32mHow long can the coronavirus survive on surfaces?[0m
Results (after 0.537 seconds):

->  1
Score: 0.9850
Paper title: Outbreak of Novel Coronavirus (SARS-Cov-2): First Evidences From International Scientific Literature and Pending Questions
Paper id:    7b7c71218f8d7ea1a1f8f702e4262b839bf7cc8a
Anser size:  15
Anser: 
[33mOn inanimate surfaces, human coronaviruses can remain infectious for up to 9 days.[0m

->  2
Score: 0.7655
Paper title: Characterisation of the canine faecal virome in healthy dogs and dogs with acute diarrhoea using shotgun metagenomics
Paper id:    fcb1ba715b2516823fee057cbb0f8276c76d19d7
Anser size:  21
Anser: 
[33mCanine coronavirus can be shed in faeces in high numbers for up to 156 days [44, 45] .[0m

->  3
Score: 0.1069
Paper title: Human Coronaviruses: Insights into Environmental Resistance and Its Influence on the Development of New Antiseptic Strategies
Paper id:    d171f82b892a2afafc2bc8a5458219dc04c8fd8d
Anser size:  21
Anser: 
[33mHum

# Overall results

## 6000 papers with no summarization 
---

### Time needed for creating the embeddings: 
- CPU times: 
  - user 13min 10s
  - sys: 5min 40s
  - total: 18min 51s
- Wall time: 18min 26s

### Remarks
Best results among the notebooks so far, almost 5/7 questions are answered and from mine 7/17. I expected better results since Cross Encoders enhance much the performance of Sentence Bert.

__Top-k__ 

Top-2 and 3 have lots of answers, as I noticed that are better that the first one. Also good results and with some tunning would be nearly to the wanted.




### Results

In [None]:
with pd.option_context('display.max_colwidth', None):
  display(queriesDf)

Unnamed: 0,Query,Proposed_answer,Model_answer,Cosine_similarity
0,What are the coronoviruses?,Coronaviruses (CoVs) are common human and animal pathogens that can transmit zoonotically and cause severe respiratory disease syndromes.,C oronaviruses (CoV) represent a diverse family of positivesense RNA viruses capable of causing respiratory and enteric disease in human and animal hosts.,0.099706
1,What was discovered in Wuhuan in December 2019?,"In December 2019, a novel coronavirus, called COVID-19, was discovered in Wuhan, China, and has spread to different cities in China as well as to 24 other countries.","A mysterious illness causing pneumonia in December 2019 in Wuhan, China is now growing into a potential pandemic.",0.919291
2,What is Coronovirus Disease 2019?,"Coronavirus Disease 2019 (COVID-19) is an emerging disease with a rapid increase in cases and deaths since its first identification in Wuhan, China, in December 2019.",CDC considered the 2019-nCoV as a possible pathogen causing the outbreak .,0.279708
3,What is COVID-19?,COVID-19 is a viral respiratory illness caused by a new coronavirus called SARS-CoV-2.,"For COVID-19, there is so far limited evidence for specific risk factors; we therefore assumed that at most 40% of travellers would be aware of a potential exposure.",0.827424
4,What is caused by SARS-COV2?,Coronavirus disease (COVID-19) is caused by SARS-COV2 and represents the causative agent of a potentially fatal disease that is of great global public health concern.,"The SARS outbreak of 2002-2003 was caused by SARS-CoV, a novel coronavirus.",0.951182
5,How is COVID-19 spread?,"First, although COVID-19 is spread by the airborne route, air disinfection of cities and communities is not known to be effective for disease control and needs to be stopped.","At the same time, there is great concern about potential public health consequences if COVID-19 spreads to developing countries that lack health infrastructure and resources to combat it effectively (de Salazar et al., 2020) .",0.971672
6,Where was COVID-19 discovered?,"In December 2019, a novel coronavirus, called COVID-19, was discovered in Wuhan, China, and has spread to different cities in China as well as to 24 other countries.","Three cases of COVID-19 were confirmed on 24 January, the first cases in Europe.",0.938341
7,How does coronavirus spread?,"The new coronavirus was reported to spread via droplets, contact and natural aerosols from human-to-human.","Corona viruses are transmitted via airborne zoonotic droplets, and viral replication occurs in the ciliated epithelium, resulting in cellular damage and inflammatory reactions at the site of infection [3, 4] .",0.893843


In [None]:
with pd.option_context('display.max_colwidth', None):
  display(myQueriesDf)

Unnamed: 0,Query,Model_answer,Cosine_similarity
0,How long can the coronavirus survive on surfaces?,"Despite the general view that enveloped viruses are fragile outside the host, several coronaviruses remain infective after drying on surfaces for more than 24 h as reviewed by Otter et al.",0.957455
1,What means COVID-19?,"By submitting their studies to ""COVID-19 Open,"" researchers can share their data while meeting their need to retain authorship, document precedence and facilitate international scientific cooperation in the response to this emergency.",0.399066
2,Is COVID19 worse than flue?,"In all types of influenza virus, mortality was higher in those treated with corticosteroids than in controls, although symptoms were more rapidly progressive patients and the risk of ARDS higher in patients infected with H7N9 [1, 2] .",0.008936
3,When the vaccine will be ready?,"If an FDA-approved live attenuated vaccine is required, it is recommended that it be given at least 4 weeks before the first or 4 weeks after the third study injection.",0.410219
4,Whats the proteins that consist COVID-19?,"Four structural proteins are encoded by the CoV genomes: spike (S), membrane (M), envelope (E), and nucleocapsid (N).",0.020674
5,Whats the symptoms of COVID-19?,"The novel pneumonia was named as Corona Virus by World Health Organization (WHO) ( the common symptoms of COVID-19 at illness onset were fever, fatigue, dry cough, myalgia, and dyspnea 3 .",0.939072
6,How can I prevent COVID-19?,"For COVID-19, there is so far limited evidence for specific risk factors; we therefore assumed that at most 40% of travellers would be aware of a potential exposure.",0.366725
7,What treatments are available for COVID-19?,Although treatment with neuraminidase inhibitors (oseltamivir or zanamivir) is recommended in all patients with suspected or confirmed influenza requiring hospitalization their use in non-severe influenza could be more harmful than beneficial because of the possibility of selection of resistant mutants .,0.001212
8,Is hand sanitizer effective against COVID-19?,"At the same time, there is great concern about potential public health consequences if COVID-19 spreads to developing countries that lack health infrastructure and resources to combat it effectively (de Salazar et al., 2020) .",0.014746
9,Am I at risk for serious complications from COVID-19 if I smoke cigarettes?,"For COVID-19, there is so far limited evidence for specific risk factors; we therefore assumed that at most 40% of travellers would be aware of a potential exposure.",0.164191


## 9000 papers with no summarization 
---

Session crashed due to RAM


## 6000 papers with paraphrase-distilroberta-base-v1 model and summarization 
---

### Time needed for creating the embeddings: 
- CPU times: 
  - user: 1min 18s
  - sys: 22.8 s
  - total: 1min 37s
- Wall time: 1min 37s

### Remarks
Not good results. From these results I think that the BERT summarizer parameters were not the appropriate and I should experiment with them. I shouldn't have so strict summarization and I may over summarized the papers.


__Top-k__ 

Not good.



### Results

In [None]:
with pd.option_context('display.max_colwidth', None):
  display(queriesDf)

Unnamed: 0,Query,Proposed_answer,Model_answer,Cosine_similarity
0,What are the coronoviruses?,Coronaviruses (CoVs) are common human and animal pathogens that can transmit zoonotically and cause severe respiratory disease syndromes.,CoV is a group of viruses that belong to the Coronaviridae family and the Nidovirales order.,0.08308
1,What was discovered in Wuhuan in December 2019?,"In December 2019, a novel coronavirus, called COVID-19, was discovered in Wuhan, China, and has spread to different cities in China as well as to 24 other countries.","2016 (Wu et al., ,",0.002247
2,What is Coronovirus Disease 2019?,"Coronavirus Disease 2019 (COVID-19) is an emerging disease with a rapid increase in cases and deaths since its first identification in Wuhan, China, in December 2019.",COVID-19: coronavirus disease 2019; PPE: personal protective equipment.,0.651608
3,What is COVID-19?,COVID-19 is a viral respiratory illness caused by a new coronavirus called SARS-CoV-2.,COVID-19: coronavirus disease 2019; PPE: personal protective equipment.,0.96314
4,What is caused by SARS-COV2?,Coronavirus disease (COVID-19) is caused by SARS-COV2 and represents the causative agent of a potentially fatal disease that is of great global public health concern.,We thus assumed that a SARS-related CoV is involved in the outbreak.,0.3216
5,How is COVID-19 spread?,"First, although COVID-19 is spread by the airborne route, air disinfection of cities and communities is not known to be effective for disease control and needs to be stopped.",COVID-19: coronavirus disease 2019; PPE: personal protective equipment.,0.042323
6,Where was COVID-19 discovered?,"In December 2019, a novel coronavirus, called COVID-19, was discovered in Wuhan, China, and has spread to different cities in China as well as to 24 other countries.",Strengthened surveillance of COVID-19 cases was implemented in France on 10 January 2020.,0.346553
7,How does coronavirus spread?,"The new coronavirus was reported to spread via droplets, contact and natural aerosols from human-to-human.",This may hint that coronavirus does not spread so widely in humans.,0.133488


In [None]:
with pd.option_context('display.max_colwidth', None):
  display(myQueriesDf)

Unnamed: 0,Query,Model_answer,Cosine_similarity
0,How long can the coronavirus survive on surfaces?,"Canine coronavirus can be shed in faeces in high numbers for up to 156 days [44, 45] .",0.765538
1,What means COVID-19?,COVID-19: coronavirus disease 2019; PPE: personal protective equipment.,0.927158
2,Is COVID19 worse than flue?,Corticosteroids could increase mortality in patients with influenza pneumonia.,0.002202
3,When the vaccine will be ready?,Current vaccine strategies take approximately 6 months for production.,0.210023
4,Whats the proteins that consist COVID-19?,"Of these proteins, 34 were shared across all strains ( Fig.",0.000441
5,Whats the symptoms of COVID-19?,Initial signs and symptoms include,0.000551
6,How can I prevent COVID-19?,Prevention of MERS-CoV transmission involves avoiding exposure.,0.002867
7,What treatments are available for COVID-19?,COVID-19: coronavirus disease 2019; PPE: personal protective equipment.,0.081656
8,Is hand sanitizer effective against COVID-19?,the effectiveness of handwashing and alcohol-based hand sanitizers.,0.065614
9,Am I at risk for serious complications from COVID-19 if I smoke cigarettes?,Tobacco/cigarette smoke exposure and influenza A virus infection.,0.001866


## 9000 papers with summarization 
---

### Time needed for creating the embeddings: 
- CPU times: 
  - user: 1min 48s
  - sys: 32.6 s
  - total: 2min 20s
- Wall time: 2min 16s

### Remarks
Again not good results and this is due my summarization tunning.

** Again I didn't have the time to re run and process again.


### Results

In [None]:
with pd.option_context('display.max_colwidth', None):
  display(queriesDf)

Unnamed: 0,Query,Proposed_answer,Model_answer,Cosine_similarity
0,What are the coronoviruses?,Coronaviruses (CoVs) are common human and animal pathogens that can transmit zoonotically and cause severe respiratory disease syndromes.,Enteroviruses are the members of Picornaviridae.,0.063864
1,What was discovered in Wuhuan in December 2019?,"In December 2019, a novel coronavirus, called COVID-19, was discovered in Wuhan, China, and has spread to different cities in China as well as to 24 other countries.","An emergent pneumonia outbreak originated in Wuhan City, in the late December 2019 1 .",0.733632
2,What is Coronovirus Disease 2019?,"Coronavirus Disease 2019 (COVID-19) is an emerging disease with a rapid increase in cases and deaths since its first identification in Wuhan, China, in December 2019.",COVID-19: coronavirus disease 2019; PPE: personal protective equipment.,0.651608
3,What is COVID-19?,COVID-19 is a viral respiratory illness caused by a new coronavirus called SARS-CoV-2.,COVID-19: coronavirus disease 2019; PPE: personal protective equipment.,0.96314
4,What is caused by SARS-COV2?,Coronavirus disease (COVID-19) is caused by SARS-COV2 and represents the causative agent of a potentially fatal disease that is of great global public health concern.,We thus assumed that a SARS-related CoV is involved in the outbreak.,0.3216
5,How is COVID-19 spread?,"First, although COVID-19 is spread by the airborne route, air disinfection of cities and communities is not known to be effective for disease control and needs to be stopped.",The COVID-19 has then rapidly spread to all over China and the world.,0.97995
6,Where was COVID-19 discovered?,"In December 2019, a novel coronavirus, called COVID-19, was discovered in Wuhan, China, and has spread to different cities in China as well as to 24 other countries.","The epidemic of COVID-19 is caused by a novel virus first detected in Wuhan, China.",0.948981
7,How does coronavirus spread?,"The new coronavirus was reported to spread via droplets, contact and natural aerosols from human-to-human.",This may hint that coronavirus does not spread so widely in humans.,0.133488


In [None]:
with pd.option_context('display.max_colwidth', None):
  display(myQueriesDf)

Unnamed: 0,Query,Model_answer,Cosine_similarity
0,How long can the coronavirus survive on surfaces?,"On inanimate surfaces, human coronaviruses can remain infectious for up to 9 days.",0.984978
1,What means COVID-19?,COVID-19: coronavirus disease 2019; PPE: personal protective equipment.,0.927158
2,Is COVID19 worse than flue?,"In comparison, COVID-19 showed similar trends with SARS patients .",0.06445
3,When the vaccine will be ready?,Current vaccine strategies take approximately 6 months for production.,0.210023
4,Whats the proteins that consist COVID-19?,"CSFV contains 4 structural proteins: C, Erns, E1 and E2.",0.000411
5,Whats the symptoms of COVID-19?,This is particularly true for the COVID-19.,0.134529
6,How can I prevent COVID-19?,It remains to be seen if this will be the case for COVID-19 as well.,0.246872
7,What treatments are available for COVID-19?,"As effective drugs for SARS, hormones and interferons can also be used to treat COVID-19 .",0.956893
8,Is hand sanitizer effective against COVID-19?,"The use of alcohol-based hand sanitizers is also effective [54, 55] .",0.205787
9,Am I at risk for serious complications from COVID-19 if I smoke cigarettes?,reported that people who have not been exposed to SARS-CoV-2 are all susceptible to COVID-19 .,0.015109


# References

[1] https://colab.research.google.com/drive/1l6stpYdRMmeDBK_vw0L5NitdiAuhdsAr?usp=sharing#scrollTo=D_hDi8KzNgMM

[2] https://www.sbert.net/docs/package_reference/cross_encoder.html