This notebook aims to:
- Prepare a list of papers and their relevance to the task under consideration.
- Prepare a list and a webpage of most relevant papers and their abstracts.
- Display top 10 most relevant papers and their abstracts.

In [None]:
zabt = "Papers on Ethical and Social Considerations"
znam = "Papers_on_Ethical_and_Social_Considerations"

Outside this notebook: take the task's specification; make a unique list of words; remove common words; and optionally sort them.

In [None]:
zwds = "access addressed adherence affects anxiety area arise articulate assessment barriers behaviors building burden capacity care closures connect considerations control coordinate covid-19 drivers duplication efforts embed enablers engage establish ethical ethics expanded fear frameworks fuel global identification identify immediate impacts information integrated issues local masks measures media minimize misinformation modification multidisciplinary networks operational outbreak oversight particularly patients physical platforms prevention principles providing psychological published qualitative rapid research response rumor salient school sciences secondary social srh standards stigma support surgical sustained systematically team thematic translate underlying uptake"

Import python packages: os, pandas, json, IPython, and spacy.

In [None]:
import os
import pandas as pd
import json
from IPython.core.display import display, HTML
# !pip uninstall spacy # Uncomment this if installed version of spacy fails.
# !pip install spacy # Uncomment this if spacy package is not installed.
import spacy
# !python -m spacy download en # Uncomment this if en language is not loaded in spacy package. 
nlp = spacy.load('en')



Apply spacy's nlp tool to the set of selected words.

In [None]:
zchk = nlp(zwds)

Specify the location of files of papers provided by this challenge.

In [None]:
ztop = '/kaggle/input/CORD-19-research-challenge'

Make an empty dataframe, to populate later.

In [None]:
zdf0 = pd.DataFrame(columns=['Folder', 'File', 'Match'])

Go through each file, review the Abstract text contained in it, compute its relevance to the task, and add it to the dataframe.

In [None]:
%%capture

for zsub, zdir, zfis in os.walk(ztop):

    for zfil in zfis:
        if zfil.endswith(".json"):
            
            with open(zsub + os.sep + zfil) as f:
                zout = json.load(f)
            f.close()
            
            zout = " ".join([part['text'] for part in zout['abstract']])
            zout = zchk.similarity(nlp(zout))
            
            zdf0 = zdf0.append({'Folder': zsub.replace(ztop, ""), 'File': zfil, 'Match': zout}, ignore_index=True)
            
print(zdf0.head(4))

Export this dataframe as a csv file.

In [None]:
zdf0.to_csv(znam + '_Check.csv', index = False)

Make a subset dataframe of records with more than 60% relevance.

In [None]:
zdf6 = zdf0[zdf0.Match > 0.6].sort_values(by=['Match'], axis=0, ascending=False, inplace=False)
print(zdf6.head(4))

Export this subset dataframe as another csv file.

In [None]:
zdf6.to_csv(znam + '_Relevant.csv', index = False)

Make a webapage html of list and abstracts of papers with more than 60% relevance.

In [None]:
%%capture

zht0 = "<html>\n<head>\n"
zht0 = zht0 + "<title>Relevant Papers for Vaccines and Therapeutics</title>\n"
zht0 = zht0 + "<script>\nfunction openPop(x) {\nei = document.getElementById('pop_' + x);\n"
zht0 = zht0 + "ei.style.display='block';\nec = document.getElementsByClassName('pip');\nvar i;\n"
zht0 = zht0 + "for (i = 0; i < ec.length; i++) {\nif ( ec[i] != ei) { ec[i].style.display='none'; }; }; }\n"
zht0 = zht0 + "function shutPop(x) { document.getElementById('pop_' + x).style.display='none'; }\n</script>\n"
zht0 = zht0 + "<style>table, th, td { border: 1px solid black; }</style>\n"
zht0 = zht0 + "</head>\n<body>\n"
zht0 = zht0 + "<h1>" + zabt + "</h1>\n"
zht0 = zht0 + "<p>The following is a list of relevant papers.</p><br />\n"
zht0 = zht0 + "<p>Click on a Title to pop up its Abstract.</p><br />\n"
zht0 = zht0 + "<table>\n<tbody>\n<tr><th>Title</th>\n<th>Abstract</th></tr>\n"

In [None]:
zht6 = zht0 # zht6 is to be saved later as a file.
zhtd = zht0 # zhtd is a smaller version of zht6, for displaying in this notebook.

for indx, cont in zdf6.iterrows():
    
    with open(ztop + os.sep + cont['Folder'] + os.sep + cont['File']) as f:
        ztxt = json.load(f)
        f.close()
        
    ztxt = " ".join([part['text'] for part in ztxt['abstract']])
    
    zhta = "<tr><td><div onClick=openPop(" + str(indx) + ")>" + str(cont['File']) + "</div></td>\n"    
    zhta = zhta + "<td><div onClick=shutPop(" + str(indx) + ") class='pip' id='pop_" + str(indx) + "' style='display:none;'>" + ztxt + "</div></td></tr>\n"
    
    zht6 = zht6 + zhta
    if indx < 10:
        zhtd = zhtd + zhta

zht6 = zht6 + "</body>\n</html>"
zhtd = zhtd + "</body>\n</html>"

Save the webpage html as a file.

In [None]:
%%capture

zout = open(znam + "_Relevant_10.html","a")
zout.write(zht6)
zout.close()

Display the smaller html as a webpage here.

In [None]:
display(HTML(zhtd))