## Covid Related Papers

In [221]:
%load_ext autoreload
%autoreload 2
MAX_COLWIDTH = 200
MAX_ROWS = 2000

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\dwight\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [222]:
import pandas as pd
pd.options.display.max_colwidth = MAX_COLWIDTH
pd.options.display.max_rows = MAX_ROWS
from cord.cord19 import ResearchPapers

In [224]:
research_papers = ResearchPapers.from_data_dir()
research_papers.save()
#research_papers = ResearchPapers.from_pickle()

Loading metadata from data\CORD-19-research-challenge
Cleaning metadata
Fixing dates that are a list e.g. ['2020-02-05', '2020-02']
Fixing dates with the seasons e.g. 2014 Autumn
Fix dates like 2016 Nov 9 Jan-Feb
Fix dates like 2012 Jan-Mar
Convert Dates like 2020 Apr 13
Converting Dates like 2020 Apr
Converting Dates like 2020
Converting Dates like 2020-01-21
Indexing research papers
Finished Indexing in 46.0 seconds
Saving to data\ResearchPapers.pickle


## When was SARS-COV2 first noticed

In [225]:
metadata = research_papers.metadata
has_wuhan = metadata.abstract.str.contains('Wuhan')
before_nov19 = metadata.published < '2019-11-01'
cols = ['title', 'abstract', 'published', 'doi', 'sha']

### Which Papers mention Wuhan

In [226]:
wuhan_papers = metadata.loc[has_wuhan & (~before_nov19), cols].sort_values(['published'])
wuhan_papers

Unnamed: 0,title,abstract,published,doi,sha
18583,Coronaviruses: a paradigm of new emerging zoonotic diseases,"A novel type of coronavirus (2019-nCoV) infecting humans appeared in Wuhan, China, at the end of December 2019. Since the identification of the outbreak the infection quickly spread involving in o...",2019-12-01,10.1093/femspd/ftaa006,
19351,Surgical management strategies for orthopedic trauma patients under epidemic of novel coronavirus pneumonia,"With the outbreak of novel coronavirus pneumonia (NCP) induced by 2019 novel coronavirus (2019-nCoV) in Wuhan, Hubei Province in December 2019, more and more suspected or confirmed cases have been...",2020-01-01,,
19237,Analysis of 8 274 cases of new coronavirus nucleic acid detection and co-infection in Wuhan,Objective To investigate the positive rate for 2019-nCoV tests and co-infections in Wuhan district. Methods A total of 8 274 cases in Wuhan were enrolled in this cross-sectional study during Janua...,2020-01-01,,
19233,High resolution CT features of novel coronavirus pneumonia in children,Objective To investigate the high resolution CT (HRCT) features of novel coronavirus pneumonia (NCP) in children . Methods A retrospective analysis was performed on the chest HRCT findings of 22 c...,2020-01-01,,
18772,The 2019 novel coronavirus resource,"An ongoing outbreak of a novel coronavirus infection in Wuhan, China since December 2019 has led to 31,516 infected persons and 638 deaths across 25 countries (till 16:00 on February 7, 2020). The...",2020-01-01,10.16288/j.yczz.20-030,
18784,"Early Transmissibility Assessment of a Novel Coronavirus in Wuhan, China","Between December 1, 2019 and January 26, 2020, nearly 3000 cases of respiratory illness caused by a novel coronavirus originating in Wuhan, China have been reported. In this short analysis, we com...",2020-01-01,10.2139/ssrn.3524675,
18791,COVID-19: A New Virus as a Potential Rapidly Spreading in the Worldwide,"Covid-19 is a novel virus with high affinity to spread in the community. In December 2019, it was first identified in Wuhan, China. The symptoms are non-specific, so fever, cough, dyspnea, are pro...",2020-01-01,10.22038/jctm.2020.46924.1264,
19227,The diagnostic value of joint detection of serum IgMand IgG antibodies to 2019-nCoV in 2019-nCoV infection,Objective To investigate the diagnostic value of immunoglobulin M (IgM) and immunoglobulin G(IgG) antibodies to 2019 Novel Coronavirus (2019-nCoV) in 2019-nCoV infection. Method This is a retrospe...,2020-01-01,,
18807,Expert Recommendations for Tracheal Intubation in Critically ill Patients with Noval Coronavirus Disease 2019,"Coronavirus Disease 2019 (COVID-19), caused by a novel coronavirus (SARS-CoV-2), is a highly contagious disease. It firstly appeared in Wuhan, Hubei province of China in December 2019. During the ...",2020-01-01,10.24920/003724,
18808,COVID-19 (Novel Coronavirus 2019) - recent trends,"The World Health Organization (WHO) has issued a warning that, although the 2019 novel coronavirus (COVID-19) from Wuhan City (China), is not pandemic, it should be contained to prevent the global...",2020-01-01,10.26355/eurrev_202002_20378,


In [227]:
wuhan_papers[wuhan_papers.published.isnull()];

In [228]:
since_sars2 = research_papers.since_sarscov2()
sars2_index = since_sars2.index_tokens

In [229]:
sars2_index[:3]

1921    [health, agriculture, farming, sector, share, livestock, synopsis, oriented, problems, risks, solutions, impacts, disproportionate, require, include, huge, related, benefits, human, negative, posi...
1922    [compounds, structures, procedure, various, scientific, brown, antioxidant, potentially, effects, novel, antitumor, seaweeds, useful, nutraceuticals, show, protective, isolated, pharmaceuticals, m...
1928    [propaganda, continue, ability, regional, may, installed, business, heavily, although, subsidies, interests, control, rely, monopolistic, weakened, digital, problems, publishers, financial, houses...
Name: abstract, dtype: object

In [230]:
from collections import Counter, defaultdict

word_map = sars2_index.tolist()

In [231]:
from collections import Counter, defaultdict
from cord.text import clean, tokenize

def get_word_count(research_paper):
    word_counts = defaultdict(int)
    index_tokens = research_paper.metadata.abstract.apply(clean).apply(tokenize)
    for row in index_tokens.tolist():
        for word in row:
            word_counts[word] +=1

    word_counts = pd.DataFrame({'word': list(word_counts.keys()), 
                  'count': list(word_counts.values())}).sort_values(['count','word'],
                                                                    ascending=[False,True])
    return word_counts.query("count> 1").reset_index(drop=True)

post_sars_word_count = get_word_count(since_sars2).rename(columns={'count': 'after'})

In [232]:
post_sars_word_count

Unnamed: 0,word,after
0,patients,3418
1,covid-19,3023
2,coronavirus,2460
3,cases,2459
4,infection,1886
...,...,...
13930,δ346-348,2
13931,τinf,2
13932,τtrans,2
13933,•‒,2


In [233]:
before_sars = research_papers.query("published < '2019-11-30' & published > '2018-11-30'")
pre_sars_word_count = get_word_count(before_sars).rename(columns={'count': 'before'})

In [241]:
since_sars2.covid_related()

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\dwight\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Unnamed: 0,title,abstract,journal,source,authors,has_text,published,when
2812,Severe Middle East Respiratory Syndrome (MERS) Pneumonia,"Middle East Respiratory Syndrome (MERS) is a viral respiratory infection, which ranges from asymptomatic infection to severe pneumonia and multiorgan failure, caused by a novel coronavirus named ...",Reference Module in Biomedical Sciences,Elsevier,"Alenazi, Thamer H.; Arabi, Yaseen M.",True,2019-12-31,2 months ago
5145,Thrombocytopenia is associated with severe coronavirus disease 2019 (COVID-19) infections: A meta-analysis,"Background Coronavirus disease 2019 (COVID-19) is a novel infectious disease with lack of established laboratory markers available to evaluate illness severity. In this study, we investigate whet...",Clinica Chimica Acta,Elsevier,"Lippi, Giuseppe; Plebani, Mario; Michael Henry, Brandon",True,2020-03-13,1 week ago
6831,Attenuation of a virulent swine acute diarrhea syndrome coronavirus strain via cell culture passage,"Swine acute diarrhea syndrome coronavirus (SADS-CoV) is a newly identified enteric alphacoronavirus that causes fatal diarrhea in newborn piglets in China. Here, we propagated a virulent strain S...",Virology,Elsevier,"Sun, Y.; Cheng, J.; Luo, Y.; Yan, X.L.; Wu, Z.X.; He, L.L.; Tan, Y.R.; Zhou, Z.H.; Li, Q.N.; Zhou, L.; Wu, R.T.; Lan, T.; Ma, J.Y.",True,2019-12-31,2 months ago
6839,Characterization and evaluation of the pathogenicity of a natural recombinant transmissible gastroenteritis virus in China,"Porcine transmissible gastroenteritis virus (TGEV) is one of the major etiological agents of viral enteritis and fetal diarrhea in suckling piglets. In this study, a TGEV JS2012 strain was isolat...",Virology,Elsevier,"Guo, Rongli; Fan, Baochao; Chang, Xinjian; Zhou, Jinzhu; Zhao, Yongxiang; Shi, Danyi; Yu, Zhengyu; He, Kongwang; Li, Bin",True,2020-06-30,in 3 months
7218,Chapter Four Structural insights into coronavirus entry,Coronaviruses (CoVs) have caused outbreaks of deadly pneumonia in humans since the beginning of the 21st century. The severe acute respiratory syndrome coronavirus (SARS-CoV) emerged in 2002 and ...,Advances in Virus Research,Elsevier,"Tortorici, M. Alejandra; Veesler, David",True,2019-12-31,2 months ago
7949,A novel coronavirus outbreak of global health concern,A novel coronavirus outbreak of global health concern,The Lancet,Elsevier,"Wang, Chen; Horby, Peter W; Hayden, Frederick G; Gao, George F",True,2020-02-21,1 month ago
7951,COVID-19: what is next for public health?,COVID-19: what is next for public health?,The Lancet,Elsevier,"Heymann, David L; Shindo, Nahoko",True,2020-02-28,3 weeks ago
7954,Preparedness is essential for malaria-endemic regions during the COVID-19 pandemic,Preparedness is essential for malaria-endemic regions during the COVID-19 pandemic,The Lancet,Elsevier,"Wang, Jigang; Xu, Chengchao; Wong, Yin Kwan; He, Yingke; Adegnika, Ayôla A; Kremsner, Peter G; Agnandji, Selidji T; Sall, Amadou A; Liang, Zhen; Qiu, Chen; Liao, Fu Long; Jiang, Tingliang; Krishna...",True,2020-03-17,7 days ago
7956,COVID-19 and Italy: what next?,"The spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has already taken on pandemic proportions, affecting over 100 countries in a matter of weeks. A global response to prepa...",The Lancet,Elsevier,"Remuzzi, Andrea; Remuzzi, Giuseppe",True,2020-03-13,1 week ago
8336,Clinical and computed tomographic imaging features of novel coronavirus pneumonia caused by SARS-CoV-2,Purpose To investigate the clinical and imaging characteristics of computed tomography (CT) in novel coronavirus pneumonia (NCP) caused by SARS-CoV-2. Materials and methods A retrospective analys...,Journal of Infection,Elsevier,"Xu, Yu-Huan; Dong, Jing-Hui; An, Wei-Min; Lv, Xiao-Yan; Yin, Xiao-Ping; Zhang, Jian-Zeng; Dong, Li; Ma, Xi; Zhang, Hong-Jie; Gao, Bu-Lang",True,2020-02-25,4 weeks ago


In [186]:
word_counts = post_sars_word_count.merge(pre_sars_word_count, on=['word'], how='left').fillna(0)
word_counts.before = word_counts.before.astype(int)
word_counts['before_pct'] = (word_counts.before / word_counts.before.sum()) * 100
word_counts['after_pct'] = (word_counts.after / word_counts.after.sum()) * 100
word_counts['pct_diff'] = word_counts.after_pct - word_counts.before_pct
word_counts = word_counts[word_counts.pct_diff > 0]
word_counts.sort_values(['pct_diff'], ascending=False).head(100)

Unnamed: 0,word,after,before,before_pct,after_pct,pct_diff
1,covid-19,3023,0,0.0,0.944838,0.944838
0,patients,3418,1205,0.399274,1.068295,0.669022
2,coronavirus,2460,410,0.135852,0.768873,0.63302
3,cases,2459,613,0.203116,0.76856,0.565444
6,china,1771,291,0.096422,0.553526,0.457104
10,2019-ncov,1388,0,0.0,0.433819,0.433819
14,wuhan,1213,4,0.001325,0.379123,0.377798
17,sars-cov-2,1174,0,0.0,0.366933,0.366933
13,outbreak,1220,364,0.12061,0.381311,0.2607
11,novel,1380,515,0.170644,0.431319,0.260675


## Which words are common on SARS2 Research Papers

In [187]:
word_counts.sort_values(['pct_diff'], ascending=False).head(100)

Unnamed: 0,word,after,before,before_pct,after_pct,pct_diff
1,covid-19,3023,0,0.0,0.944838,0.944838
0,patients,3418,1205,0.399274,1.068295,0.669022
2,coronavirus,2460,410,0.135852,0.768873,0.63302
3,cases,2459,613,0.203116,0.76856,0.565444
6,china,1771,291,0.096422,0.553526,0.457104
10,2019-ncov,1388,0,0.0,0.433819,0.433819
14,wuhan,1213,4,0.001325,0.379123,0.377798
17,sars-cov-2,1174,0,0.0,0.366933,0.366933
13,outbreak,1220,364,0.12061,0.381311,0.2607
11,novel,1380,515,0.170644,0.431319,0.260675
