Introduction:
===========
**Coronavirus Disease 2019 (COVID-19)** was first identified in December 2019 in Wuhan, the capital of China's Hubei province and has since then spread to be identified as a Global Pandemic.

What is COVID-19:
-----------------
COVID-19 is a novel coronavirus which expresses itself in humans as a respiratory illness.

How does it spread:
----------------------------
According to the World Health Organization as of 15/04/2020 "This disease can spread from person to person through small droplets from the nose or mouth which are spread when a person with COVID-19 coughs or exhales. These droplets land on objects and surfaces around the person. Other people then catch COVID-19 by touching these objects or surfaces, then touching their eyes, nose or mouth. People can also catch COVID-19 if they breathe in droplets from a person with COVID-19 who coughs out or exhales droplets."

The question that is on everyones mind is **"How could we have prevented the spread?"**
We hope to allow the research to answer this.

What can we do to stop it:
--------
*For the purpose of this challenge we will be focusing on Nonpharmaceutical Interventions in preventing the spread of COVID-19.*

As per the CDC: **Nonpharmaceutical Interventions(NPIs)** are actions, apart from getting vaccinated and taking medicine, that people and communities can take to help slow the spread of illnesses like pandemic influenza (flu).

NPI's are typically broken up into 3 categories listed with examples:

    1. Personal: 
        -Staying home when you are sick
        -Covering coughs and sneezes with a tissue
        -Washing hands with soap and water or using hand sanitizer when soap and water is not available
    2. Community:
        -Social Distancing
        -Closures (child care centers, schools, places of worship, sporting events, etc)
    3. Environmental:
        -Routine surface cleaning 

The goal of our submission is to analyze the research done on the implementation, efficacy and scaling of NPI's, isolate these papers from other COVID-19 research articles and derive meaningful insights to help us mitigate the existing situaiton and prepare our systems for any future pandemic which may come our way.   



In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import glob
import json

In [2]:
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 50)
pd.set_option('display.width', 100)
pd.set_option('display.max_colwidth', -1)

  after removing the cwd from sys.path.


In [3]:
# data csv from Kaggle notebook instead of pickling
df = pd.read_csv('data.csv')

In [4]:
len(df)

1301

In [5]:
df.head()

Unnamed: 0.1,Unnamed: 0,paper_id,title,author_list,abstract,body_text,doi
0,0,b891efc6e1419713b05ff7d89b26d260478c28df,"6756 tuberculosis prevention in healthcare workers in china years after the severe acute respiratory syndrome pandemic\r\nName: title, dtype: object",[],"china has the world's second largest tuberculosis (tb) burden after india [1] . healthcare workers (hcws) in china's national tb control programme have a significantly increased level of exposure to tb disease and patients. after the severe acute respiratory syndrome pandemic in 2003, the chinese government made increased efforts to protect hcws from nosocomial tb infections and other respiratory infectious diseases, especially for those hcws working in chest hospitals and infectious disease hospitals. these efforts (although not universal throughout china) included the increased use of biosafety level 3 (bsl3) laboratories and respiratory isolation through negative pressure wards and rooms, and the practice of standardised biosafety protocols and procedures through continued training and education. systematic tb screening in hcws can improve their awareness of tb infection [2] . the tuberculin skin test (tst) has been widely used for the detection of tb infection for decades but due to cross-reactivity with the bacille calmette-guérin (bcg) vaccination and exposure to non-tb mycobacteria, the tst fails to provide sufficient specificity for the detection of tb infection [2] . in the past decade, interferon-γ (ifn-γ) release assays have been developed and commercialised to measure the in vitro release of ifn-γ from effector t-cells stimulated with the mycobacterium tuberculosis-specific antigens (the 6-kda early secretory antigenic target and 10-kda culture filtrate protein) as an aid to diagnosing tb infection. these new screening assays (quantiferon-tb gold in-tube (qiagen, venlo, the netherlands) and t-spot.tb (oxford immunotec, abingdon, uk)) have demonstrated higher specificity and at least comparable sensitivity for the diagnosis of tb infection compared with the tst [3].","the goal of the present study was to investigate and describe latent tb infection (ltbi) among hcws in china, and the effectiveness of infection control strategies in tb hospitals using the t-spot.tb assay and tst. this study was conducted in two major primary referral hospitals for tb patients and other respiratory infectious diseases in shandong province, china, in 2013: the shandong provincial chest hospital (spch), jinan; and linyi municipal chest hospital (lmch), linyi. approximately 3000 patients with active tb disease are treated in each hospital annually. since 2004, both hospitals have been equipped with bsl3 laboratories that are used for mycobacterial specimen testing (acid-fast bacillus (afb) smear test, culture and species identification) and respiratory isolation wards with negative pressure rooms for smear-positive tb patients. as recommended by china's national tuberculosis control programme, the tst was carried out by the mantoux method using a 5 tuberculin unit dose of bcg purified protein derivative (chengdu institute of biological products, chengdu, china). an induration of ⩾10 mm was defined as a positive tst. a blood sample for the t-spot.tb assay was drawn from each hcw before the tst was performed. t-spot.tb assays were carried out and interpreted according to the manufacturer's instructions. spot counts were analysed using an elispot reader (ctl-immunospot s5 core analyzer; cellular technology ltd, shaker heights, oh, usa). covariates (age, sex, working year and job category) associated with positive t-spot.tb results were analysed by univariate and multivariate analyses. univariate p-values were determined for each covariate by logistic regression. covariates with univariate p-values <0.20 were included in multivariate logistic regression models. statistical analysis was performed using stata/se version 13.1 (statacorp lp, college station, tx, usa). the study protocol was approved by the institutional review boards of spch and lmch. 1141 hcws (median age 35 years, range 19-70 years; physicians 18.1%, nurses 42.3%, laboratory staff 4.3%, medical technicians 10.9%, hospital logistic staff 12.7%, administrative staff 11.7%) out of 1334 total employees of both hospitals were enrolled and screened for tb infection in this study, after 66 were excluded due to age <18 years, pregnancy, tuberculin allergy or past tb disease history, and 127 refused to participate (figure 1). 935 (82.9%) out of 1141 hcws had bcg vaccination scars and 1071 (94.0%) reported having been exposured to active tb. of 1141 hcw participants, 70 (6.1%) were only given the t-spot.tb assay, 186 (16.3%) received the tst alone and six had an indeterminate or failed t-spot.tb assay (figure 1). a total of 879 hcws (median age 34 years, mean age 35.8 years; range 19-70 years) had both tst and t-spot.tb results available for analysis. the overall positivity rate of t-spot.tb testing was 13.9% (122 out of 879; 95% ci 11.7-16.3%), while the tst positivity rate was 49.3% (433 out of 879; 95% ci 45.9-52.5%). the agreement between tst and t-spot.tb test results was 51.9% (95% ci 48.6-55.2%; κ=0.027). results for univariate analysis showed that t-spot.tb positivity was only associated with older age and longer duration of healthcare service. in multivariate models that included both subject age and longer duration of healthcare service, only duration of healthcare service remained significantly associated with positive test results. specifically, subjects with >10 years of service had the highest odds of a positive test result (univariate or 2.25, 95% ci 1.53-3.31; >10 years versus ⩽10 years of service). our data suggests that the highly specific t-spot.tb assay had a significantly lower rate of positivity among hcws at the spch and lmch than the tst and when compared with other hospital-based studies in beijing, china (13.9% versus 33.6% at beijing chest hospital (bch) and 28.6% in a tertiary general hospital in beijing) [4, 5] . compared to the data from bch, both study hcw populations (spch/ lmch versus bch) share similarities in distribution of age and sex and occupational levels of exposure to m. tuberculosis. in addition, data from the 2011 chinese national tb surveillance summary suggest the reported prevalence of tb disease in shandong province (spch/lmch) and beijing (bch) are also similar at 15-44 cases per 100 000 population [6] . thus, the only major discernible difference between spch/lmch and bch is the lack of an iso15189-certified tb reference laboratory and respiratory isolation wards with negative pressure rooms for afb smear-positive tb patients at the bch. in conclusion, a proactive and comprehensive strategy for nosocomial infection control practices plays a pivotal role in protecting hcws from tb infection, which should be implemented in all chest hospital and infectious disease settings in china. the importance of effective measures against nosocomial tb infection has been clearly emphasised by the world health organization stop tb strategy and recent policy on infection control [7] . towards the goal of tb elimination, prevention and treatment of ltbi are essential elements of global tb control efforts",[['10.1183/23120541.00015-2015']]
1,1,7a5d731cd59597da7bcef171af936c0af4d7528c,epidemic and case investigations surveillance operation for the 141st confirmed case of middle east respiratory syndrome coronavirus in response to the patient's prior travel to jeju island,"['Open Access', 'Jong-Myon Bae']","the provincial government of jeju, south korea, was notified that a 42-year-old man infected with the middle east respiratory syndrome (mers) coronavirus had gone sightseeing in jeju island. although the visiting period might be interpreted as the incubation period of mers, the province decided to conduct active surveillance to prevent a worst-case scenario. based on the channel of movement of the patient, healthy isolation and active monitoring were conducted for persons who came in contact with the patient. during the active surveillance, none of the 56 persons in self-isolation and 123 persons under active monitoring became infected. this fact supports that mers is not contagious during the incubation period.","a 42-year-old man who was residing in seoul had a sudden fever at a body temperature of 38°c at dawn on june 10, 2015. after recalling that he visited the outpatient clinic of samsung medical center in seoul with his father on may 27, during which the 14th confirmed patient with middle east respiratory syndrome (mers), who was considered a super spreader, was staying at the hospital, he suspected having mers infection [1] and voluntarily reported to health authorities. he thus became the 141st patient with confirmed diagnosis of mers during the outbreak in south korea after may 20, 2015 ( figure 1 ). while conducting epidemiological investigation of the confirmed case according to the 2015 mers response guidelines [2] , disease prevention authorities became aware of two new facts. first, the patient visited a nearby family medicine clinic on may 29 and june 1 because of cold symptoms after may 23. second, from june 5 to june 8, 12 family members (8 adults and 4 children) from 4 families, including the confirmed patient's family, went sightseeing to jeju island on vacation. such epidemiological results can lead to two interpretations based on the 2-day to 14-day mers incubation period [3] . first, the patient started to experience symptoms 14 days, which is the maximum incubation period of mers, after the contact date, may 27. this means that he traveled to jeju island during the incubation period. second, the patient visited an outpatient family medicine clinic 2 and 5 days after the contact date, which can be considered as the minimum incubation period of mers. in other words, the 141st confirmed mers patient traveled to jeju island while he was contagious condition. the centers for disease control and prevention placed more weight on the latter interpretation and urgently alerted the jeju provincial government for proactive mers prevention on the midnight of june 18. upon notification, the jeju mers prevention headquarters (hereafter, ""the headquarters"") decided to respond proactively and preemptively in order to prevent the most serious of potential scenarios. as such, the following surveillance operations were conducted: 1) additional epidemiological investigation to determine the travel channel of the family members and other companions of the patient while the 141st confirmed patient was being isolated in seoul, and his family members and travel companions were being quarantined at home after being categorized as close contacts, the headquarters found out the full details of their travel channels from the time they entered jeju island to the time they left through wired communication. according to the information obtained, two actions were initiated. first, based on the fact that the patient entered the island at 4:00 pm on june 5 through an airport and left at 3:00 pm on june 8, the maximum incubation period of 14 days was from june 19 to june 22, which would be the focus surveillance period. however, the headquarters decided to conduct surveillance until the dawn of june 30 to address the possibility of a patient's delayed awareness of the infection owing to the weak clinical symptoms. second, through closed circuit television (cctv) recordings of their travel channel, the contacts were categorized into close and general contacts. 2) self-quarantine and active surveillance and monitoring of the contacts lodging and restaurant staffs who were confirmed to be close contacts through travel channel determination underwent selfquarantine. meanwhile, the general contacts who were confirmed based on cctv recordings underwent active monitoring. guests who stayed on the same floor of the same lodging facility underwent active monitoring as well. 3) encouragement of suspicion notification and active response meanwhile, through the local media in jeju island, the travel channel of the confirmed case was released to the public, and individuals were encouraged to immediately report if they were at the same place at the same time as the patient and experienced mers-related symptoms. mers confirming test was conducted 2 times with epidemiologic investigation for those who reported suspicion of being infected, along with the confirmation of the epidemiological link. on june 18, when the surveillance operation started, monitoring began with 179 individuals, including the 56 individuals who were designated to self-quarantine and the 123 individuals under active monitoring. however, all showed negative results by midnight of june 30, when the self-quarantine and monitoring ended. meanwhile, 91 cases of suspected infection were notified during the period, but all showed negative results in the mers test. therefore, it was concluded that none of the jeju citizens became infected after being in contact with the 141st confirmed patient with mers during incubation period. the 141st confirmed patient with mers visited jeju island during the incubation period, but none of the jeju citizens who came in contact with him was diagnosed with mers. in addition, the family members of the 141st confirmed patient and his travel companions, who were considered close contacts, all tested negative for mers and were subsequently informed to terminate home quarantine. such results are epidemiological evidence that mers is not contagious during its incubation period [4] . meanwhile, through wired communication to his family members and travel companions on the first day of the surveillance operation, facts emerged that supported that the 141st confirmed case of mers was in its incubation period when the patient visited jeju island. first, according to the testimony of his family members and travel companions, he recovered from the cold symptoms that started on may 23 and traveled to jeju island in a healthy state on june 3. second, he did not show any fever according to the fever detection device installed at the entrance of the airport and lodging facility. in particular, lodging staff testified that they did not observe any abnormalities in the patient. third, none of the family members and travel companions who made close contact with him was diagnosed with mers. therefore, in a press conference for jeju citizens on june 18, it was announced that the possibility of the patient being a source of infection was extremely low. although it was fortunate that none of the jeju citizens was diagnosed with mers, this incidence was reported to have caused jeju island a loss of 150 billion won in june [5] . considering the industrial economic structure and geographic environment of jeju island, the depicted mers incidence shows that operation of a center that can proactively and preemptively monitor external infectious diseases is urgently needed [6] .",[['10.4178/epih/e2015035']]
2,2,854545208a6bc7cafa8744604899e32237ff2047,highly pathogenic avian influenza a virus h5n1 non-structural protein 1 is associated with apoptotic activation of the intrinsic mitochondrial pathway,"['Qian Bian', 'Jing Lu', 'L I Zhang', 'Ying Chi', 'Yan Li', 'Hongxiong Guo']","non-structural protein 1 (ns1) is an essential virulence factor of the highly pathogenic h5n1 avian influenza virus and of the apoptosis associated with the pathogenesis of h5n1. previous studies have revealed that the ns1 protein is able to induce apoptosis via an extrinsic pathway. however, it remains unclear whether the intrinsic pathway is also associated with this apoptosis. the present study used a clone of the ns1 gene from avian influenza a/jiangsu/1/2007 and observed the localization of the ns1 protein and cytochrome c release from mitochondria and the change of mitochondrial membrane potential (mmp) in lung cancer cells. cytotoxicity was detected using an mtt assay and the number of apoptotic cells was counted using a flow cytometer. following the isolation of mitochondria, western blotting was performed to compare cytochrome c release from the mitochondria in cells before and after apoptosis. the change of mmp was detected using jc-1 staining. furthermore, the results reveal that the majority of the ns1 protein was localized in the cell nucleus, and that it may induce apoptosis of human lung epithelial cells. the apoptosis occurred with marked cytochrome c release from mitochondria and a change of the mmp. this indicated that the ns1 protein may be associated with apoptosis induced by an intrinsic mitochondrial pathway.","avian influenza a (h5n1) virus is a highly pathogenic contagious agent that causes severe impairment in poultry and humans, particularly limited person-to-person transmission (1) . the cumulative number of confirmed human cases of h5n1 from 14 countries between november 2003 and july 2014 reached 667, 393 of which were fatal according to a report issued by the world health organization (2). in total, 47 individuals infected with h5n1 were identified in china and 30 (63.8%) succumbed to the h5n1 infection (2). h5n1 causes primary viral pneumonia with rapid progression to lung failure following invasion of epithelial cells in the upper and lower respiratory tracts (3) . however, the exact mechanism for elucidating the severity of human h5n1 infection remains unclear. previously, apoptosis was not only observed in the alveolar epithelial cells of 2 patients who succumbed to h5n1 infection, but was also induced by h5n1 in numerous cell types in vivo and in vitro (4) (5) (6) . these results indicated that apoptosis may be important in h5n1 pathogenesis in the human body. at present, two main apoptotic pathways have been documented; the tumor necrosis factor (tnf) receptor-mediated extrinsic pathway and the intrinsic pathway mediated by mitochondria/cytochrome c (7) . the non-structural protein 1 (ns1), which is encoded by the influenza virus ns segment, is able to alter the host response and virulence of the virus in the case of reassortment without prior adaptation (8, 9) , and is associated with apoptosis regulation in mammalian cells. previous studies revealed that the h5n1 ns1 protein induced the tnf-mediated extrinsic apoptosis pathway in human alveolar basal epithelial cells (10) (11) (12) (13) . however, the susceptible cell lines of the various pathogenic avian influenza viruses are different, which causes them to vary in their responses to apoptosis (14, 15) . as mentioned above, apoptosis induced by the h5n1 ns1 protein may vary in various cell lines. to further investigate whether the other apoptotic pathways induced by h5n1 ns1 protein exist, h5n1 ns1 protein was used to induce the human lung epithelial cell line, nci-h292. construction of an ns1-expressing plasmid. highly pathogenic avian influenza a/jiangsu/1/2007 (h5n1) viral rna was extracted from supernatants of infected cell cultures for use as a polymerase chain reaction (pcr) template for amplifying the ns1 gene. total rna was extracted from cell lysate using the qiaamp viral rna mini kit (qiagen, hilden, germany) according to the manufacturer's instructions. the full-length ns1 gene was amplified using the superscript iii one-step reverse transcription-pcr (rt-pcr) system with platinum taq high-fidelity polymerase (invitrogen; thermo fisher scientific, inc., waltham, ma, usa) from h5n1 virus cdna. the sense and antisense primers used for ns1 (eu434690) were 5'-gtg ctc gag atg gat tcc aac act gtg tca-3' and 5'-cac ggt acc tca aac ttc tga ctc aat tgt-3', respectively. the pcr conditions were 95˚c for 15 min, followed by 34 cycles of 94˚c for 30 sec, 58˚c for 30 sec, and 72˚c for 30 sec. the cloning insert was ligated into the pmd18-t vector (d101a; takara biotechnology co., ltd., dalian, china) by quick ligase (m2200s; neb beijing ltd., beijing, china) incubating for 30 min at room temperature. pmd18-t-ns1 was subcloned into the expression plasmid pxj40-hemagglutinin (ha) (invitrogen; thermo fisher scientific, inc.) using xhoi (d1094a) and kpni (d1068a) (both from takara biotechnology co., ltd., dalian, china) sites to produce the recombinant ha-tagged construct, pxj40-ha-ns1. the construction of plasmid pxj40-ha-ns1 followed standard cloning procedures. competent escherichia coli top10 cells (invitrogen; thermo fisher scientific, inc.) were transformed using pxj40-ha-ns1 plasmids, and the plasmids were amplified and purified using a high-purity plasmid purification kit (invitrogen; thermo fischer scientific, inc.). clones were then screened by restriction enzyme digestion and sequence analysis using the version 3.1 bigdye terminator ready reaction cycle sequencing kit (applied biosystems; thermo fischer scientific, inc.) according to the manufacturer's instructions. cell line culture and transient transfection. non-small cell lung cancer cell lines, nci-h1299 and nci-h292 (cell bank; shanghai institutes for biological sciences, chinese academy of sciences; shanghai china), were separately grown as a monolayer in dulbecco's modified eagle's medium (dmem) (invitrogen; thermo fischer scientific, inc.) supplemented with 10% fetal bovine serum (invitrogen thermo fischer scientific, inc.) at 37˚c and in a 5% co 2 incubator. the two cell lines were used for different purposes, nci-h1299 was used to observe the localization of the ns1 protein in the cell whereas nci-h292 was used to confirm the extent of apoptosis induced by the ns1 protein. nci-h1299 and nci-h292 were transfected with pxj40-ha-ns1 or control plasmids (pxj40-ha-vector; invi trogen; thermo fischer scientific, inc.) using lipofectamine 2000 reagent according to the manufacturer's instructions (invitrogen; thermo fisher scientific, inc). after 4 h, lipofectamine 2000-dna complexes were removed, and the cell culture dmem medium was replaced with fresh dmem with or without 0.025 nm staurosporine (sts; sigma-aldrich; merk kgaa, darmstadt, germany) at 37˚c for 24 h. cells were collected after 24 h, washed with phosphate-buffered saline (pbs) and trypsinized with 0.125% trypsin/edta solution. immunofluorescence staining. nci-h1299 cells were fixed in 4% (w/v) paraformaldehyde at room temperature for 30 min and permeablized in 0.5% (w/v) triton x-100, followed by incubation with primary and secondary antibodies for 1 h at room temperature sequentially. anti-ha serum (ah158; beyotime institute of biotechnology, haimen, china) with 1:200 was used for the control and alexa 488-conjugated secondary antibody (1:500, a-11017; invitrogen; thermo fisher scientific, inc.) were used to probe for the ns1 protein at room temperature for 1 h. following protein staining, anti-cytochrome c monoclonal antibody (1:1,000, bd556432; bd biosciences, franklin lakes, nj, usa) and alexa 555-conjugated secondary antibody (1:500, a-21427; invitrogen; thermo fisher scientific, inc.) were utilized to probe the nci-h1299 cellular morphology. finally, 4',6-diamidino-2-phenylindole (dapi, 1:1,000, d1306; invitrogen; thermo fisher scientific, inc.) was used to dye the cell nucleus at room temperature for 1 h. triple-fluorescence stained cells were observed with a confocal microscope at a high-power magnification of x100 (fv10-asw, version 01.07.03.00; olympus corporation, tokyo, japan). cell viability assay. mtt cell viability assays were performed according to the manufacturer's instructions (sigma-aldrich; merck kgaa). briefly, 20 µl 5 mg/ml mtt was added to the culture medium. following incubation at 37˚c for 3 h, 100 µl acidic isopropanol (0.1 nm hcl in acidic isopropanol) was added to the nci-h292 cells and then the absorbance of each sample was measured at 570 nm with an automated plate reader (spectramax paradigm; molecular devices, llc, sunnyvale, ca, usa) compared to the same number of control cells. flow cytometric analysis. in order to determine the apoptotic rate, ~1x10 6 nci-h292 cells/ml were stained with fluorescein isothiocyanate (fitc)-conjugated annexin v and propidium iodide (556547, annexin v-fitc apoptosis detection kit, bd pharmingen; bd biosciences) in a volume of 100 µl on ice for 30 min in the dark. the cells were washed 3 times with pbs. finally, 400 µl binding buffer was added to the cells. additionally, the mixture was analyzed with a bd facscalibur flow cytometer and the percentage of apoptosis of 10,000 cells was determined. the data were analyzed using bd cellquest™ pro software (version 5.2.1; bd biosciences) and the percentage cells with apoptosis per group were calculated. determination of mitochondrial membrane potential (mmp). at 24 h following transfection, the culture medium of the nci-h292 cells was replaced with dmem that did not include phenol red and was supplemented with 5 mg/l jc-1 dye (beyotime institute of biotechnology) in the dark for 20 min at 37˚c. subsequently, the cells were washed twice with pbs and placed in fresh medium without serum. finally, mmp was analyzed by calculating the ratio of fluorescence intensity at 555-488 nm in triplicate. mitochondria isolation and calculation of cytochrome c release. cytosolic and mitochondrial isolation are performed as described by cheng et al (16) . the percentage of cytochrome c release was calculated using the following formula: the percentage of cytochrome c release that is equal to the amount of cytochrome c in the mitochondrial supernatant/the total amount of cytochrome c in mitochondrial supernatant and pellet. western blot analysis. monolayers of cells transfected with dna or untransfected cells were lysed with ice-cold lysis buffer (150 mm tris-hcl, ph 8.0, 50 mm nacl, 1 mm edta, 0.5% nonidet p-40, 1 tablet complete mini protein inhibitor mixture/10 ml buffer and 0.7 µg/ml pepstatin), and the total protein concentration was determined using the bicinchoninic acid assay. a total of 7 µl proteins with equivalent concentrations (1 µg/µl) were heated for 5 min at 100˚c in lysis buffer containing β-mercaptoethanol, and then resolved using 4-20% sds-page. the proteins were then transferred onto a polyvinylidene difluoride membrane and blocked with 1% powdered skimmed milk in tris-buffered saline with 0.1% tween-20 for 1 h at room temperature. anti-cytochrome c mab mouse antibody (1:200, bd556432; bd biosciences) was then used to probe for cytochrome c overnight at 4˚c. the membrane was also probed for gadph (1:1,000, a2066; sigma-aldrich; merck kgaa) was used as the loading control. membranes were subsequently washed with 150 mm pbs and incubated for 1 h at 4˚c in 10% dried milk in pbs. membranes were washed 5 times with pbs and subsequently incubated for 1 h at room temperature with anti-mouse immunoglobulin g horseradish peroxidase-conjugated secondary antibody (1:1,000, zb2307; zsgb bio; origene technologies, inc., rockville, md, usa)., followed by visualization of positive bands with the pierce (thermo fisher scientific, inc.) enhanced chemiluminescence procedure using kodak biomax film. blots were scanned and the protein ratios were calculated using the pdquest program (version 7.4.0; bio-rad laboratories, inc.). the results shown are representative of 3independent experiments. statistical analysis. in the cell viability assays and mmp experiments, all data were expressed as the mean ± standard deviation. statistical significance among groups was assessed using one-way analysis of variance and the tukey test. p<0.05 was considered to indicate a statistically significant difference. finally, statistical analyses were performed using graphpad prism 5.0 (graphpad software, inc., la jolla, ca, usa). the plasmid pxj40-ha-ns1 was transfected into nci-h1299 cells, and the expression and location of the ns1 protein was monitored. the plasmid pxj40-ns1-ha includes the full-length ns gene cloned from influenza h5n1. as demonstrated in fig. 1 , the ns1 protein began to express 24 h following transfection (fig. 1a) , and predominantly remained in the nucleus (fig. 1b-d) . previous studies have revealed that the ns1 protein of h5n1 is able to induce apoptosis in a549 cells and human airway epithelial cells (17, 18) . the present study attempted to clarify whether the expression of ns1 was sufficient to induce apoptosis with or without sts, which was a confirmed apoptosis inducer, and sts (0.025 nm) was used as a positive control. initially, ns1 protein and sts were used to induce cytotoxicity, and the cytotoxicity was then detected under various conditions using the mtt assay. compared with cells transfected with empty vector, the viability of ns1-transfected cells was markedly lower (fig. 2) . however, the addition of sts significantly decreased the viability of the cells compared with ns1-transfected cells and sts-treated cells, which suggested an increase in cytotoxicity (p<0.05). to further confirm cell apoptosis, which is associated with ns1 protein expression, the nci-h292 cells expressing ns1 protein were stained with annexin v-fitc and pi, and detected using flow cytometry (fig. 3) . compared with the empty vector-transfected cells (9.19% annexin v + /pi -, indicating early apoptosis, and 4.49% annexin v + /pi + , indicating late apoptosis), the ns1-transfected cells were 22.96% annexin v + /piand 2.62% annexin v + /pi + . following treatment with sts, ns1-transfected cells were 43.35% annexin v + /piand 2.98% annexin v + /pi + , whereas empty vector-transfected cells were 33.47% annexin v + /piand 8.16% annexin v + /pi + . compared with the empty vector-transfected group, the number of apoptotic cells in the ns1-plasmid transfected group was significantly increased (p<0.05). following combinational treatment of ns1 and sts, the number of apoptotic cells was significantly increased compared with the ns1 group (p<0.05). these results indicate that ns1 is associated with the apoptosis of nci-h292 cells, and that the effect of apoptosis induced by ns1 protein may be enhanced by sts. as demonstrated in fig. 4 , ns1 together with sts induced the release of cytochrome c from the mitochondria to the cytosol. it is known that cytochrome c release from the mitochondria to the cytosol is a signal for apoptosis (19) . this implies the possibility of ns1 protein being associated with apoptosis by activating the intrinsic pathway through cytochrome c. in order to confirm this, mmp was initially detected by jc-1 staining. normal cells stained with jc-1 emitted mitochondrial orange-red fluorescence with a slight green fluorescence, whereas the ratio of orange-red and green fluorescence was inverted when cell apoptosis was induced by ns1 protein. as shown in fig. 5 , in comparison with the empty vector transfected cells group, ns1 transfected cells significantly initiated apoptosis with sts (the ratio of fluorescence intensity at 555/488 nm was ~1.00; p<0.05). however, without sts treatment, the ns1-expressing nci-h292 cells have no significant difference with the empty vector. to further corroborate the effect of the mitochondria/cytochrome c on the intrinsic apoptosis pathway inducted by the ns1 protein, western blotting was performed to detect the expression of cytochrome c in mitochondrial pellets and supernatants. the majority of cytochrome c expressed in mitochondrial pellets was isolated from the cells transfected with empty vector. it is noteworthy that the amount of cytochrome c was markedly decreased in the mitochondrial pellets following ns1-transfection for 24 h, whereas the amount of cytochrome c was markedly increased in the mitochondrial supernatant (fig. 4) . these results indicate that ns1 may trigger the release of cytochrome c from mitochondria and that this effect is enhanced by sts. the results of the present study indicate that the ns1 protein of the h5n1 highly pathogenic avian influenza a virus strain is associated with apoptotic activation of nci-h1299 through the intrinsic mitochondrial pathway. the mtt assay and flow cytometric analysis revealed that the ns1 protein-induced apoptosis of nci-h1299 cells and its activity may be enhanced by sts. additionally, the ns1 protein together with sts was able to cause the release of cytochrome c from the mitochondria to the cytosol, and the change of mmp. during the intrinsic mitochondrial apoptosis process, the permeabilization of the mitochondrial outer membrane is the critical step, which results in the release of several apoptogenic factors from the intermembrane space of mitochondria (20) (21) (22) . cytochrome c is one of these factors, which binds to the adaptor apoptotic peptidase-activating factor 1 that subsequently recruits cytosolic pro-caspase-9 into a heptameric apoptosome (23) . the intrinsic mitochondrial apoptosis pathway is one of the major pathways during viral pathogenesis that include human immunodeficiency virus and severe acute respiratory syndrome coronavirus (24, 25) . although zhang et al (10) demonstrated that the h5n1 ns1 protein induced caspase-dependent apoptosis in human alveolar basal epithelial cells, neuman et al (20) revealed that the activation of caspase is the common end event during apoptosis. therefore, the study by zhang et al (10) did not have enough evidence to support that the ns1 of h5n1 induced apoptosis through an extrinsic pathway. in the present study, the efficiency of ns1 in inducing apoptosis is lower than that of sts, however, the increasing synergistic effect on inducing apoptosis between sts and ns1 were observed. in addition, in the present study, the plasmid with the ns1 gene was transfected with lipofectamine 2000 and the transfecting efficiency was lower than the penetrating ability of sts. at this point, although the capacity of inducing apoptosis of ns1 is weaker than sts in the present study, the importance of each one needs to be investigated further. previous studies have revealed that the ns1 protein of influenza a viruses is a multifunctional viral protein that modulates the virus replication cycle and viral protein synthesis, and ha and neuropilin-1 (np1) of h5n1 also induce apoptosis of airway epithelial cells (26, 27) . in vivo, there is a need to determine whether the synergetic effect on inducing apoptosis by np1, nucleoprotein and ha exists. in particular, these proteins induce apoptosis by either sharing the same pathway or not. in conclusion, the results of the present study reveal that the intrinsic mitochondrial apoptosis pathway is associated with the apoptosis induced by the ns1 protein of h5n1. therefore, this may be a novel mechanism in the ability of highly pathogenic avian influenza a virus h5n1 causing severe impairment in humans.",[['10.3892/etm.2017.5056']]
3,3,e8d81153777e9f92cbc2bc9ea03ed1f0d0b78543,"group b betacoronavirus in rhinolophid bats, japan","['Jin Suzuki', 'Ryota Sato', 'Tomoya Kobayashi', 'Toshiki Aoi', 'Ryô Harasawa']","we report group b betacoronavirus infection in little japanese horseshoe bats in iwate prefecture. we then used reversetranscription pcr to look for the coronavirus rna-dependent rna polymerase gene in fecal samples collected from 27 little japanese horseshoe bats and found eight were provisionally positive. we had a success in the nucleotide sequencing of six of the eight positive samples and compared them with those of authentic coronaviruses. we found that these six samples were positive in coronavirus infection, and they belonged to the group b betacornavirus by phylogenetic analysis. virus isolation using the vero cell culture was unsuccessful. pathogenic trait of these bat coronaviruses remained unexplored.","viruses in the genus betacoronavirus include pathogens that cause emerging pandemic diseases, such as severe acute respiratory syndrome (sars) and middle east respiratory syndrome (mers), in humans. it has been suggested that bats are major reservoirs for these zoonotic pathogens [9] . this justifies surveillance for coronaviruses (covs) in bats throughout the world. in japan, covs belonging to the genus alphacoronavirus have been detected in intestinal and fecal specimens of eastern bent-winged bats (miniopterus fuliginosus) in wakayama prefecture [8] . here, we report the first demonstration of group b betacoronaviruses in little japanese horseshoe bats (rhinolophus cornutus) in iwate prefecture, japan. a total of 38 fecal samples were collected individually from captured insectivorous bats, including 27 little japanese horseshoe bats (r. cornutus), six greater horseshoe bats (r. ferrumequinum), four japanese large-footed bats (myotis macrodactylus) and one ussuri tube-nosed bat (murina ussuriensis) from 2 separate caves in morioka, iwate prefecture with permission from the local administration, during 2013. we designated specimens as ""og"" or ""is"" based on the respective originating cave name, ogayu (39.6019n, 141.2508e) and isari (39.6771n, 141.0881e), and we also assigned them numbers. the ogayu cave is an abandoned gold mine called manju. the isari cave is located at the atago shrine, which is approximately 16 km away from the ogayu cave. feces were homogenized with sterile phosphate buffered-saline solution containing penicillin, streptomycin and amphotericine b. viral rna was extracted from feces homogenates by the single-step guanidium isothiocyanate-phenol-chloroform method [1] . cdna synthesis and pcr were performed as described previously [3] . the pcr primers in-6 and in-7 were used to amplify the rna-dependent rna polymerase (rdrp) gene of cov as described elsewhere [5] . we also performed a second-step pcr using the in-6 primer and a reverse primer (5′-atcagatagaatcat-catagaga-3′) described previously [2] . pcr products of expected size were obtained in eight of 38 samples, after the second-step pcr. five of the eight positive samples were derived from bats roosting in the isari cave, and the rest were from the ogaya cave. these positive pcr products derived only from the samples of little japanese horseshoe bats were subjected to nucleotide sequencing as described previously [3] . cov rna was not detected in the other three bat species. virus isolation using the vero cell culture derived from an african green monkey was unsuccessful, probably due to inappropriate choice of cell culture. we had a success in the nucleotide sequencing of six of the eight positive samples and have deposited partial nucleotide sequences of the rdrp gene to the dna databases under the accession numbers ab889995 to ab890000. the 2 other samples unsuccessful in nucleotide sequencing were considered to be false-positive in pcr. the six sequences (426 nt) showed more than 98% nt identity to each other. comparison between the six covs from little japanese horseshoe bats and those from dna databases revealed that the nucleotide sequences had the highest identity (about 91% nt) to bat sars cov strain rp3 (genbank accession no. dq071615), which was isolated from fecal swabs of pearson's horseshoe bats (r. pearsonii) in guangxi province, china in 2005 [4] . nucleotide sequences aligned using clustal w [10] were analyzed by the neighbor-joining method [7] . the phylogenetic tree indicated that the six cov sequences detected in little japanese horseshoe bats have a monophyletic relationship to the sars cov in the same lineage of the group b betacoronavirus (fig. 1 ). bat sars cov or sars-like cov has been detected in rhinolophid bats on the eurasian continent [4, 6] . our results indicate that betacoronaviruses closely related to sars cov were prevalent among little japanese horseshoe bats in iwate prefecture, japan. currently, pathogenic trait of these betacoronaviruses is unknown. although there is no evidence of human infection by these bat covs, the existence of group b betacoronaviruses in bats may be predictive of the emergence of a disease, such as sars, since the isari cave is also used as a den for civets, which is another potential intermediate species. during this study, we marked the bats with aluminum bands to identify them for further investigation of their habitat. this may provide further information, so we can achieve a better understanding of the relationship between rhinolophid bat behavior and betacoronavirus ecology in the future.",[['10.1292/jvms.14-0012']]
4,4,a14b5655cb13ed64cb8cff7c806a7b58c858b8b7,feasibility of controlling covid-19 outbreaks by isolation of cases and contacts,"['J Hellewell', 'S Abbott', 'Jarvis Phd', 'T W Russell', 'J D Munday', 'A J Kucharski', 'W J Edmunds', 'S Funk', 'Rosalind M Eggo', 'Joel Hellewell', 'Sam Abbott', 'Amy Gimma', 'Nikos I Bosse', 'Christopher I Jarvis', 'Timothy W Russell', 'James D Munday', 'Adam J Kucharski', 'John Edmunds', 'Sebastian Funk']","background isolation of cases and contact tracing is used to control outbreaks of infectious diseases, and has been used for coronavirus disease 2019 (covid-19). whether this strategy will achieve control depends on characteristics of both the pathogen and the response. here we use a mathematical model to assess if isolation and contact tracing are able to control onwards transmission from imported cases of covid-19. we developed a stochastic transmission model, parameterised to the covid-19 outbreak. we used the model to quantify the potential effectiveness of contact tracing and isolation of cases at controlling a severe acute respiratory syndrome coronavirus 2 (sars-cov-2)-like pathogen. we considered scenarios that varied in the number of initial cases, the basic reproduction number (r 0 ), the delay from symptom onset to isolation, the probability that contacts were traced, the proportion of transmission that occurred before symptom onset, and the proportion of subclinical infections. we assumed isolation prevented all further transmission in the model. outbreaks were deemed controlled if transmission ended within 12 weeks or before 5000 cases in total. we measured the success of controlling outbreaks using isolation and contact tracing, and quantified the weekly maximum number of cases traced to measure feasibility of public health effort. findings simulated outbreaks starting with five initial cases, an r 0 of 1·5, and 0% transmission before symptom onset could be controlled even with low contact tracing probability; however, the probability of controlling an outbreak decreased with the number of initial cases, when r 0 was 2·5 or 3·5 and with more transmission before symptom onset. across different initial numbers of cases, the majority of scenarios with an r 0 of 1·5 were controllable with less than 50% of contacts successfully traced. to control the majority of outbreaks, for r 0 of 2·5 more than 70% of contacts had to be traced, and for an r 0 of 3·5 more than 90% of contacts had to be traced. the delay between symptom onset and isolation had the largest role in determining whether an outbreak was controllable when r 0 was 1·5. for r 0 values of 2·5 or 3·5, if there were 40 initial cases, contact tracing and isolation were only potentially feasible when less than 1% of transmission occurred before symptom onset. interpretation in most scenarios, highly effective contact tracing and case isolation is enough to control a new outbreak of covid-19 within 3 months. the probability of control decreases with long delays from symptom onset to isolation, fewer cases ascertained by contact tracing, and increasing transmission before symptoms. this model can be modified to reflect updated transmission characteristics and more specific definitions of outbreak control to assess the potential success of local response efforts.","as of feb 5, 2020, more than 24 550 cases of coronavirus disease 2019 (covid-19) had been confirmed, including more than 190 cases outside of china, and more than 490 reported deaths globally. 1 control measures have been implemented within china to try to contain the outbreak. 2 as people with the infection arrive in countries or areas without ongoing transmission, efforts are being made to halt transmission, and prevent potential outbreaks. 3, 4 isolation of confirmed and suspected cases, and identification of contacts are a crucial part of these control efforts; however, whether these efforts will achieve control of transmission of covid-19 is unclear. isolation of cases and contact tracing becomes less effective if infectiousness begins before the onset of symptoms. 5, 6 for example, the severe acute respiratory syndrome (sars) outbreak that began in southern china in 2003, was eventually able to be controlled through tracing contacts of suspected cases and isolating confirmed cases because the majority of transmission occurred after symptom onset. 7 these interventions also play a major role in response to outbreaks where onset of symptoms and infectiousness are concurrent-eg, ebola virus disease, 8, 9 middle east respiratory syndrome (mers), 10, 11 and many other infections. 12, 13 the effectiveness of isolation and contact-tracing methods hinges on two key epidemiological parameters: the number of secondary infections generated by each new infection and the proportion of transmission that occurs before symptom onset. 5 in addition, successful contact tracing and reducing the delay between symptom onset and isolation are crucial, because, during this time, cases remain in the community where they can infect others until isolation. 6, 14 transmission before symptom onset could only be prevented by tracing contacts of confirmed cases and testing (and quarantining) those contacts. cases that do not seek care, potentially because of subclinical infection, are a further challenge to control. if covid-19 can be controlled by isolation and contact tracing, then public health efforts should be focused on this strategy; however, if this is not enough to control outbreaks, then additional resources might be needed for additional interventions. several key characteristics of the transmissibility and natural history of covid-19 are currently unknown-eg, whether transmission can occur before symptom onset. therefore, we explored a range of epidemiological scenarios that represent potential transmission properties based on current information about covid-19 transmission. we assessed the ability of isolation and contact tracing to control disease outbreaks in areas without widespread transmission using a mathematical model. 6, [15] [16] [17] by varying the efficacy of contact-tracing efforts, the size of the outbreak when detected, and the promptness of isolation after symptom onset, we show how viable it is for countries at risk of imported cases to use contact tracing and isolation as a containment strategy. we implemented a branching process model, in which the number of potential secondary cases produced by each individual is drawn from a negative binomial distribution with a mean equal to the reproduction number, and heterogeneity in the number of new infections produced by each individual. 6, 15, [17] [18] [19] each potential new infection was assigned a time of infection drawn from the serial interval distribution. secondary cases were only created if the person with the infection had not been isolated by the time of infection. as an example (figure 1), a person infected with the virus could potentially produce three secondary infections (because three is drawn from the negative binomial distribution), but only two transmissions might occur before the case is isolated. thus, in the model, a reduced delay from onset to isolation would reduce the average number of secondary cases. we initialised the branching process with five, 20, or 40 cases to represent a newly detected outbreak of varying size. initial symptomatic cases were then isolated after symptom onset with a delay drawn from the onset-toisolation distribution (table). isolation was assumed to be 100% effective at preventing further transmission; therefore, in the model, failure to control the outbreak resulted from the incomplete contact tracing and the delays in isolating cases rather than the inability of isolation to prevent further transmission. either 100% or 90% of cases became symptomatic, and all symptomatic cases were eventually reported. evidence before this study contact tracing and isolation of cases is a common intervention for controlling infectious disease outbreaks. it can be effective but might require intensive public health effort and cooperation to effectively reach and monitor all contacts. previous work has shown that when the pathogen has infectiousness before symptom onset, control of outbreaks using contact tracing and isolation is more challenging. further introduction of coronavirus disease 2019 (covid-19) to new territories is likely in the coming days and weeks, and interventions to prevent an outbreak following these introductions are a key mitigating strategy. current planning is focused on tracing of contacts of introduced cases, and rapid isolation. these methods have been used previously for other novel outbreaks, but it is not clear if they will be effective for covid-19. we use a mathematical model to assess the feasibility of contact tracing and case isolation to control outbreaks of covid-19. we used disease transmission characteristics specific to the pathogen and give the best available evidence if contact tracing and isolation can achieve control of outbreaks. we simulated new outbreaks starting from 5, 20, or 40 introduced cases. contact tracing and isolation might not contain outbreaks of covid-19 unless very high levels of contact tracing are achieved. even in this case, if there is subclinical transmission, or a high fraction of transmission before onset of symptoms, this strategy might not achieve control within 3 months. the effectiveness of isolation of cases and contacts to control outbreaks of covid-19 depends on the precise characteristics of transmission, which remain unclear at the present time. using the current best understanding, around 80% of symptomatic contacts must be traced and isolated to control over 80% of outbreaks in the model. future research on the transmission characteristics could improve precision on control estimates. each newly infected case was identified through contact tracing with probability ρ. secondary cases that had been traced were isolated immediately on becoming symptomatic. cases that were missed by contact tracing (probability 1-ρ) were isolated when they became symptomatic, with a delay drawn from the onset-toisolation distribution. in addition, each case had an independent probability of being subclinical, and was therefore not detected either by self-report or contact tracing. new secondary cases caused by a subclinical case were missed by contact tracing and could only be isolated on the basis of symptoms. the model included isolation of symptomatic individuals only-ie, no quarantine, so isolation could not prevent transmission before symptom onset. in the model, subclinical cases were never isolated, whereas symptomatic cases might transmit before symptoms appear, but were eventually isolated. quarantining contacts of cases (ie, individuals who are not yet symptomatic, and might not be infected) requires a considerable investment in public health resources, and has not been widely implemented for all contacts of cases. 3 however, some countries have adopted a quarantine or self-quarantine policy for airline travellers who have returned from countries with confirmed covid-19 transmission. 23 we ran 1000 simulations of each combination of r 0 , the proportion of transmission before symptom onset, onsetto-isolation delay, the number of initial cases, and the probability that a contact was traced (table) . we explored two scenarios of delay (short and long) between symptom onset and isolation (figure 2). the short delay was estimated during the late stages of the 2003 sars outbreak in singapore, 18 and the long delay was an empirical distribution calculated from the early phase of the covid-19 outbreak in wuhan. 23 we varied the percentage of contacts traced from 0% to 100%, at 20% intervals, to quantify the effectiveness of contact tracing. the incubation period for each case was drawn from a weibull distribution. a corresponding serial interval for each case was then drawn from a skewed normal distribution with the mean parameter of the distribution after an incubation period, person a shows symptoms and is isolated at a time drawn from the delay distribution (table) . a draw from the negative binomial distribution with mean reproduction number (r 0 ) and distribution parameter determines how many people person a potentially infects. for each of those, a serial interval is drawn. two of these exposures occur before the time person a is isolated. each contact is traced with probability ρ, with probability 1-ρ they are missed by contact tracing. person b is successfully traced, which means that they will be isolated without delay when they develop symptoms. they could, however, still infect others before they are isolated. person c is missed by contact tracing. this means that they are only detected if and when symptomatic, and are isolated after a delay from symptom onset. because person c was not traced, they infected two more people (e and f), in addition to person d, than if they had been isolated at symptom onset. a version with subclinical transmission is given in the appendix (p 12). set to the incubation period for that case, an sd of 2, and a skew parameter chosen such that a set proportion of serial intervals were shorter than the incubation period (meaning that a set proportion of transmission happened before symptom onset; figure 2 ). this sampling approach ensured that the serial interval and incubation period for each case was correlated, and prevented biologically implausible scenarios where a case could develop symptoms soon after exposure, but not become infectious until very late after exposure and vice versa. there are many estimates of the reproduction number for the early phase of the covid-19 outbreak the incubation distribution estimate fitted to data from the wuhan outbreak by backer and colleagues. 22 (c) an example of the method used to sample the serial interval for a case that has an incubation period of 5 days. each case has an incubation period drawn from the distribution in (b), their serial interval is then drawn from a skewed normal distribution with the mean set to the incubation period of the case. in (c), the incubation period was 5 days. the skew parameter of the skewed normal distribution controls the proportion of transmission that occurs before symptom onset; the three scenarios explored are less than 1%, 15%, and 30% of transmission before onset. in wuhan, china, 15, 17, 18, 21, [24] [25] [26] [27] [28] and therefore we used the values 1·5, 2·5, and 3·5, which span most of the range of current estimates (table) . we used the secondary case distribution from the 2003 sars outbreak, 19 and tested the effect of lower heterogeneity in the number of secondary cases 29 as a sensitivity analysis (appendix pp 2-5). we calculated the effective reproduction number (r eff ) of the simulation as the average number of secondary cases produced by each infected person in the presence of isolation and contact tracing. we present results in relation to the baseline scenario of r 0 of 2·5, 21 20 initial cases, a short delay to isolation, 20 15% of transmission before symptom onset, 30 and 0% subclinical infection. 31 values of the natural history represent the current best understanding of covid-19 transmission, and we used 20 index cases and a short delay to isolation to represent a relatively large influx into a setting of high awareness of possible infection. 23 outbreak control was defined as no new infections between 12 and 16 weeks after the initial cases. outbreaks that reached 5000 cumulative cases were assumed to be too large to control within 12-16 weeks, and were categorised as uncontrolled outbreaks. based on this definition, we reported the probability that an outbreak of a severe acute respiratory syndrome coronavirus 2-like pathogen would be controlled within 12 weeks for each scenario, assuming that the basic reproduction number remained constant and no other interventions were implemented. the probability that an outbreak is controlled gives a one-dimensional understanding of the difficulty of achieving control, because the model placed no constraints on the number of cases and contacts that could be traced and isolated. in reality, the feasibility of contact tracing and isolation is likely to be determined both by the probability of achieving control, and the resources needed to trace and isolate infected cases. 32 we therefore reported the weekly maximum number of cases undergoing contact tracing and isolation for each scenario that resulted in outbreak control. new cases require their contacts to be traced, and if these numbers are high, it can overwhelm the contact-tracing system and affect the quality of the contact-tracing effort. 33 it is likely that the upper limit on contacts to trace varies from country to country. all code is available as an r package. the funders of the study had no role in study design, data collection, data analysis, data interpretation, writing of the article, or the decision to submit for publication. all authors had full access to all the data in the study and were responsible for the decision to submit the article for publication. to achieve control of 90% of outbreaks, 80% of contacts needed to be traced and isolated for scenarios with a reproduction number of 2·5 ( figure 3 ). the probability of control was higher at all levels of contact tracing when the reproduction number was 1·5, and fell rapidly for a reproduction number of 3·5. at a reproduction number of 1·5, the effect of isolation was coupled with the chance of stochastic extinction resulting from overdispersion, 19 which is why some outbreaks were controlled even at 0% contacts traced. isolation and contact tracing decreased transmission, as shown by a decrease in the effective reproduction number ( figure 3) . when the basic reproduction number was 1·5, the median estimate rapidly fell below 1, which indicated that control was probable. for the higher transmission scenarios, a higher level of contact tracing was needed to bring the median effective reproduction number below 1. the effect of isolation without contact tracing can be seen at 0%, where the effective reproduction number was lower than the simulated basic reproduction number because of rapid isolation (and ceasing transmission) of cases. the number of initial cases had a large effect on the probability of achieving control. with five initial cases, there was a greater than 50% chance of achieving control in 3 months, even at modest contact-tracing levels for code see https://github.com/ cmmid/ringbp (figure 4). more than 40% of these outbreaks were controlled with no contact tracing because of the combined effects of isolation of symptomatic cases and stochastic extinction. the probability of control dropped as the number of initial cases increased-eg, for 40 initial cases, 80% contact tracing did not lead to 80% of simulations controlled within 3 months. the delay from symptom onset to isolation had a major role in achieving control of outbreaks ( figure 4) . at 80% of contacts traced, the probability of achieving control fell from 89% to 31%, with a long delay from onset to isolation. if no transmission occurred before symptom onset, then the probability of achieving control was higher for all values of contacts traced ( figure 4) . the difference between 15% and 30% of transmission before symptoms had a marked effect on probability to control. we found this effect in all scenarios tested (appendix p 5). in scenarios in which only 10% of cases were asymptomatic, the probability that simulations were controlled by isolation and contact tracing for all values of contact tracing decreased ( figure 4) . for 80% of contacts traced, only 37% of outbreaks were controlled, compared with 89% without subclinical infection. these figures show the effect of changing one model assumption at a time; all combinations are given in the appendix, in comparison to the baseline scenario (appendix pp 2-5). in many scenarios, between 25 and 100 symptomatic cases occurred in a week at the peak of the simulated outbreak ( figure 5 ). all of these cases, and their contacts, would need to be isolated. large numbers of new cases can overwhelm isolation facilities, and the more contacts that need to be traced, the greater the logistical task of following them up. in the 2014 ebola epidemic in liberia, each case reported between six and 20 contacts, 8 and the number of contacts seen in mers outbreaks is often higher than that. 10 20 contacts for each of 100 cases means 2000 contacts traced to achieve control. uncontrolled outbreaks typically had higher numbers of cases (appendix p 13). the maximum numbers of weekly cases figure 5 : the maximum weekly cases requiring contact tracing and isolation in scenarios with 20 index cases that achieved control within 3 months scenarios vary by reproduction number and the mean delay from onset to isolation. 15% of transmission occurred before symptom onset, and 0% subclinical infection. the percentage of simulations that achieved control is shown in the boxplot. this illustrates the potential size of the eventually controlled simulated outbreaks, which would need to be managed through contact tracing and isolation. *the interval extends out of the plotting region. figure 5 might appear counterintuitive, because a lower maximum number of weekly cases is not associated with higher outbreak control. this occurs because with better contact tracing it becomes possible to control outbreaks with higher numbers of weekly cases. we determined conditions in which case isolation, contact tracing, and preventing transmission by contacts who are infected would be sufficient to control a new covid-19 outbreak in the absence of other control measures. we found that in some plausible scenarios, case isolation alone would be unlikely to control transmission within 3 months. case isolation was more effective when there was little transmission before symptom onset and when the delay from symptom onset to isolation was short. preventing transmission by tracing and isolating a larger proportion of contacts, thereby decreasing the effective reproduction number, improved the number of scenarios in which control was likely to be achieved. however, these outbreaks required a large number of cases to be contact traced and isolated each week, which is of concern when assessing the feasibility of this strategy. subclinical infection markedly decreased the probability of controlling outbreaks within 3 months. in scenarios in which the reproduction number was 2·5, 15% of transmission occurred before symptom onset, and there was a short delay to isolation, at least 80% of infected contacts needed to be traced and isolated to give a probability of control of 90% or more. this scenario echoes other suggestions that highly effective contact tracing will be necessary to control outbreaks in other countries. 16 in scenarios in which the delay from onset to isolation was long, similar to the delays in the early phase of the outbreak in wuhan, the same contact tracing success of 80% achieved a probability of containing an outbreak of less than 40%. higher presymptomatic transmission decreases the probability that the outbreaks were controlled, under all reproduction numbers and isolation delay distributions tested. our model did not include other control measures that might decrease the reproduction number and therefore also increase the probability of achieving control of an outbreak. at the same time, it assumed that isolation of cases and contacts is completely effective, and that all symptomatic cases are eventually reported. relaxing these assumptions would decrease the probability that control is achieved. we also make the assumption that contact is required for transmission between two individuals, but transmission via fomites might be possible. this type of transmission would make effective contact tracing challenging, and good respiratory and hand hygiene would be crucial to reduce this route of transmission, coupled with environmental decontamination in health-care settings. for this reason, we used contact-tracing percentage intervals of 20% to avoid indicating more precision in the corresponding probability of control than the model could support. we simplified our model to determine the effect of contact tracing and isolation on the control of outbreaks under different scenarios of transmission; however, as more data becomes available, the model can be updated or tailored to particular public health contexts. the robustness of control measures is likely to be affected both by differences in transmission between countries, but also by the concurrent number of cases that require contact tracing in each scenario. practically, there is likely to be an upper bound on the number of cases that can be traced, which might vary by country, and case isolation is likely to be imperfect. 34 we reported the maximum number of weekly cases during controlled outbreaks, but the capacity of response efforts might vary. in addition to the number of contacts, other factors could decrease the percentage of contacts that can be traced, such as cooperation of the community with the public health response. we explored a range of scenarios informed by the latest evidence on transmission of covid-19. similar analyses using branching models have already been used to analyse the wuhan outbreak to find plausible ranges for the initial exposure event size and the basic reproduction number. 15, 18 our analysis expands on this work by including infectiousness before the onset of symptoms, case isolation, explicit modelling of case incubation periods, and time to infectiousness. a key area of uncertainty is whether, and for how long, individuals are infectious before symptom onset, and whether subclinical infection occurs; both are likely to make the outbreak harder to control. 22 whether, and how much, transmission occurs before symptoms is difficult to quantify. under-reporting of prodromal symptoms, such as fatigue and mild fever, is possible; thus, transmission might not truly be occurring before symptoms, but before noticeable symptoms. there is evidence of transmission before onset, 30 and so we used 15%. increased awareness of prodromal symptoms, and therefore short delays until isolation-as seen in the sars outbreak in beijing in 2003 35 -would increase control of outbreaks in our model. if contact tracing included testing of non-symptomatic contacts, those contacts could be quarantined without symptoms, which would decrease transmission in the model. costs associated with additional testing might not be possible in all contexts. the model could be modified to include some transmission after isolation (such as in hospitals), which would decrease the probability of achieving control. in addition, we defined an outbreak as controlled if it reached extinction by 3 months, regardless of outbreak size or number of weekly cases. this definition might be narrowed where the goal is to keep the overall caseload of the outbreak low. this might be of concern to local authorities for reducing the health-care surges, and might limit geographical spread. our study indicates that in most plausible outbreak scenarios, case isolation and contact tracing alone is insufficient to control outbreaks, and that in some scenarios even near perfect contact tracing will still be insufficient, and further interventions would be required to achieve control. rapid and effective contact tracing can reduce the initial number of cases, which would make the outbreak easier to control overall. effective contact tracing and isolation could contribute to reducing the overall size of an outbreak or bringing it under control over a longer time period. contributors rme conceived the study. jh, ag, sa, wje, sf, and rme designed the model. cij, twr, and nib worked on statistical aspects of the study. jh, ag, sa, and nib programmed the model, and, with rme, made the figures. ajk and jdm consulted on the code. all authors interpreted the results, contributed to writing the article, and approved the final version for submission. we declare no competing interests. no data were used in this study. the r code for the work is available at https://github.com/cmmid/ringbp.",[['10.1002/acn3.727']]


In [6]:
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans

from PIL import Image
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

In [7]:
class Preprocesser:
    def __init__(self, df):
        self.keywords = ['incident command system',
                         'emergency operations',
                         'joint information center',
                         'social distancing',
                         'childcare closers',
                         'travel advisory',
                         'travel warning',
                         'isolation',
                         'quarantine',
                         'mass gathering cancellations',
                         'school closures',
                         'facility closures',
                         'evacuation',
                         'relocation',
                         'restricting travel',
                         'travel ban',
                         'patient cohort',
                         'npi']
        self.occurances_minimum = 2
        self.df_full = df
        print(self.df_full.shape)
        self.key_slice()
        print(self.df_full.shape)
        self.npi_slice()
        print(self.df_full.shape)
    
    def key_slice(self):
        self.df_full = self.df_full[self.df_full['abstract'].str.contains('|'.join(self.keywords), na=False, regex=True)].reset_index(drop=True)
        
    def npi_slice(self):
        def get_count(row):
            return sum([row['abstract'].count(keyword) for keyword in self.keywords])
        self.df_full = self.df_full[self.df_full.apply(get_count, axis=1) >= self.occurances_minimum]
        
    def remove_stopwords(self,columns):
        stop = stopwords.words('english')
        for col in columns:
            self.df_full[col] = self.df_full[col].astype(str).apply(lambda x: ' '.join([word for word in x.split() if word not in (stop)]))

    def to_tfidf(self,columns):
        for col in columns:
            tfidfv = TfidfVectorizer()
            self.df_full[col + '_tfidf'] = list(tfidfv.fit_transform(self.df_full[col]).toarray())
            
    def remove_punc(self, columns):
        for col in columns:
            self.df_full[col] = self.df_full[col].str.replace('[^a-zA-Z\s]+','')
        
def display_wordcloud(text):
    wordcloud = WordCloud(max_font_size=50, max_words=100, background_color='white').generate(text)
    plt.figure()
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis("off")
    plt.show()

In [8]:
def pca_apply(df, columns, n_comp):
    new_df = df.copy()
    for col in columns:
        pca = PCA(n_components=n_comp, random_state=1)
        new_df[col+'_pca'] = list(pca.fit_transform(np.stack(df[col].to_numpy())))
    return new_df.reset_index(drop=True)

def apply_scaler(df, columns):
    new_df = df.copy()
    for col in columns:
        scaler = StandardScaler()
        new_df[col + '_scaled'] = list(scaler.fit_transform(np.stack(df[col].to_numpy())))
    return new_df.reset_index(drop=True)

In [9]:
prepr = Preprocesser(df)

(1301, 7)
(1301, 7)
(428, 7)


In [10]:
prepr.remove_punc(['body_text','abstract'])
prepr.remove_stopwords(['body_text', 'abstract'])
prepr.to_tfidf(['body_text', 'abstract'])
pca_df = pca_apply(prepr.df_full, ['abstract_tfidf','body_text_tfidf'], 3)
scaled_df = apply_scaler(pca_df,['abstract_tfidf_pca','body_text_tfidf_pca'])
# clustered_df = cluster(scaled_df, ['abstract_tfidf_pca_scaled', 'body_text_tfidf_pca_scaled'], 10)

In [None]:
# from dash.dependencies import Output, Input
from dash.exceptions import PreventUpdate
import plotly.express as px
import plotly.graph_objects as go
import dash
import dash_html_components as html
import dash_core_components as dcc
from dash.dependencies import Input, Output

def get_breaks(row, col, word_limit=45, break_char='<br>', colon=True):
    if row[col] == row[col]:
        words = row[col].split(' ')
        if colon:
            data = f'{col}: '
        else:
            data.replace('<br>', '')
        total_chars = 0

        # add break every length characters
        for i in range(len(words)):
            total_chars += len(words[i])
            if total_chars > word_limit:
                data += f'{break_char}{words[i]}'
                total_chars = 0
            else:
                data += f' {words[i]}'
        return data
    return row[col]

class Cluster_Plot:
    def __init__(self, df, text_type, clust_nums):
        self.styles = {
            'pre': {
                'border': 'thin lightgrey solid',
                'overflowX': 'scroll'
            }
        }
        self.set_app()
        self.df = df
        self.set_text(text_type)
        self.clust_nums = clust_nums
        self.create_cluster_df()
        
        if self.app is not None and hasattr(self, 'callbacks'):
            self.callbacks(self.app)

    def run_process(self):
        self.set_app_layout()
        self.app.run_server()
        
    def set_app(self):
        self.app = dash.Dash(__name__,
                        external_stylesheets=["https://codepen.io/chriddyp/pen/bWLwgP.css", "../localstyles.css"])
        
    def set_app_layout(self):
        self.app.layout = html.Div(children=[
            html.Div(className='row', style={'background-color': '#142a57'}, children=[
                html.Div(className='container', style={'max-width': 'unset'}, children=[
                    html.Div(className='row', children=[
                        html.Div(className='nine columns', children=[
                            html.H1('NPI Cluster Analysis', style={ 'color': 'white', 'padding-top': '1%'})
                        ]),
                        html.Div(className='three columns', children=[
                            html.Img(
                                src='https://storage.googleapis.com/lab_assets/Jefferson_CorporateEnterpriseLightBlue-01.png',
                                style={ 'width': '60%', 'padding-top': '5%'}
                            )
                        ])
                    ])
                ])
            ]),
            html.Div(className='container', style={'max-width': 'unset'}, children=[
                html.Div(className='row', children=[
                    html.Div(className='three columns', children=[
                        html.H2('Cluster Control Panel'),
                        html.Label('Select to cluster on the article abstract or body'),
                        dcc.RadioItems(
                            id='abstract_or_body',
                            options=[{'label': 'abstract', 'value': 'abstract'},
                                     {'label': 'body', 'value': 'body_text'}],
                            value=self.text_type
                        ),
                        html.Label('Select number of Clusters'),
                        dcc.Dropdown(
                            id='cluster_num',
                            options=[{'label': i+1, 'value': i+1} for i in list(range(20))],
                            value=self.clust_nums
                        ),
                        html.Label('Select cluster to view'),
                        dcc.Dropdown(
                            id='cluster_id',
                            options=self.cluster_id_list,
                            value='all'
                        ),
                    ]),
                    html.Div(className='six columns', children=[
                        dcc.Graph(id="graph", style={"width": "90%", "display": "inline-block"})
                    ]),
                    html.Div(className='three columns', children=[
                        html.H2('Filter Articles'),
                        html.Label('Search keywords'),
                        dcc.Input(id='search',
                            value='',
                            type='text'
                        )
                    ]),
                ]),
                html.Div(className='row', children=[
                    html.Div([
                        dcc.Markdown("""
                            **Selected Article**

                            Click on values in the plot to select article.
                        """),
                        html.Pre(id='hover-data', style=self.styles['pre'])
                    ])
                ])
            ]),
            html.Div(className='row', style={'background-color': '#142a57', 'position': 'fixed', 'bottom': '0'}, children=[
                html.Div(style={'max-width': 'unset'}, children=[
                    html.Img(
                                src='https://storage.googleapis.com/lab_assets/DICE_PoweredByEnhancedWhite.png',
                                style={ 'width': '10%', 'padding-top': '1%', 'margin-left': '45%', 'margin-right': '45%'}
                            )
                ])
            ])
        ])
        
    def create_cluster_df(self):
        self.cluster_id_list = [{'label': i, 'value': i} for i in list(range(self.clust_nums))]
        self.cluster_id_list.append({'label': 'all', 'value': 'all'})
        new_df = self.df.copy()
        kmeans = KMeans(n_clusters = self.clust_nums, random_state=1)
        new_df[self.col_cluster_id] = list(kmeans.fit_predict(np.stack(new_df[self.col].to_numpy())))
        self.cluster_df = new_df.reset_index(drop=True)
        self.cluster_df['title'] = self.cluster_df.apply(get_breaks, args=('title',), axis=1)
        self.cluster_df[['x', 'y', 'z']] = pd.DataFrame(self.cluster_df[self.col].values.tolist(),
                                                        index = self.cluster_df.index)
    
    def set_text(self, text_type):
        self.text_type = text_type
        self.col = f'{self.text_type}_tfidf_pca_scaled'
        self.col_cluster_id = f'{self.text_type}_tfidf_pca_scaled_clusterID'
        
    def callbacks(self, app):
        @app.callback([Output('cluster_id', 'options'),
                       Output('graph', 'figure')],
                      [Input('abstract_or_body', 'value'),
                       Input('cluster_num', 'value'),
                       Input('cluster_id', 'value'),
                       Input('search', 'value')])
        def update_clusters(abstract_or_body, cluster_num, cluster_id, search_string):
            self.set_text(abstract_or_body)
            self.clust_nums = cluster_num
            self.create_cluster_df()
            options = self.cluster_id_list
            df = self.cluster_df.copy()
            show_scale = True
            if cluster_id != 'all':
                show_scale = False
                df = df[df[self.col_cluster_id] == cluster_id]
            df = df[df[self.text_type].str.contains(search_string, na=False, regex=True)].reset_index(drop=True)
            fig = px.scatter_3d(df, x='x', y='y', z='z',
                                 color=self.col_cluster_id,
                                 hover_name='title',
                                 hover_data=['paper_id', 'doi'])
            fig.update_layout(scene = dict(
                                xaxis = dict(nticks=4, range=[-5,5],),
                                yaxis = dict(nticks=4, range=[-5,5],),
                                zaxis = dict(nticks=4, range=[-5,5],),),
                              hoverlabel=dict(
                                bgcolor='white', 
                                font_size=8, 
                                font_family='Rockwell'
                              ),
                              coloraxis=dict(
                                colorbar=dict(title='Cluster ID'),
                                showscale=show_scale
                              ))
            return options, fig
        
        @app.callback(Output("hover-data", "children"), [Input("graph", "clickData")])
        def display_click_data(clickData):
            string = None
            if clickData:
                print(clickData['points'][0])
                click_index = clickData['points'][0]['pointNumber']
                string = f'{self.cluster_df.iloc[click_index]["title"]}'
                item_list = ['abstract', 'body_text', 'author_list', 'paper_id', 'doi']
                for i in item_list:
                    formatted_data = get_breaks(self.cluster_df.iloc[click_index],
                                               i,
                                               word_limit=100,
                                               break_char='\n')
                    string += f'\n\n{formatted_data}'
            return string

c = Cluster_Plot(scaled_df, 'abstract', 10)
c.run_process()

 * Tip: There are .env or .flaskenv files present. Do "pip install python-dotenv" to use them.


 * Serving Flask app "__main__" (lazy loading)
 * Environment: production
   Use a production WSGI server instead.
 * Debug mode: off


 * Running on http://127.0.0.1:8050/ (Press CTRL+C to quit)
