<div align='left' style="width:29%;overflow:hidden;">
<a href='http://inria.fr'>
<img src='https://github.com/lmarti/jupyter_custom/raw/master/imgs/inr_logo_rouge.png' alt='Inria logo' title='Inria logo'/>
</a>
</div>

# Representations, LDA and topics in CORD-19

> Here we calculate representations of the papers based on their text content. Then, from these representations, a modeling of topics will be carried out using the LDA method. Finally, the most relevant papers for each topic will be determined using the PageRank scores of each paper.

In [None]:
# default_exp lda

In [1]:
!pip install -q -r requirements.txt

In [2]:
# export
from risotto.references import load_papers_from_metadata_file, build_papers_reference_graph, paper_as_markdown
from fastprogress.fastprogress import progress_bar

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation

from pathlib import Path

import scispacy
import en_core_sci_sm
import networkx as nx
from collections import defaultdict
import numpy as np

Loading paper dataset and re-generating the graph of papers and the corresponding PageRank.

In [3]:
cord19_dataset_folder = "./datasets/CORD-19-research-challenge"

In [4]:
papers, _ = load_papers_from_metadata_file(cord19_dataset_folder)

In [5]:
G = build_papers_reference_graph(papers)

In [6]:
pageranks = nx.pagerank(G)

## Paper representations

In order to build a representation for each paper, the following libraries will be used:

- spaCy: https://spacy.io/
- scispaCy: https://allenai.github.io/scispacy/

The language model named`en_core_sci_sm` will be used, which has been trained with a corpus of biomedical text with a vocabulary of more than 100.000 words.
In case of needing a model with a larger vocabulary, there are some others available.

Loading the biomedical language pipeline.

In [7]:
# export
nlp = en_core_sci_sm.load()

In [8]:
# Select a paper to showcase spacy's features
sample_paper = list(pageranks.keys())[0]
sample_text = "\n".join([ paragraph["text"] for paragraph in sample_paper._file_contents["body_text"]])
sample_text

doc = nlp(sample_text, disable=["tagger", "parser", "ner"])
doc[17].lemma_

'be'

The document tokenized by the `spacy` pipeline is displayed.
An interesting thing about using `spacy` with the pretrained language model is that it automatically computes document and token representations vectors.
It's a pending task to find out which language model architecture it's used to compute those vectors.

A relevant aspect that influences downstream tasks is the number of out-of-vocabulary (OOV) tokens.
The following cell makes a quick inspection over a sample paper counting the number of OOV tokens.
A continuación, se realizará una iteración sobre los tokens para detectarlos.

In [9]:
num_oov = 0
for token in doc:
    if token.is_oov and token.string != "\n":
        if token.string.endswith("virus"):
            print(token, "not found")
        num_oov += 1
    else:
        if token.string.endswith("virus"):
            print(token, "found")

virus found
virus found
virus found


In [10]:
print(f'Number of out of vocabulary tokens: {num_oov} ({100 * num_oov / len(doc)}%).')

Number of out of vocabulary tokens: 336 (4.801371820520148%).


Note that relevant tokens, such as *coronavirus* are included in the language model vocabulary.

Testing the mechanisms used to remove stopwords, punctuation, spaces, and extract the token's lemma.

In [11]:
tokens = {token for token in doc}

In [12]:
no_stop_word_tokens = {token for token in doc if not (token.is_stop or token.is_punct or token.is_space)}

In [13]:
len(tokens), len(no_stop_word_tokens)

(6998, 3544)

## Latent Dirichlet Allocation (LDA)

The following cells will perform topic modelling experiments using the LDA technique.
The `scikit-learn` implementation of this model will be used.

First, let's process all documents texts.

In [14]:
# export
def process_papers_file_contents(papers):
    texts = []
    nlp = en_core_sci_sm.load()
    for paper in progress_bar(papers):
        text = " \n ".join([ paragraph["text"] for paragraph in paper._file_contents["body_text"]])
        """
        NB.: for development speed purposes, the only document's attributes
        considered for the topic modelling were the title and the abstract.
        Should the text be included in other experiments, the following line
        should be modified to include `{paper.text}`.
        """
        texts.append(f"{paper.title} \n {paper.abstract}")
    return texts

In [15]:
docs = process_papers_file_contents(list(pageranks.keys()))

Peeking at the top 5 papers.

In [16]:
print('\n=====\n'.join(docs[:5]))

Sequence requirements for RNA strand transfer during nidovirus discontinuous subgenomic RNA synthesis 
 Nidovirus subgenomic mRNAs contain a leader sequence derived from the 5′ end of the genome fused to different sequences (‘bodies’) derived from the 3′ end. Their generation involves a unique mechanism of discontinuous subgenomic RNA synthesis that resembles copy-choice RNA recombination. During this process, the nascent RNA strand is transferred from one site in the template to another, during either plus or minus strand synthesis, to yield subgenomic RNA molecules. Central to this process are transcription-regulating sequences (TRSs), which are present at both template sites and ensure the fidelity of strand transfer. Here we present results of a comprehensive co-variation mutagenesis study of equine arteritis virus TRSs, demonstrating that discontinuous RNA synthesis depends not only on base pairing between sense leader TRS and antisense body TRS, but also on the primary sequence o

Vectors storing the token occurrence count will be used as document representations.
`tf-idf` vectors are purposefully not used because the document frequency normalization is already carried out by the LDA technique.

In [17]:
# export
def tokenizer(sentence):
    tokens = []
    for token in nlp(sentence, disable=["tagger", "parser", "ner"]):
        # Se descartan números, stopwords, puntuación, espacio y tokens de largo 1
        if not (token.like_num or token.is_stop or token.is_punct
                or token.is_space or len(token) == 1):
            tokens.append(token.lemma_)
    return tokens

In [18]:
count_vectorizer = CountVectorizer(tokenizer=tokenizer, lowercase=True)

In [19]:
vectorized_docs = count_vectorizer.fit_transform(docs)

A sparse matrix is built rows one for each document, and columns: one for each token.

In [20]:
vectorized_docs.shape

(44648, 164664)

In [21]:
len(count_vectorizer.vocabulary_)

164664

In [22]:
lda = LatentDirichletAllocation(n_components=10, verbose=2, n_jobs=-1)
lda = lda.fit(vectorized_docs)

[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   2 out of   4 | elapsed:   46.8s remaining:   46.8s
[Parallel(n_jobs=4)]: Done   4 out of   4 | elapsed:   49.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.


iteration: 1 of max_iter: 10


[Parallel(n_jobs=4)]: Done   2 out of   4 | elapsed:   32.2s remaining:   32.2s
[Parallel(n_jobs=4)]: Done   4 out of   4 | elapsed:   34.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.


iteration: 2 of max_iter: 10


[Parallel(n_jobs=4)]: Done   2 out of   4 | elapsed:   27.9s remaining:   27.9s
[Parallel(n_jobs=4)]: Done   4 out of   4 | elapsed:   30.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.


iteration: 3 of max_iter: 10


[Parallel(n_jobs=4)]: Done   2 out of   4 | elapsed:   34.8s remaining:   34.8s
[Parallel(n_jobs=4)]: Done   4 out of   4 | elapsed:   40.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.


iteration: 4 of max_iter: 10


[Parallel(n_jobs=4)]: Done   2 out of   4 | elapsed:   21.5s remaining:   21.5s
[Parallel(n_jobs=4)]: Done   4 out of   4 | elapsed:   24.6s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.


iteration: 5 of max_iter: 10


[Parallel(n_jobs=4)]: Done   2 out of   4 | elapsed: 34.2min remaining: 34.2min
[Parallel(n_jobs=4)]: Done   4 out of   4 | elapsed: 34.3min finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.


iteration: 6 of max_iter: 10


[Parallel(n_jobs=4)]: Done   2 out of   4 | elapsed:   38.1s remaining:   38.1s
[Parallel(n_jobs=4)]: Done   4 out of   4 | elapsed:   40.9s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.


iteration: 7 of max_iter: 10


[Parallel(n_jobs=4)]: Done   2 out of   4 | elapsed:   34.7s remaining:   34.7s
[Parallel(n_jobs=4)]: Done   4 out of   4 | elapsed:  7.5min finished


iteration: 8 of max_iter: 10


[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   2 out of   4 | elapsed:  1.2min remaining:  1.2min
[Parallel(n_jobs=4)]: Done   4 out of   4 | elapsed: 24.6min finished


iteration: 9 of max_iter: 10


[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   2 out of   4 | elapsed:   47.0s remaining:   47.0s
[Parallel(n_jobs=4)]: Done   4 out of   4 | elapsed:   49.3s finished


iteration: 10 of max_iter: 10


[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   2 out of   4 | elapsed:   54.8s remaining:   54.8s
[Parallel(n_jobs=4)]: Done   4 out of   4 | elapsed:  1.0min finished


In [23]:
lda

LatentDirichletAllocation(batch_size=128, doc_topic_prior=None,
                          evaluate_every=-1, learning_decay=0.7,
                          learning_method='batch', learning_offset=10.0,
                          max_doc_update_iter=100, max_iter=10,
                          mean_change_tol=0.001, n_components=10, n_jobs=-1,
                          perp_tol=0.1, random_state=None,
                          topic_word_prior=None, total_samples=1000000.0,
                          verbose=2)

The execution of the following cells will display the most relevant tokens for each identified topic.

In [24]:
# export
def print_topic_words(topic_model, vectorizer, num_words):
    feature_names = vectorizer.get_feature_names()
    for topic_idx, topic in enumerate(topic_model.components_):
        message = f"Topic {topic_idx}: " 
        message += " ".join(
            [feature_names[i] for i in topic.argsort()[:-num_words - 1:-1]])
        print(message)

In [25]:
print_topic_words(lda, count_vectorizer, 20)

Topic 0: mers-cov vaccine virus coronavirus mouse middle east antibody strain syndrome high group ibv tgev effect show day study respiratory infection
Topic 1: virus sample respiratory sequence detection detect assay strain test coronavirus study infection analysis human pcr result method isolate positive gene
Topic 2: calve study diarrhea pedv pig group result high infection effect control day animal porcine air increase piglet sample herd abstract
Topic 3: health disease outbreak case model datum public epidemic covid-19 transmission pandemic infectious risk control study country china system result spread
Topic 4: patient respiratory infection clinical case pneumonia study covid-19 disease acute severe nan influenza hospital treatment de child age care symptom
Topic 5: protein bind activity structure de domain peptide membrane fusion acid protease receptor coronavirus sars-cov inhibitor et enzyme ace2 site residue
Topic 6: cat feline model influenza fip prediction result rate method

The dataset papers will be classified into the different previously modelled topics.

In [26]:
docs_classified = lda.transform(vectorized_docs)
docs_classified[:5]

[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   2 out of   4 | elapsed: 47.5min remaining: 47.5min
[Parallel(n_jobs=4)]: Done   4 out of   4 | elapsed: 47.6min finished


array([[8.19751514e-04, 8.19851715e-04, 8.19785769e-04, 8.19772231e-04,
        8.19739964e-04, 8.19814598e-04, 8.19997258e-04, 9.92621681e-01,
        8.19767690e-04, 8.19838685e-04],
       [1.25023508e-03, 1.25015511e-03, 1.25002050e-03, 1.25010727e-03,
        1.25004174e-03, 8.04528155e-01, 1.25003536e-03, 1.85470640e-01,
        1.25031194e-03, 1.25029797e-03],
       [1.09919490e-03, 1.09924208e-03, 1.09941836e-03, 1.09920295e-03,
        4.52019223e-02, 7.13667213e-02, 1.09896932e-03, 7.16309239e-01,
        3.03368599e-02, 1.31289229e-01],
       [8.93201291e-04, 8.93060812e-04, 8.92943176e-04, 8.93036352e-04,
        8.92952591e-04, 8.69791018e-01, 3.50719363e-02, 8.88857918e-02,
        8.92976462e-04, 8.93082942e-04],
       [8.40432330e-04, 4.05902690e-02, 8.40452815e-04, 8.40620333e-04,
        8.40494481e-04, 8.40471730e-04, 8.40428417e-04, 9.52685855e-01,
        8.40433000e-04, 8.40542427e-04]])

Finalmente, para cada uno de los temas identifiados, se imprimen los top-5 papers pertenecientes al tema, ordenados por su *pagerank*.
Finally, the top-5 PageRank sorted papers belonging to each topic are displayed.

In [27]:
docs_topics = docs_classified.argmax(1)
topic_papers = defaultdict(list)
all_papers = list(pageranks.keys())
for idx, topic_id in enumerate(docs_topics):
    topic_papers[topic_id].append(all_papers[idx])
    
for topic_id, papers in sorted(topic_papers.items(), key=lambda t: t[0]):
    print(f"Topic ID {topic_id}")
    sorted_papers = sorted(papers, reverse=True, key=lambda p: pageranks[p])
    for paper in sorted_papers[:5]:
        paper_as_markdown(paper)
    #print("\n", end="")

Topic ID 0



- **Title:** Detection of hepatitis C virus in the nasal secretions of an intranasal drug-user
- **Authors:** McMahon, James M; Simm, Malgorzata; Milano, Danielle; Clatts, Michael
- **Publish date/time:** 2004-05-07
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: One controversial source of infection for hepatitis C virus (HCV) involves the sharing of contaminated implements, such as straws or spoons, used to nasally inhale cocaine and other powdered drugs. An essential precondition for this mode of transmission is the presence of HCV in the nasal secretions of intranasal drug users. METHODS: Blood and nasal secretion samples were collected from five plasma-positive chronic intranasal drug users and tested for HCV RNA using RT-PCR. RESULTS: HCV was detected in all five blood samples and in the nasal secretions of the subject with the highest serum viral load. CONCLUSIONS: This study is the first to demonstrate the presence of HCV in nasal secretions. This finding has implications for potential transmission of HCV through contact with contaminated nasal secretions.


- **Title:** The influence of locked nucleic acid residues on the thermodynamic properties of 2′-O-methyl RNA/RNA heteroduplexes
- **Authors:** Kierzek, Elzbieta; Ciesielska, Anna; Pasternak, Karol; Mathews, David H.; Turner, Douglas H.; Kierzek, Ryszard
- **Publish date/time:** 2005-09-09
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** The influence of locked nucleic acid (LNA) residues on the thermodynamic properties of 2′-O-methyl RNA/RNA heteroduplexes is reported. Optical melting studies indicate that LNA incorporated into an otherwise 2′-O-methyl RNA oligonucleotide usually, but not always, enhances the stabilities of complementary duplexes formed with RNA. Several trends are apparent, including: (i) a 3′ terminal U LNA and 5′ terminal LNAs are less stabilizing than interior and other 3′ terminal LNAs; (ii) most of the stability enhancement is achieved when LNA nucleotides are separated by at least one 2′-O-methyl nucleotide; and (iii) the effects of LNA substitutions are approximately additive when the LNA nucleotides are separated by at least one 2′-O-methyl nucleotide. An equation is proposed to approximate the stabilities of complementary duplexes formed with RNA when at least one 2′-O-methyl nucleotide separates LNA nucleotides. The sequence dependence of 2′-O-methyl RNA/RNA duplexes appears to be similar to that of RNA/RNA duplexes, and preliminary nearest-neighbor free energy increments at 37°C are presented for 2′-O-methyl RNA/RNA duplexes. Internal mismatches with LNA nucleotides significantly destabilize duplexes with RNA.


- **Title:** Ethnoveterinary medicines used for ruminants in British Columbia, Canada
- **Authors:** Lans, Cheryl; Turner, Nancy; Khan, Tonya; Brauer, Gerhard; Boepple, Willi
- **Publish date/time:** 2007-02-26
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: The use of medicinal plants is an option for livestock farmers who are not allowed to use allopathic drugs under certified organic programs or cannot afford to use allopathic drugs for minor health problems of livestock. METHODS: In 2003 we conducted semi-structured interviews with 60 participants obtained using a purposive sample. Medicinal plants are used to treat a range of conditions. A draft manual prepared from the data was then evaluated by participants at a participatory workshop. RESULTS: There are 128 plants used for ruminant health and diets, representing several plant families. The following plants are used for abscesses: Berberis aquifolium/Mahonia aquifolium Echinacea purpurea, Symphytum officinale, Bovista pila, Bovista plumbea, Achillea millefolium and Usnea longissima. Curcuma longa L., Salix scouleriana and Salix lucida are used for caprine arthritis and caprine arthritis encephalitis.Euphrasia officinalis and Matricaria chamomilla are used for eye problems. Wounds and injuries are treated with Bovista spp., Usnea longissima, Calendula officinalis, Arnica sp., Malva sp., Prunella vulgaris, Echinacea purpurea, Berberis aquifolium/Mahonia aquifolium, Achillea millefolium, Capsella bursa-pastoris, Hypericum perforatum, Lavandula officinalis, Symphytum officinale and Curcuma longa. Syzygium aromaticum and Pseudotsuga menziesii are used for coccidiosis. The following plants are used for diarrhea and scours: Plantago major, Calendula officinalis, Urtica dioica, Symphytum officinale, Pinus ponderosa, Potentilla pacifica, Althaea officinalis, Anethum graveolens, Salix alba and Ulmus fulva. Mastitis is treated with Achillea millefolium, Arctium lappa, Salix alba, Teucrium scorodonia and Galium aparine. Anethum graveolens and Rubus sp., are given for increased milk production.Taraxacum officinale, Zea mays, and Symphytum officinale are used for udder edema. Ketosis is treated with Gaultheria shallon, Vaccinium sp., and Symphytum officinale. Hedera helix and Alchemilla vulgaris are fed for retained placenta. CONCLUSION: Some of the plants showing high levels of validity were Hedera helix for retained placenta and Euphrasia officinalis for eye problems. Plants with high validity for wounds and injuries included Hypericum perforatum, Malva parviflora and Prunella vulgaris. Treatments with high validity against endoparasites included those with Juniperus communis and Pinus ponderosa. Anxiety and pain are well treated with Melissa officinalis and Nepeta caesarea.


- **Title:** Arthritis suppression by NADPH activation operates through an interferon-β pathway
- **Authors:** Olofsson, Peter; Nerstedt, Annika; Hultqvist, Malin; Nilsson, Elisabeth C; Andersson, Sofia; Bergelin, Anna; Holmdahl, Rikard
- **Publish date/time:** 2007-05-09
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: A polymorphism in the activating component of the nicotinamide adenine dinucleotide phosphate (NADPH) oxidase complex, neutrophil cytosolic factor 1 (NCF1), has previously been identified as a regulator of arthritis severity in mice and rats. This discovery resulted in a search for NADPH oxidase-activating substances as a potential new approach to treat autoimmune disorders such as rheumatoid arthritis (RA). We have recently shown that compounds inducing NCF1-dependent oxidative burst, e.g. phytol, have a strong ameliorating effect on arthritis in rats. However, the underlying molecular mechanism is still not clearly understood. The aim of this study was to use gene-expression profiling to understand the protective effect against arthritis of activation of NADPH oxidase in the immune system. RESULTS: Subcutaneous administration of phytol leads to an accumulation of the compound in the inguinal lymph nodes, with peak levels being reached approximately 10 days after administration. Hence, global gene-expression profiling on inguinal lymph nodes was performed 10 days after the induction of pristane-induced arthritis (PIA) and phytol administration. The differentially expressed genes could be divided into two pathways, consisting of genes regulated by different interferons. IFN-γ regulated the pathway associated with arthritis development, whereas IFN-β regulated the pathway associated with disease protection through phytol. Importantly, these two molecular pathways were also confirmed to differentiate between the arthritis-susceptible dark agouti (DA) rat, (with an Ncf-1(DA )allele that allows only low oxidative burst), and the arthritis-protected DA.Ncf-1(E3 )rat (with an Ncf1(E3 )allele that allows a stronger oxidative burst). CONCLUSION: Naturally occurring genetic polymorphisms in the Ncf-1 gene modulate the activity of the NADPH oxidase complex, which strongly regulates the severity of arthritis. We now show that the Ncf-1 allele that enhances oxidative burst and protects against arthritis is operating through an IFN-β-associated pathway, whereas the arthritis-driving allele operates through an IFN-γ-associated pathway. Treatment of arthritis-susceptible rats with an NADPH oxidase-activating substance, phytol, protects against arthritis. Interestingly, the treatment led to a restoration of the oxidative-burst effect and induction of a strikingly similar IFN-β-dependent pathway, as seen with the disease-protective Ncf1 polymorphism.


- **Title:** Rapid generation of an anthrax immunotherapeutic from goats using a novel non-toxic muramyl dipeptide adjuvant
- **Authors:** Kelly, Cassandra D; O'Loughlin, Chris; Gelder, Frank B; Peterson, Johnny W; Sower, Laurie E; Cirino, Nick M
- **Publish date/time:** 2007-10-22
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: There is a clear need for vaccines and therapeutics for potential biological weapons of mass destruction and emerging diseases. Anthrax, caused by the bacterium Bacillus anthracis, has been used as both a biological warfare agent and bioterrorist weapon previously. Although antibiotic therapy is effective in the early stages of anthrax infection, it does not have any effect once exposed individuals become symptomatic due to B. anthracis exotoxin accumulation. The bipartite exotoxins are the major contributing factors to the morbidity and mortality observed in acute anthrax infections. METHODS: Using recombinant B. anthracis protective antigen (PA83), covalently coupled to a novel non-toxic muramyl dipeptide (NT-MDP) derivative we hyper-immunized goats three times over the course of 14 weeks. Goats were plasmapheresed and the IgG fraction (not affinity purified) and F(ab')(2 )derivatives were characterized in vitro and in vivo for protection against lethal toxin mediated intoxication. RESULTS: Anti-PA83 IgG conferred 100% protection at 7.5 μg in a cell toxin neutralization assay. Mice exposed to 5 LD(50 )of Bacillus anthracis Ames spores by intranares inoculation demonstrated 60% survival 14 d post-infection when administered a single bolus dose (32 mg/kg body weight) of anti-PA83 IgG at 24 h post spore challenge. Anti-PA83 F(ab')(2 )fragments retained similar neutralization and protection levels both in vitro and in vivo. CONCLUSION: The protection afforded by these GMP-grade caprine immunotherapeutics post-exposure in the pilot murine model suggests they could be used effectively to treat post-exposure, symptomatic human anthrax patients following a bioterrorism event. These results also indicate that recombinant PA83 coupled to NT-MDP is a potent inducer of neutralizing antibodies and suggest it would be a promising vaccine candidate for anthrax. The ease of production, ease of covalent attachment, and immunostimulatory activity of the NT-MDP indicate it would be a superior adjuvant to alum or other traditional adjuvants in vaccine formulations.

Topic ID 1



- **Title:** Airborne rhinovirus detection and effect of ultraviolet irradiation on detection by a semi-nested RT-PCR assay
- **Authors:** Myatt, Theodore A; Johnston, Sebastian L; Rudnick, Stephen; Milton, Donald K
- **Publish date/time:** 2003-01-13
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: Rhinovirus, the most common cause of upper respiratory tract infections, has been implicated in asthma exacerbations and possibly asthma deaths. Although the method of transmission of rhinoviruses is disputed, several studies have demonstrated that aerosol transmission is a likely method of transmission among adults. As a first step in studies of possible airborne rhinovirus transmission, we developed methods to detect aerosolized rhinovirus by extending existing technology for detecting infectious agents in nasal specimens. METHODS: We aerosolized rhinovirus in a small aerosol chamber. Experiments were conducted with decreasing concentrations of rhinovirus. To determine the effect of UV irradiation on detection of rhinoviral aerosols, we also conducted experiments in which we exposed aerosols to a UV dose of 684 mJ/m(2). Aerosols were collected on Teflon filters and rhinovirus recovered in Qiagen AVL buffer using the Qiagen QIAamp Viral RNA Kit (Qiagen Corp., Valencia, California) followed by semi-nested RT-PCR and detection by gel electrophoresis. RESULTS: We obtained positive results from filter samples that had collected at least 1.3 TCID(50 )of aerosolized rhinovirus. Ultraviolet irradiation of airborne virus at doses much greater than those used in upper-room UV germicidal irradiation applications did not inhibit subsequent detection with the RT-PCR assay. CONCLUSION: The air sampling and extraction methodology developed in this study should be applicable to the detection of rhinovirus and other airborne viruses in the indoor air of offices and schools. This method, however, cannot distinguish UV inactivated virus from infectious viral particles.


- **Title:** Relationship of SARS-CoV to other pathogenic RNA viruses explored by tetranucleotide usage profiling
- **Authors:** Yap, Yee Leng; Zhang, Xue Wu; Danchin, Antoine
- **Publish date/time:** 2003-09-20
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: The exact origin of the cause of the Severe Acute Respiratory Syndrome (SARS) is still an open question. The genomic sequence relationship of SARS-CoV with 30 different single-stranded RNA (ssRNA) viruses of various families was studied using two non-standard approaches. Both approaches began with the vectorial profiling of the tetra-nucleotide usage pattern V for each virus. In approach one, a distance measure of a vector V, based on correlation coefficient was devised to construct a relationship tree by the neighbor-joining algorithm. In approach two, a multivariate factor analysis was performed to derive the embedded tetra-nucleotide usage patterns. These patterns were subsequently used to classify the selected viruses. RESULTS: Both approaches yielded relationship outcomes that are consistent with the known virus classification. They also indicated that the genome of RNA viruses from the same family conform to a specific pattern of word usage. Based on the correlation of the overall tetra-nucleotide usage patterns, the Transmissible Gastroenteritis Virus (TGV) and the Feline CoronaVirus (FCoV) are closest to SARS-CoV. Surprisingly also, the RNA viruses that do not go through a DNA stage displayed a remarkable discrimination against the CpG and UpA di-nucleotide (z = -77.31, -52.48 respectively) and selection for UpG and CpA (z = 65.79,49.99 respectively). Potential factors influencing these biases are discussed. CONCLUSION: The study of genomic word usage is a powerful method to classify RNA viruses. The congruence of the relationship outcomes with the known classification indicates that there exist phylogenetic signals in the tetra-nucleotide usage patterns, that is most prominent in the replicase open reading frames.


- **Title:** Bioinformatics analysis of SARS coronavirus genome polymorphism
- **Authors:** Pavlović-Lažetić, Gordana M; Mitić, Nenad S; Beljanski, Miloš V
- **Publish date/time:** 2004-05-25
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: We have compared 38 isolates of the SARS-CoV complete genome. The main goal was twofold: first, to analyze and compare nucleotide sequences and to identify positions of single nucleotide polymorphism (SNP), insertions and deletions, and second, to group them according to sequence similarity, eventually pointing to phylogeny of SARS-CoV isolates. The comparison is based on genome polymorphism such as insertions or deletions and the number and positions of SNPs. RESULTS: The nucleotide structure of all 38 isolates is presented. Based on insertions and deletions and dissimilarity due to SNPs, the dataset of all the isolates has been qualitatively classified into three groups each having their own subgroups. These are the A-group with "regular" isolates (no insertions / deletions except for 5' and 3' ends), the B-group of isolates with "long insertions", and the C-group of isolates with "many individual" insertions and deletions. The isolate with the smallest average number of SNPs, compared to other isolates, has been identified (TWH). The density distribution of SNPs, insertions and deletions for each group or subgroup, as well as cumulatively for all the isolates is also presented, along with the gene map for TWH. Since individual SNPs may have occurred at random, positions corresponding to multiple SNPs (occurring in two or more isolates) are identified and presented. This result revises some previous results of a similar type. Amino acid changes caused by multiple SNPs are also identified (for the annotated sequences, as well as presupposed amino acid changes for non-annotated ones). Exact SNP positions for the isolates in each group or subgroup are presented. Finally, a phylogenetic tree for the SARS-CoV isolates has been produced using the CLUSTALW program, showing high compatibility with former qualitative classification. CONCLUSIONS: The comparative study of SARS-CoV isolates provides essential information for genome polymorphism, indication of strain differences and variants evolution. It may help with the development of effective treatment.


- **Title:** Moderate mutation rate in the SARS coronavirus genome and its implications
- **Authors:** Zhao, Zhongming; Li, Haipeng; Wu, Xiaozhuang; Zhong, Yixi; Zhang, Keqin; Zhang, Ya-Ping; Boerwinkle, Eric; Fu, Yun-Xin
- **Publish date/time:** 2004-06-28
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: The outbreak of severe acute respiratory syndrome (SARS) caused a severe global epidemic in 2003 which led to hundreds of deaths and many thousands of hospitalizations. The virus causing SARS was identified as a novel coronavirus (SARS-CoV) and multiple genomic sequences have been revealed since mid-April, 2003. After a quiet summer and fall in 2003, the newly emerged SARS cases in Asia, particularly the latest cases in China, are reinforcing a wide-spread belief that the SARS epidemic would strike back. With the understanding that SARS-CoV might be with humans for years to come, knowledge of the evolutionary mechanism of the SARS-CoV, including its mutation rate and emergence time, is fundamental to battle this deadly pathogen. To date, the speed at which the deadly virus evolved in nature and the elapsed time before it was transmitted to humans remains poorly understood. RESULTS: Sixteen complete genomic sequences with available clinical histories during the SARS outbreak were analyzed. After careful examination of multiple-sequence alignment, 114 single nucleotide variations were identified. To minimize the effects of sequencing errors and additional mutations during the cell culture, three strategies were applied to estimate the mutation rate by 1) using the closely related sequences as background controls; 2) adjusting the divergence time for cell culture; or 3) using the common variants only. The mutation rate in the SARS-CoV genome was estimated to be 0.80 – 2.38 × 10(-3 )nucleotide substitution per site per year which is in the same order of magnitude as other RNA viruses. The non-synonymous and synonymous substitution rates were estimated to be 1.16 – 3.30 × 10(-3 )and 1.67 – 4.67 × 10(-3 )per site per year, respectively. The most recent common ancestor of the 16 sequences was inferred to be present as early as the spring of 2002. CONCLUSIONS: The estimated mutation rates in the SARS-CoV using multiple strategies were not unusual among coronaviruses and moderate compared to those in other RNA viruses. All estimates of mutation rates led to the inference that the SARS-CoV could have been with humans in the spring of 2002 without causing a severe epidemic.


- **Title:** Base-By-Base: Single nucleotide-level analysis of whole viral genome alignments
- **Authors:** Brodie, Ryan; Smith, Alex J; Roper, Rachel L; Tcherepanov, Vasily; Upton, Chris
- **Publish date/time:** 2004-07-14
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: With ever increasing numbers of closely related virus genomes being sequenced, it has become desirable to be able to compare two genomes at a level more detailed than gene content because two strains of an organism may share the same set of predicted genes but still differ in their pathogenicity profiles. For example, detailed comparison of multiple isolates of the smallpox virus genome (each approximately 200 kb, with 200 genes) is not feasible without new bioinformatics tools. RESULTS: A software package, Base-By-Base, has been developed that provides visualization tools to enable researchers to 1) rapidly identify and correct alignment errors in large, multiple genome alignments; and 2) generate tabular and graphical output of differences between the genomes at the nucleotide level. Base-By-Base uses detailed annotation information about the aligned genomes and can list each predicted gene with nucleotide differences, display whether variations occur within promoter regions or coding regions and whether these changes result in amino acid substitutions. Base-By-Base can connect to our mySQL database (Virus Orthologous Clusters; VOCs) to retrieve detailed annotation information about the aligned genomes or use information from text files. CONCLUSION: Base-By-Base enables users to quickly and easily compare large viral genomes; it highlights small differences that may be responsible for important phenotypic differences such as virulence. It is available via the Internet using Java Web Start and runs on Macintosh, PC and Linux operating systems with the Java 1.4 virtual machine.

Topic ID 2



- **Title:** A new paradigm in respiratory hygiene: increasing the cohesivity of airway secretions to improve cough interaction and reduce aerosol dispersion
- **Authors:** Zayas, Gustavo; Dimitry, John; Zayas, Ana; O'Brien, Darryl; King, Malcolm
- **Publish date/time:** 2005-09-02
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: Infectious respiratory diseases are transmitted to non-infected subjects when an infected person expels pathogenic microorganisms to the surrounding environment when coughing or sneezing. When the airway mucus layer interacts with high-speed airflow, droplets are expelled as aerosol; their concentration and size distribution may each play an important role in disease transmission. Our goal is to reduce the aerosolizability of respiratory secretions while interfering only minimally with normal mucus clearance using agents capable of increasing crosslinking in the mucin glycoprotein network. METHODS: We exposed mucus simulants (MS) to airflow in a simulated cough machine (SCM). The MS ranged from non-viscous, non-elastic substances (water) to MS of varying degrees of viscosity and elasticity. Mucociliary clearance of the MS was assessed on the frog palate, elasticity in the Filancemeter and the aerosol pattern in a "bulls-eye" target. The sample loaded was weighed before and after each cough maneuver. We tested two mucomodulators: sodium tetraborate (XL"B") and calcium chloride (XL "C"). RESULTS: Mucociliary transport was close to normal speed in viscoelastic samples compared to non-elastic, non-viscous or viscous-only samples. Spinnability ranged from 2.5 ± 0.6 to 50.9 ± 6.9 cm, and the amount of MS expelled from the SCM increased from 47 % to 96 % adding 1.5 μL to 150 μL of XL "B". Concurrently, particles were inversely reduced to almost disappear from the aerosolization pattern. CONCLUSION: The aerosolizability of MS was modified by increasing its cohesivity, thereby reducing the number of particles expelled from the SCM while interfering minimally with its clearance on the frog palate. An unexpected finding is that MS crosslinking increased "expectoration".


- **Title:** The effects of injection of bovine vaccine into a human digit: a case report
- **Authors:** O'Neill, Jennifer K; Richards, Simon W; Ricketts, David M; Patterson, Marc H
- **Publish date/time:** 2005-10-11
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: The incidence of needlestick injuries in farmers and veterinary surgeons is significant and the consequences of such an injection can be serious. CASE PRESENTATION: We report accidental injection of bovine vaccine into the base of the little finger. This resulted in increased pressure in the flexor sheath causing signs and symptoms of ischemia. Amputation of the digit was required despite repeated surgical debridement and decompression. CONCLUSION: There have been previous reports of injection of oil-based vaccines into the human hand resulting in granulomatous inflammation or sterile abscess and causing morbidity and tissue loss. Self-injection with veterinary vaccines is an occupational hazard for farmers and veterinary surgeons. Injection of vaccine into a closed compartment such as the human finger can have serious sequelae including loss of the injected digit. These injuries are not to be underestimated. Early debridement and irrigation of the injected area with decompression is likely to give the best outcome. Frequent review is necessary after the first procedure because repeat operations may be required.


- **Title:** Correlation between the presence of neutralizing antibodies against porcine circovirus 2 (PCV2) and protection against replication of the virus and development of PCV2-associated disease
- **Authors:** Meerts, Peter; Misinzo, Gerald; Lefebvre, David; Nielsen, Jens; Bøtner, Anette; Kristensen, Charlotte S; Nauwynck, Hans J
- **Publish date/time:** 2006-01-30
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: In a previous study, it was demonstrated that high replication of Porcine circovirus 2 (PCV2) in a gnotobiotic pig was correlated with the absence of PCV2-neutralizing antibodies. The aim of the present study was to investigate if this correlation could also be found in SPF pigs in which PMWS was experimentally reproduced and in naturally PMWS-affected pigs. RESULTS: When looking at the total anti-PCV2 antibody titres, PMWS-affected and healthy animals seroconverted at the same time point, and titres in PMWS-affected animals were only slightly lower compared to those in healthy animals. In healthy animals, the evolution of PCV2-neutralizing antibodies coincided with that of total antibodies. In PMWS-affected animals, neutralizing antibodies could either not be found (sera from field studies) or were detected in low titres between 7 and 14 DPI only (sera from experimentally inoculated SPF pigs). Differences were also found in the evolution of specific antibody isotypes titres against PCV2. In healthy pigs, IgM antibodies persisted until the end of the study, whereas in PMWS-affected pigs they quickly decreased or remained present at low titres. The mean titres of other antibody isotypes (IgG1, IgG2 and IgA), were slightly lower in PMWS-affected pigs compared to their healthy group mates at the end of each study. CONCLUSION: This study describes important differences in the development of the humoral immune response between pigs that get subclinically infected with PCV2 and pigs that experience a high level of PCV2-replication which in 3 of 4 experiments led to the development of PMWS. These observations may contribute to a better understanding of the pathogenesis of a PCV2-infection.


- **Title:** How long do nosocomial pathogens persist on inanimate surfaces? A systematic review
- **Authors:** Kramer, Axel; Schwebke, Ingeborg; Kampf, Günter
- **Publish date/time:** 2006-08-16
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: Inanimate surfaces have often been described as the source for outbreaks of nosocomial infections. The aim of this review is to summarize data on the persistence of different nosocomial pathogens on inanimate surfaces. METHODS: The literature was systematically reviewed in MedLine without language restrictions. In addition, cited articles in a report were assessed and standard textbooks on the topic were reviewed. All reports with experimental evidence on the duration of persistence of a nosocomial pathogen on any type of surface were included. RESULTS: Most gram-positive bacteria, such as Enterococcus spp. (including VRE), Staphylococcus aureus (including MRSA), or Streptococcus pyogenes, survive for months on dry surfaces. Many gram-negative species, such as Acinetobacter spp., Escherichia coli, Klebsiella spp., Pseudomonas aeruginosa, Serratia marcescens, or Shigella spp., can also survive for months. A few others, such as Bordetella pertussis, Haemophilus influenzae, Proteus vulgaris, or Vibrio cholerae, however, persist only for days. Mycobacteria, including Mycobacterium tuberculosis, and spore-forming bacteria, including Clostridium difficile, can also survive for months on surfaces. Candida albicans as the most important nosocomial fungal pathogen can survive up to 4 months on surfaces. Persistence of other yeasts, such as Torulopsis glabrata, was described to be similar (5 months) or shorter (Candida parapsilosis, 14 days). Most viruses from the respiratory tract, such as corona, coxsackie, influenza, SARS or rhino virus, can persist on surfaces for a few days. Viruses from the gastrointestinal tract, such as astrovirus, HAV, polio- or rota virus, persist for approximately 2 months. Blood-borne viruses, such as HBV or HIV, can persist for more than one week. Herpes viruses, such as CMV or HSV type 1 and 2, have been shown to persist from only a few hours up to 7 days. CONCLUSION: The most common nosocomial pathogens may well survive or persist on surfaces for months and can thereby be a continuous source of transmission if no regular preventive surface disinfection is performed.


- **Title:** A super-spreading ewe infects hundreds with Q fever at a farmers' market in Germany
- **Authors:** Porten, Klaudia; Rissland, Jürgen; Tigges, Almira; Broll, Susanne; Hopp, Wilfried; Lunemann, Mechthild; van Treeck, Ulrich; Kimmig, Peter; Brockmann, Stefan O; Wagner-Wiening, Christiane; Hellenbrand, Wiebke; Buchholz, Udo
- **Publish date/time:** 2006-10-06
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: In May 2003 the Soest County Health Department was informed of an unusually large number of patients hospitalized with atypical pneumonia. METHODS: In exploratory interviews patients mentioned having visited a farmers' market where a sheep had lambed. Serologic testing confirmed the diagnosis of Q fever. We asked local health departments in Germany to identiy notified Q fever patients who had visited the farmers market. To investigate risk factors for infection we conducted a case control study (cases were Q fever patients, controls were randomly selected Soest citizens) and a cohort study among vendors at the market. The sheep exhibited at the market, the herd from which it originated as well as sheep from herds held in the vicinity of Soest were tested for Coxiella burnetii (C. burnetii). RESULTS: A total of 299 reported Q fever cases was linked to this outbreak. The mean incubation period was 21 days, with an interquartile range of 16–24 days. The case control study identified close proximity to and stopping for at least a few seconds at the sheep's pen as significant risk factors. Vendors within approximately 6 meters of the sheep's pen were at increased risk for disease compared to those located farther away. Wind played no significant role. The clinical attack rate of adults and children was estimated as 20% and 3%, respectively, 25% of cases were hospitalized. The ewe that had lambed as well as 25% of its herd tested positive for C. burnetii antibodies. CONCLUSION: Due to its size and point source nature this outbreak permitted assessment of fundamental, but seldom studied epidemiological parameters. As a consequence of this outbreak, it was recommended that pregnant sheep not be displayed in public during the 3(rd )trimester and to test animals in petting zoos regularly for C. burnetii.

Topic ID 3



- **Title:** A new recruit for the army of the men of death
- **Authors:** Petsko, Gregory A
- **Publish date/time:** 2003-06-27
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** The army of the men of death, in John Bunyan's memorable phrase, has a new recruit, and fear has a new face: a face wearing a surgical mask.


- **Title:** A double epidemic model for the SARS propagation
- **Authors:** Ng, Tuen Wai; Turinici, Gabriel; Danchin, Antoine
- **Publish date/time:** 2003-09-10
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: An epidemic of a Severe Acute Respiratory Syndrome (SARS) caused by a new coronavirus has spread from the Guangdong province to the rest of China and to the world, with a puzzling contagion behavior. It is important both for predicting the future of the present outbreak and for implementing effective prophylactic measures, to identify the causes of this behavior. RESULTS: In this report, we show first that the standard Susceptible-Infected-Removed (SIR) model cannot account for the patterns observed in various regions where the disease spread. We develop a model involving two superimposed epidemics to study the recent spread of the SARS in Hong Kong and in the region. We explore the situation where these epidemics may be caused either by a virus and one or several mutants that changed its tropism, or by two unrelated viruses. This has important consequences for the future: the innocuous epidemic might still be there and generate, from time to time, variants that would have properties similar to those of SARS. CONCLUSION: We find that, in order to reconcile the existing data and the spread of the disease, it is convenient to suggest that a first milder outbreak protected against the SARS. Regions that had not seen the first epidemic, or that were affected simultaneously with the SARS suffered much more, with a very high percentage of persons affected. We also find regions where the data appear to be inconsistent, suggesting that they are incomplete or do not reflect an appropriate identification of SARS patients. Finally, we could, within the framework of the model, fix limits to the future development of the epidemic, allowing us to identify landmarks that may be useful to set up a monitoring system to follow the evolution of the epidemic. The model also suggests that there might exist a SARS precursor in a large reservoir, prompting for implementation of precautionary measures when the weather cools down.


- **Title:** A model of tripeptidyl-peptidase I (CLN2), a ubiquitous and highly conserved member of the sedolisin family of serine-carboxyl peptidases
- **Authors:** Wlodawer, Alexander; Durell, Stewart R; Li, Mi; Oyama, Hiroshi; Oda, Kohei; Dunn, Ben M
- **Publish date/time:** 2003-11-11
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: Tripeptidyl-peptidase I, also known as CLN2, is a member of the family of sedolisins (serine-carboxyl peptidases). In humans, defects in expression of this enzyme lead to a fatal neurodegenerative disease, classical late-infantile neuronal ceroid lipofuscinosis. Similar enzymes have been found in the genomic sequences of several species, but neither systematic analyses of their distribution nor modeling of their structures have been previously attempted. RESULTS: We have analyzed the presence of orthologs of human CLN2 in the genomic sequences of a number of eukaryotic species. Enzymes with sequences sharing over 80% identity have been found in the genomes of macaque, mouse, rat, dog, and cow. Closely related, although clearly distinct, enzymes are present in fish (fugu and zebra), as well as in frogs (Xenopus tropicalis). A three-dimensional model of human CLN2 was built based mainly on the homology with Pseudomonas sp. 101 sedolisin. CONCLUSION: CLN2 is very highly conserved and widely distributed among higher organisms and may play an important role in their life cycles. The model presented here indicates a very open and accessible active site that is almost completely conserved among all known CLN2 enzymes. This result is somehow surprising for a tripeptidase where the presence of a more constrained binding pocket was anticipated. This structural model should be useful in the search for the physiological substrates of these enzymes and in the design of more specific inhibitors of CLN2.


- **Title:** Air pollution and case fatality of SARS in the People's Republic of China: an ecologic study
- **Authors:** Cui, Yan; Zhang, Zuo-Feng; Froines, John; Zhao, Jinkou; Wang, Hua; Yu, Shun-Zhang; Detels, Roger
- **Publish date/time:** 2003-11-20
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: Severe acute respiratory syndrome (SARS) has claimed 349 lives with 5,327 probable cases reported in mainland China since November 2002. SARS case fatality has varied across geographical areas, which might be partially explained by air pollution level. METHODS: Publicly accessible data on SARS morbidity and mortality were utilized in the data analysis. Air pollution was evaluated by air pollution index (API) derived from the concentrations of particulate matter, sulfur dioxide, nitrogen dioxide, carbon monoxide and ground-level ozone. Ecologic analysis was conducted to explore the association and correlation between air pollution and SARS case fatality via model fitting. Partially ecologic studies were performed to assess the effects of long-term and short-term exposures on the risk of dying from SARS. RESULTS: Ecologic analysis conducted among 5 regions with 100 or more SARS cases showed that case fatality rate increased with the increment of API (case fatality = - 0.063 + 0.001 * API). Partially ecologic study based on short-term exposure demonstrated that SARS patients from regions with moderate APIs had an 84% increased risk of dying from SARS compared to those from regions with low APIs (RR = 1.84, 95% CI: 1.41–2.40). Similarly, SARS patients from regions with high APIs were twice as likely to die from SARS compared to those from regions with low APIs. (RR = 2.18, 95% CI: 1.31–3.65). Partially ecologic analysis based on long-term exposure to ambient air pollution showed the similar association. CONCLUSION: Our studies demonstrated a positive association between air pollution and SARS case fatality in Chinese population by utilizing publicly accessible data on SARS statistics and air pollution indices. Although ecologic fallacy and uncontrolled confounding effect might have biased the results, the possibility of a detrimental effect of air pollution on the prognosis of SARS patients deserves further investigation.


- **Title:** The Virus That Changed My World
- **Authors:** Poh Ng, Lisa Fong
- **Publish date/time:** 2003-12-22
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** Personal account of a young virologist working in Singapore at the height of the 2003 SARS pandemic

Topic ID 4



- **Title:** Association of HLA class I with severe acute respiratory syndrome coronavirus infection
- **Authors:** Lin, Marie; Tseng, Hsiang-Kuang; Trejaut, Jean A; Lee, Hui-Lin; Loo, Jun-Hun; Chu, Chen-Chung; Chen, Pei-Jan; Su, Ying-Wen; Lim, Ken Hong; Tsai, Zen-Uong; Lin, Ruey-Yi; Lin, Ruey-Shiung; Huang, Chun-Hsiung
- **Publish date/time:** 2003-09-12
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: The human leukocyte antigen (HLA) system is widely used as a strategy in the search for the etiology of infectious diseases and autoimmune disorders. During the Taiwan epidemic of severe acute respiratory syndrome (SARS), many health care workers were infected. In an effort to establish a screening program for high risk personal, the distribution of HLA class I and II alleles in case and control groups was examined for the presence of an association to a genetic susceptibly or resistance to SARS coronavirus infection. METHODS: HLA-class I and II allele typing by PCR-SSOP was performed on 37 cases of probable SARS, 28 fever patients excluded later as probable SARS, and 101 non-infected health care workers who were exposed or possibly exposed to SARS coronavirus. An additional control set of 190 normal healthy unrelated Taiwanese was also used in the analysis. RESULTS: Woolf and Haldane Odds ratio (OR) and corrected P-value (Pc) obtained from two tails Fisher exact test were used to show susceptibility of HLA class I or class II alleles with coronavirus infection. At first, when analyzing infected SARS patients and high risk health care workers groups, HLA-B*4601 (OR = 2.08, P = 0.04, Pc = n.s.) and HLA-B*5401 (OR = 5.44, P = 0.02, Pc = n.s.) appeared as the most probable elements that may be favoring SARS coronavirus infection. After selecting only a "severe cases" patient group from the infected "probable SARS" patient group and comparing them with the high risk health care workers group, the severity of SARS was shown to be significantly associated with HLA-B*4601 (P = 0.0008 or Pc = 0.0279). CONCLUSIONS: Densely populated regions with genetically related southern Asian populations appear to be more affected by the spreading of SARS infection. Up until recently, no probable SARS patients were reported among Taiwan indigenous peoples who are genetically distinct from the Taiwanese general population, have no HLA-B* 4601 and have high frequency of HLA-B* 1301. While increase of HLA-B* 4601 allele frequency was observed in the "Probable SARS infected" patient group, a further significant increase of the allele was seen in the "Severe cases" patient group. These results appeared to indicate association of HLA-B* 4601 with the severity of SARS infection in Asian populations. Independent studies are needed to test these results.


- **Title:** Pro/con clinical debate: Isolation precautions for all intensive care unit patients with methicillin-resistant Staphylococcus aureus colonization are essential
- **Authors:** Farr, Barry M; Bellingan, Geoffrey
- **Publish date/time:** 2004-02-19
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** Antibiotic-resistant bacteria are an increasingly common problem in intensive care units (ICUs), and they are capable of impacting on patient outcome, the ICU's budget and bed availability. This issue, coupled with recent outbreaks of illnesses that pose a direct risk to ICU staff (such as SARS [severe acute respiratory syndrome]), has led to renewed emphasis on infection control measures and practitioners in the ICU. Infection control measures frequently cause clinicians to practice in a more time consuming way. As a result it is not surprising that ensuring compliance with these measures is not always easy, particularly when their benefit is not immediately obvious. In this issue of Critical Care, two experts face off over the need to isolate patients infected with methicillin-resistant Staphylococcus aureus.


- **Title:** Dynamic changes of serum SARS-Coronavirus IgG, pulmonary function and radiography in patients recovering from SARS after hospital discharge
- **Authors:** Xie, Lixin; Liu, Youning; Fan, Baoxing; Xiao, Yueyong; Tian, Qing; Chen, Liangan; Zhao, Hong; Chen, Weijun
- **Publish date/time:** 2005-01-08
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** OBJECTIVE: The intent of this study was to examine the recovery of individuals who had been hospitalized for severe acute respiratory syndrome (SARS) in the year following their discharge from the hospital. Parameters studied included serum levels of SARS coronavirus (SARS-CoV) IgG antibody, tests of lung function, and imaging data to evaluate changes in lung fibrosis. In addition, we explored the incidence of femoral head necrosis in some of the individuals recovering from SARS. METHODS: The subjects of this study were 383 clinically diagnosed SARS patients in Beijing, China. They were tested regularly for serum levels of SARS-CoV IgG antibody and lung function and were given chest X-rays and/or high resolution computerized tomography (HRCT) examinations at the Chinese PLA General Hospital during the 12 months that followed their release from the hospital. Those individuals who were found to have lung diffusion abnormities (transfer coefficient for carbon monoxide [D(L)CO] < 80% of predicted value [pred]) received regular lung function tests and HRCT examinations in the follow-up phase in order to document the changes in their lung condition. Some patients who complained of joint pain were given magnetic resonance imaging (MRI) examinations of their femoral heads. FINDINGS: Of all the subjects, 81.2% (311 of 383 patients) tested positive for serum SARS-CoV IgG. Of those testing positive, 27.3% (85 of 311 patients) were suffering from lung diffusion abnormities (D(L)CO < 80% pred) and 21.5% (67 of 311 patients) exhibited lung fibrotic changes. In the 12 month duration of this study, all of the 40 patients with lung diffusion abnormities who were examined exhibited some improvement of lung function and fibrosis detected by radiography. Of the individuals receiving MRI examinations, 23.1% (18 of 78 patients) showed signs of femoral head necrosis. INTERPRETATION: The lack of sero-positive SARS-CoV in some individuals suggests that there may have been some misdiagnosed cases among the subjects included in this study. Of those testing positive, the serum levels of SARS-CoV IgG antibody decreased significantly during the 12 months after hospital discharge. Additionally, we found that the individuals who had lung fibrosis showed some spontaneous recovery. Finally, some of the subjects developed femoral head necrosis.


- **Title:** Accuracy of parents in measuring body temperature with a tympanic thermometer
- **Authors:** Robinson, Joan L; Jou, Hsing; Spady, Donald W
- **Publish date/time:** 2005-01-11
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: It is now common for parents to measure tympanic temperatures in children. The objective of this study was to assess the diagnostic accuracy of these measurements. METHODS: Parents and then nurses measured the temperature of 60 children with a tympanic thermometer designed for home use (home thermometer). The reference standard was a temperature measured by a nurse with a model of tympanic thermometer commonly used in hospitals (hospital thermometer). A difference of ≥ 0.5 °C was considered clinically significant. A fever was defined as a temperature ≥ 38.5 °C. RESULTS: The mean absolute difference between the readings done by the parent and the nurse with the home thermometer was 0.44 ± 0.61 °C, and 33% of the readings differed by ≥ 0.5 °C. The mean absolute difference between the readings done by the parent with the home thermometer and the nurse with the hospital thermometer was 0.51 ± 0.63 °C, and 72 % of the readings differed by ≥ 0.5 °C. Using the home thermometer, parents detected fever with a sensitivity of 76% (95% CI 50–93%), a specificity of 95% (95% CI 84–99%), a positive predictive value of 87% (95% CI 60–98%), and a negative predictive value of 91% (95% CI 79–98 %). In comparing the readings the nurse obtained from the two different tympanic thermometers, the mean absolute difference was 0.24 ± 0.22 °C. Nurses detected fever with a sensitivity of 94% (95 % CI 71–100 %), a specificity of 88% (95% CI 75–96 %), a positive predictive value of 76% (95% CI 53–92%), and a negative predictive value of 97% (95%CI 87–100 %) using the home thermometer. The intraclass correlation coefficient for the three sets of readings was 0.80, and the consistency of readings was not affected by the body temperature. CONCLUSIONS: The readings done by parents with a tympanic thermometer designed for home use differed a clinically significant amount from the reference standard (readings done by nurses with a model of tympanic thermometer commonly used in hospitals) the majority of the time, and parents failed to detect fever about one-quarter of the time. Tympanic readings reported by parents should be interpreted with great caution.


- **Title:** The immediate effects of the severe acute respiratory syndrome (SARS) epidemic on childbirth in Taiwan
- **Authors:** Lee, Cheng-Hua; Huang, Nicole; Chang, Hong-Jen; Hsu, Yea-Jen; Wang, Mei-Chu; Chou, Yiing-Jenq
- **Publish date/time:** 2005-04-04
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: When an emerging infectious disease like severe acute respiratory syndrome (SARS) strikes suddenly, many wonder the public's overwhelming fears of SARS may deterred patients from seeking routine care from hospitals and/or interrupt patient's continuity of care. In this study, we sought to estimate the influence of pregnant women's fears of severe acute respiratory syndrome (SARS) on their choice of provider, mode of childbirth, and length of stay (LOS) for the delivery during and after the SARS epidemic in Taiwan. METHODS: The National Health Insurance data from January 01, 2002 to December 31, 2003 were used. A population-based descriptive analysis was conducted to assess the changes in volume, market share, cesarean rate, and average LOS for each of the 4 provider levels, before, during and after the SARS epidemic. RESULTS: Compared to the pre-SARS period, medical centers and regional hospitals dropped 5.2% and 4.1% in market share for childbirth services during the peak SARS period, while district hospitals and clinics increased 2.1% and 7.1%, respectively. For changes in cesarean rates, only a significantly larger increase was observed in medical centers (2.2%) during the peak SARS period. In terms of LOS, significant reductions in average LOS were observed in all hospital levels except for clinics. Average LOS was shortened by 0.21 days in medical centers (5.6%), 0.21 days in regional hospitals (5.8%), and 0.13 days in district hospitals (3.8%). CONCLUSION: The large amount of patients shifting from the maternity wards of more advanced hospitals to those of less advanced hospitals, coupled with the substantial reduction in their length of maternity stay due to their fears of SARS could also lead to serious concerns for quality of care, especially regarding a patient's accessibility to quality providers and continuity of care.

Topic ID 5



- **Title:** Crystal structure of murine sCEACAM1a[1,4]: a coronavirus receptor in the CEA family
- **Authors:** Tan, Kemin; Zelus, Bruce D.; Meijers, Rob; Liu, Jin-huan; Bergelson, Jeffrey M.; Duke, Norma; Zhang, Rongguang; Joachimiak, Andrzej; Holmes, Kathryn V.; Wang, Jia-huai
- **Publish date/time:** 2002-05-01
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** CEACAM1 is a member of the carcinoembryonic antigen (CEA) family. Isoforms of murine CEACAM1 serve as receptors for mouse hepatitis virus (MHV), a murine coronavirus. Here we report the crystal structure of soluble murine sCEACAM1a[1,4], which is composed of two Ig-like domains and has MHV neutralizing activity. Its N-terminal domain has a uniquely folded CC′ loop that encompasses key virus-binding residues. This is the first atomic structure of any member of the CEA family, and provides a prototypic architecture for functional exploration of CEA family members. We discuss the structural basis of virus receptor activities of murine CEACAM1 proteins, binding of Neisseria to human CEACAM1, and other homophilic and heterophilic interactions of CEA family members.


- **Title:** Structure of coronavirus main proteinase reveals combination of a chymotrypsin fold with an extra α-helical domain
- **Authors:** Anand, Kanchan; Palm, Gottfried J.; Mesters, Jeroen R.; Siddell, Stuart G.; Ziebuhr, John; Hilgenfeld, Rolf
- **Publish date/time:** 2002-07-01
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** The key enzyme in coronavirus polyprotein processing is the viral main proteinase, M(pro), a protein with extremely low sequence similarity to other viral and cellular proteinases. Here, the crystal structure of the 33.1 kDa transmissible gastroenteritis (corona)virus M(pro) is reported. The structure was refined to 1.96 Å resolution and revealed three dimers in the asymmetric unit. The mutual arrangement of the protomers in each of the dimers suggests that M(pro) self-processing occurs in trans. The active site, comprised of Cys144 and His41, is part of a chymotrypsin-like fold that is connected by a 16 residue loop to an extra domain featuring a novel α-helical fold. Molecular modelling and mutagenesis data implicate the loop in substrate binding and elucidate S1 and S2 subsites suitable to accommodate the side chains of the P1 glutamine and P2 leucine residues of M(pro) substrates. Interactions involving the N-terminus and the α-helical domain stabilize the loop in the orientation required for trans-cleavage activity. The study illustrates that RNA viruses have evolved unprecedented variations of the classical chymotrypsin fold.


- **Title:** Cloaked similarity between HIV-1 and SARS-CoV suggests an anti-SARS strategy
- **Authors:** Kliger, Yossef; Levanon, Erez Y
- **Publish date/time:** 2003-09-21
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: Severe acute respiratory syndrome (SARS) is a febrile respiratory illness. The disease has been etiologically linked to a novel coronavirus that has been named the SARS-associated coronavirus (SARS-CoV), whose genome was recently sequenced. Since it is a member of the Coronaviridae, its spike protein (S2) is believed to play a central role in viral entry by facilitating fusion between the viral and host cell membranes. The protein responsible for viral-induced membrane fusion of HIV-1 (gp41) differs in length, and has no sequence homology with S2. RESULTS: Sequence analysis reveals that the two viral proteins share the sequence motifs that construct their active conformation. These include (1) an N-terminal leucine/isoleucine zipper-like sequence, and (2) a C-terminal heptad repeat located upstream of (3) an aromatic residue-rich region juxtaposed to the (4) transmembrane segment. CONCLUSIONS: This study points to a similar mode of action for the two viral proteins, suggesting that anti-viral strategy that targets the viral-induced membrane fusion step can be adopted from HIV-1 to SARS-CoV. Recently the FDA approved Enfuvirtide, a synthetic peptide corresponding to the C-terminal heptad repeat of HIV-1 gp41, as an anti-AIDS agent. Enfuvirtide and C34, another anti HIV-1 peptide, exert their inhibitory activity by binding to a leucine/isoleucine zipper-like sequence in gp41, thus inhibiting a conformational change of gp41 required for its activation. We suggest that peptides corresponding to the C-terminal heptad repeat of the S2 protein may serve as inhibitors for SARS-CoV entry.


- **Title:** Coronavirus 3CL(pro )proteinase cleavage sites: Possible relevance to SARS virus pathology
- **Authors:** Kiemer, Lars; Lund, Ole; Brunak, Søren; Blom, Nikolaj
- **Publish date/time:** 2004-06-06
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: Despite the passing of more than a year since the first outbreak of Severe Acute Respiratory Syndrome (SARS), efficient counter-measures are still few and many believe that reappearance of SARS, or a similar disease caused by a coronavirus, is not unlikely. For other virus families like the picornaviruses it is known that pathology is related to proteolytic cleavage of host proteins by viral proteinases. Furthermore, several studies indicate that virus proliferation can be arrested using specific proteinase inhibitors supporting the belief that proteinases are indeed important during infection. Prompted by this, we set out to analyse and predict cleavage by the coronavirus main proteinase using computational methods. RESULTS: We retrieved sequence data on seven fully sequenced coronaviruses and identified the main 3CL proteinase cleavage sites in polyproteins using alignments. A neural network was trained to recognise the cleavage sites in the genomes obtaining a sensitivity of 87.0% and a specificity of 99.0%. Several proteins known to be cleaved by other viruses were submitted to prediction as well as proteins suspected relevant in coronavirus pathology. Cleavage sites were predicted in proteins such as the cystic fibrosis transmembrane conductance regulator (CFTR), transcription factors CREB-RP and OCT-1, and components of the ubiquitin pathway. CONCLUSIONS: Our prediction method NetCorona predicts coronavirus cleavage sites with high specificity and several potential cleavage candidates were identified which might be important to elucidate coronavirus pathology. Furthermore, the method might assist in design of proteinase inhibitors for treatment of SARS and possible future diseases caused by coronaviruses. It is made available for public use at our website: .


- **Title:** Proteomics computational analyses suggest that the carboxyl terminal glycoproteins of Bunyaviruses are class II viral fusion protein (beta-penetrenes)
- **Authors:** Garry, Courtney E; Garry, Robert F
- **Publish date/time:** 2004-11-15
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** The Bunyaviridae family of enveloped RNA viruses includes five genuses, orthobunyaviruses, hantaviruses, phleboviruses, nairoviruses and tospoviruses. It has not been determined which Bunyavirus protein mediates virion:cell membrane fusion. Class II viral fusion proteins (beta-penetrenes), encoded by members of the Alphaviridae and Flaviviridae, are comprised of three antiparallel beta sheet domains with an internal fusion peptide located at the end of domain II. Proteomics computational analyses indicate that the carboxyl terminal glycoprotein (Gc) encoded by Sandfly fever virus (SAN), a phlebovirus, has a significant amino acid sequence similarity with envelope protein 1 (E1), the class II fusion protein of Sindbis virus (SIN), an Alphavirus. Similar sequences and common structural/functional motifs, including domains with a high propensity to interface with bilayer membranes, are located collinearly in SAN Gc and SIN E1. Gc encoded by members of each Bunyavirus genus share several sequence and structural motifs. These results suggest that Gc of Bunyaviridae, and similar proteins of Tenuiviruses and a group of Caenorhabditis elegans retroviruses, are class II viral fusion proteins. Comparisons of divergent viral fusion proteins can reveal features essential for virion:cell fusion, and suggest drug and vaccine strategies.

Topic ID 6



- **Title:** Orientation determination by wavelets matching for 3D reconstruction of very noisy electron microscopic virus images
- **Authors:** Saad, Ali Samir
- **Publish date/time:** 2005-03-02
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: In order to perform a 3D reconstruction of electron microscopic images of viruses, it is necessary to determine the orientation (Euler angels) of the 2D projections of the virus. The projections containing high resolution information are usually very noisy. This paper proposes a new method, based on weighted-projection matching in wavelet space for virus orientation determination. In order to speed the retrieval of the best match between projections from a model and real virus particle, a hierarchical correlation matching method is also proposed. RESULTS: A data set of 600 HSV-1 capsid particle images in different orientations was used to test the proposed method. An initial model of about 40 Å resolutions was used to generate projections of an HSV-1 capsid. Results show that a significant improvement, in terms of accuracy and speed, is obtained for the initial orientation estimates of noisy herpes virus images. For the bacteriophage (P22), the proposed method gave the correct reconstruction compared to the model, while the classical method failed to resolve the correct orientations of the smooth spherical P22 viruses. CONCLUSION: This paper introduces a new method for orientation determination of low contrast images and highly noisy virus particles. This method is based on weighted projection matching in wavelet space, which increases the accuracy of the orientations. A hierarchical implementation of this method increases the speed of orientation determination. The estimated number of particles needed for a higher resolution reconstruction increased exponentially. For a 6 Å resolution reconstruction of the HSV virus, 50,000 particles are necessary. The results show that the proposed method reduces the amount of data needed in a reconstruction by at least 50 %. This may result in savings 2 to 3 man-years invested in acquiring images from the microscope and data processing. Furthermore, the proposed method is able to determine orientations for some difficult particles like P22 with accuracy and consistency. Recently a low PH sindbis capsid was determined with the proposed method, where other methods based on the common line fail.


- **Title:** Incorporating indel information into phylogeny estimation for rapidly emerging pathogens
- **Authors:** Redelings, Benjamin D; Suchard, Marc A
- **Publish date/time:** 2007-03-14
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: Phylogenies of rapidly evolving pathogens can be difficult to resolve because of the small number of substitutions that accumulate in the short times since divergence. To improve resolution of such phylogenies we propose using insertion and deletion (indel) information in addition to substitution information. We accomplish this through joint estimation of alignment and phylogeny in a Bayesian framework, drawing inference using Markov chain Monte Carlo. Joint estimation of alignment and phylogeny sidesteps biases that stem from conditioning on a single alignment by taking into account the ensemble of near-optimal alignments. RESULTS: We introduce a novel Markov chain transition kernel that improves computational efficiency by proposing non-local topology rearrangements and by block sampling alignment and topology parameters. In addition, we extend our previous indel model to increase biological realism by placing indels preferentially on longer branches. We demonstrate the ability of indel information to increase phylogenetic resolution in examples drawn from within-host viral sequence samples. We also demonstrate the importance of taking alignment uncertainty into account when using such information. Finally, we show that codon-based substitution models can significantly affect alignment quality and phylogenetic inference by unrealistically forcing indels to begin and end between codons. CONCLUSION: These results indicate that indel information can improve phylogenetic resolution of recently diverged pathogens and that alignment uncertainty should be considered in such analyses.


- **Title:** Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments
- **Authors:** Seemann, Stefan E.; Gorodkin, Jan; Backofen, Rolf
- **Publish date/time:** 2008-10-04
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** Computational methods for determining the secondary structure of RNA sequences from given alignments are currently either based on thermodynamic folding, compensatory base pair substitutions or both. However, there is currently no approach that combines both sources of information in a single optimization problem. Here, we present a model that formally integrates both the energy-based and evolution-based approaches to predict the folding of multiple aligned RNA sequences. We have implemented an extended version of Pfold that identifies base pairs that have high probabilities of being conserved and of being energetically favorable. The consensus structure is predicted using a maximum expected accuracy scoring scheme to smoothen the effect of incorrectly predicted base pairs. Parameter tuning revealed that the probability of base pairing has a higher impact on the RNA structure prediction than the corresponding probability of being single stranded. Furthermore, we found that structurally conserved RNA motifs are mostly supported by folding energies. Other problems (e.g. RNA-folding kinetics) may also benefit from employing the principles of the model we introduce. Our implementation, PETfold, was tested on a set of 46 well-curated Rfam families and its performance compared favorably to that of Pfold and RNAalifold.


- **Title:** Predicting linear B-cell epitopes using string kernels
- **Authors:** EL-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant
- **Publish date/time:** 2008-07-01
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** The identification and characterization of B-cell epitopes play an important role in vaccine design, immunodiagnostic tests, and antibody production. Therefore, computational tools for reliably predicting linear B-cell epitopes are highly desirable. We evaluated Support Vector Machine (SVM) classifiers trained utilizing five different kernel methods using fivefold cross-validation on a homology-reduced data set of 701 linear B-cell epitopes, extracted from Bcipep database, and 701 non-epitopes, randomly extracted from SwissProt sequences. Based on the results of our computational experiments, we propose BCPred, a novel method for predicting linear B-cell epitopes using the subsequence kernel. We show that the predictive performance of BCPred (AUC = 0.758) outperforms 11 SVM-based classifiers developed and evaluated in our experiments as well as our implementation of AAP (AUC = 0.7), a recently proposed method for predicting linear B-cell epitopes using amino acid pair antigenicity. Furthermore, we compared BCPred with AAP and ABCPred, a method that uses recurrent neural networks, using two data sets of unique B-cell epitopes that had been previously used to evaluate ABCPred. Analysis of the data sets used and the results of this comparison show that conclusions about the relative performance of different B-cell epitope prediction methods drawn on the basis of experiments using data sets of unique B-cell epitopes are likely to yield overly optimistic estimates of performance of evaluated methods. This argues for the use of carefully homology-reduced data sets in comparing B-cell epitope prediction methods to avoid misleading conclusions about how different methods compare to each other. Our homology-reduced data set and implementations of BCPred as well as the APP method are publicly available through our web-based server, BCPREDS, at: http://ailab.cs.iastate.edu/bcpreds/.


- **Title:** Identification of protein functions using a machine-learning approach based on sequence-derived properties
- **Authors:** Lee, Bum Ju; Shin, Moon Sun; Oh, Young Joon; Oh, Hae Seok; Ryu, Keun Ho
- **Publish date/time:** 2009-08-09
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: Predicting the function of an unknown protein is an essential goal in bioinformatics. Sequence similarity-based approaches are widely used for function prediction; however, they are often inadequate in the absence of similar sequences or when the sequence similarity among known protein sequences is statistically weak. This study aimed to develop an accurate prediction method for identifying protein function, irrespective of sequence and structural similarities. RESULTS: A highly accurate prediction method capable of identifying protein function, based solely on protein sequence properties, is described. This method analyses and identifies specific features of the protein sequence that are highly correlated with certain protein functions and determines the combination of protein sequence features that best characterises protein function. Thirty-three features that represent subtle differences in local regions and full regions of the protein sequences were introduced. On the basis of 484 features extracted solely from the protein sequence, models were built to predict the functions of 11 different proteins from a broad range of cellular components, molecular functions, and biological processes. The accuracy of protein function prediction using random forests with feature selection ranged from 94.23% to 100%. The local sequence information was found to have a broad range of applicability in predicting protein function. CONCLUSION: We present an accurate prediction method using a machine-learning approach based solely on protein sequence properties. The primary contribution of this paper is to propose new PNPRD features representing global and/or local differences in sequences, based on positively and/or negatively charged residues, to assist in predicting protein function. In addition, we identified a compact and useful feature subset for predicting the function of various proteins. Our results indicate that sequence-based classifiers can provide good results among a broad range of proteins, that the proposed features are useful in predicting several functions, and that the combination of our and traditional features may support the creation of a discriminative feature set for specific protein functions.

Topic ID 7



- **Title:** Sequence requirements for RNA strand transfer during nidovirus discontinuous subgenomic RNA synthesis
- **Authors:** Pasternak, Alexander O.; van den Born, Erwin; Spaan, Willy J.M.; Snijder, Eric J.
- **Publish date/time:** 2001-12-17
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** Nidovirus subgenomic mRNAs contain a leader sequence derived from the 5′ end of the genome fused to different sequences (‘bodies’) derived from the 3′ end. Their generation involves a unique mechanism of discontinuous subgenomic RNA synthesis that resembles copy-choice RNA recombination. During this process, the nascent RNA strand is transferred from one site in the template to another, during either plus or minus strand synthesis, to yield subgenomic RNA molecules. Central to this process are transcription-regulating sequences (TRSs), which are present at both template sites and ensure the fidelity of strand transfer. Here we present results of a comprehensive co-variation mutagenesis study of equine arteritis virus TRSs, demonstrating that discontinuous RNA synthesis depends not only on base pairing between sense leader TRS and antisense body TRS, but also on the primary sequence of the body TRS. While the leader TRS merely plays a targeting role for strand transfer, the body TRS fulfils multiple functions. The sequences of mRNA leader–body junctions of TRS mutants strongly suggested that the discontinuous step occurs during minus strand synthesis.


- **Title:** Synthesis of a novel hepatitis C virus protein by ribosomal frameshift
- **Authors:** Xu, Zhenming; Choi, Jinah; Yen, T.S.Benedict; Lu, Wen; Strohecker, Anne; Govindarajan, Sugantha; Chien, David; Selby, Mark J.; Ou, Jing‐hsiung
- **Publish date/time:** 2001-07-16
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** Hepatitis C virus (HCV) is an important human pathogen that affects ∼100 million people worldwide. Its RNA genome codes for a polyprotein, which is cleaved by viral and cellular proteases to produce at least 10 mature viral protein products. We report here the discovery of a novel HCV protein synthesized by ribosomal frameshift. This protein, which we named the F protein, is synthesized from the initiation codon of the polyprotein sequence followed by ribosomal frameshift into the −2/+1 reading frame. This ribosomal frameshift requires only codons 8–14 of the core protein‐coding sequence, and the shift junction is located at or near codon 11. An F protein analog synthesized in vitro reacted with the sera of HCV patients but not with the sera of hepatitis B patients, indicating the expression of the F protein during natural HCV infection. This unexpected finding may open new avenues for the development of anti‐HCV drugs.


- **Title:** Discontinuous and non-discontinuous subgenomic RNA transcription in a nidovirus
- **Authors:** van Vliet, A.L.W.; Smits, S.L.; Rottier, P.J.M.; de Groot, R.J.
- **Publish date/time:** 2002-12-01
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** Arteri-, corona-, toro- and roniviruses are evolutionarily related positive-strand RNA viruses, united in the order Nidovirales. The best studied nidoviruses, the corona- and arteriviruses, employ a unique transcription mechanism, which involves discontinuous RNA synthesis, a process resembling similarity-assisted copy-choice RNA recombination. During infection, multiple subgenomic (sg) mRNAs are transcribed from a mirror set of sg negative-strand RNA templates. The sg mRNAs all possess a short 5′ common leader sequence, derived from the 5′ end of the genomic RNA. The joining of the non-contiguous ‘leader’ and ‘body’ sequences presumably occurs during minus-strand synthesis. To study whether toroviruses use a similar transcription mechanism, we characterized the 5′ termini of the genome and the four sg mRNAs of Berne virus (BEV). We show that BEV mRNAs 3–5 lack a leader sequence. Surprisingly, however, RNA 2 does contain a leader, identical to the 5′-terminal 18 residues of the genome. Apparently, BEV combines discontinuous and non-discontinous RNA synthesis to produce its sg mRNAs. Our findings have important implications for the understanding of the mechanism and evolution of nidovirus transcription.


- **Title:** Prokaryotic-style frameshifting in a plant translation system: conservation of an unusual single-tRNA slippage event
- **Authors:** Napthine, Sawsan; Vidakovic, Marijana; Girnary, Roseanne; Namy, Olivier; Brierley, Ian
- **Publish date/time:** 2003-08-01
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** Ribosomal frameshifting signals are found in mobile genetic elements, viruses and cellular genes of prokaryotes and eukaryotes. Typically they comprise a slippery sequence, X XXY YYZ, where the frameshift occurs, and a stimulatory mRNA element. Here we studied the influence of host translational environment and the identity of slippery sequence-decoding tRNAs on the frameshift mechanism. By expressing candidate signals in Escherichia coli, and in wheatgerm extracts depleted of endogenous tRNAs and supplemented with prokaryotic or eukaryotic tRNA populations, we show that when decoding AAG in the ribosomal A-site, E.coli tRNA(Lys) promotes a highly unusual single-tRNA slippage event in both prokaryotic and eukaryotic ribosomes. This event does not appear to require slippage of the adjacent P-site tRNA, although its identity is influential. Conversely, asparaginyl-tRNA promoted a dual slippage event in either system. Thus, the tRNAs themselves are the main determinants in the selection of single- or dual-tRNA slippage mechanisms. We also show for the first time that prokaryotic tRNA(Asn) is not inherently ‘unslippery’ and induces efficient frameshifting when in the context of a eukaryotic translation system.


- **Title:** SseG, a virulence protein that targets Salmonella to the Golgi network
- **Authors:** Salcedo, Suzana P.; Holden, David W.
- **Publish date/time:** 2003-10-01
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** Intracellular replication of the bacterial pathogen Salmonella enterica occurs in membrane-bound compartments called Salmonella-containing vacuoles (SCVs). Maturation of the SCV has been shown to occur by selective interactions with the endocytic pathway. We show here that after invasion of epithelial cells and migration to a perinuclear location, the majority of SCVs become surrounded by membranes of the Golgi network. This process is dependent on the Salmonella pathogenicity island 2 type III secretion system effector SseG. In infected cells, SseG was associated with the SCV and peripheral punctate structures. Only bacterial cells closely associated with the Golgi network were able to multiply; furthermore, mutation of sseG or disruption of the Golgi network inhibited intracellular bacterial growth. When expressed in epithelial cells, SseG co-localized extensively with markers of the trans-Golgi network. We identify a Golgi-targeting domain within SseG, and other regions of the protein that are required for localization of bacteria to the Golgi network. Therefore, replication of Salmonella in epithelial cells is dependent on simultaneous and selective interactions with both endocytic and secretory pathways.

Topic ID 8



- **Title:** The involvement of survival signaling pathways in rubella-virus induced apoptosis
- **Authors:** Cooray, Samantha; Jin, Li; Best, Jennifer M
- **Publish date/time:** 2005-01-04
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** Rubella virus (RV) causes severe congenital defects when acquired during the first trimester of pregnancy. RV cytopathic effect has been shown to be due to caspase-dependent apoptosis in a number of susceptible cell lines, and it has been suggested that this apoptotic induction could be a causal factor in the development of such defects. Often the outcome of apoptotic stimuli is dependent on apoptotic, proliferative and survival signaling mechanisms in the cell. Therefore we investigated the role of phosphoinositide 3-kinase (PI3K)-Akt survival signaling and Ras-Raf-MEK-ERK proliferative signaling during RV-induced apoptosis in RK13 cells. Increasing levels of phosphorylated ERK, Akt and GSK3β were detected from 24–96 hours post-infection, concomitant with RV-induced apoptotic signals. Inhibition of PI3K-Akt signaling reduced cell viability, and increased the speed and magnitude of RV-induced apoptosis, suggesting that this pathway contributes to cell survival during RV infection. In contrast, inhibition of the Ras-Raf-MEK-ERK pathway impaired RV replication and growth and reduced RV-induced apoptosis, suggesting that the normal cellular growth is required for efficient virus production.


- **Title:** Recombinant Tula hantavirus shows reduced fitness but is able to survive in the presence of a parental virus: analysis of consecutive passages in a cell culture
- **Authors:** Plyusnina, Angelina; Plyusnin, Alexander
- **Publish date/time:** 2005-02-22
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** Tula hantavirus carrying recombinant S RNA segment (recTULV) grew in a cell culture to the same titers as the original cell adapted variant but presented no real match to the parental virus. Our data showed that the lower competitiveness of recTULV could not be increased by pre-passaging in the cell culture. Nevertheless, the recombinant virus was able to survive in the presence of the parental virus during five consecutive passages. The observed survival time seems to be sufficient for transmission of newly formed recombinant hantaviruses in nature.


- **Title:** CXCR2 is critical for dsRNA-induced lung injury: relevance to viral lung infection
- **Authors:** Londhe, Vedang A; Belperio, John A; Keane, Michael P; Burdick, Marie D; Xue, Ying Ying; Strieter, Robert M
- **Publish date/time:** 2005-05-28
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: Respiratory viral infections are characterized by the infiltration of leukocytes, including activated neutrophils into the lung that can lead to sustained lung injury and potentially contribute to chronic lung disease. Specific mechanisms recruiting neutrophils to the lung during virus-induced lung inflammation and injury have not been fully elucidated. Since CXCL1 and CXCL2/3, acting through CXCR2, are potent neutrophil chemoattractants, we investigated their role in dsRNA-induced lung injury, where dsRNA (Poly IC) is a well-described synthetic agent mimicking acute viral infection. METHODS: We used 6–8 week old female BALB/c mice to intratracheally inject either single-stranded (ssRNA) or double-stranded RNA (dsRNA) into the airways. The lungs were then harvested at designated timepoints to characterize the elicited chemokine response and resultant lung injury following dsRNA exposure as demonstrated qualititatively by histopathologic analysis, and quantitatively by FACS, protein, and mRNA analysis of BAL fluid and tissue samples. We then repeated the experiments by first pretreating mice with an anti-PMN or corresponding control antibody, and then subsequently pretreating a separate cohort of mice with an anti-CXCR2 or corresponding control antibody prior to dsRNA exposure. RESULTS: Intratracheal dsRNA led to significant increases in neutrophil infiltration and lung injury in BALB/c mice at 72 h following dsRNA, but not in response to ssRNA (Poly C; control) treatment. Expression of CXCR2 ligands and CXCR2 paralleled neutrophil recruitment to the lung. Neutrophil depletion studies significantly reduced neutrophil infiltration and lung injury in response to dsRNA when mice were pretreated with an anti-PMN monoclonal Ab. Furthermore, inhibition of CXCR2 ligands/CXCR2 interaction by pretreating dsRNA-exposed mice with an anti-CXCR2 neutralizing Ab also significantly attenuated neutrophil sequestration and lung injury. CONCLUSION: These findings demonstrate that CXC chemokine ligand/CXCR2 biological axis is critical during the pathogenesis of dsRNA-induced lung injury relevant to acute viral infections.


- **Title:** Persistence of lung inflammation and lung cytokines with high-resolution CT abnormalities during recovery from SARS
- **Authors:** Wang, Chun-Hua; Liu, Chien-Ying; Wan, Yung-Liang; Chou, Chun-Liang; Huang, Kuo-Hsiung; Lin, Horng-Chyuan; Lin, Shu-Min; Lin, Tzou-Yien; Chung, Kian Fan; Kuo, Han-Pin
- **Publish date/time:** 2005-05-11
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** BACKGROUND: During the acute phase of severe acute respiratory syndrome (SARS), mononuclear cells infiltration, alveolar cell desquamation and hyaline membrane formation have been described, together with dysregulation of plasma cytokine levels. Persistent high-resolution computed tomography (HRCT) abnormalities occur in SARS patients up to 40 days after recovery. METHODS: To determine further the time course of recovery of lung inflammation, we investigated the HRCT and inflammatory profiles, and coronavirus persistence in bronchoalveolar lavage fluid (BALF) of 12 patients at recovery at 60 and 90 days. RESULTS: At 60 days, compared to normal controls, SARS patients had increased cellularity of BALF with increased alveolar macrophages (AM) and CD8 cells. HRCT scores were increased and correlated with T-cell numbers and their subpopulations, and inversely with CD4/CD8 ratio. TNF-α, IL-6, IL-8, RANTES and MCP-1 levels were increased. Viral particles in AM were detected by electron microscopy in 7 of 12 SARS patients with high HRCT score. On day 90, HRCT scores improved significantly in 10 of 12 patients, with normalization of BALF cell counts in 6 of 12 patients with repeat bronchoscopy. Pulse steroid therapy and prolonged fever were two independent factors associated with delayed resolution of pneumonitis, in this non-randomized, retrospective analysis. CONCLUSION: Resolution of pneumonitis is delayed in some patients during SARS recovery and may be associated with delayed clearance of coronavirus, Complete resolution may occur by 90 days or later.


- **Title:** PRED(BALB/c): a system for the prediction of peptide binding to H2(d) molecules, a haplotype of the BALB/c mouse
- **Authors:** Zhang, Guang Lan; Srinivasan, Kellathur N.; Veeramani, Anitha; August, J. Thomas; Brusic, Vladimir
- **Publish date/time:** 2005-07-01
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** PRED(BALB/c) is a computational system that predicts peptides binding to the major histocompatibility complex-2 (H2(d)) of the BALB/c mouse, an important laboratory model organism. The predictions include the complete set of H2(d) class I (H2-K(d), H2-L(d) and H2-D(d)) and class II (I-E(d) and I-A(d)) molecules. The prediction system utilizes quantitative matrices, which were rigorously validated using experimentally determined binders and non-binders and also by in vivo studies using viral proteins. The prediction performance of PRED(BALB/c) is of very high accuracy. To our knowledge, this is the first online server for the prediction of peptides binding to a complete set of major histocompatibility complex molecules in a model organism (H2(d) haplotype). PRED(BALB/c) is available at .

Topic ID 9



- **Title:** Discovering human history from stomach bacteria
- **Authors:** Disotell, Todd R
- **Publish date/time:** 2003-04-28
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** Recent analyses of human pathogens have revealed that their evolutionary histories are congruent with the hypothesized pattern of ancient and modern human population migrations. Phylogenetic trees of strains of the bacterium Helicobacter pylori and the polyoma JC virus taken from geographically diverse groups of human beings correlate closely with relationships of the populations in which they are found.


- **Title:** Viral Discovery and Sequence Recovery Using DNA Microarrays
- **Authors:** Wang, David; Urisman, Anatoly; Liu, Yu-Tsueng; Springer, Michael; Ksiazek, Thomas G; Erdman, Dean D; Mardis, Elaine R; Hickenbotham, Matthew; Magrini, Vincent; Eldred, James; Latreille, J. Phillipe; Wilson, Richard K; Ganem, Don; DeRisi, Joseph L
- **Publish date/time:** 2003-11-17
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** Because of the constant threat posed by emerging infectious diseases and the limitations of existing approaches used to identify new pathogens, there is a great demand for new technological methods for viral discovery. We describe herein a DNA microarray-based platform for novel virus identification and characterization. Central to this approach was a DNA microarray designed to detect a wide range of known viruses as well as novel members of existing viral families; this microarray contained the most highly conserved 70mer sequences from every fully sequenced reference viral genome in GenBank. During an outbreak of severe acute respiratory syndrome (SARS) in March 2003, hybridization to this microarray revealed the presence of a previously uncharacterized coronavirus in a viral isolate cultivated from a SARS patient. To further characterize this new virus, approximately 1 kb of the unknown virus genome was cloned by physically recovering viral sequences hybridized to individual array elements. Sequencing of these fragments confirmed that the virus was indeed a new member of the coronavirus family. This combination of array hybridization followed by direct viral sequence recovery should prove to be a general strategy for the rapid identification and characterization of novel viruses and emerging infectious disease.


- **Title:** Virology on the Internet: the time is right for a new journal
- **Authors:** Garry, Robert F
- **Publish date/time:** 2004-08-26
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** Virology Journal is an exclusively on-line, Open Access journal devoted to the presentation of high-quality original research concerning human, animal, plant, insect bacterial, and fungal viruses. Virology Journal will establish a strategic alternative to the traditional virology communication process.


- **Title:** Locked nucleic acid (LNA) mediated improvements in siRNA stability and functionality
- **Authors:** Elmén, Joacim; Thonberg, Håkan; Ljungberg, Karl; Frieden, Miriam; Westergaard, Majken; Xu, Yunhe; Wahren, Britta; Liang, Zicai; Ørum, Henrik; Koch, Troels; Wahlestedt, Claes
- **Publish date/time:** 2005-01-14
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** Therapeutic application of the recently discovered small interfering RNA (siRNA) gene silencing phenomenon will be dependent on improvements in molecule bio-stability, specificity and delivery. To address these issues, we have systematically modified siRNA with the synthetic RNA-like high affinity nucleotide analogue, Locked Nucleic Acid (LNA). Here, we show that incorporation of LNA substantially enhances serum half-life of siRNA's, which is a key requirement for therapeutic use. Moreover, we provide evidence that LNA is compatible with the intracellular siRNA machinery and can be used to reduce undesired, sequence-related off-target effects. LNA-modified siRNAs targeting the emerging disease SARS, show improved efficiency over unmodified siRNA on certain RNA motifs. The results from this study emphasize LNA's promise in converting siRNA from a functional genomics technology to a therapeutic platform.


- **Title:** Detection and characterization of horizontal transfers in prokaryotes using genomic signature
- **Authors:** Dufraigne, Christine; Fertil, Bernard; Lespinats, Sylvain; Giron, Alain; Deschavanne, Patrick
- **Publish date/time:** 2005-01-13
- **Linked references:** 0
- **Linked referenced by:** 0
- **Abstract:** Horizontal DNA transfer is an important factor of evolution and participates in biological diversity. Unfortunately, the location and length of horizontal transfers (HTs) are known for very few species. The usage of short oligonucleotides in a sequence (the so-called genomic signature) has been shown to be species-specific even in DNA fragments as short as 1 kb. The genomic signature is therefore proposed as a tool to detect HTs. Since DNA transfers originate from species with a signature different from those of the recipient species, the analysis of local variations of signature along recipient genome may allow for detecting exogenous DNA. The strategy consists in (i) scanning the genome with a sliding window, and calculating the corresponding local signature (ii) evaluating its deviation from the signature of the whole genome and (iii) looking for similar signatures in a database of genomic signatures. A total of 22 prokaryote genomes are analyzed in this way. It has been observed that atypical regions make up ∼6% of each genome on the average. Most of the claimed HTs as well as new ones are detected. The origin of putative DNA transfers is looked for among ∼12 000 species. Donor species are proposed and sometimes strongly suggested, considering similarity of signatures. Among the species studied, Bacillus subtilis, Haemophilus Influenzae and Escherichia coli are investigated by many authors and give the opportunity to perform a thorough comparison of most of the bioinformatics methods used to detect HTs.

---

In [1]:
# tell nbdev to generate library from notebooks
from nbdev.export import *
notebook2script()

Converted 00_downloader.ipynb.
Converted 01_references.ipynb.
Converted 02_representations_and_lda.ipynb.
Converted 03_hierarchical_topic_modelling.ipynb.
Converted 99_risotto_gui.ipynb.
Converted index.ipynb.


In [5]:
# this code is here for cosmetic reasons
from IPython.core.display import HTML
from urllib.request import urlopen
HTML(urlopen('https://raw.githubusercontent.com/lmarti/jupyter_custom/master/custom.include').read().decode('utf-8'))

---