Skip to content

Cov phylogenetic tree quality by monophylicity

Robert Edgar edited this page Jul 4, 2020 · 3 revisions

Background

The goal is to assess agreement between a tree and Cov taxonomy by measuring the degree of monophylicity. In an optimal tree, all species would be monophyletic. Given a number of candidate trees, we would pick the tree with most monophyletic species, or possibly a tree where more important species are monophyletic (e.g SARS).

Example report

     TaxId   Seqs   Taxa  Mono  Names   
     28295    159      1  mono  Porcine epidemic diarrhea virus   
    694014    283      1  mono  Avian coronavirus   
...   
    694007      4      2  POLY  Tylonycteris bat coronavirus HKU4, Tylonycteris pachypus bat coronavirus HKU4-related   
   1335626     12      3  POLY  Middle East respiratory syndrome-related coronavirus, Bat coronavirus, Hypsugo bat coronavirus HKU25   
     11137      8      2  POLY  Human coronavirus 229E, Rousettus aegyptiacus bat coronavirus 229E-related   
51 taxa, 42 mono, 9 polyphyletic   

Data and code download

s3://serratus-public/rce/monophy/

Usage

See runme.bash for an example.

To run the analysis, you need a rooted tree in usearch tabbed format.

Root placement is not important, if you have an unrooted tree you can use any convenient method. With raxml:

raxml -f I -m GTRCAT -t $intree -n rooted

To convert a rooted Newick tree to usearch tabbed:

usearch -tree_cvt tree.newick -tabbedout tree.tsv

Clone this wiki locally