Skip to content

A FAQ like Introduction to RogueNaRok

Alexis Stamatakis edited this page Jul 8, 2017 · 4 revisions

RogueNaRok is an algorithm for the identification of rogue taxa in a tree set.

Table of Contents

How do I install it?

Download the code and run the "make" command.

For a parallel version of the RogueNaRok algorithm, use "make mode=parallel". Note that, the parallel version requires the pthreads-library. Running one of the programs without arguments will trigger the help message.

What are Rogue Taxa?

Rogue taxa are wandering taxa, that assume plenty phylogenetic positions in a set of bootstrap (or Bayesian sampled) trees. Usually ambivalent or insufficient phylogenetic signal is the reason for this phenomenon. Thereby, they decrease resolution and/or support in the consensus tree. Removing (resp. pruning) them from a tree set may produce a more informative consensus tree.

What is the input data?

The input needs to be a set of fully bifurcating unrooted trees in Newick format contained in one single tree file.

What is RogueNaRok not?

This is important to emphasize: RogueNaRok identifies rogue taxa that fit the definition above. Every taxon that has a detrimental effect on the support in a consensus tree is considered a rogue taxon and therefore can be detected by RogueNaRok based on the bootstrap trees. Other classes of nasty taxa that produce effects like long branch attraction or problems with convergent evolution will not be detected by RogueNaRok, if they do not have a negative effect on the support.

What are specific features of your implementation/ algorithm?

  • Optimize either support or resolution of the resulting pruned consensus tree.
  • Optimize with respect to a minimum frequency threshold for the bipartitions in the pruned consensus tree ranging between 50% (majority consensus, our default) to 100% (strict consensus). Alternatively, you can optimize the majority rule extended consensus (MRE) tree or the bipartition support of a tree collection drawn on a maximum likelihood estimate tree.
  • Explicitly forbid to consider certain taxa for pruning as rogue taxa.
  • Dropset size: specify the number of taxa simultaneously considered for pruning in each iteration of the algorithm. The default is 1, since this is a particularly expensive operation. However, if you specify the dropset size as "number of taxa" - 1, then the algorithm will find the most informative (resp. optimal) pruned consensus tree with respect to the given parameters.
  • Ready for multi-core machines: expensive phases of the algorithm can be executed in parallel on shared-memory machines. Advantageous for dropset sizes > 1, MRE tree optimization and data sets with more than 2,000 taxa.

Relevant Publications

  • The RogueNaRok algorithm as implemented here, was accepted in Systematic Biology:
Andre J. Aberer, Denis Krompaß, Alexandros Stamatakis: Pruning Rogue Taxa Improves Phylogenetic Accuracy: An Efficient Algorithm and Webservice. Systematic Biology. 2012.
  • It is an improved version of an algorithm published on IEEE BIBM 2011:
Andre J. Aberer, Alexandros Stamatakis: "A Simple and Accurate Method for Rogue Taxon Identification", IEEE BIBM 2011, Atlanta, Georgia, USA, November 2011.
  • The rogue taxon identification algorithm implemented in RAxML is published in TCBB:
N.D. Pattengale, A.J. Aberer, K.M. Swenson, A. Stamatakis, B.M.E. Moret : "Uncovering Hidden Phylogenetic Consensus in Large Datasets". IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2011.

What programs does the suite comprise?

  • RogueNaRok: our algorithm with various variations
  • rnr-tii: the taxonomic instability index as known from Mesquite by Maddison.
  • rnr-lsi: the leaf stability index (all three measures for UNROOTED trees) by Thorley.
  • rnr-prune: a simple program to prune a tree collection and/or a single ML tree
  • rnr-mast: computes a unrooted maximum agreement subtree of a tree set.

What is the meaning of the name?

The name is an allusion to Ragnarok, the twilight/doom of the Norse gods. In mythology, a renewed and fertile world emerges from the catastrophe. This aspect reflects our hope that phylogenies pruned from rogues as suggested by our algorithm are more informative than before.

Acknowledgement

Some functions (and even more important a lot of implementation concepts) are derived or included from RAxML, a phylogenetic tree inference software under Maximum Likelihood by Alexandros Stamatakis.