Skip to content

Hands On 2: Advanced Search Options

aberer edited this page Nov 24, 2011 · 10 revisions

make sure you downloaded the data from the first tutorial.

Table of Contents

the exclude option

Species172 and Species099 are the first two rogue taxa found in the search on the 150.bs-dataset. Maybe you are specifically interested in the position of those two taxa and you want to avoid pruning them even if this leads to a worse consensus tree. If you provide an exclude file (e.g., exclude.txt) for RogueNaRok, it will not consider the taxa in this file for pruning. Write the taxa into the file, one taxon per line (watch out for superfluous whitespace). Invoke RogueNaRok like this:

    ./RogueNaRok  -i data/150.bs -x exclude.txt  -n runId

searching for rogues in other constructs

The -c parameter allows you to specify various thresholds for the consensus threshold for bipartitions. The parameter range is between -c 50 (MR, default) and -c 100 (strict).

Alternatively, you can use RogueNaRok to determine rogue taxa that affect the support in the (greedily) extended majority rule (MRE) consensus tree with -c MRE. Note that such a rogue taxon search can become expensive quickly, because we have to compute many MRE consensi. Moreover, this procedure may be debatable: the question is, if we find rogue taxa or just identify taxa that are disadvantageous for the greedy MRE algorithm.

It is quite common to draw bipartition support of bootstrap tree onto the maximum likelihood estimate (MLE) tree (as obtained with RAxML's "-f b" option). RogueNaRok can also be used to identify rogues that affect the support values here. For this kind of rogue taxon search, provide a MLE tree with -t. Note that, this search should not be conducted with -c or -b.

Example calls for all three variants:

 ./RogueNaRok  -i data/150.bs -c 100 -n strict

.

 ./RogueNaRok  -i data/150.bs -c MRE -n mre

.

 ./RogueNaRok  -i data/150.bs -n mle -t data/150.tre

Of course, if you conduct these kind of rogue taxon searches, you should use the adequate method to construct a result from the pruned bootstrap trees (e.g., in RAxML use -J STRICT, -J MRE or -f b).

dropset size (expensive)

For a default run (equivalent to -s 1), RogueNaRok searches for the most deleterious rogue taxon, prunes it and searches again for the next most deleterious one. However, there may be rogue taxon constellations that make it necessary to prune sets of two or more taxa (so-called dropsets) at a time to observe improvement in the consensus tree support. That is exactly, what the -s parameter is for. For instance with -s 3, RogueNaRok assess all dropsets with up to 3 taxa and this search may yield a set of rogue taxa with a higher RBIC improvement (i.e., "better" rogues).

However, there are two downsides. Firstly, for values for -s beyond 3 or 4 these searches quickly become prohibitive. This strongly depends on the number of bipartitions in your bootstrap tree set. Secondly, while you may obtain a better result, it is possible that a higher value for -s yields a worse result because of the greedy nature of our algorithm. According to our experience, for MR consensus trees (-c 50, default) a value of -s 2 is the best choice (although improvements are small compared to -s 1). The higher the threshold -c, the higher the necessity for larger dropsets: if you optimize a strict consensus, a -s of 3 or 4 may be advisable.

Example:

     ./RogueNaRok  -i data/150.bs -n mle -s 3 -c 100

modifying the optimality criterion

You can optimize the resolution instead of the support in the consensus tree with -b. For an example of the difference between these two classes of optimality, see the RogueNaRok paper.

Furthermore, you can penalize the loss of taxa via pruning with -L "penalty". The formula for the optimality of a dropset then is modified to

    "support change" - ("penalty" * "taxa in dropset")

If you want to optimize the RIC optimality criterion, use "-L 1.0 -b". For dropset size 1, this is equivalent to truncating the list of rogue taxa manually.

multithreaded RogueNaRok

For datasets with more than 1000 taxa or expensive options (like -s), it may be worthwhile to execute RogueNaRok in parallel. Compile a parallel version of RogueNaRok and execute it in parallel with the -T parameter.

 make mode=parallel clean && make mode=parallel

.

 ./RogueNaRok-parallel  -i data/2000.bs  -n tmp   -T 4

.