Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch between tree and fasta internal node labels #16

Open
krdav opened this issue Jul 15, 2017 · 2 comments
Open

Mismatch between tree and fasta internal node labels #16

krdav opened this issue Jul 15, 2017 · 2 comments

Comments

@krdav
Copy link

krdav commented Jul 15, 2017

Dear PRANK developers,

As the title indicates I am experiencing that there is a mismatch of the labels between the tree and fasta file containing the ancestral sequences. The following picture shows this nicely:
screen shot 2017-07-13 at 21 19 17

These are the raw output files from prank the only thing I did was to replace “#” flanking the internal node labels with “A” so the tree could be viewed in FigTree. Notice “seq8” which is in 1e-04 distance to A21A (originally #21#) and clearly a sampled ancestor, but searching for seq8‘s sequence reveals that it is not identical to A21A as expected. What triggers me to think that the labels are shuffled is that another ancestor, A18A, is in identical to seq8 - and that this is reoccurring for similar examples.

I have tried both version v.170427 and v.150803 with the same result. Here is the command I used to run PRANK:
prank -d=sequences.fasta -o=test -quiet -showtree -showanc -showevents -DNA -f=fasta
cat sequences.best.anc.dnd | tr '#' 'A' > sequences.best.anc.dnd.tree <-- To enable view in FigTree

Here are the input sequences:
sequences.fasta.zip

@metasoarous
Copy link

I can confirm this behavior.

@joelarmstrong
Copy link

joelarmstrong commented May 31, 2018

I think this may be a problem with PRANK's fallback inference when the bppancestor call fails. Unfortunately the bppancestor call always silently fails, at least with my version, because the bppancestor commandline from PRANK uses "NHX" format instead of "Nhx".

As a really, really hacky workaround, first enable debug logging so you know whether bppancestor was used or not:

diff --git a/src/prank.h b/src/prank.h
index dd3ac81..8da123e 100644
--- a/src/prank.h
+++ b/src/prank.h
@@ -33,7 +33,7 @@ void printHelp(bool complete);

 int version;

-int NOISE = 0;
+int NOISE = 3;

 /********* input/output: **********/

You'll probably start to see something like:

BppAncestor: Tree format 'NHX' unknown.
BppAncestor not used. Inferring approximate ancestral sequences

Then this patch should get the bppancestor call to succeed, which should cause the sequences to stop getting swapped:

diff --git a/src/bppancestors.cpp b/src/bppancestors.cpp
index 60c5b95..0fbc31a 100644
--- a/src/bppancestors.cpp
+++ b/src/bppancestors.cpp
@@ -214,7 +214,7 @@ bool BppAncestors::inferAncestors(AncestralNode *root,map<string,string> *aseqs,

     stringstream command;
     command << bppdistpath<<"bppancestor input.sequence.file="<<f_name.str()<<" input.sequence.format=Fasta input.sequence.sites_to_use=all input.tree.file="<<t_name.str()<<
-            " input.tree.format=NHX input.sequence.max_gap_allowed=100% initFreqs=observed output.sequence.file="<<o_name.str()<<" output.sequence.format=Phylip";
+            " input.tree.format=Nhx input.sequence.max_gap_allowed=100% initFreqs=observed output.sequence.file="<<o_name.str()<<" output.sequence.format=Phylip";
     if(!isDna)
         command << " alphabet=Protein model=WAG01";
     else

This works for me with BPPSuite v. 2.4.0. If all goes well you should see:

BppAncestor: BppAncestor's done. Bye.
BppAncestor: Total execution time: 0.000000d, 0.000000h, 0.000000m, 0.000000s.
BppAncestor done

I'll try to put a PR together if I can track down what's wrong with the "approximate" reconstruction code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants