New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem w/ branch len estimate with closely related leaves #134
Comments
Further testing gives me the impression that (1) this does not always occur given the same input and (2) only occured when I add the --confidence flag to treetime. |
it this run a tree with four leaves with some identical dates/branch lengths? then it is likely a numerical instability when trying to invert a singular matrix. |
Yes, this is a larger tree (20+ leaves) but 3 of them are identical in their SNV alignment, but the dates are different. Is there a way around this instability, besides manually clipping the corresponding branch values to 0? The dates should help resolve polytomies, right? |
could you send me these data. I can't quite explain why this might happen and it would be good to fix. |
which data do you need? the alignment, dates, undated tree -- anything else? |
yes, those are what I would need. |
I think I might be having a similar error (if not I can open a new issue). When estimating date confidences using the marginal likelihood, some nodes will sporadically have very large intervals: Rather than having intervals in the range of 100s of years, these nodes have confidence intervals of +100,000 years. These large intervals are somewhat random, in that rerunning the analyses moves them around. Any thoughts on why this might be occurring and if there's a solution? |
yes, this looks like there is a problem. My hunch is that there is some numerical accuracy problem. |
I was thinking numerical accuracy too. This is a large phylogeny with many small branches (1e-8). Would there be any value in rescaling the branch lengths before (ex. multiply them all by 1e4)? |
I suppose this is a large genome? Does this use a SNP only alignment? Or a vcf file? TreeTime carries around an internal scale that is |
I am having this same (or a similar) issue on a SARS-CoV-2 dataset with roughly 5000 sequences using the flags, however it occurs without the covariation or branch-length-mode flags as well:
I'm using a full alignment. The problem is random and rerunning on the same dataset can generate reasonable confidence intervals, but it happens often enough that it is an issue. Using TreeTime v. 0.80 on Python v3.9. I've attached the treetime output as well as the ML tree and a list of accession numbers (can't share alignment because GISAID data). |
Sorry, just started to pick this up again. All the numbers in the |
In the TreeTime .nexus output I get a huge negative branch len followed by another large on for the corresponding leaves:
Is this a bug or some numerical instability? How could I avoid this?
Thanks a lot!
The text was updated successfully, but these errors were encountered: