-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read Nexus tree or tree/matrix files #89
Comments
In the attached ZIP is a simple tree in Nexus and a corresponding character matrix. We need to be able to add reading of this type to Arbor. A lot of existing packages will output in this format. |
I know we have simple Nexus tree reading, but this format is complex. There is a very complete C++ implementation of the NEXUS spec available here. maybe we can use this to parse to our intermediate tree representation: |
Flow currently assumes nexus file extensions to be trees. This is not correct. Nexus is a file type which can (and often does) contain either trees, matrices, or both in a single file. Multiple trees and multiple matrices can be stored in a single Nexus file. Reading Nexus successfully is fairly critical for widespread adoption of Arbor. |
I can take a look. It seems that a new "trees_tables" type is appropriate for Nexus files. |
Thanks. This isn’t urgent, but I’d like to work on this over the next few weeks/months.
|
It is clear that there can be zero, one, or more trees in a nexus file, and it is clear that there can be zero or one matrices. What is not clear is whether there can be more than one matrix (or if in practice this ever happens). This paper seems to document the nexus format better than anything else I've seen http://sysbio.oxfordjournals.org/content/46/4/590.full.pdf. To do this right we should have a collection of nexus files of all shapes and sizes and perform testing on all of them to ensure they are all supported. If there can be any number of trees or tables, a few workflows might make sense. I prefer a |
I agree to this approach of having the combined format and selector steps in a workflow. I am working with David Maddison this week. I'll ask him for samples and how many trees / matrices are allowed per file.
|
Nexus files are used often in phylogenetics. Instead of having to support our own parsers, we should adopt mature parsers if they exist. The parser below handles Nexus and Newick files into R with more reliability than ape, and uses the NCL (Nexus class library).
http://francoismichonneau.net/2014/12/rncl/
The text was updated successfully, but these errors were encountered: