-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discuss: TIGER implementations are confusing #103
Comments
I vote for option 1 as well. The python / cython code for tigger is available in a separate repository here: https://github.com/brettc/tigger in any case. This would be a good starting place for future Tiger/python projects. A comment though: I think the reason to remove it is that the entropy stuff works better. I don't think a good reason to remove it is because: "users find compiling C hard". We already distribute compiled C code (raxml and phyml). If we didn't have the entropy solution, we'd simply have to distribute some more compiled c code (which, of course, would make our lives more difficult). I should add that compiling tigger is, in fact, very easy. You just need to right dependencies installed. The hard bit is making sure you have all the right dependencies installed (ie. the reason Anaconda is so good). Cheers, On 19 May 2016, 9:55 AM +1200, roblanfnotifications@github.com, wrote:
|
Fair enough. To continue the argument, I would say that the compiled C code we R On 19 May 2016 at 08:35, Brett Calcott notifications@github.com wrote:
Rob Lanfear phone: +61 (0)2 9850 8204 |
Yep!
|
Option 1. Whatever streamlines maintenance and support. |
I think we definitely need to simplify things with the commands (maybe just Paul On Thu, May 19, 2016 at 11:46 AM April Wright notifications@github.com
|
I think Paul made a good suggestion. Best Am 20.05.2016 um 14:53 schrieb Paul Frandsen notifications@github.com:
Dr. Christoph Mayer Zoologisches Forschungsmuseum Alexander Koenig
Stiftung des öffentlichen Rechts; Direktor: Prof. J. W. Wägele |
Hey @pbfrandsen do you have your pure python TIGER code running and tested yet? If so, can you pull in changes to master, then submit a pull request to master? Once we have it in there and working, I'll clean up the C code issues as follows:
|
Hi Rob, It's all implemented and running fine. Just need to write a couple of Paul sent from my mobile phone, apologies for any typos Hey @pbfrandsen https://github.com/pbfrandsen do you have your pure Once we have it in there and working, I'll clean up the C code issues as
— |
Done. Entropy is now the only option for DNA and protein - TIGER is just too slow, especially since lots of people have datasets with of the order of ~1m sites these days. Also Entropy seems to just work better. Morphology can use entropy (default) or TIGER ('--kmeans tiger' at the command line). |
Hey @brettc @pbfrandsen @wrightaprilm,
Here's something to figure out. Right now we have THREE implementations of TIGER, and this is confusing. They are:
Right now these can all be called from the commandline with options like
--kmeans tiger
or--kmeans fast_tiger
. It's kinda confusing. My question is, what is the best thing for users here? We don't officially support TIGER for DNA or protein any more (entropy is just as good, and ~infinitely quicker), but TIGER might still be useful for morphology datasets, which are much smaller.Here are two options, let me know which you prefer, or if there are others you think are better.
Option 1
Delete all C code, retain Paul's pure python implementation, and only have TIGER work with morphology (these datasets are small, and pure python is OK here). If users call --kmeans tiger from the commandline with DNA or protein, they get back an error.
Pros: v. simple for users, and for us to maintain.
Cons: users don't get to do TIGER rates on DNA/Protein any more (do they care?)
Option 2
Retain one of the C implementations (in which case Brett/Paul need to be willing to maintain it if there are bugs, which of course there aren't, and also answer inevitable installation/compilation questions on the forum), and use that for DNA/Protein, but pure python for morphology (so that we avoid problems in the most common use case).
My vote is for Option 1. Because it's the simplest, and doesn't require users to ever compile C code, which is I think a step too far for most users, and will just put people off. We can keep the code in place for running fastTIGER and TIGGER, but just remove the options to do so, and mentions of them in the docs.
Thoughts?
R
The text was updated successfully, but these errors were encountered: