Skip to content
This repository has been archived by the owner on Nov 9, 2023. It is now read-only.

add pipeline for inserting OTU representative sequences into an existing tree #1499

Open
gregcaporaso opened this issue Apr 10, 2014 · 0 comments
Milestone

Comments

@gregcaporaso
Copy link
Contributor

This would be really useful, but the version that was in QIIME insert_seqs_into_tree.py wasn't sufficiently tested, and wasn't being widely used. We should re-write this (or begin from the code that was deleted) with sufficient documentation and testing.

From @clozupone by email:

I never use this code but I often do this - except using ARB. I find it to be very useful functionality for downstream analysis. I actually kinda like using ARB for this because I can just keep an arb database that contains e.g. the 16S from genomes that I would want to relate OTUs to + significant OTUs that I observed in other datasets, and then I can just add my new OTUs to the same tree and see where they fit in and then visualize the results within the same program. (Incidently when I want to then make a "pretty" tree displaying the results like I did in my recent HIV Cell Host and Microbe Paper, I have been exporting the tree from ARB and using topiary explorer). I never really tried using this code in QIIME for the insertion step because I have been using ARB for years and was used to doing it this way. Disadvantages to ARB, however, are that many people find it difficult to use and the actual algorithm used in their "parsimony insertion" tool is a bit of a black box.

On another note, when I first started doing the sort of meta-analysis that we do now with the database, it was all done in ARB using parsimony insertion. Currently we just look if I sequence matches a reference and throw it away if it does not match a reference closely, rather than keeping it and adding it to the tree with an appropriate branch as insertion methods allow you to do. This has worked fine for well sampled environments like the human gut, but has limitations for other environments where most of the diversity gets tossed because the sequences are too different. Further development of this pipeline could allow metaanalysis to work better for such environments. I would see this as the strongest advantage of further developing and working this code into QIIME/QIITA pipelines if someone wanted to take this one. Speed may be a consideration. I've messed around with this a bit in RAXML (years ago) and at the speed of their insertion methods seemed prohibitively slow!

Just my 2 cents.
Cathy

@gregcaporaso gregcaporaso added this to the QIIME 2.0.0 milestone Apr 10, 2014
gregcaporaso added a commit to gregcaporaso/qiime that referenced this issue Apr 10, 2014
… were only used by that code. see biocore#1499 for additional information.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant