Skip to content
This repository has been archived by the owner on Nov 9, 2023. It is now read-only.

use Greengenes 85% aligned reference OTUs instead of (outdated) Greengenes core set as default PyNAST template #1770

Closed
gregcaporaso opened this issue Dec 10, 2014 · 6 comments

Comments

@gregcaporaso
Copy link
Contributor

@wasade did some comparisons about a year ago, and I did some comparisons today (see here, which I'll be updating shortly to contain more test sequences, but I don't expect differences) that show that the trees generated using either the Greengenes 85% aligned reference OTU sequences or the (outdated) Greengenes aligned core set as the reference database are effectively identical.

So, we're going to update to use the Greengenes 13_8 85% OTU aligned sequences as the default template alignment for PyNAST, as it's more recently updated, we have a clearer understanding of how they were created, and the licensing is clearer.

If anyone has concerns about this, please reply here.

@wasade
Copy link
Member

wasade commented Dec 10, 2014

👍

@gregcaporaso
Copy link
Contributor Author

In my updated results looking at ~3000 representative sequences from the Moving Pictures dataset, the correlation coefficient is a bit lower, but still highly significant. We do lose fewer sequences as failing to aligned with GG, so on that measure those results are better...

Thoughts on this?

@wasade
Copy link
Member

wasade commented Dec 10, 2014

If you blast a handful of those seqs, do they hit anything that makes sense
in nt?

On Wed, Dec 10, 2014 at 12:42 PM, Greg Caporaso notifications@github.com
wrote:

In my updated results looking at ~3000 representative sequences from the Moving
Pictures
dataset, the correlation coefficient is a bit lower, but still
highly significant. We do lose fewer sequences as failing to aligned with
GG, so on that measure those results are better...

Thoughts on this?


Reply to this email directly or view it on GitHub
#1770 (comment).

@gregcaporaso
Copy link
Contributor Author

I think you're misunderstanding - sequences that were failing to hit with the (ancient) core set are now hitting the GG 85% reference OTUs, so on the metric of minimizing sequences that fail to align with PyNAST, the GG 85% OTUs are doing better as a template alignment.

I'm asking whether anyone is concerned about the Mantel correlation between the tip-to-tip distance matrices from the resulting trees is only ~0.72.

@wasade
Copy link
Member

wasade commented Dec 10, 2014

Oh, okay, that's awesome

On Wed, Dec 10, 2014 at 1:15 PM, Greg Caporaso notifications@github.com
wrote:

I think you're misunderstanding - sequences that were failing to hit with
the (ancient) core set are now hitting the GG 85% reference OTUs, so on the
metric of minimizing sequences that fail to align with PyNAST, the GG 85%
OTUs are doing better.

I'm asking whether anyone is concerned about the Mantel correlation
between the tip-to-tip distance matrices from the resulting trees is only
~0.72.


Reply to this email directly or view it on GitHub
#1770 (comment).

@jairideout
Copy link
Member

Fixed in #1777.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants