Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dada2 16S/18S classifier vs 18S shows different taxonomic placement #860

Closed
betsyalf opened this issue Oct 15, 2019 · 6 comments
Closed

Comments

@betsyalf
Copy link

Hi dada2 team,
We are using 18S to examine Chickpea fields that are heavily infected with Phytophthora. I noticed when I use the dada2 formatted 16/18S silva classifier or the 18S, the family/genus placement of the ASVs are more or less where I expect them to be (with some exceptions); however, the kingdom, phylum, class, and order placement are different between the two classifiers. I had assumed the 18S classifier was more or less the 16/18S with 18S parsed into a new file, but this doesn't seem to be the case. Is there documentation as to the origin of the silva_132.18s.99_rep_set.dada2.fa file?

16S/18S silva
taxa_Eth18_all <- assignTaxonomy(seqtabEth, "../../Resources/silva_nr_v132_train_set.fa", multithread=TRUE, tryRC=TRUE)

Screen Shot 2019-10-15 at 4 08 59 PM

18S silva
taxa_Eth18 <- assignTaxonomy(seqtabEth, "../../Resources/silva_132.18s.99_rep_set.dada2.fa", multithread=TRUE, tryRC=TRUE)
Screen Shot 2019-10-15 at 4 08 50 PM

@benjjneb
Copy link
Owner

The Silva 16S database we curate is derived from the mothur-formatted approximation of the Silva SEED database. You can see how this is created here: http://blog.mothur.org/2017/03/22/SILVA-v128-reference-files/ A key thing to realize, is that the screening for this dataset is bacterial 16S-centric, i.e. it looks for bacterial primer sites to keep sequences, and thus it is not an ideal option for Eukaryotic 18S assignment.

We did not create the Silva 18S database, it was contributed by others, but there is a bit of information on how it was constructed at its Zenodo deposition: https://zenodo.org/record/1447330#.XaZdzOdKiL8 The way that was constructed focused on keeping Eukaryotic entries, so this database may be more appropriate for Euk 18S assignment, but I have to admit I haven't used it myself so I can't guarantee anything there.

@betsyalf
Copy link
Author

Hi Ben,
Thanks for getting back to me so quickly. I had checked out the Zenodo page, but the information provided wasn't in enough detail to explain why the higher level taxonomic hierarchies are different (no applicable code provided). Pat Schloss's blog might explain part of the answer. In the R code used by the mothur folks to collapse down the silva taxonomy to Linnean levels, the names that are pulled from the arb for phylum, class, and order are different than the dada2 silva-18S. Strangely enough, the dada2-18S, qiime2 silva-all , and qiime2 silva-18S only all have consistent hierarchies (https://www.arb-silva.de/download/archive/qiime). I'll touch base with the qiime2 folks and compare their code to mothur.

qiime2 16S/18S
Screen Shot 2019-10-15 at 4 50 09 PM
qiime2 18S only
Screen Shot 2019-10-15 at 4 49 59 PM

@benjjneb
Copy link
Owner

Great, feel free to update us as you find out more. You could also consider contacting the folks who contributed the DADA2-formatted 18S database to see if they could comment more on their approach. My guess is the difference between reducing to the Linnean levels or not is a (the?) major factor.

@betsyalf
Copy link
Author

Update: looks like the difference is how the contributors decided to drop down to 7 layers. In the dada2 16/18S classifier, the mothur convention was used while the 18S only the qiime convention was used with extra annotations in the genus and species level. It appears the mothur group chose to sample throughout the taxonomic hierarchy, while the qiime group focused on the very top and very bottom levels. Thus the difference in the middle hierarchies.

As more folks are moving out of mothur and qiime for the extra flexibility that stand alone dada2 provides, it would be a good idea to document the differences between the 16S/18S and 18S only on the dada2 webpage.

On a side note, the qiime folks recognize that the current way they collapse to 7 levels is awkward for Eukaryotes and are actively seeking input on how to deal with this in the future
https://forum.qiime2.org/t/silva-classifier-seven-level-code/12028

@benjjneb
Copy link
Owner

Thanks that's some useful investigation into what's going on there.

For now we have not imposed stringent reporting requirements on "contributed" reference training fastas like the Silva 18S data. Perhaps that should be revisited.

@benjjneb
Copy link
Owner

benjjneb commented Oct 30, 2019

Closing, but will keep an eye on the updates from the Q2 team, which appears to be looking into this: https://forum.qiime2.org/t/silva-classifier-seven-level-code/12028/16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants