-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dada2 16S/18S classifier vs 18S shows different taxonomic placement #860
Comments
The Silva 16S database we curate is derived from the mothur-formatted approximation of the Silva SEED database. You can see how this is created here: http://blog.mothur.org/2017/03/22/SILVA-v128-reference-files/ A key thing to realize, is that the screening for this dataset is bacterial 16S-centric, i.e. it looks for bacterial primer sites to keep sequences, and thus it is not an ideal option for Eukaryotic 18S assignment. We did not create the Silva 18S database, it was contributed by others, but there is a bit of information on how it was constructed at its Zenodo deposition: https://zenodo.org/record/1447330#.XaZdzOdKiL8 The way that was constructed focused on keeping Eukaryotic entries, so this database may be more appropriate for Euk 18S assignment, but I have to admit I haven't used it myself so I can't guarantee anything there. |
Hi Ben, |
Great, feel free to update us as you find out more. You could also consider contacting the folks who contributed the DADA2-formatted 18S database to see if they could comment more on their approach. My guess is the difference between reducing to the Linnean levels or not is a (the?) major factor. |
Update: looks like the difference is how the contributors decided to drop down to 7 layers. In the dada2 16/18S classifier, the mothur convention was used while the 18S only the qiime convention was used with extra annotations in the genus and species level. It appears the mothur group chose to sample throughout the taxonomic hierarchy, while the qiime group focused on the very top and very bottom levels. Thus the difference in the middle hierarchies. As more folks are moving out of mothur and qiime for the extra flexibility that stand alone dada2 provides, it would be a good idea to document the differences between the 16S/18S and 18S only on the dada2 webpage. On a side note, the qiime folks recognize that the current way they collapse to 7 levels is awkward for Eukaryotes and are actively seeking input on how to deal with this in the future |
Thanks that's some useful investigation into what's going on there. For now we have not imposed stringent reporting requirements on "contributed" reference training fastas like the Silva 18S data. Perhaps that should be revisited. |
Closing, but will keep an eye on the updates from the Q2 team, which appears to be looking into this: https://forum.qiime2.org/t/silva-classifier-seven-level-code/12028/16 |
Hi dada2 team,
We are using 18S to examine Chickpea fields that are heavily infected with Phytophthora. I noticed when I use the dada2 formatted 16/18S silva classifier or the 18S, the family/genus placement of the ASVs are more or less where I expect them to be (with some exceptions); however, the kingdom, phylum, class, and order placement are different between the two classifiers. I had assumed the 18S classifier was more or less the 16/18S with 18S parsed into a new file, but this doesn't seem to be the case. Is there documentation as to the origin of the silva_132.18s.99_rep_set.dada2.fa file?
16S/18S silva
taxa_Eth18_all <- assignTaxonomy(seqtabEth, "../../Resources/silva_nr_v132_train_set.fa", multithread=TRUE, tryRC=TRUE)
18S silva
taxa_Eth18 <- assignTaxonomy(seqtabEth, "../../Resources/silva_132.18s.99_rep_set.dada2.fa", multithread=TRUE, tryRC=TRUE)
The text was updated successfully, but these errors were encountered: