dada2 16S/18S classifier vs 18S shows different taxonomic placement #860

betsyalf · 2019-10-15T23:29:12Z

Hi dada2 team,
We are using 18S to examine Chickpea fields that are heavily infected with Phytophthora. I noticed when I use the dada2 formatted 16/18S silva classifier or the 18S, the family/genus placement of the ASVs are more or less where I expect them to be (with some exceptions); however, the kingdom, phylum, class, and order placement are different between the two classifiers. I had assumed the 18S classifier was more or less the 16/18S with 18S parsed into a new file, but this doesn't seem to be the case. Is there documentation as to the origin of the silva_132.18s.99_rep_set.dada2.fa file?

16S/18S silva
taxa_Eth18_all <- assignTaxonomy(seqtabEth, "../../Resources/silva_nr_v132_train_set.fa", multithread=TRUE, tryRC=TRUE)

18S silva
taxa_Eth18 <- assignTaxonomy(seqtabEth, "../../Resources/silva_132.18s.99_rep_set.dada2.fa", multithread=TRUE, tryRC=TRUE)

The text was updated successfully, but these errors were encountered:

benjjneb · 2019-10-16T00:05:59Z

The Silva 16S database we curate is derived from the mothur-formatted approximation of the Silva SEED database. You can see how this is created here: http://blog.mothur.org/2017/03/22/SILVA-v128-reference-files/ A key thing to realize, is that the screening for this dataset is bacterial 16S-centric, i.e. it looks for bacterial primer sites to keep sequences, and thus it is not an ideal option for Eukaryotic 18S assignment.

We did not create the Silva 18S database, it was contributed by others, but there is a bit of information on how it was constructed at its Zenodo deposition: https://zenodo.org/record/1447330#.XaZdzOdKiL8 The way that was constructed focused on keeping Eukaryotic entries, so this database may be more appropriate for Euk 18S assignment, but I have to admit I haven't used it myself so I can't guarantee anything there.

betsyalf · 2019-10-16T18:51:13Z

Hi Ben,
Thanks for getting back to me so quickly. I had checked out the Zenodo page, but the information provided wasn't in enough detail to explain why the higher level taxonomic hierarchies are different (no applicable code provided). Pat Schloss's blog might explain part of the answer. In the R code used by the mothur folks to collapse down the silva taxonomy to Linnean levels, the names that are pulled from the arb for phylum, class, and order are different than the dada2 silva-18S. Strangely enough, the dada2-18S, qiime2 silva-all , and qiime2 silva-18S only all have consistent hierarchies (https://www.arb-silva.de/download/archive/qiime). I'll touch base with the qiime2 folks and compare their code to mothur.

qiime2 16S/18S

qiime2 18S only

benjjneb · 2019-10-16T18:55:23Z

Great, feel free to update us as you find out more. You could also consider contacting the folks who contributed the DADA2-formatted 18S database to see if they could comment more on their approach. My guess is the difference between reducing to the Linnean levels or not is a (the?) major factor.

betsyalf · 2019-10-17T21:46:32Z

Update: looks like the difference is how the contributors decided to drop down to 7 layers. In the dada2 16/18S classifier, the mothur convention was used while the 18S only the qiime convention was used with extra annotations in the genus and species level. It appears the mothur group chose to sample throughout the taxonomic hierarchy, while the qiime group focused on the very top and very bottom levels. Thus the difference in the middle hierarchies.

As more folks are moving out of mothur and qiime for the extra flexibility that stand alone dada2 provides, it would be a good idea to document the differences between the 16S/18S and 18S only on the dada2 webpage.

On a side note, the qiime folks recognize that the current way they collapse to 7 levels is awkward for Eukaryotes and are actively seeking input on how to deal with this in the future
https://forum.qiime2.org/t/silva-classifier-seven-level-code/12028

benjjneb · 2019-10-21T16:03:14Z

Thanks that's some useful investigation into what's going on there.

For now we have not imposed stringent reporting requirements on "contributed" reference training fastas like the Silva 18S data. Perhaps that should be revisited.

benjjneb · 2019-10-30T20:40:22Z

Closing, but will keep an eye on the updates from the Q2 team, which appears to be looking into this: https://forum.qiime2.org/t/silva-classifier-seven-level-code/12028/16

benjjneb closed this as completed Oct 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dada2 16S/18S classifier vs 18S shows different taxonomic placement #860

dada2 16S/18S classifier vs 18S shows different taxonomic placement #860

betsyalf commented Oct 15, 2019

benjjneb commented Oct 16, 2019

betsyalf commented Oct 16, 2019

benjjneb commented Oct 16, 2019

betsyalf commented Oct 17, 2019

benjjneb commented Oct 21, 2019

benjjneb commented Oct 30, 2019 •

edited

Loading

dada2 16S/18S classifier vs 18S shows different taxonomic placement #860

dada2 16S/18S classifier vs 18S shows different taxonomic placement #860

Comments

betsyalf commented Oct 15, 2019

benjjneb commented Oct 16, 2019

betsyalf commented Oct 16, 2019

benjjneb commented Oct 16, 2019

betsyalf commented Oct 17, 2019

benjjneb commented Oct 21, 2019

benjjneb commented Oct 30, 2019 • edited Loading

benjjneb commented Oct 30, 2019 •

edited

Loading