You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear Benjamin,
I am using DADA2 to denoise amplicons from multiplexed PCRs sequenced in Illumina MiSeq 300PE, from a panel of markers used in mammals.
Despite DADA2 has been used mostly for metabarcoding analysis where (1 sample) => (1 locus) * (taxa in sample), my case is (1 sample) = (30 loci) * (1 taxon).
So far so good with DADA2. For each sample, I demultiplex using locus-specific primers using cutadapt and then I run dada on each samplei_locusj.fastq. Since my taxon is a diploid animal, the maximum number of expected real variants for each dada run is 2.
Under this expectation, I had to tune OMEGA_A to around 10^-20 to get variants from fastq files with a low number of reads. When using dada with all the samples for a given locus dada2::dada(all_files_for_locus*j*, pool = T) the power to detect variants at low coverage is great, even with the default OMEGA_A of 10^-40, at little risk of false positives.
My questions are multiple:
Probably dada2 needs many sequences for reliable matrices of error estimations. Do you think splitting files so much instead of doing a pooled call with all samples and loci together can affect negatively the determination of ASVs? Do you think it makes sense to split by locus even if reads end up being much lower?
I am using the 'clustering' element from the output of 'dada-class', to estimate empirically for my dataset a justifiable OMEGA_A. It is working quite well. After reading through the forum and the documentation I could not find a definition for the columns "pval" and "birth_pval". Also, they are always "NA" for the first denoised variant, I guess that would correspond to a p_val of infinite, as you show in Fig 3 from Rosen et al. (2012). Based on these plots and your suggestion to plot histograms value of p-value threshold (OMEGA_A) for divisive partitioning #315 I have made the following plot to guide a justified decision on what OMEGA_A to apply to my data. After playing with thresholds, I noticed "birth_pval" is the value used for "OMEGA_A", is that right? if so, what is then the definition for "pval"? Is OMEGA_C the same than OMEGA_R from Rosen et al. (2012). (value 10 in axis == infinite).
Thanks a lot for your insights,
Miguel
The text was updated successfully, but these errors were encountered:
Dear Benjamin,
I am using DADA2 to denoise amplicons from multiplexed PCRs sequenced in Illumina MiSeq 300PE, from a panel of markers used in mammals.
Despite DADA2 has been used mostly for metabarcoding analysis where (1 sample) => (1 locus) * (taxa in sample), my case is (1 sample) = (30 loci) * (1 taxon).
So far so good with DADA2. For each sample, I demultiplex using locus-specific primers using cutadapt and then I run dada on each samplei_locusj.fastq. Since my taxon is a diploid animal, the maximum number of expected real variants for each dada run is 2.
Under this expectation, I had to tune OMEGA_A to around 10^-20 to get variants from fastq files with a low number of reads. When using dada with all the samples for a given locus
dada2::dada(all_files_for_locus*j*, pool = T)
the power to detect variants at low coverage is great, even with the default OMEGA_A of 10^-40, at little risk of false positives.My questions are multiple:
The text was updated successfully, but these errors were encountered: