Denoising amplicons from multilocus PCRs #1981

csmiguel · 2024-07-12T19:38:46Z

Dear Benjamin,
I am using DADA2 to denoise amplicons from multiplexed PCRs sequenced in Illumina MiSeq 300PE, from a panel of markers used in mammals.
Despite DADA2 has been used mostly for metabarcoding analysis where (1 sample) => (1 locus) * (taxa in sample), my case is (1 sample) = (30 loci) * (1 taxon).
So far so good with DADA2. For each sample, I demultiplex using locus-specific primers using cutadapt and then I run dada on each samplei_locusj.fastq. Since my taxon is a diploid animal, the maximum number of expected real variants for each dada run is 2.
Under this expectation, I had to tune OMEGA_A to around 10^-20 to get variants from fastq files with a low number of reads. When using dada with all the samples for a given locus dada2::dada(all_files_for_locus*j*, pool = T) the power to detect variants at low coverage is great, even with the default OMEGA_A of 10^-40, at little risk of false positives.
My questions are multiple:

Probably dada2 needs many sequences for reliable matrices of error estimations. Do you think splitting files so much instead of doing a pooled call with all samples and loci together can affect negatively the determination of ASVs? Do you think it makes sense to split by locus even if reads end up being much lower?
I am using the 'clustering' element from the output of 'dada-class', to estimate empirically for my dataset a justifiable OMEGA_A. It is working quite well. After reading through the forum and the documentation I could not find a definition for the columns "pval" and "birth_pval". Also, they are always "NA" for the first denoised variant, I guess that would correspond to a p_val of infinite, as you show in Fig 3 from Rosen et al. (2012). Based on these plots and your suggestion to plot histograms value of p-value threshold (OMEGA_A) for divisive partitioning #315 I have made the following plot to guide a justified decision on what OMEGA_A to apply to my data. After playing with thresholds, I noticed "birth_pval" is the value used for "OMEGA_A", is that right? if so, what is then the definition for "pval"? Is OMEGA_C the same than OMEGA_R from Rosen et al. (2012). (value 10 in axis == infinite).

Thanks a lot for your insights, Miguel

The text was updated successfully, but these errors were encountered:

csmiguel closed this as completed Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Denoising amplicons from multilocus PCRs #1981

Denoising amplicons from multilocus PCRs #1981

csmiguel commented Jul 12, 2024

Denoising amplicons from multilocus PCRs #1981

Denoising amplicons from multilocus PCRs #1981

Comments

csmiguel commented Jul 12, 2024