Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Denoising amplicons from multilocus PCRs #1981

Closed
csmiguel opened this issue Jul 12, 2024 · 0 comments
Closed

Denoising amplicons from multilocus PCRs #1981

csmiguel opened this issue Jul 12, 2024 · 0 comments

Comments

@csmiguel
Copy link

Dear Benjamin,
I am using DADA2 to denoise amplicons from multiplexed PCRs sequenced in Illumina MiSeq 300PE, from a panel of markers used in mammals.
Despite DADA2 has been used mostly for metabarcoding analysis where (1 sample) => (1 locus) * (taxa in sample), my case is (1 sample) = (30 loci) * (1 taxon).
So far so good with DADA2. For each sample, I demultiplex using locus-specific primers using cutadapt and then I run dada on each samplei_locusj.fastq. Since my taxon is a diploid animal, the maximum number of expected real variants for each dada run is 2.
Under this expectation, I had to tune OMEGA_A to around 10^-20 to get variants from fastq files with a low number of reads. When using dada with all the samples for a given locus dada2::dada(all_files_for_locus*j*, pool = T) the power to detect variants at low coverage is great, even with the default OMEGA_A of 10^-40, at little risk of false positives.
My questions are multiple:

  1. Probably dada2 needs many sequences for reliable matrices of error estimations. Do you think splitting files so much instead of doing a pooled call with all samples and loci together can affect negatively the determination of ASVs? Do you think it makes sense to split by locus even if reads end up being much lower?
  2. I am using the 'clustering' element from the output of 'dada-class', to estimate empirically for my dataset a justifiable OMEGA_A. It is working quite well. After reading through the forum and the documentation I could not find a definition for the columns "pval" and "birth_pval". Also, they are always "NA" for the first denoised variant, I guess that would correspond to a p_val of infinite, as you show in Fig 3 from Rosen et al. (2012). Based on these plots and your suggestion to plot histograms value of p-value threshold (OMEGA_A) for divisive partitioning #315 I have made the following plot to guide a justified decision on what OMEGA_A to apply to my data. After playing with thresholds, I noticed "birth_pval" is the value used for "OMEGA_A", is that right? if so, what is then the definition for "pval"? Is OMEGA_C the same than OMEGA_R from Rosen et al. (2012). (value 10 in axis == infinite).
imagen Thanks a lot for your insights, Miguel
@csmiguel csmiguel closed this as completed Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant