DADA2 Not recovering known community members in mock community samples #1005

skelto3 · 2020-05-08T20:38:11Z

Hello,

I constructed simple mock communities comprised of short synthetic genes that vary only at a 6 bp region in the middle, combined in various known concentrations (example sequences of one such mock community pasted below). In every try so far, at least one of the known mock community members is not recovered after denoising, despite an abundance of perfect matches being present in the raw reads. I have included the known sequences as priors (forward and reverse compliments prior to merging), used pooling and no pooling, and tried selfconsist = T and F, each to no avail. In the below example mock communtiy, based on the known concentrations of the differnt variants going into the mock, it appears that the first and second sequences are being assigned to the same ASV, which is given the same sequence as the second mock member, and thus I recover zero perfect matches for the first mock member. This is particularly puzzling because the first mock member comprisies ~a third of the raw reads in some samples.

Is there anything else I can try to get DADA2 to descriminate among these similar sequences?

thank you.

Pmb.F.priors <- c("AGCTATTCTATTCCTAAATAATACATCCAACACTCCAACACTATTATTCCTAGCAACC",
"AGCTATTCTATTCCTAAATAATACTCTCAACACTCCAACACTATTATTCCTAGCAACC",
"AGCTATTCTATTCCTAAATAATAAGAGCAACACTCCAACACTATTATTCCTAGCAACC",
"AGCTATTCTATTCCTAAATAATAATGACAACACTCCAACACTATTATTCCTAGCAACC",
"AGCTATTCTATTCCTAAATAATATACACAACACTCCAACACTATTATTCCTAGCAACC")

benjjneb · 2020-05-09T20:17:02Z

To clarify, the sequences linked here are the mock community sequences you are trying to recover? And, is it just these sequences being denoised, or are they part of a long sequenced region?

Also, "first" and "second" in your text, corresponds to the 1st and 2nd sequence in Pmb.F.priors?

skelto3 · 2020-05-10T02:45:07Z

Yes, the sequences listed are those that I am trying to recover, and they should be the only sequences present in the samples. These sequences are the complete amplicon (after removing primers), they are not part of a longer sequenced region. I promise there are good reasons for why I am metabarcoding such a tiny region that I realize are not obvious. Yes, first and second correspond to the order in the Pmb.F.priors vector.

…

On Sat, May 9, 2020 at 4:17 PM Benjamin Callahan ***@***.***> wrote: To clarify, the sequences linked here are the mock community sequences you are trying to recover? And, is it just these sequences being denoised, or are they part of a long sequenced region? Also, "first" and "second" in your text, corresponds to the 1st and 2nd sequence in Pmb.F.priors? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1005 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABTPX4GTKFPKXS4PSIJJG4DRQW24TANCNFSM4M4OKNWQ> .

-- James Skelton Community Ecologist webpage: poetsworm.com email: skelto3@g <skelto3@vt.edu>mail.com

benjjneb · 2020-05-11T14:31:52Z

That is... strange. When I use the dada2 alignment from within the R package, these sequences are all clearly distinguished from one another so what is going on?

unname(outer(Pmb.F.priors, Pmb.F.priors, nwhamming, vec=TRUE))

What version of the dada2 R package are you using? Could you share an example fastq file with me?

skelto3 · 2020-05-11T16:13:42Z

Using v‘1.14.0’ Would be willing to share a fastq privately. How may I do so?

…

On Mon, May 11, 2020 at 10:32 AM Benjamin Callahan ***@***.***> wrote: That is... strange. When I use the dada2 alignment from within the R package, these sequences are all clearly distinguished from one another so what is going on? unname(outer(Pmb.F.priors, Pmb.F.priors, nwhamming, vec=TRUE)) What version of the dada2 R package are you using? Could you share an example fastq file with me? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1005 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABTPX4FTOTBYWE6DBKRRT63RRAD6RANCNFSM4M4OKNWQ> .

-- James Skelton Community Ecologist webpage: poetsworm.com email: skelto3@g <skelto3@vt.edu>mail.com

benjjneb · 2020-05-11T16:22:31Z

You can email me: benjamin DOT j DOT callahan AT gmail DOT com

benjjneb · 2020-07-16T20:26:52Z

Did we get this figured out over email?

skelto3 · 2020-07-16T21:26:31Z

Yes. Changing gap_penalty to 20 resolved the issue. Thank you for checking back.

…

On Thu, Jul 16, 2020, 4:27 PM Benjamin Callahan ***@***.***> wrote: Did we get this figured out over email? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1005 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABTPX4BR7NLN45U7DAS6O7LR35PBXANCNFSM4M4OKNWQ> .

benjjneb closed this as completed Jul 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DADA2 Not recovering known community members in mock community samples #1005

DADA2 Not recovering known community members in mock community samples #1005

skelto3 commented May 8, 2020 •

edited

Loading

benjjneb commented May 9, 2020

skelto3 commented May 10, 2020 via email

benjjneb commented May 11, 2020

skelto3 commented May 11, 2020 via email

benjjneb commented May 11, 2020

benjjneb commented Jul 16, 2020

skelto3 commented Jul 16, 2020 via email

DADA2 Not recovering known community members in mock community samples #1005

DADA2 Not recovering known community members in mock community samples #1005

Comments

skelto3 commented May 8, 2020 • edited Loading

benjjneb commented May 9, 2020

skelto3 commented May 10, 2020 via email

benjjneb commented May 11, 2020

skelto3 commented May 11, 2020 via email

benjjneb commented May 11, 2020

benjjneb commented Jul 16, 2020

skelto3 commented Jul 16, 2020 via email

skelto3 commented May 8, 2020 •

edited

Loading