Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DADA2 Not recovering known community members in mock community samples #1005

Closed
skelto3 opened this issue May 8, 2020 · 7 comments
Closed

Comments

@skelto3
Copy link

skelto3 commented May 8, 2020

Hello,

I constructed simple mock communities comprised of short synthetic genes that vary only at a 6 bp region in the middle, combined in various known concentrations (example sequences of one such mock community pasted below). In every try so far, at least one of the known mock community members is not recovered after denoising, despite an abundance of perfect matches being present in the raw reads. I have included the known sequences as priors (forward and reverse compliments prior to merging), used pooling and no pooling, and tried selfconsist = T and F, each to no avail. In the below example mock communtiy, based on the known concentrations of the differnt variants going into the mock, it appears that the first and second sequences are being assigned to the same ASV, which is given the same sequence as the second mock member, and thus I recover zero perfect matches for the first mock member. This is particularly puzzling because the first mock member comprisies ~a third of the raw reads in some samples.

Is there anything else I can try to get DADA2 to descriminate among these similar sequences?

thank you.

Pmb.F.priors <- c("AGCTATTCTATTCCTAAATAATACATCCAACACTCCAACACTATTATTCCTAGCAACC",
"AGCTATTCTATTCCTAAATAATACTCTCAACACTCCAACACTATTATTCCTAGCAACC",
"AGCTATTCTATTCCTAAATAATAAGAGCAACACTCCAACACTATTATTCCTAGCAACC",
"AGCTATTCTATTCCTAAATAATAATGACAACACTCCAACACTATTATTCCTAGCAACC",
"AGCTATTCTATTCCTAAATAATATACACAACACTCCAACACTATTATTCCTAGCAACC")

@benjjneb
Copy link
Owner

benjjneb commented May 9, 2020

To clarify, the sequences linked here are the mock community sequences you are trying to recover? And, is it just these sequences being denoised, or are they part of a long sequenced region?

Also, "first" and "second" in your text, corresponds to the 1st and 2nd sequence in Pmb.F.priors?

@skelto3
Copy link
Author

skelto3 commented May 10, 2020 via email

@benjjneb
Copy link
Owner

That is... strange. When I use the dada2 alignment from within the R package, these sequences are all clearly distinguished from one another so what is going on?

unname(outer(Pmb.F.priors, Pmb.F.priors, nwhamming, vec=TRUE))

What version of the dada2 R package are you using? Could you share an example fastq file with me?

@skelto3
Copy link
Author

skelto3 commented May 11, 2020 via email

@benjjneb
Copy link
Owner

You can email me: benjamin DOT j DOT callahan AT gmail DOT com

@benjjneb
Copy link
Owner

Did we get this figured out over email?

@skelto3
Copy link
Author

skelto3 commented Jul 16, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants