-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sequence Extraction #1321
Comments
Identifying the provenance of each denoised read is fairly straightforward, up until merging (where it is still completely possible, just a bit harder). See this issues and those linked for some guidance: #354 |
Thank you so much, I really appreciate it. I'm about to go into the field for a few days so I may comment again next week if I am still having issues. Mark |
Hello Ben, So I wanted to clarify my request a little bit because I am still having a bit of trouble. So down below is a piece of the summary table from the DADA2 pipeline. I would like to extract only the 5508 sequences shown under the nonchim category. These sequences are the filtered, denoised, merged, and nonchim sequences I would like to run a blast on in genbank. Is there a way to extract these 5508 successfully filtered sequences. Thank you so much for your help, I really appreciate you taking the time to answer questions like these. Mark |
Ah, you just want to get the sequences in a format that is usable outside R, like BLAST?
And you should have those sequence in fasta format. |
Hi Ben, When I did the sq <- getSequences(seqtab.nochim) it gave a result that had only621 sequences. Based on the table posted below, I would have expected to get 306238 sequences from all my samples combined in the nonchim row. Additionally, when I tried to do the writeFasta it told me it couldn't find the writeFasta function. Any help you could provide would be greatly appreciated. Thank you so much for continuing to answer my questions.
21-1-trnL 6242 5657 5542 5555 5509 5508 |
There is a difference between "reads" and "sequences" here. `getSequences' will extract all the "sequences", i.e. the unique DNA sequences in the sequence table. That number will not be the same as the number of reads, which are the numbers that are being shown in the results posted above, because some sequences are represented by many reads.
My fault there, you need to load the "ShortRead" package as well as the "dada2" package to use |
Hi Ben,
It looks like I got everything to work, thank you so much for all your help
and for running this amazing package!
Mark
…On Thu, Apr 22, 2021 at 5:57 PM Benjamin Callahan ***@***.***> wrote:
There is a difference between "reads" and "sequences" here. `getSequences'
will extract all the "sequences", i.e. the unique DNA sequences in the
sequence table. That number will not be the same as the number of reads,
which are the numbers that are being shown in the results posted above,
because some sequences are represented by many reads.
Additionally, when I tried to do the writeFasta it told me it couldn't
find the writeFasta function.
My fault there, you need to load the "ShortRead" package as well as the
"dada2" package to use writeFasta in that way.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1321 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ATWFZACETJXAYF4BVHUVWE3TKCSUTANCNFSM427WMMMA>
.
|
Hi Ben, I actually had one hopefully last question for you. Is there a way to see how many reads go into each unique sequence that is extracted. So for example, I have 621 unique sequences, can I see if one of them only had a single read versus one that has 10,000. Mark |
In R you just can look at the entry in the sequence table corresponding to that sample and ASV. E.g. for the 12th ASV, |
Hello,
I am using dada2 for a metabarcoding project and I was curious if there was a way for me to extract the sequences identified during the tracking step of the dada2 pipeline. For instance, I would like to get those sequences that are under the nonchim catagory and then run a blast on those now filtered sequences. How can I extract them?
Any help would be apprecaited!
Mark Johnson
input filtered denoisedF denoisedR merged nonchim
21-1-trnL 6242 5657 5542 5555 5509 5508
21-2-trnL 8631 8104 8050 8062 8039 8012
21-3-trnL 11648 10953 10839 10851 10771 10605
21-4-trnL 9738 9120 8936 8986 8771 8731
21-5-trnL 7388 6925 6824 6866 6700 6535
21-6-trnL 7120 6721 6621 6634 6531 6269
The text was updated successfully, but these errors were encountered: