Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quality score plot DADA2 #1988

Open
murheisa opened this issue Jul 24, 2024 · 3 comments
Open

quality score plot DADA2 #1988

murheisa opened this issue Jul 24, 2024 · 3 comments

Comments

@murheisa
Copy link

Hello, I'm currently working on some 16S sequences of whale microbiomes, they were sequenced with Illumina, V4 region, and I´ve been familiar only with V3V4, so I don't know if this quality and at the beginning of the sequence is normal. In the tutorial you mention that in V4 region there are sufficient overlap, so you can guide the trimming parameters only with quality, but these samples seem to me little odd. Can I trim 30 pb at the beginning ?

Captura de pantalla 2024-07-24 a la(s) 14 10 47 Captura de pantalla 2024-07-24 a la(s) 14 11 01
@benjjneb
Copy link
Owner

The red line is showing the number of reads that extend at least that far. The lower quality at the start is due to a large fraction (80%?) of reads that are only ~30nts long in your data.

How has this data been preprocessed and do you have any ideas on why there are a very large number of reads that are both bad quality and only 30 nts long?

But the solution to this would probably not be trimming the starting reads, but removing these short reads.

@murheisa
Copy link
Author

murheisa commented Aug 1, 2024

The sequences are supposed to be raw reads, and all samples have this same issue, I checked with fastqc and all of these short sequences are only NNNNNNNNNNN. The sequencing company used these raw reads with qiime with a previous removal of primer with fastx and quality values with less than 20 as well as sequences less than 130 bases in length were removed and merged with FLASH script and then used in qiime. These were 2x300 bp.

I wonder, if I remove only these 30 pb reads with NNNNNN in trimgalore, the sequences that are kept, could I used them in dada2, and then with filter and trim remove primers and low quality as usual?

@benjjneb
Copy link
Owner

benjjneb commented Aug 2, 2024

I wonder, if I remove only these 30 pb reads with NNNNNN in trimgalore, the sequences that are kept, could I used them in dada2, and then with filter and trim remove primers and low quality as usual?

Yes, you can work just with the remaining reads. And short all-N reads will be removed by filterAndTrim(..., minLen=100) as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants