Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

learnErrors #1730

Closed
Pablo-UY opened this issue Apr 21, 2023 · 3 comments
Closed

learnErrors #1730

Pablo-UY opened this issue Apr 21, 2023 · 3 comments

Comments

@Pablo-UY
Copy link

Hi everyone.
I'm exploring a GBS metagenomics dataset, where the sequences are SE from Illumina, and I'm getting this error profile plot.
Reads have a wide range distribution from 40bp to 90bp, but mostly 90 bp.
I only trimmed by Q and min length and then:
errF = learnErrors(filtered_R, multithread = T)
Could be this the main problem in obtaining these results? or is it due to the nature of the sequences?

Thanks.
Pablo
Plot_errors

@benjjneb
Copy link
Owner

What sequencing instrument and read size are you using? (e.g. HiSeq 1x100)

Reads have a wide range distribution from 40bp to 90bp, but mostly 90 bp.

The raw reads are varying in size? Or this is after some sort of trimming step?
What does an example sample plotQualityProfile look like?

@Pablo-UY
Copy link
Author

It was sequenced in a Novaseq 1x91.
Yes. Raw reads vary in size (15-91) and I trimmed to keep sequences more than 40pb.
plotQualityProfile looks like this:
FASTQ_crudo y Filtrado

@benjjneb
Copy link
Owner

Probably part of what you are seeing is the less-good fit of the error model for binned quality score data like NovaSeq produces, see more here: #1307

Yes. Raw reads vary in size (15-91)

This is still concerning to me though. Typically Illumina raw reads are all of uniform length. If there is a step that is introducing random length variation (e.g. read-by-read truncation at a specific quality score threshold) then this will reduce the sensitivity of dada2 and hurt its ability to accurately infer the error model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants