New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filtering reads based on low-quality base ratio #699
Comments
Would I personally feel a |
Is that the I read https://www.drive5.com/usearch/manual/avgq.html back when I added |
That is exactly it. It is a bit more generic as it does not care about the length and therefore works well on variable-length sequences. The major mistake I made in fastq-filter that there is also a median filter. That is also quite terrible because it is not an informative metric. There is also a Q-score filter that correctly does the average (it converts the score to error rate internally), but I wish I hadn't added that because Q-scores are massively confusion. Q10 vs Q20 vs Q30 are massive differences and the Q-scores simply don't convey that. Error-rate is much better as it does what it says on the tin. It was a good exercise in pedantic performance optimization though, the lookup table is really fast.
Yes exactly. I got it from https://gigabaseorgigabyte.wordpress.com/2017/06/26/averaging-basecall-quality-scores-the-right-way/. Once seen this can't be unseen. I hate that almost every tool does this the wrong way, FastQC, FastP and upon informing them, they haven't even fixed it. |
Hi I tried to read the information in the links you provided (e.g. information on the --max-expected-errors option) to but I am still confusedon how to actually use it (which threshold to use) Many thanks, |
I am not familiar with this ratio. Is this something that is used in other tools? In any case, it is not directly supported in Cutadapt. You can choose the Let’s assume you have a read length of So for this read length, using That said, a threshold of 15.8 appears to be very large anyway. https://www.drive5.com/usearch/manual/faq_emax.html recommends using a threshold of 1. |
Is there a way to filter reads based on low-quality base ratio of the read?
Many thanks,
Gil
The text was updated successfully, but these errors were encountered: