Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poly-A before poly-G in NextSeq reads #36

Closed
PAMorin opened this issue Mar 9, 2018 · 8 comments
Closed

poly-A before poly-G in NextSeq reads #36

PAMorin opened this issue Mar 9, 2018 · 8 comments

Comments

@PAMorin
Copy link

PAMorin commented Mar 9, 2018

Hi,
I am seeing a large portion of NextSeq reads that have the poly-G tail, and have successfully trimmed that off with fastp. Most sequences also have a poly-A run just before the poly-G tail, which is apparently due to reduction in signal strength (lower quality) from clusters before they fail altogether (see https://sequencing.qcfail.com/articles/illumina-2-colour-chemistry-can-overcall-high-confidence-g-bases/). I tried to use the -x function in the same trimming command, but it doesn't work (presumably because it isn't at the end of the original read). I suppose I could try it as a second trimming process, but wanted to know if this is a common issue that people with poly-G reads from NextSeq data see, and if so, could it be incorporated into the options.
The data look like this after trimming:
@NS500704:337:HGG2HBGX3:1:11101:10170:2997 1:N:0:GACGAGG+CGGAAT
GCAAGGTCTTAATCAAATTTTGTCAGCTGCAAGATCGAAGAGCACACGTCTGAACTCCAGTCACGACGAGGATCTCGTATGCCGTCTTCTGCGTGAAAAAAAAAA
+
AAAAAEEEEEAEEEE6EEAEEEEEEEEEEEEEEEEEEE/EEEEEEEEAEEEEE/EEEEEEEEEEEAEE/EEEA</AE/AEE<E</6<E//EE/EE/<AAEEEEA/
@NS500704:337:HGG2HBGX3:1:11101:8528:3344 1:N:0:GACGAGG+CGGAAT
ACAGAAACAGGTGCACAGTTCCCCATCAAGATCGGAAGACACACGTCTGAACTCCAGTCACGACGAGGCTCTCGTATGCCGTCTTCTGCATGAAAAAAAAAA
+
AAAAAAEEEEEE6E/AEEE/EEEEEEEEEE/EEEEEEEEE/EAEEEEEE/AE/EAE/AEEEEEEA/EE/AE<///A<E<E//<EE//////A/EEEEEE///

Thanks,
Phil Morin (phillip.morin@noaa.gov)

@sfchen
Copy link
Member

sfchen commented Mar 9, 2018

Thanks Phil.

In current design if polyG and polyX are both enabled, polyG will be ignored since I thought it's part of polyX.

Please wait for 10 minutes, I will make an update to support polyG trimming --> polyX trimming when they are both enabled.

Thanks
Shifu

@sfchen
Copy link
Member

sfchen commented Mar 9, 2018

Hi @PAMorin

I just pushed the update.

Please try to build fastp with latest code on master. Or download http://opengene.org/fastp/fastp to test again.

Thanks
Shifu

@PAMorin
Copy link
Author

PAMorin commented Mar 9, 2018 via email

@sfchen
Copy link
Member

sfchen commented Mar 9, 2018

Hi Phil,

Your system's gcc compiler is too old (before 2011), so that it doesn't support C++ 11. You can either:

  1. use the new pre-compiled binary of fastp, which can be downloaded from http://opengene.org/fastp/fastp
  2. or ask your system administrator to update the gcc compiler.

@mblue9
Copy link

mblue9 commented Mar 10, 2018

Hi @sfchen I think I may have this issue too and would love to try this option, will there be a new Bioconda version of fastp with this?

@sfchen
Copy link
Member

sfchen commented Mar 10, 2018

Hi @mblue9

A new version of fastp will be released once these new features are well tested. Bioconda version will also be updated soon (usually with one day delay).

@mblue9
Copy link

mblue9 commented Mar 10, 2018

Great, thanks!

@sfchen
Copy link
Member

sfchen commented Mar 19, 2018

Implemented and fixed.

@mblue9 new version is already in Bioconda.

I am closing this issue.

@sfchen sfchen closed this as completed Mar 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants