Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple entries in remap.fq*.gz for single read pair #18

Closed
cdeboever3 opened this issue Jun 18, 2015 · 9 comments · Fixed by #25
Closed

Multiple entries in remap.fq*.gz for single read pair #18

cdeboever3 opened this issue Jun 18, 2015 · 9 comments · Fixed by #25

Comments

@cdeboever3
Copy link
Contributor

I've used the mapping part of WASP successfully before, but for some reason I've started seeing what seems to be a bug where the sequence for a single read pair is written to the remap.fq*.gz files multiple times:

$ zcat test.remap.fq1.gz 
@1:chr12:6643710:6643710:3
TCCGATCTGCCGCATCTTCTTTTGCGTCGCCAGCCGAGCCACATCGCTCAGACACCATGGGGAAGGTGAAGGTCGGAGTCAACGGATTTGGTCGTATTGG
+1:chr12:6643710:6643710:3
FFB/FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB
@1:chr12:6643710:6643710:3
TCCGATCTGCCGCATCTTCTTTTGCGTCGCCAGCCGAGCCACATCGCTGAGACACCATGGGGAAGGTGAAGGTCGGAGTCAACGGATTTGGTCGTATTGG
+1:chr12:6643710:6643710:3
FFB/FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB
@1:chr12:6643710:6643710:3
TCCGATCTGCCGCATCTTCTTTTGCGTCGCCAGCCGAGCCACATCGCTGAGACACCATGGGGAAGGTGAAGGTCGGAGTCAACGGATTTGGTCGTATTGG
+1:chr12:6643710:6643710:3
FFB/FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB

I've made a zip with files to reproduce the bug. I used the latest version of find_intersecting_snps.py from master.

https://dl.dropboxusercontent.com/u/3886457/wasp_bug.zip

@cdeboever3
Copy link
Contributor Author

Looking at my older runs of WASP, it seems that reads were output multiple times in the fastq files in the past as well. I guess the bug here may be that this read pair doesn't make it through the filtering step even though it aligns to the same spot. I've added files to the zip that show this.

@gmcvicker
Copy link
Collaborator

Hi Chris, thanks for the bug report, looking into this now...

@gmcvicker
Copy link
Collaborator

I think I found the problem and have committed a fix here:
b1e8219

Thanks again for the bug report and let us know if you have any further issues.

@cdeboever3
Copy link
Contributor Author

Thanks Graham. It seems the results (e.g. whether a read pair is kept or not) from the mapping pipeline weren't affected by this bug right?

@gmcvicker
Copy link
Collaborator

I am not 100% certain, but unfortunately I think that it could have
affected which paired end reads are filtered. I think that some PE reads
may have dropped out of the pipeline even though they could have been kept.

The other outstanding issue is that Step #5 (rmdup) does not currently
support PE reads. I am working to fix this now.

On Thu, Jun 25, 2015 at 12:50 PM, Christopher DeBoever <
notifications@github.com> wrote:

Thanks Graham. It seems the results (e.g. whether a read pair is kept or
not) from the mapping pipeline weren't affected by this bug right?


Reply to this email directly or view it on GitHub
#18 (comment).

@gmcvicker gmcvicker reopened this Jul 27, 2015
@gmcvicker
Copy link
Collaborator

It turns out the 'fix' I made was not correct and has created some issues with the PE reads. I have reverted to the old version and I am working on fixing the original issue (which was minor by comparison).

@cdeboever3
Copy link
Contributor Author

Sounds good, I was actually looking at the code last week although so far
I've mostly just added comments. I'm hoping to start refactoring a bit
tomorrow and adding in some unit tests.

On Mon, Jul 27, 2015 at 2:39 PM, Graham McVicker notifications@github.com
wrote:

It turns out the 'fix' I made was not correct and has created some issues
with the PE reads. I have reverted to the old version and I am working on
fixing the original issue (which was minor by comparison).


Reply to this email directly or view it on GitHub
#18 (comment).

@cdeboever3
Copy link
Contributor Author

I've been able to clean up the code a bit and add a lot of documentation and some tests (0170a01). I actually looked into this bug and it turns out it's not a bug. The two reads both overlap the SNP so the three possible read pairs are output. I added a test for the data I provided initially.

I can make a pull request, but I was also wondering if we could add an option to specify that the input bam file is already coordinate sorted? I can add that in before I make the pull request.

@gmcvicker
Copy link
Collaborator

Hi Chris,

That changes and test look great. You are welcome to add an option to
indicate that the input bam is already sorted. Once you are ready to make a
pull request we can accept it.

Thanks a lot for your help!

Graham

On Mon, Aug 3, 2015 at 3:17 PM, Christopher DeBoever <
notifications@github.com> wrote:

I've been able to clean up the code a bit and add a lot of documentation
and some tests (0170a01
0170a01).
I actually looked into this bug and it turns out it's not a bug. The two
reads both overlap the SNP so the three possible read pairs are output. I
added a test for the data I provided initially.

I can make a pull request, but I was also wondering if we could add an
option to specify that the input bam file is already coordinate sorted? I
can add that in before I make the pull request.


Reply to this email directly or view it on GitHub
#18 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants