Multiple entries in remap.fq*.gz for single read pair #18

cdeboever3 · 2015-06-18T19:44:51Z

I've used the mapping part of WASP successfully before, but for some reason I've started seeing what seems to be a bug where the sequence for a single read pair is written to the remap.fq*.gz files multiple times:

$ zcat test.remap.fq1.gz 
@1:chr12:6643710:6643710:3
TCCGATCTGCCGCATCTTCTTTTGCGTCGCCAGCCGAGCCACATCGCTCAGACACCATGGGGAAGGTGAAGGTCGGAGTCAACGGATTTGGTCGTATTGG
+1:chr12:6643710:6643710:3
FFB/FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB
@1:chr12:6643710:6643710:3
TCCGATCTGCCGCATCTTCTTTTGCGTCGCCAGCCGAGCCACATCGCTGAGACACCATGGGGAAGGTGAAGGTCGGAGTCAACGGATTTGGTCGTATTGG
+1:chr12:6643710:6643710:3
FFB/FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB
@1:chr12:6643710:6643710:3
TCCGATCTGCCGCATCTTCTTTTGCGTCGCCAGCCGAGCCACATCGCTGAGACACCATGGGGAAGGTGAAGGTCGGAGTCAACGGATTTGGTCGTATTGG
+1:chr12:6643710:6643710:3
FFB/FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB

I've made a zip with files to reproduce the bug. I used the latest version of find_intersecting_snps.py from master.

https://dl.dropboxusercontent.com/u/3886457/wasp_bug.zip

The text was updated successfully, but these errors were encountered:

cdeboever3 · 2015-06-18T21:14:27Z

Looking at my older runs of WASP, it seems that reads were output multiple times in the fastq files in the past as well. I guess the bug here may be that this read pair doesn't make it through the filtering step even though it aligns to the same spot. I've added files to the zip that show this.

gmcvicker · 2015-06-25T16:14:17Z

Hi Chris, thanks for the bug report, looking into this now...

gmcvicker · 2015-06-25T16:48:00Z

I think I found the problem and have committed a fix here:
b1e8219

Thanks again for the bug report and let us know if you have any further issues.

cdeboever3 · 2015-06-25T16:50:39Z

Thanks Graham. It seems the results (e.g. whether a read pair is kept or not) from the mapping pipeline weren't affected by this bug right?

gmcvicker · 2015-06-25T17:25:56Z

I am not 100% certain, but unfortunately I think that it could have
affected which paired end reads are filtered. I think that some PE reads
may have dropped out of the pipeline even though they could have been kept.

The other outstanding issue is that Step #5 (rmdup) does not currently
support PE reads. I am working to fix this now.

On Thu, Jun 25, 2015 at 12:50 PM, Christopher DeBoever <
notifications@github.com> wrote:

Thanks Graham. It seems the results (e.g. whether a read pair is kept or
not) from the mapping pipeline weren't affected by this bug right?

—
Reply to this email directly or view it on GitHub
#18 (comment).

gmcvicker · 2015-07-27T21:39:26Z

It turns out the 'fix' I made was not correct and has created some issues with the PE reads. I have reverted to the old version and I am working on fixing the original issue (which was minor by comparison).

cdeboever3 · 2015-07-27T21:42:48Z

Sounds good, I was actually looking at the code last week although so far
I've mostly just added comments. I'm hoping to start refactoring a bit
tomorrow and adding in some unit tests.

On Mon, Jul 27, 2015 at 2:39 PM, Graham McVicker notifications@github.com
wrote:

It turns out the 'fix' I made was not correct and has created some issues
with the PE reads. I have reverted to the old version and I am working on
fixing the original issue (which was minor by comparison).

—
Reply to this email directly or view it on GitHub
#18 (comment).

cdeboever3 · 2015-08-03T19:17:56Z

I've been able to clean up the code a bit and add a lot of documentation and some tests (0170a01). I actually looked into this bug and it turns out it's not a bug. The two reads both overlap the SNP so the three possible read pairs are output. I added a test for the data I provided initially.

I can make a pull request, but I was also wondering if we could add an option to specify that the input bam file is already coordinate sorted? I can add that in before I make the pull request.

gmcvicker · 2015-08-03T19:38:10Z

Hi Chris,

That changes and test look great. You are welcome to add an option to
indicate that the input bam is already sorted. Once you are ready to make a
pull request we can accept it.

Thanks a lot for your help!

Graham

On Mon, Aug 3, 2015 at 3:17 PM, Christopher DeBoever <
notifications@github.com> wrote:

I've been able to clean up the code a bit and add a lot of documentation
and some tests (0170a01
0170a01).
I actually looked into this bug and it turns out it's not a bug. The two
reads both overlap the SNP so the three possible read pairs are output. I
added a test for the data I provided initially.

I can make a pull request, but I was also wondering if we could add an
option to specify that the input bam file is already coordinate sorted? I
can add that in before I make the pull request.

—
Reply to this email directly or view it on GitHub
#18 (comment).

gmcvicker closed this as completed Jun 25, 2015

gmcvicker reopened this Jul 27, 2015

cdeboever3 mentioned this issue Aug 4, 2015

Mapping code refactoring and tests. #25

Merged

gmcvicker closed this as completed in #25 Aug 4, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple entries in remap.fq*.gz for single read pair #18

Multiple entries in remap.fq*.gz for single read pair #18

cdeboever3 commented Jun 18, 2015

cdeboever3 commented Jun 18, 2015

gmcvicker commented Jun 25, 2015

gmcvicker commented Jun 25, 2015

cdeboever3 commented Jun 25, 2015

gmcvicker commented Jun 25, 2015

gmcvicker commented Jul 27, 2015

cdeboever3 commented Jul 27, 2015

cdeboever3 commented Aug 3, 2015

gmcvicker commented Aug 3, 2015

Multiple entries in remap.fq*.gz for single read pair #18

Multiple entries in remap.fq*.gz for single read pair #18

Comments

cdeboever3 commented Jun 18, 2015

cdeboever3 commented Jun 18, 2015

gmcvicker commented Jun 25, 2015

gmcvicker commented Jun 25, 2015

cdeboever3 commented Jun 25, 2015

gmcvicker commented Jun 25, 2015

gmcvicker commented Jul 27, 2015

cdeboever3 commented Jul 27, 2015

cdeboever3 commented Aug 3, 2015

gmcvicker commented Aug 3, 2015