Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: off-by-one bug on bam reads #75

Open
andypexai opened this issue Mar 3, 2024 · 1 comment
Open

Bug: off-by-one bug on bam reads #75

andypexai opened this issue Mar 3, 2024 · 1 comment

Comments

@andypexai
Copy link

I'll make a genome of 30 bases, and my BAM has one read, 21 bases in length, that spans from bases 5-25 (1-based start) or 4-25 (0-based start):

    ██████FAKE_READ██████
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
         1         2         3
123456789012345678901234567890

It has this as a BAM:

$ samtools view -h test.bam
@HD     VN:1.5  SO:coordinate
@SQ     SN:chr1 LN:30
@PG     ID:samtools     PN:samtools     VN:1.19.2       CL:samtools view -h test.bam
FAKE_READ       0       chr1    5       1       21M     *       0       0       *       *       AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:21 YT:Z:UU

It should be 4 bases depth 0, followed by 21 bases depth 1, followed by 5 bases at depth 0. See the samtools output below:

$ samtools depth -a test.bam
chr1    1       0
chr1    2       0
chr1    3       0
chr1    4       0
chr1    5       1
chr1    6       1
chr1    7       1
chr1    8       1
chr1    9       1
chr1    10      1
chr1    11      1
chr1    12      1
chr1    13      1
chr1    14      1
chr1    15      1
chr1    16      1
chr1    17      1
chr1    18      1
chr1    19      1
chr1    20      1
chr1    21      1
chr1    22      1
chr1    23      1
chr1    24      1
chr1    25      1
chr1    26      0
chr1    27      0
chr1    28      0
chr1    29      0
chr1    30      0

And just to provide a little more info, here are a couple bedtools operations:

$ bedtools bamtobed -i test.bam
chr1    4       25      FAKE_READ       1       +
$ printf "chr1\t0\t30\n" > chr1.bed
$ bedtools coverage -a test.bam -b chr1.bed    
chr1    4       25      FAKE_READ       1       +       4       25      0,0,0   1       21,     0,      1       21      21      1.0000000

Everything looks as expected to me. Now I'll make the d4 file:

$ d4tools create -q=1 test.bam test.d4
$ d4tools view test.d4
chr1    0       4       0
chr1    4       26      1
chr1    26      30      0

Here it appears that the d4tools believes my read is 22 bases long, so the end coordinate is off by one, more specifically it's +1 more than it should be.

@arq5x
Copy link
Collaborator

arq5x commented Mar 5, 2024

@38 this does look like a bug. Do you possibly have time to have a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants