Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats from repeated regions behave strangely #8

Closed
snystrom opened this issue Oct 25, 2020 · 8 comments
Closed

stats from repeated regions behave strangely #8

snystrom opened this issue Oct 25, 2020 · 8 comments

Comments

@snystrom
Copy link

There is some odd behavior when using stat from a bed file when the bed file contains duplicated regions. The stat will run correctly for the first occurrence of the region, then subsequent instances will be reported as 0. This happens for mean, median, and percentile but not hist.

Example bed file:

regions.bed below

chr3R   16681897        16724143
chr3R   16681897        16724143
chr3R   16681897        16724143
chr3R   16681897        16724143
chr3R   16681897        16724143

Example of unexpected behavior

This happens with mean, median, and percentile.

d4utils stat test.d4 -r regions.bed

chr3R   16681897        16724143        39.93615963641528
chr3R   16681897        16724143        0
chr3R   16681897        16724143        0
chr3R   16681897        16724143        0
chr3R   16681897        16724143        0

In the above example, I would expect the following output:

chr3R   16681897        16724143        39.93615963641528
chr3R   16681897        16724143        39.93615963641528
chr3R   16681897        16724143        39.93615963641528
chr3R   16681897        16724143        39.93615963641528
chr3R   16681897        16724143        39.93615963641528

hist behaves as expected.

# Run w/ single-copy of region
d4utils stat test_bw.d4 -r chr3R:16681897-16724143 -s hist 
<0	0
0	10
1	14
2	0
3	0
4	0
5	0
6	29
# snip

# Run w/ multiple copies of region
d4utils stat test_bw.d4 -r regions.bed -s hist 
<0	0
0	50
1	70
2	0
3	0
4	0
5	0
6	145
# snip
@38 38 closed this as completed Nov 13, 2020
@snystrom
Copy link
Author

snystrom commented Nov 25, 2020

Hi,

I think this error persists with slightly different behavior. Now when I run stat I just get 0 for all regions. Again, expected behavior is 39.93615963641528 for all regions. (Also happens with median)

d4tools stat test.d4 -r regions.bed

chr3R   16681897        16724143        0
chr3R   16681897        16724143        0
chr3R   16681897        16724143        0
chr3R   16681897        16724143        0
chr3R   16681897        16724143        0

I can confirm this because when I view the region, all entries are >0 so there's no way the mean can be 0. Also, when looking at the file from which it was created (in this case, a bigwig) there is high signal at this locus. I can provide example files if needed to reproduce.

@snystrom
Copy link
Author

I should also mention, that when running stat on just a single instance of this region also returns 0, so the problem is now worse.

@arq5x
Copy link
Collaborator

arq5x commented Nov 29, 2020

Hi @snystrom, would you mind providing a minimal file to reproduce this?

@snystrom
Copy link
Author

Sure thing. Here's a bigwig file & a bash script to reproduce the error. Let me know if there's anything else you need.

issue-8-example.tar.gz

md5 checksum:
64d11f4f7b0c37e4dfa0c82ea791d940  issue-8-example.tar.gz

@38
Copy link
Owner

38 commented Nov 30, 2020

Thanks for reporting this. And this is a different bug from the original one and I published the new version with the fix to this bug.

@snystrom
Copy link
Author

snystrom commented Nov 30, 2020

Some more details about the d4 file I'm using. In the example I uploaded, I use a bigwig, but I've also done this from a bam file & got the same result (even when setting -q 0 for the bam), so I think it's not the input file, but I'd be happy to be proven wrong.

@38
Copy link
Owner

38 commented Nov 30, 2020

Some more details about the d4 file I'm using. In the example I uploaded, I use a bigwig, but I've also done this from a bam file & got the same result (even when setting -q 0 for the bam), so I think it's not the input file, but I'd be happy to be proven wrong.

Yep, I can confirm this bug - and this should be fixed in version 0.1.16

@snystrom
Copy link
Author

Just messed around with the new version & can confirmed fixed! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants