Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in scanTabix #8

Open
timoast opened this issue Jul 24, 2019 · 4 comments
Open

Bug in scanTabix #8

timoast opened this issue Jul 24, 2019 · 4 comments

Comments

@timoast
Copy link

timoast commented Jul 24, 2019

Hi,

I came across a bug in scanTabix where no data is returned when requesting regions on double-digit chromosomes (ie >chr9). This only appears to be an issue on Windows and when the tabix file is above a certain size.

Here is a tabix file and index that will reproduce the issue. Apologies for the huge file, I tried a downsampling but the bug only seems to occur with the larger file.

Reproducible example:

library(Rsamtools)
library(GenomicRanges)
library(IRanges)

tbx.file <- "fragments.tsv.gz"
range.chr14 <- GRanges(seqnames = 'chr14', ranges = IRanges(start = 99635624, end = 99737861))
tbx <- TabixFile(file = tbx.file)
scanTabix(file = tbx, param = range.chr14)

This code will return data on macOS or linux but an empty vector on windows (I tested on Windows 7 with R 3.6.1 and the current version of Rsamtools).

@bschilder
Copy link

Was this resolved? I'm wondering if perhaps some of the other errors I'm experiencing are related to this (will post those soon).

@mtmorgan
Copy link
Contributor

I think this is likely an integer overflow on Windows; I wonder if this occurs under the 64-bit build, especially under R-devel? This seems to be a regression introduced when we moved to using Rhtslib, but that transition is now quite old and it seems like the right thing to do is update Rhtslib, and then Rsamtools. Unfortunately, that is likely to be a moderate-to-big project and in the short to intermediate term the solution is likely to use Linux or macOS, e.g., via the Windows subsystem for Linux or, e.g., your local compute cluster or cloud provider.

@bschilder
Copy link

bschilder commented Mar 19, 2022

Thanks for the reply @mtmorgan, that's quite understandable.

Along those lines, an intermediate solution might be to use the Bioconductor Docker container, which is Linux-based and includes an Rstudio interface. We use this as a base for most of our Docker containers.

@hpages
Copy link
Contributor

hpages commented Mar 23, 2022

Reminds me of this Rhtslib Windows-specific bug from 2.5 years ago: https://support.bioconductor.org/p/124568/

Yes Rhtslib still contains HTSlib 1.7 which is lagging 4 years behind the latest HTSlib (version 1.15). Right thing to do at this point would be to update Rhtslib. Maybe that Windows-specific Tabix bug is gone in HTSlib 1.15, hopefully. However, as Martin said, this is a major endeavor. Not before BioC 3.16.

H.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants