-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kmerize with hg19 complains of "long vectors not supported yet" #8
Comments
Yeah, I need to architecturally overhaul how the storage is managed to workaround R's integer limit. This might be similar to #6 At the minute, hg19 can't be kmerized because R has a hard limit on vector maximum size as 2.147e+9 (see https://stackoverflow.com/a/21142236). As the stackoverflow answer explains, when you access vectors using the array notation, underneath the covers the indexes use integers, but hg19 is > 3 billion basepairs: > suppressPackageStartupMessages(library(BSgenome.Hsapiens.UCSC.hg19))
> message(format(2^31-1, big.mark = ","))
2,147,483,647
> message(format(sum(as.numeric(elementNROWS(BSgenome.Hsapiens.UCSC.hg19))), big.mark = ","))
3,137,161,264
> sum(elementNROWS(BSgenome.Hsapiens.UCSC.hg19))
[1] NA
Warning message:
In sum(elementNROWS(BSgenome.Hsapiens.UCSC.hg19)) :
integer overflow - use sum(as.numeric(.))
> Thanks for the feedback! |
Actually, the architectural bug about long vectors was fixed upstream. Did a bit more searching for "long vectors bioconductor" and found this issue was fixed in R-3.4 devel: > version$version.string
[1] "R version 3.4.2 (2017-09-28)"
> BiocInstaller::useDevel()
Error: 'devel' version not available This leaves upgrading to R-3.5: # In R 3.5:
source("https://bioconductor.org/biocLite.R")
biocLite(c("devtools", "BSgenome.Hsapiens.UCSC.hg19"))
devtools::install_github("coregenomics/kmap", repos = BiocInstaller::biocinstallRepos())
library(kmap)
library(BiocParallel)
mappable_regions <- mappable("hg19", kmer = 50, BPPARAM = SerialParam()) I haven't tried testing this yet because my lab machine keeps running out of memory even with the single core. I'm in the process of installing R 3.5 and trying kmap on our university cluster and will let you know. |
kmerize(): - slidingWindows() in R 3.5.0 / Bioconductor 3.7 natively produces GRangesList from GRanges input and no longer requires higher level BPPARAM parallelization. gr_masked(): - Coercion to RangesList no longer supported or necessary; can directly coerce to IRangesList. - Drop esoteric ir2gr() function in lieu of lower memory footprint expand_rle().
Fault persists with R 3.5 and running INFO [2018-06-11 23:24:38] Removing non-standard DNA bases
INFO [2018-06-11 23:33:29] Chopping into 50-mers
Error in .Call2("Ranges_validate", x_start, x_end, x_width, PACKAGE = "IRanges") :
long vectors not supported yet: memory.c:3486
Calls: mappable ... anyStrings -> isTRUE -> validityMethod -> valid.func -> .Call2 Will post to the bioc-devel mailing list. |
Greetings,
I am running the mappable function for hg19 genome:
library(kmap)
library(BiocParallel)
# either
mappable.regions<-mappable("hg19",kmer=50,BPPARAM=MulticoreParam(workers = 1))
# or
mappable.regions<-mappable("hg19",kmer=50)
But finally I get the following error:
INFO [2018-06-01 15:09:12] Removing non-standard DNA bases
INFO [2018-06-01 15:26:38] Chopping into 50-mers
Error in .Call2("valid_Ranges", x_start, x_end, x_width, PACKAGE = "IRanges") :
long vectors not supported yet: memory.c:3451
Any idea what might has caused such an error?
The text was updated successfully, but these errors were encountered: