Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory allocation problems: "'Realloc' could not re-allocate memory" #2777

Closed
jsams opened this issue Apr 20, 2018 · 8 comments
Closed

memory allocation problems: "'Realloc' could not re-allocate memory" #2777

jsams opened this issue Apr 20, 2018 · 8 comments
Labels
Milestone

Comments

@jsams
Copy link
Contributor

@jsams jsams commented Apr 20, 2018

I have a data.table with a large number of rows bumping up against the MAXINT row limit. Trying to reduce the table to be a unique on the key is trying to allocate way too much memory.

example

> affils = readRDS(file=sprintf(ALL_DATA, 'affils_all.rds'))
> setDT(affils)
> sapply(affils, class)
$fan_id
[1] "integer"

$contact_id
[1] "integer"

$is_first
[1] "integer"

$is_second
[1] "integer"

$created_at
[1] "POSIXct" "POSIXt" 

> nrow(affils)
[1] 2127968526
> key(affils)
[1] "fan_id"     "contact_id"
> affils = affils[, .(is_first=max(is_first), is_second=max(is_second),
+                     created_at=min(created_at)),
+                 keyby=.(fan_id, contact_id)]
Error in uniqlist(byval) : 
  'Realloc' could not re-allocate memory (18446744065119617024 bytes)

Enter a frame number, or 0 to exit   

1: affils[, .(is_first = max(is_first), is_second = max(is_second), created_at = min(created_at)), keyby = .(fan_id, contact_id)]
2: `[.data.table`(affils, , .(is_first = max(is_first), is_second = max(is_second), created_at = min(created_at)), keyby = .(fan_id, contact_id))
3: uniqlist(byval)

I really don't think that it should be necessary to try to allocate that much memory to run that operation. There are 10s of millions of unique users and 10s of millions of unique contacts. It is probable that there are in fact no duplicate values in that table. I was really just running this as a sanity check.

Potentially related, I think I am seeing memory count overflows (i.e. attempts to allocate a negative amount of memory) in rbind and/or forderv. Unfortunately I don't have the output as I had to kill the screen window those R sessions were in. But basically I had structurally similar tables as above, but with much fewer rows and was rbind'ing them and then running that same unique operation as above. I had checked that the total number of rows was less than MAXINT. I do have part of the error from my search history:

failed to realloc working memory stack data.table

None of this should be constrained by the amount of memory on the machine, as the process was only using about 10-15% of total available RAM on the machine.

sessionInfo

> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/libf77blas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Matrix_1.2-12       lubridate_1.7.1     fasttime_1.0-2      data.table_1.10.4-3

loaded via a namespace (and not attached):
[1] compiler_3.4.2  magrittr_1.5    tools_3.4.2     Rcpp_0.12.14    stringi_1.1.6   grid_3.4.2      stringr_1.2.0   lattice_0.20-35
@mattdowle
Copy link
Member

@mattdowle mattdowle commented Apr 20, 2018

Thanks for the good report. Agreed: it isn't really trying to request that amount of RAM (16 EB). It seems like a type cast problem at C level.

I can see a potential cast problem in that call to Realloc() in uniqlist.c. Before I attempt to fix it, please confirm you see the same problem with dev version: install dev. I'm going to have to rely on you to confirm it's fixed, as reproducing locally or adding to the test suite is going to be tricky at this size. If it is a cast problem, I'm surprised it got past the compiler. We aren't doing any type punning in this file as far as I can see. Oh ... it's just an overflow to negative then cast to size_t, that's why compiler was ok with it. Should be quick to fix, but please first confirm dev fails (it should do).

@mattdowle mattdowle added this to the v1.10.6 milestone Apr 20, 2018
@mattdowle mattdowle added the bug label Apr 20, 2018
mattdowle added a commit that referenced this issue Apr 21, 2018
@mattdowle
Copy link
Member

@mattdowle mattdowle commented Apr 21, 2018

@jsams Fix now merged to master. Please test and confirm it works now ok. I'll wait to hear from you before closing.

@jsams
Copy link
Contributor Author

@jsams jsams commented Apr 23, 2018

Thanks Matt! I have some stuff running that I need to wait on first. Not too sure how to safely have the -dev and stable versions installed at the same time and don't want to run off of -dev for those other processes.

@jangorecki
Copy link
Member

@jangorecki jangorecki commented Apr 23, 2018

@jsams use R CMD INSTALL --library=LIB or install.packages("data.table", lib=path) and just provide any path. R has great built-in support for running different versions of a package. Only limitation is that in single R process you cannot run two different versions of a package. Nothing stops you from running two R processes, default one uses library(data.table) and the new one library(data.table, lib.lob=path). Canonical information you can find in a short paragraph in R admin manual.

@mattdowle
Copy link
Member

@mattdowle mattdowle commented Apr 23, 2018

@jangorecki It's more that he's telling us something else is running, possibly critical. In those circumstances I would leave the running process be and not risk anything on that box until it finishes. The test we'd like him to run is large, so it could exhaust ram and attract oom killer to the critical process, for example, at least.
(If anyone watching is wondering why I'm assuming jsams is male, I'm not assuming, we've liaised before and it's James Sams, iirc.)

@jsams
Copy link
Contributor Author

@jsams jsams commented Apr 25, 2018

Looks like success on "data.table 1.10.5 IN DEVELOPMENT built 2018-04-23 00:47:20 UTC; travis". Again, thanks for the quick fix Matt!

Unfortunately, I deleted uncommited code so I wasn't able to replicate the other crash, but I believe there was also an overflow around here. Between the error message and the "*4bytes", I think that's what I saw. It didn't seem like this fixed applied to that case, but I can't be sure.

Of course, that's not much to go on; so, if you want to ignore it until it pops up again, I understand.

@mattdowle
Copy link
Member

@mattdowle mattdowle commented Apr 28, 2018

@jsams Thanks for confirming. And thanks for the new clue. That is indeed enough to go on and I see the problem in forder.c you highlighted. Will fix ...

@mattdowle mattdowle removed this from the v1.11.0 milestone Apr 29, 2018
@mattdowle mattdowle added this to the v1.11.2 milestone Apr 29, 2018
@mattdowle mattdowle removed this from the v1.11.4 milestone May 1, 2018
@mattdowle mattdowle added this to the 1.11.2 milestone May 1, 2018
@mattdowle
Copy link
Member

@mattdowle mattdowle commented May 7, 2018

We should really have created a follow-up issue since the original one is fixed. Anyway, postponing to 1.11.4 because 1.11.2 is a quick follow-up release to CRAN and pressing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants