Data.table crashes R session on rbindlist #2340

peekxc · 2017-09-07T20:04:44Z

At the following github is a test data set that should replicate the issue.

On a clean R session, the following seems to completely crash R from an memory violation:

    load(file = "test.rdata")
    data.table::rbindlist(test, idcol = "tid")

I'm running R version 3.4.1.

The issue seems to be when 'idcol' is used and there are empty data.tables in the list to be rbind'ed. I found the following workaround seems to produce the expected behaviour:

  test2 <- Filter(function(dt) nrow(dt) != 0, test)
  data.table::rbindlist(test2, idcol = "tid")

But it would be great if data.table handled this.

The text was updated successfully, but these errors were encountered:

franknarf1 · 2017-09-07T20:10:56Z

The FAQ, vignette("datatable-faq") says

Reading data.table from RDS or RData file

*.RDS and *.RData are file types which can store in-memory R objects on disk efficiently. However, storing data.table into the binary file loses its column over-allocation. This isn't a big deal -- your data.table will be copied in memory on the next by reference operation and throw a warning. Therefore it is recommended to call alloc.col() on each data.table loaded with readRDS() or load() calls.

Does rbindlist(lapply(test, alloc.col), idcol = "tid") also crash?

peekxc · 2017-09-07T20:28:04Z

Hmm I did not notice this in the FAQ. That being said, it does still crash the session.

franknarf1 · 2017-09-07T21:04:29Z

Oh, hadn't noticed your example was in a gist. No repro here on R 3.3.3:

library(data.table)
# data.table 1.10.5 IN DEVELOPMENT built 2017-09-06 22:02:57 UTC; travis
load("C:\\Users\\ferickson\\Downloads\\test.rdata")
rbindlist(test, idcol = "tid")
# typical results

sessionInfo()
# R version 3.3.3 (2017-03-06)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 7 x64 (build 7601) Service Pack 1

Btw, I guess you are testing on the devel version. If not, see https://github.com/Rdatatable/data.table/wiki/Support

peekxc · 2017-09-07T21:47:11Z

Interesting that it doesn't seem to reproduce.

I don't think I'm using a dev version?

sessionInfo()
# R version 3.4.1 (2017-06-30)
# Platform: x86_64-apple-darwin15.6.0 (64-bit)
# Running under: OS X El Capitan 10.11.6
library("data.table")
# data.table 1.10.4

arunsrinivasan · 2017-09-20T11:10:30Z

Just got caught with this as well. Here's a simpler example... Run it a couple of times to get a segfault (on Windows with 1.10.4 at the moment):

require(data.table)
ll <- list(data.table(x=1, y=2), data.table(), data.table(x=3, y=4))
dt <- data.table(bla=1:3, ll)

# run multiple times and you should get a segfault)
dt[, rbindlist(ll, idcol=".id")]
dt[, rbindlist(ll, idcol=".id")]
dt[, rbindlist(ll, idcol=".id")]
dt[, rbindlist(ll, idcol=".id")]

jsams · 2017-09-21T05:23:15Z

I'm not sure this is related only to the use of idcol. I'm getting it with some real world data, but I don't have a replicable use case. My setup reads 1000 data.tables from disk (in 1000 rds files), selects some rows and aggregates, and then rbinds the results.

(Actually, 40 files are read from disk and the selection and aggregation are applied, with rbindlist called on that result which is stored in a list via lapply. then rbindlist is called again on the resulting list. this is result of that process.)

I've verified that all items in the list are of class data.table, and have nonzero number of rows. The list is 30.1 GB as reported by pryr::object_size on a machine with 1.5 TB of RAM

uscd = rbindlist(user_song_count_dtlist)

*** caught segfault ***
address 0x7f27442b92c4, cause 'memory not mapped'

Traceback:
1: rbindlist(user_song_count_dtlist)

(sorry for the delete/repost, wanted to use this account, edit on reading from disk turned out to be inaccurate)

jsams · 2017-09-22T17:10:03Z

apologies, after attempting some workarounds, I've discovered that it seems as if a data.table can't have more than MAXINT rows. Didn't realize that was a limitation and is probably what was causing my issue

mattdowle · 2017-11-08T23:28:19Z

I can reproduce using @arunsrinivasan's example with current CRAN (1.10.4-3) on Ubuntu.
It looks fixed in dev though. Bug fix 5 in news for 1.10.5 :

Seg fault in rbindlist() when one or more items are empty, #2019. Thanks Michael Lang for the pull request.

Thanks to @mllg's PR #2077 merged in May 2017.

@jsams Given the above, I doubt it's related to MAXINT in your case. Can you confirm dev works fine for you? But MAXINT is a good side issue. The rbindlist.c source accumulates n_rows in type size_t so that looks correct and won't overflow but I can't see a check that n_rows < MAXINT. A test could construct a list() with multiple references to the same DT so that the "big" test will run and test it fails gracefully (that the result would be > 2bn rows) without actually needing a lot of RAM to run. [Update: yes there was a segfault there. Fixed and test added.]

…2340

arunsrinivasan added bug segfault labels Sep 20, 2017

arunsrinivasan added this to the v1.10.6 milestone Nov 3, 2017

mattdowle closed this as completed in 18790e0 Nov 9, 2017

mattdowle added a commit that referenced this issue Nov 9, 2017

Tidied format specifier from %lld to %d in rbindlist error message, #…

72e3658

…2340

AmyMikhail mentioned this issue Mar 14, 2018

RStudio and R crashes with fatal error linked to data.table operation #2672

Closed

MichaelChirico mentioned this issue Apr 16, 2018

Creating data.table with long vectors does not produce error at creation #2751

Closed

renkun-ken mentioned this issue Oct 10, 2019

Need long-vector support #3957

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data.table crashes R session on rbindlist #2340

Data.table crashes R session on rbindlist #2340

peekxc commented Sep 7, 2017 •

edited

Loading

franknarf1 commented Sep 7, 2017

Reading data.table from RDS or RData file

peekxc commented Sep 7, 2017

franknarf1 commented Sep 7, 2017

peekxc commented Sep 7, 2017

arunsrinivasan commented Sep 20, 2017

jsams commented Sep 21, 2017

jsams commented Sep 22, 2017

mattdowle commented Nov 8, 2017 •

edited

Loading

Data.table crashes R session on rbindlist #2340

Data.table crashes R session on rbindlist #2340

Comments

peekxc commented Sep 7, 2017 • edited Loading

franknarf1 commented Sep 7, 2017

Reading data.table from RDS or RData file

peekxc commented Sep 7, 2017

franknarf1 commented Sep 7, 2017

peekxc commented Sep 7, 2017

arunsrinivasan commented Sep 20, 2017

jsams commented Sep 21, 2017

jsams commented Sep 22, 2017

mattdowle commented Nov 8, 2017 • edited Loading

peekxc commented Sep 7, 2017 •

edited

Loading

mattdowle commented Nov 8, 2017 •

edited

Loading