New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unique gives segfault from C stack overflow #4300
Comments
Would you mind try to call setDT(data) before the unique call and report the results back here? Just check if it’s due to the null self pointer. |
I tried calling |
Oops, having your data would be necessary then. |
Please find the data set here |
Thanks for the data, I can reproduce this. UPDATE: the error is thrown from Line 736 in b1b1832
|
OK, I create a smaller data set (23MB) that can reproduce this. The cause, I believe, is the recursive function I don't know why this happens for this specific dataset as I can't reproduce it with randomly generated data. Codelibrary(data.table)
dt <- readRDS('~/Downloads/data_debug_tan.rds')
dt2 <- data.table:::duplicated.data.table(dt) DataDebugged message
|
Its reproducible with the following setting:
As @shrektan pointed out it's Examplex = matrix(rnorm(5000), nrow=10)
idx = sample(10, 20, TRUE)
DT = as.data.table(x[idx,])
forderv(DT, by=names(DT), sort=FALSE, retGrp=TRUE)
# Error: segfault from C stack overflow Calling it a second time also changes the error to x = matrix(rnorm(5000), nrow=10)
idx = sample(10, 20, TRUE)
DT = as.data.table(x[idx,])
forderv(DT, by=names(DT), sort=FALSE, retGrp=TRUE)
# Error in colnamesInt(x, by, check_dups = FALSE) :
# Internal error: savetl_init checks failed (0 100 0x556d9d49d540 0x556d9e05f430). please report to data.table issue tracker. Error seems to have been introduced by 05c0d45, no segfault at 092fec3 edit: |
I think I'm running into this with a data.table that is 18k columns, but I'm not getting any errors in R, it just crashes. If a fix is on the roadmap, is there a recommended way to de-duplicate wide data.tables in the meantime? |
You should at least get the AFAIA nobody is working on this fix yet. If the method which is failing is edit: Another way to avoid this is to enlarge your stack size. Under linux you can do this with Cstack_info()["size"]
#> size
#> NA |
#
Minimal reproducible example
crashes RStudio, or in the R console:
I can provide
data_debug.rds
(~300MB) if desired.#
Output of sessionInfo()
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.12.8
loaded via a namespace (and not attached):
[1] compiler_3.6.2
The text was updated successfully, but these errors were encountered: