Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbindlist raises malformed factor #3662

Closed
jangorecki opened this issue Jun 26, 2019 · 3 comments · Fixed by #3894
Closed

rbindlist raises malformed factor #3662

jangorecki opened this issue Jun 26, 2019 · 3 comments · Fixed by #3894
Labels
Milestone

Comments

@jangorecki
Copy link
Member

jangorecki commented Jun 26, 2019

recent devel

rbindlist(list(data.table(a=as.factor(1:2), b=as.factor(2:3)),
          list(a=as.factor(3L), b=as.factor(4:5))))
#Error in as.character.factor(x) : malformed factor

error could be improved

on 1.12.2 it segfaults

@jangorecki jangorecki added this to the 1.12.4 milestone Jun 26, 2019
@MichaelChirico
Copy link
Member

MichaelChirico commented Jul 7, 2019

Actually this still segfaults for me:

library(data.table)
# data.table 1.12.3 IN DEVELOPMENT built 2019-07-07 12:06:15 UTC using 4 threads (see ?getDTthreads).  Latest news: r-datatable.com
rbindlist(list(data.table(a=as.factor(1:2), b=as.factor(2:3)), list(a=as.factor(3L), b=as.factor(4:5))))
 *** caught segfault ***
address 0x0, cause 'memory not mapped'

Traceback:
 1: rbindlist(list(data.table(a = as.factor(1:2), b = as.factor(2:3)),     list(a = as.factor(3L), b = as.factor(4:5))))
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] C/UTF-8/C/C/C/C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.6.0

@MichaelChirico
Copy link
Member

MichaelChirico commented Sep 15, 2019

The segfault is caused here:

data.table/src/rbindlist.c

Lines 444 to 453 in 0ba9144

for (int k=0; k<n; ++k) {
SEXP s = thisColStrD[k];
if (s!=NA_STRING && -TRUELENGTH(s)!=k+1) { nohop=false; break; }
}
if (nohop) memcpy(targetd+ansloc, INTEGER(thisCol), thisnrow*SIZEOF(thisCol));
else {
int *id = INTEGER(thisCol);
for (int r=0; r<thisnrow; r++)
targetd[ansloc+r] = id[r]==NA_INTEGER ? NA_INTEGER : -TRUELENGTH(thisColStrD[id[r]-1]);
}

I don't 100% follow the logic of what's going on here, but I believe the break is meant to break out of a different loop than it's breaking (or there's meant to be two breaks).

If I change the memcopy section to also break, we still get the malformed factor error, but no segfault:

if (nohop) {
  memcpy(targetd+ansloc, INTEGER(thisCol), thisnrow*SIZEOF(thisCol));
  break; // <- new break
}

^ red herring explanation, but the line is indeed where the segfault is triggered

@mattdowle
Copy link
Member

mattdowle commented Sep 18, 2019

Recycling is missing in that section. Needed to recycle the length-1 as.factor(3L) in this example. Will fix ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants