Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use proper NA for integer64 on assign #3724

Merged
merged 7 commits into from
Aug 24, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,8 @@

28. From v1.12.0 `between()` and `%between%` interpret missing values in `lower=` or `upper=` as unlimited bounds. A new parameter `NAbounds` has been added to achieve the old behaviour of returning `NA`, [#3522](https://github.com/Rdatatable/data.table/issues/3522). Thanks @cguill95 for reporting. This is now consistent for character input, [#3667](https://github.com/Rdatatable/data.table/issues/3667) (thanks @AnonymousBoba), and class `nanotime` is now supported too.

29. `integer64` defined on a subset of a new column would leave "gibberish" on the remaining rows, [#3723](https://github.com/Rdatatable/data.table/issues/3723). A bug in `rbindlist` with the same root cause was also fixed, [#1459](https://github.com/Rdatatable/data.table/issues/1459). Thanks @shrektan and @jangorecki for the reports.

#### NOTES

1. `rbindlist`'s `use.names="check"` now emits its message for automatic column names (`"V[0-9]+"`) too, [#3484](https://github.com/Rdatatable/data.table/pull/3484). See news item 5 of v1.12.2 below.
Expand Down
8 changes: 8 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -15730,6 +15730,14 @@ test(2082.02, between(as.double(1:3), c(2,3,4), 2), error="Item 2 of lower [(]3.
if (test_bit64) test(2082.03, between(as.integer64(1:3), as.integer64(2), as.integer64(1)), error="Item 1 of lower (2) is greater than item 1 of upper (1)")
test(2082.04, between(letters[1:2], c("foo","bar"), c("bar")), error="Item 1 of lower ('foo') is greater than item 1 of upper ('bar')")

# partial instantiation of integer64 column was creating NA_REAL, not INT64_MIN
if (test_bit64) {
# sub-assign from #3723
test(2083.1, data.table(x=1:2)[1, y := as.integer64(0L)]$y, as.integer64(c(0L, NA)))
# rbindlist(fill=TRUE) from #1459
test(2083.2, rbind(data.table(a=1:2, b=as.integer64(c(1,NA))), data.table(a=3L), fill=TRUE)$b, as.integer64(c(1, NA, NA)))
}


###################################
# Add new tests above this line #
Expand Down
14 changes: 12 additions & 2 deletions src/assign.c
Original file line number Diff line number Diff line change
Expand Up @@ -489,7 +489,7 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values)
SEXP RHS;

if (coln+1 > oldncol) { // new column
SET_VECTOR_ELT(dt, coln, newcol=allocNAVector(TYPEOF(thisvalue), nrow));
SET_VECTOR_ELT(dt, coln, newcol=allocNAVectorLike(thisvalue, nrow));
// initialize with NAs for when 'rows' is a subset and it doesn't touch
// do not try to save the time to NA fill (contiguous branch free assign anyway) since being
// sure all items will be written to (isNull(rows), length(rows), vlen<1, targetlen) is not worth the risk.
Expand Down Expand Up @@ -1003,7 +1003,7 @@ void writeNA(SEXP v, const int from, const int n)
for (int i=from; i<=to; ++i) vd[i] = NA_INTEGER;
} break;
case REALSXP : {
if (INHERITS(v, char_integer64)) {
if (Rinherits(v, char_integer64)) { // Rinherits covers nanotime too which inherits from integer64 via S4 extends
int64_t *vd = (int64_t *)REAL(v);
for (int i=from; i<=to; ++i) vd[i] = INT64_MIN;
} else {
Expand Down Expand Up @@ -1041,6 +1041,16 @@ SEXP allocNAVector(SEXPTYPE type, R_len_t n)
return(v);
}

SEXP allocNAVectorLike(SEXP x, R_len_t n) {
mattdowle marked this conversation as resolved.
Show resolved Hide resolved
// writeNA needs the attribute retained to write NA_INTEGER64, #3723
// TODO: remove allocNAVector above when usage in fastmean.c, fcast.c and fmelt.c can be adjusted; see comments in PR3724
SEXP v = PROTECT(allocVector(TYPEOF(x), n));
copyMostAttrib(x, v);
writeNA(v, 0, n);
UNPROTECT(1);
return(v);
}

static SEXP *saveds=NULL;
static R_len_t *savedtl=NULL, nalloc=0, nsaved=0;

Expand Down
1 change: 1 addition & 0 deletions src/data.table.h
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ SEXP SelfRefSymbol;

// assign.c
SEXP allocNAVector(SEXPTYPE type, R_len_t n);
SEXP allocNAVectorLike(SEXP x, R_len_t n);
void writeNA(SEXP v, const int from, const int n);
void savetl_init(), savetl(SEXP s), savetl_end();
int checkOverAlloc(SEXP x);
Expand Down
4 changes: 2 additions & 2 deletions src/dogroups.c
Original file line number Diff line number Diff line change
Expand Up @@ -305,13 +305,13 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
if (isNull(target)) {
// first time adding to new column
if (TRUELENGTH(dt) < INTEGER(lhs)[j]) error("Internal error: Trying to add new column by reference but tl is full; alloc.col should have run first at R level before getting to this point in dogroups"); // # nocov
tmp = PROTECT(allocNAVector(TYPEOF(RHS), LENGTH(VECTOR_ELT(dt,0))));
tmp = PROTECT(allocNAVectorLike(RHS, LENGTH(VECTOR_ELT(dt,0))));
// increment length only if the allocation passes, #1676
SETLENGTH(dtnames, LENGTH(dtnames)+1);
SETLENGTH(dt, LENGTH(dt)+1);
SET_VECTOR_ELT(dt, INTEGER(lhs)[j]-1, tmp);
UNPROTECT(1);
// Even if we could know reliably to switch from allocNAVector to allocVector for slight speedup, user code could still contain a switched halt, and in that case we'd want the groups not yet done to have NA rather than uninitialized or 0.
// Even if we could know reliably to switch from allocNAVectorLike to allocVector for slight speedup, user code could still contain a switched halt, and in that case we'd want the groups not yet done to have NA rather than uninitialized or 0.
// dtnames = getAttrib(dt, R_NamesSymbol); // commented this here and added it on the beginning to fix #4990
SET_STRING_ELT(dtnames, INTEGER(lhs)[j]-1, STRING_ELT(newnames, INTEGER(lhs)[j]-origncol-1));
target = VECTOR_ELT(dt,INTEGER(lhs)[j]-1);
Expand Down