Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coerce 01 #3909

Merged
merged 43 commits into from
Sep 28, 2019
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
415689b
interim
mattdowle Sep 25, 2019
1b07fff
interim
mattdowle Sep 25, 2019
17d82df
interim
mattdowle Sep 25, 2019
6350d5a
interim
mattdowle Sep 25, 2019
92056d1
interim
mattdowle Sep 25, 2019
16e65e1
interim
mattdowle Sep 25, 2019
ce317ea
interim
mattdowle Sep 25, 2019
86a4c9a
interim
mattdowle Sep 26, 2019
d0b7ba7
coerce to logical warnings working again
mattdowle Sep 26, 2019
45f66cc
interim
mattdowle Sep 26, 2019
7a569ed
interim
mattdowle Sep 26, 2019
fc72fb6
interim
mattdowle Sep 26, 2019
ef1c6fd
interim
mattdowle Sep 26, 2019
78733d9
interim; passing
mattdowle Sep 26, 2019
b46e173
interm; passing
mattdowle Sep 26, 2019
8c3e064
tidy
mattdowle Sep 26, 2019
8497d02
extra test; passes all revdeps
mattdowle Sep 26, 2019
6a2519d
news item tweaked
mattdowle Sep 26, 2019
f5779c0
allNA coverage via :=
mattdowle Sep 26, 2019
328cefd
allNA used in bmerge, and covered
mattdowle Sep 26, 2019
94ffa0e
allNA empty tests
mattdowle Sep 26, 2019
2374b32
coverage
mattdowle Sep 26, 2019
60b3757
tidy
mattdowle Sep 26, 2019
510da29
don't create NA level when assigning NA_character_ to factor
mattdowle Sep 26, 2019
2be417d
rbindlist malformed NA factor level, #3915
mattdowle Sep 26, 2019
4d84982
coverage
mattdowle Sep 26, 2019
4ec649d
news item extra nx= fixed
mattdowle Sep 26, 2019
c0bad1a
snprintf
mattdowle Sep 26, 2019
b83d133
memcpy
mattdowle Sep 26, 2019
2525f9b
coverage
mattdowle Sep 27, 2019
a415095
NEWS improvements
jangorecki Sep 27, 2019
ba4b0cb
more verbose
jangorecki Sep 27, 2019
c8ce5a7
allNA now does not return false by default
jangorecki Sep 27, 2019
7cad29b
setnames news item
mattdowle Sep 27, 2019
9f78001
interim
mattdowle Sep 27, 2019
0e9e13f
saved coerce from NA to NA_character_
mattdowle Sep 27, 2019
8d871cc
assigning factor numbers works again
mattdowle Sep 27, 2019
6b521e1
type complex cases and tests
mattdowle Sep 27, 2019
475414a
added verbose message when coerce, more cases and tests
mattdowle Sep 28, 2019
47ad56e
more integer64 cases (expecting some lack of coverage to cover next),…
mattdowle Sep 28, 2019
598752a
any->all in comments
MichaelChirico Sep 28, 2019
5b799ea
Added CHECK_RANGE macro
mattdowle Sep 28, 2019
7f34680
code formatting only
mattdowle Sep 28, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,16 +193,16 @@
set.seed(108)
x = rnorm(1e6); n = 1e3
rollfun = function(x, n, FUN) {
ans = rep(NA_real_, nx<-length(x))
nx = nx = length(x)
mattdowle marked this conversation as resolved.
Show resolved Hide resolved
ans = rep(NA_real_, nx)
for (i in n:nx) ans[i] = FUN(x[(i-n+1):i])
ans
}
system.time(ans1<-rollfun(x, n, mean))
system.time(ans2<-zoo::rollapplyr(x, n, function(x) mean(x), fill=NA))
system.time(ans3<-zoo::rollmeanr(x, n, fill=NA))
system.time(ans4<-frollapply(x, n, mean))
system.time(ans5<-frollmean(x, n))
sapply(list(ans2,ans3,ans4,ans5), all.equal, ans1)
system.time(rollfun(x, n, mean))
system.time(zoo::rollapplyr(x, n, function(x) mean(x), fill=NA))
system.time(zoo::rollmeanr(x, n, fill=NA))
system.time(frollapply(x, n, mean))
system.time(frollmean(x, n))

### fun mean sum median
# base rollfun 8.815 5.151 60.175
Expand All @@ -224,6 +224,8 @@
# [1] "A" "b" "c"
mattdowle marked this conversation as resolved.
Show resolved Hide resolved
```

29. `DT[<condition>, integerColumn:=0]` and `set(DT,i,j,0)` no longer warns about the `0` being the wrong type (`numeric` instead of `integer`). The long warning was focussed on efficiency to help the user avoid the necessary coercion from `0` to `0L`. Although the time and space for this coercion in a single call is unmeasurably small, when placed in a loop the small overhead of any allocation on R's heap could start to become noticeable (more so for `set()` whose purpose it be low-overhead for looping). The coercion warning and its advice could be too much for new users ("what and why L?"), but advanced users appreciate the warning. Further, when assigning a value across many columns, it could be inconvenient to supply the correct type (where the columns vary in type) and frustating if a simple `0` would be clear. To balance these requirements from different types of users, the coercion is now avoided by doing the coercion ourselves in internal code without allocating an object on R's heap: an internal optimization that we will now refer to as a "punned-coercion". As there is no time or space penalty for a punned-coerce, advanced users don't need to worry about it either. Where a punned-coerce discards fractional data (e.g. 3.14 assigned to an integer column) there will still be a warning. Punned-coerce applies to length>1 vectors as well as length-1 vectors including recycling, in all cases of `:=` and `set()`.

## BUG FIXES

1. `first`, `last`, `head` and `tail` by group no longer error in some cases, [#2030](https://github.com/Rdatatable/data.table/issues/2030) [#3462](https://github.com/Rdatatable/data.table/issues/3462). Thanks to @franknarf1 for reporting.
Expand Down
10 changes: 5 additions & 5 deletions R/test.data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -282,9 +282,9 @@ test = function(num,x,y=TRUE,error=NULL,warning=NULL,output=NULL,notOutput=NULL,
if (!missing(error) && !missing(y))
stop("Test ",numStr," is invalid: when error= is provided it does not make sense to pass y as well") # nocov

string_match = function(x, y) {
length(grep(x,y,fixed=TRUE)) || # try treating x as literal first; useful for most messages containing ()[]+ characters
length(tryCatch(grep(x,y), error=function(e)NULL)) # otherwise try x as regexp
string_match = function(x, y, ignore.case=FALSE) {
length(grep(x, y, fixed=TRUE)) || # try treating x as literal first; useful for most messages containing ()[]+ characters
length(tryCatch(grep(x, y, ignore.case=ignore.case), error=function(e)NULL)) # otherwise try x as regexp
}

xsub = substitute(x)
Expand Down Expand Up @@ -364,10 +364,10 @@ test = function(num,x,y=TRUE,error=NULL,warning=NULL,output=NULL,notOutput=NULL,
fail = TRUE
# nocov end
}
if (length(notOutput) && string_match(notOutput, out)) {
if (length(notOutput) && string_match(notOutput, out, ignore.case=TRUE)) {
# nocov start
cat("Test",numStr,"produced output but should not have:\n")
cat("Expected absent: <<",gsub("\n","\\\\n",notOutput),">>\n",sep="")
cat("Expected absent (case insensitive): <<",gsub("\n","\\\\n",notOutput),">>\n",sep="")
cat("Observed: <<",gsub("\n","\\\\n",out),">>\n",sep="")
fail = TRUE
# nocov end
Expand Down
71 changes: 37 additions & 34 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -869,17 +869,17 @@ test(299.03, truelength(DT)>length(DT)) # the := over-allocated, by 100 by def
# FR #2551 - old 299.3 and 299.5 are changed to include length(RHS) > 1 to issue the warning
DT[,c:=rep(42L,.N)] # plonk
test(299.04, DT, data.table(a=1:3, c=42L))
test(299.05, DT[2:3,c:=c(42, 42)], data.table(a=1:3,c=42L), warning="Coerced double RHS to integer.*column 2 named 'c'.*RHS.*no fractions.*more efficiently.*integer.*Consider.*L")
test(299.05, DT[2:3,c:=c(43, 44)], data.table(a=1:3,c=42:44))
# FR #2551 - length(RHS) = 1 - no warning for type conversion
test(299.06, DT[2,c:=42], data.table(a=1:3,c=42L))
test(299.06, DT[2,c:=42], data.table(a=1:3,c=INT(42,42,44)))
# also see tests 302 and 303. (Ok, new test file for fast assign would be tidier).
test(299.07, DT[,c:=rep(FALSE,nrow(DT))], data.table(a=1:3,c=FALSE)) # replace c column with logical
test(299.08, DT[2:3,c:=c(42,0)], data.table(a=1:3,c=c(FALSE,TRUE,FALSE)), warning="Coerced double RHS to logical.*column 2 named 'c'.*If the target column's type logical is correct")
test(299.08, DT[2:3,c:=c(42,0)], error="Cannot assign 42.0.* to a logical column")
test(299.09, DT[2:3,c:=c(1,0)], data.table(a=1:3,c=c(FALSE,TRUE,FALSE)))
MichaelChirico marked this conversation as resolved.
Show resolved Hide resolved
# FR #2551 is now changed to fit in / fix bug #5442. Stricter warnings are in place now. Check tests 1294.1-34 below.
test(299.09, DT[2,c:=42], data.table(a=1:3,c=c(FALSE,TRUE,FALSE)), warning="Coerced double RHS to logical to match")
test(299.11, DT[2,c:=42L], data.table(a=1:3,c=c(FALSE,TRUE,FALSE)), warning="Coerced integer RHS to logical to match")
test(299.12, DT[2:3,c:=c(0L, 0L)], data.table(a=1:3,c=FALSE), warning="Coerced integer RHS to logical to match the type of the target column.*If the target column's type logical is correct")

test(299.10, DT[2,c:=42], error="Cannot assign 42.0.* to a logical column")
test(299.11, DT[2,c:=42L], error="Cannot assign 42 to a logical column")
test(299.12, DT[2:3,c:=c(0L, 0L)], data.table(a=1:3,c=FALSE))

# Test bug fix #1468, combining i and by.
DT = data.table(a=1:3,b=1:9,v=1:9,key="a,b")
Expand Down Expand Up @@ -969,8 +969,7 @@ test(335, DT[,2:1]<-NULL, error="Attempt to assign to column")

DT = data.table(a=1:2, b=1:6)
test(336, DT[,z:=a/b], data.table(a=1:2,b=1:6,z=(1:2)/(1:6)))
test(337, DT[3:4,z:=a*b], data.table(a=1:2,b=1:6,z=c(1,1,3,8,1/5,2/6)), warning="Coerced integer RHS to double to match")

test(337, DT[3:4,z:=a*b], data.table(a=1:2,b=1:6,z=c(1,1,3,8,1/5,2/6)))

# test eval of LHS of := (using with=FALSE gives a warning here from v1.9.3)
DT = data.table(a=1:3, b=4:6)
Expand Down Expand Up @@ -1024,8 +1023,7 @@ test(355, DT[11:2010,f:=newlevels], data.table(f=factor(c(rep("000",10),newlevel
DT = data.table(f=c("a","b"),x=1:4)
# Test coercing factor to character column
test(355.5, DT[3,f:=factor("foo")], data.table(f=c("a","b","foo","b"),x=1:4))
test(355.6, DT[4,f:=factor("bar"),verbose=TRUE], data.table(f=c("a","b","foo","bar"),x=1:4), output="Coerced factor RHS to character to match the column")

test(355.6, DT[4,f:=factor("bar"),verbose=TRUE], data.table(f=c("a","b","foo","bar"),x=1:4), notOutput="coerce")

# See datatable-help post and NEWS item for 1.6.7
DT = data.table(X=factor(letters[1:10]), Y=1:10)
Expand Down Expand Up @@ -1218,16 +1216,14 @@ test(423, truelength(DT), 1028L)
DT <- data.table(a = factor(c("A", "Z")), b = 1:4)
DT[1,1] <- "Z"
test(424, DT, data.table(a=factor(c("Z","Z","A","Z")),b=1:4))
test(425, DT[1,1] <- 1, 1, warning="Coerced 'double' RHS to 'integer'")
test(426, DT, data.table(a=factor(c("A","Z")),b=1:4))
DT[1,1] <- 2L
test(425, DT[1,1] <- 1, error="Cannot assign 'double' to 'factor'")
test(426, DT[1,1] <- 2L, error="Cannot assign 'integer' to 'factor'")
mattdowle marked this conversation as resolved.
Show resolved Hide resolved
test(427, DT, data.table(a=factor(c("Z","Z","A","Z")),b=1:4))
DT[1,a:="A"]
test(428, DT, data.table(a=factor(c("A","Z","A","Z")),b=1:4))
DT[1,a:=2L]
test(429, DT, data.table(a=factor(c("Z","Z","A","Z")),b=1:4))
test(430, DT[1,1]<- 3L, 3L, warning="RHS contains 3 which is outside the levels range.*1,2.*of column 1, NAs generated")
test(431, DT[1,1:=4L], data.table(a=factor(c(NA,"Z","A","Z")),b=1:4), warning="RHS contains 4 which is outside the levels range.*1,2.*of column 1, NAs generated")
test(429, DT[1,a:=2L], error="Cannot assign 'integer' to 'factor'. Factor columns can only be assigned factor or character values, or NA in any type")
test(430, DT, data.table(a=factor(c("A","Z","A","Z")),b=1:4))
test(431, DT[1,1:=NA], data.table(a=factor(c(NA,"Z","A","Z")),b=1:4))

old = getOption("datatable.alloccol") # Test that unsetting datatable.alloccol is caught, #2014
options(datatable.alloccol=NULL) # In this =NULL case, options() in R 3.0.0 returned TRUE rather than the old value. This R bug was fixed in R 3.1.1.
Expand Down Expand Up @@ -4850,34 +4846,34 @@ test(1294.02, dt[, a := 1.5]$a, rep(1L, 3L), warning="Coerced double RHS to inte
test(1294.03, dt[, a := NA]$a, rep(NA_integer_, 3L))
test(1294.04, dt[, a := "a"]$a, rep(NA_integer_, 3L),
warning=c("NAs introduced by coercion",
"Coerced character RHS to integer.*create the RHS as type integer.*with as.integer.. to avoid this warning.*DT., `a`:=as.character.`a`.*"))
"Coerced character RHS to integer.*column 1 named 'a'"))
test(1294.05, dt[, a := list(list(1))]$a, rep(1L, 3L), warning="Coerced list RHS to integer to match.*column 1 named 'a'")
test(1294.06, dt[, a := list(1L)]$a, rep(1L, 3L))
test(1294.07, dt[, a := list(1)]$a, rep(1L, 3L))
test(1294.08, dt[, a := TRUE]$a, rep(1L, 3L), warning="Coerced logical RHS to integer")
test(1294.08, dt[, a := TRUE]$a, rep(1L, 3L))
test(1294.09, dt[, b := 1L]$b, rep(1,3))
test(1294.10, dt[, b := NA]$b, rep(NA_real_,3))
test(1294.11, dt[, b := "bla"]$b, rep(NA_real_, 3),
warning=c("NAs introduced by coercion",
"Coerced character RHS to double to match.*column 2 named 'b'.*DT[, `b`:=as.character(`b`)]"))
"Coerced character RHS to double to match.*column 2 named 'b'"))
test(1294.12, dt[, b := list(list(1))]$b, rep(1,3), warning="Coerced list RHS to double")
test(1294.13, dt[, b := TRUE]$b, rep(1,3), warning="Coerced logical RHS to double")
test(1294.13, dt[, b := TRUE]$b, rep(1,3))
test(1294.14, dt[, b := list(1)]$b, rep(1,3))
test(1294.15, dt[, c := 1]$c, rep(TRUE, 3), warning="Coerced double RHS to logical")
test(1294.16, dt[, c := 1L]$c, rep(TRUE, 3), warning="Coerced integer RHS to logical")
test(1294.15, dt[, c := 1]$c, rep(TRUE, 3))
test(1294.16, dt[, c := 1L]$c, rep(TRUE, 3))
test(1294.17, dt[, c := NA]$c, rep(NA, 3))
test(1294.18, dt[, c := list(1)]$c, rep(TRUE, 3), warning="Coerced double RHS to logical")
test(1294.18, dt[, c := list(1)]$c, rep(TRUE, 3))
test(1294.19, dt[, c := list(list(1))]$c, rep(TRUE, 3), warning="Coerced list RHS to logical")
test(1294.20, dt[, c := "bla"]$c, rep(NA, 3), warning="Coerced character RHS to logical")
test(1294.21, dt[, d := 1]$d, rep(list(1), 3))
test(1294.22, dt[, d := 1L]$d, rep(list(1L), 3))
test(1294.23, dt[, d := TRUE]$d, rep(list(TRUE), 3))
test(1294.24, dt[, d := "bla"]$d, rep(list("bla"), 3))
test(1294.25, dt[, d := list(list(1))]$d, rep(list(1), 3))
test(1294.26, dt[, e := 1]$e, rep("1", 3), warning="Coerced double RHS to character")
test(1294.27, dt[, e := 1L]$e, rep("1", 3), warning="Coerced integer RHS to character")
test(1294.28, dt[, e := TRUE]$e, rep("TRUE", 3), warning="Coerced logical RHS to character")
test(1294.29, dt[, e := list(list(1))]$e, rep("1", 3), warning="Coerced list RHS to character")
test(1294.26, dt[, e := 1]$e, rep("1", 3))
test(1294.27, dt[, e := 1L]$e, rep("1", 3))
test(1294.28, dt[, e := TRUE]$e, rep("TRUE", 3))
test(1294.29, dt[, e := list(list(1))]$e, rep("1", 3))
test(1294.30, dt[, e := "bla"]$e, rep("bla", 3))
test(1294.31, dt[, e := list("bla2")]$e, rep("bla2", 3))

Expand Down Expand Up @@ -9815,7 +9811,7 @@ test(1674, forderv(c(2147483645L, 2147483646L, 2147483647L, 2147483644L), order=
A = data.table(foo = c(1, 2, 3), bar = c(4, 5, 6))
A[, bar := factor(bar, levels = c(4, 5), labels = c("Boop", "Beep"), exclude = 6)]
B = data.table(foo = c(1, 2, 3, 4, 5, 6), bar = structure(c(3L, 3L, 3L, 1L, 2L, NA), .Label=c("Boop","Beep",NA), class="factor"))
test(1675.1, as.integer(B[A, bar := i.bar, on="foo"]$bar), c(1:3,1:2,NA))
test(1675.1, as.integer(B[A, bar := i.bar, on="foo"]$bar), c(1:2,NA,1:2,NA)) # remove the NA level is change in v1.12.4
A = data.table(foo = c(1, 2, 3), bar = c(4, 5, 6))
B = data.table(foo = c(1, 2, 3, 4, 5, 6), bar = c(NA, NA, NA, 4, 5, 6))
A[, bar := factor(bar, levels = c(4, 5), labels = c("Boop", "Beep"), exclude = 6)]
Expand Down Expand Up @@ -12927,9 +12923,9 @@ test(1944.6, DT[flag == 1, sum(x), keyby = group], data.table(group=1:4, V1=INT(
# assigning an int greater than length(levels) corruption of int, #2984
DT <- data.table(a = factor(c("A", "Z")), b = 1:4)
c <- 3L
test(1945.1, DT[1, a:=c][1,a], factor(NA, levels=c("A","Z")), warning="RHS.*outside the levels range.*NAs generated")
test(1945.1, DT[1, a:=c], error="Cannot assign 'integer' to 'factor'") # [1,a], factor(NA, levels=c("A","Z")), warning="RHS.*outside the levels range.*NAs generated")
test(1945.2, c, 3L)
test(1945.3, DT[2,1] <- c, 3L, warning="RHS.*outside the levels range.*NAs generated")
test(1945.3, DT[2,1] <- c, error="Cannot assign 'integer' to 'factor'") # 3L, warning="RHS.*outside the levels range.*NAs generated")
test(1945.4, c, 3L)

# subset a data.table containing an altrep derived from ]<-, ]]<- etc, #3051
Expand Down Expand Up @@ -14120,8 +14116,8 @@ test(2005.3, set(DT, 3L, 8i, NA), error="j is type 'complex'. Must be integer, c
test(2005.4, set(DT, 1L, 2L, expression(x+2)), error="cannot be coerced to.*integer") # R's error message same as returned by as.integer(expression(x+2))
DT[,foo:=factor(c("a","b","c"))]
test(2005.5, DT[2, foo:=8i], error="Can't assign to column 'foo' (type 'factor') a value of type 'complex' (not character, factor, integer or numeric)")
test(2005.6, DT[2, a:=9, verbose=TRUE], output="Coerced length-1 RHS from double to integer to match column's type. No precision was lost. If this")
test(2005.7, DT[2, a:=NA, verbose=TRUE], output="Coerced length-1 RHS from logical to integer to match column's type. If this")
test(2005.6, DT[2, a:=9, verbose=TRUE], notOutput="Coerced")
test(2005.7, DT[2, a:=NA, verbose=TRUE], notOutput="Coerced")
test(2005.8, DT[2, a:=9.9]$a, INT(1,9,3), warning="Coerced double RHS to integer.*One or more RHS values contain fractions which have been lost.*9.9.*has been truncated to 9")

# rbindlist raw type, #2819
Expand Down Expand Up @@ -16166,6 +16162,13 @@ DT = data.table(x = rep(letters[c(3, 1, 2)], each = 2))
test(2114.7, DT[, `:=`(g=.GRP, f=factor(.GRP)), by = x],
data.table(x=rep(c("c","a","b"),each=2), g=rep(1:3,each=2), f=factor(rep(as.character(1:3),each=2))))

# extra tests from #996 for completeness; no warning no-alloc coerce here of 0 and 1 numerics
DT = data.table(a=1:4, b=c(FALSE, TRUE, NA, FALSE))
test(2115.1, set(DT,3L,1L,0), data.table(a=INT(1,2,0,4), b=c(FALSE, TRUE, NA, FALSE)))
test(2115.2, set(DT,3L,2L,0), data.table(a=INT(1,2,0,4), b=c(FALSE, TRUE, FALSE, FALSE)))
DT = data.table(code=c("c","b","c","a"), val=10:13)
test(2115.3, DT[code=="c", val := val+1], data.table(code=c("c","b","c","a"), val=INT(11,11,13,13)))


###################################
# Add new tests above this line #
Expand Down
Loading