Skip to content

Commit

Permalink
replacement can be shut off with optim level
Browse files Browse the repository at this point in the history
  • Loading branch information
Michael Chirico committed Aug 31, 2019
1 parent f3c1827 commit 6d99792
Show file tree
Hide file tree
Showing 3 changed files with 23 additions and 14 deletions.
2 changes: 1 addition & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@

20. `setkey`, `[key]by=` and `on=` in verbose mode (`options(datatable.verbose=TRUE)`) now detect any columns inheriting from `Date` which are stored as 8 byte double, test if any fractions are present, and if not suggest using a 4 byte integer instead (such as `data.table::IDate`) to save space and time, [#1738](https://github.com/Rdatatable/data.table/issues/1738). In future this could be upgraded to `message` or `warning` depending on feedback.

21. New function `fifelse(test, yes, no)` has been implemented in C by Morgan Jacob, [#3657](https://github.com/Rdatatable/data.table/issues/3657). It is comparable to `base::ifelse`, `dplyr::if_else`, `hutils::if_else`, and (forthcoming) [`vctrs::if_else()`](https://vctrs.r-lib.org/articles/stability.html#ifelse). It returns a vector of the same length as `test` but unlike `base::ifelse` the output type is consistent with those of `yes` and `no`. Because we believe `fifelse` to be strictly superior to `base::ifelse`, `ifelse` usage in `j` is now detected and replaced with `fifelse` (with warning), [#2677](https://github.com/Rdatatable/data.table/issues/2677). Please see `?data.table::fifelse` for more details.
21. New function `fifelse(test, yes, no)` has been implemented in C by Morgan Jacob, [#3657](https://github.com/Rdatatable/data.table/issues/3657). It is comparable to `base::ifelse`, `dplyr::if_else`, `hutils::if_else`, and (forthcoming) [`vctrs::if_else()`](https://vctrs.r-lib.org/articles/stability.html#ifelse). It returns a vector of the same length as `test` but unlike `base::ifelse` the output type is consistent with those of `yes` and `no`. Because we believe `fifelse` to be superior to `base::ifelse`, `ifelse` usage in `j` is now detected and replaced with `fifelse` (with warning), [#2677](https://github.com/Rdatatable/data.table/issues/2677). This can also be deactivated by lowering the optimization level: `options(datatable.optimize = 2L)`. Please see `?data.table::fifelse` for more details.

```R
# default 4 threads on a laptop with 16GB RAM and 8 logical CPU
Expand Down
24 changes: 14 additions & 10 deletions R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,7 @@ replace_order = function(isub, verbose, env) {
# setdiff removes duplicate entries, which'll create issues with duplicated names. Use %chin% instead.
dupdiff = function(x, y) x[!x %chin% y]

current_optim = getOption('datatable.optimize')
if (!missing(i)) {
xo = NULL
isub = substitute(i)
Expand Down Expand Up @@ -353,7 +354,7 @@ replace_order = function(isub, verbose, env) {
# optimize here so that we can switch it off if needed
check_eval_env = environment()
check_eval_env$eval_forder = FALSE
if (getOption("datatable.optimize") >= 1L) {
if (current_optim >= 1L) {
isub = replace_order(isub, verbose, check_eval_env)
}
if (check_eval_env$eval_forder) {
Expand Down Expand Up @@ -1160,13 +1161,16 @@ replace_order = function(isub, verbose, env) {
}

# 2677 -- browbeat about ifelse usage in j
SDenv$ifelse = function(test, yes, no) {
if (length(test) == 1L) {
warning("ifelse() is a vector operation, but input has length(test)=1; if (test) yes else no is the more efficient syntax for this case; replacing")
return(if (test) yes else no)
# TODO: phase this in to be the default (i.e., regardless of optimization level)
if ( current_optim >= 3L) {
SDenv$ifelse = function(test, yes, no) {
if (length(test) == 1L) {

This comment has been minimized.

Copy link
@jangorecki

jangorecki Aug 31, 2019

Member

I am very much against this. It will popup unexpectedly when a group in data has 1 row only. Or when data is 1 row only, not even by group.

This comment has been minimized.

Copy link
@MichaelChirico

MichaelChirico Aug 31, 2019

Member

Yes it's true... was thinking it's OK because it's only when user is doing ifelse which we encourage against. But yes more harm than good. Will remove.

warning("ifelse() is a vector operation, but input has length(test)=1; if (test) yes else no is the more efficient syntax for this case; replacing")
return(if (test) yes else no)
}
warning("ifelse() usage detected and converted to data.table's fifelse() which is more efficient and won't strip attributes. If you have a use case for ifelse that is not handled well by fifelse, please let us know; in the meantime, you can call base::ifelse explicitly. Please also be wary of nested ifelse constructs like ifelse(ifelse(ifelse(l1, a, ifelse(l2, b)), c, d), e) which are in general quite expensive and almost always better formulated as a join.")
fifelse(test, yes, no)
}
warning("ifelse() usage detected and converted to data.table's fifelse() which is more efficient and won't strip attributes. If you have a use case for ifelse that is not handled well by fifelse, please let us know; in the meantime, you can call base::ifelse explicitly. Please also be wary of nested ifelse constructs like ifelse(ifelse(ifelse(l1, a, ifelse(l2, b)), c, d), e) which are in general quite expensive and almost always better formulated as a join.")
fifelse(test, yes, no)
}

syms = all.vars(jsub)
Expand Down Expand Up @@ -1472,7 +1476,7 @@ replace_order = function(isub, verbose, env) {
lockBinding(".iSD",SDenv)

GForce = FALSE
if ( getOption("datatable.optimize")>=1L && (is.call(jsub) || (is.name(jsub) && as.character(jsub)[[1L]] %chin% c(".SD",".N"))) ) { # Ability to turn off if problems or to benchmark the benefit
if ( current_optim>=1L && (is.call(jsub) || (is.name(jsub) && as.character(jsub)[[1L]] %chin% c(".SD",".N"))) ) { # Ability to turn off if problems or to benchmark the benefit
# Optimization to reduce overhead of calling lapply over and over for each group
oldjsub = jsub
funi = 1L # Fix for #985
Expand Down Expand Up @@ -1628,7 +1632,7 @@ replace_order = function(isub, verbose, env) {
dotN = function(x) is.name(x) && x==".N" # For #5760. TODO: Rprof() showed dotN() may be the culprit if iterated (#1470)?; avoid the == which converts each x to character?
# FR #971, GForce kicks in on all subsets, no joins yet. Although joins could work with
# nomatch=0L even now.. but not switching it on yet, will deal it separately.
if (getOption("datatable.optimize")>=2L && !is.data.table(i) && !byjoin && length(f__) && !length(lhs)) {
if (current_optim>=2L && !is.data.table(i) && !byjoin && length(f__) && !length(lhs)) {
if (!length(ansvars) && !use.I) {
GForce = FALSE
if ( (is.name(jsub) && jsub == ".N") || (is.call(jsub) && length(jsub)==2L && length(as.character(jsub[[1L]])) && as.character(jsub[[1L]])[1L] == "list" && length(as.character(jsub[[2L]])) && as.character(jsub[[2L]])[1L] == ".N") ) {
Expand Down Expand Up @@ -1706,7 +1710,7 @@ replace_order = function(isub, verbose, env) {
# when fastmean can do trim.
}
} else if (verbose) {
if (getOption("datatable.optimize")<1L) cat("All optimizations are turned off\n")
if (current_optim<1L) cat("All optimizations are turned off\n")
else cat("Optimization is on but left j unchanged (single plain symbol): '",deparse(jsub, width.cutoff=200L, nlines=1L),"'\n",sep="")
}
if (byjoin) {
Expand Down
11 changes: 8 additions & 3 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -15864,12 +15864,17 @@ test(2095, any(grepl('override', capture.output(dcast(DT, a~b, fun.aggregate=len
# gmean intermediate can overflow integers without warning, #986
test(2096, data.table(a=c(1L,1L), v=c(2e9L, 2e9L))[, mean(v), a], data.table(a=1L, V1=2e9))

# #2677 -- emit warnings when ifelse is used in j
# #2677 -- emit warnings when ifelse is used in j (only for high optim _for now_)
optim = options(datatable.optimize=0L)
DT = data.table(a=1:10)
test(2097.1, DT[ , ifelse(a > 5, 1, -1)], rep(c(-1, 1), each=5L),
test(2097.1, DT[ , ifelse(a > 5, 1, -1)], rep(c(-1, 1), each=5L))
test(2097.2, DT[1L, ifelse(a > 5, 1, -1)], -1)
options(datatable.optimize=3L)
test(2097.3, DT[ , ifelse(a > 5, 1, -1)], rep(c(-1, 1), each=5L),
warning="ifelse() usage detected and converted")
test(2097.2, DT[1L, ifelse(a > 5, 1, -1)], -1,
test(2097.4, DT[1L, ifelse(a > 5, 1, -1)], -1,
warning="ifelse() is a vector operation")
options(optim)

###################################
# Add new tests above this line #
Expand Down

0 comments on commit 6d99792

Please sign in to comment.