Rdatatable
diff --git a/‎NAMESPACE
Lines changed: 1 addition & 0 deletions b/‎NAMESPACE
Lines changed: 1 addition & 0 deletions
diff --git a/‎NEWS.md
Lines changed: 52 additions & 0 deletions b/‎NEWS.md
Lines changed: 52 additions & 0 deletions
diff --git a/‎R/froll.R
Lines changed: 127 additions & 12 deletions b/‎R/froll.R
Lines changed: 127 additions & 12 deletions
@@ -54,6 +54,7 @@ S3method(cube, data.table)
 S3method(rollup, data.table)
 export(frollmean)
 export(frollsum)
+export(frollmax)
 export(frollapply)
 export(nafill)
 export(setnafill)
 
@@ -4,6 +4,18 @@
 
 ## data.table [v1.17.99](https://github.com/Rdatatable/data.table/milestone/35)  (in development)
 
+### BREAKING CHANGE
+
+1. Rolling functions `frollmean` and `frollsum` distinguish `Inf`/`-Inf` from `NA` to match the same rules as base R when `algo="fast"` (previously they were considered the same). If your input into those functions has `Inf` or `-Inf` then you will be affected by this change. As a result, the argument that controls the handling of `NA`s has been renamed from `hasNA` to `has.nf` (_has non-finite_). `hasNA` continues to work with a warning, for now.
+    ```r
+    ## before
+    frollsum(c(1,2,3,Inf,5,6), 2)
+    #[1] NA  3  5 NA NA 11
+
+    ## now
+    frollsum(c(1,2,3,Inf,5,6), 2)
+    #[1]  NA   3   5 Inf Inf  11
+
 ### NOTICE OF INTENDED FUTURE POTENTIAL BREAKING CHANGES 
 
 1. `data.table(x=1, <expr>)`, where `<expr>` is an expression resulting in a 1-column matrix without column names, will eventually have names `x` and `V2`, not `x` and `V1`, consistent with `data.table(x=1, <expr>)` where `<expr>` results in an atomic vector, for example `data.table(x=1, cbind(1))` and `data.table(x=1, 1)` will both have columns named `x` and `V2`. In this release, the matrix case continues to be named `V1`, but the new behavior can be activated by setting `options(datatable.old.matrix.autoname)` to `FALSE`. See point 5 under Bug Fixes for more context; this change will provide more internal consistency as well as more consistency with `data.frame()`.
@@ -75,6 +87,46 @@
 
 15. New function `isoyear()` has been implemented as a complement to `isoweek()`, returning the ISO 8601 year corresponding to a given date, [#7154](https://github.com/Rdatatable/data.table/issues/7154). Thanks to @ben-schwen and @MichaelChirico for the suggestion and @venom1204 for the implementation.
 
+16. Multiple improvements have been added to rolling functions. Request came from @gpierard who needed left aligned, adaptive, rolling max, [#5438](https://github.com/Rdatatable/data.table/issues/5438). There was no `frollmax` function yet. Adaptive rolling functions did not have support for `align="left"`. `frollapply` did not support `adaptive=TRUE`. Available alternatives were base R `mapply` or self-join using `max` and grouping `by=.EACHI`. As a follow up of his request, the following features have been added:
+    - new function `frollmax`, applies `max` over a rolling window.
+    - support for `align="left"` for adaptive rolling function.
+    - support for `adaptive=TRUE` in `frollapply`.
+    - `partial` argument to trim window width to available observations rather than returning `NA` whenever window is not complete.
+    - `give.names` argument that can be used to automatically give the names based on the names of `x` and `n`.
+    - `frollmean` and `frollsum` no longer treat `Inf` and `-Inf` as `NA`s as it used to be for `algo="fast"` (breaking change).
+    - `hasNA` argument has been renamed to `has.nf` to convey that it is not only related to `NA/NaN` but other non-finite values (`Inf/-Inf`) as well.
+
+    Thanks to @jangorecki for implementation and @MichaelChirico and others for work on splitting into smaller PRs and reviews.
+    For a comprehensive description about all available features see `?froll` manual.
+
+    Adaptive `frollmax` has observed to be around 80 times faster than second fastest solution (data.table self-join using `max` and grouping `by=.EACHI`). Note that important factor in performance is width of the rolling window. Code for the benchmark below has been taken from [this SO answer](https://stackoverflow.com/a/73408459/2490497).
+    ```r
+    set.seed(108)
+    setDTthreads(16)
+    x = data.table(
+      value = cumsum(rnorm(1e6, 0.1)),
+      end_window = 1:1e6 + sample(50:500, 1e6, TRUE),
+      row = 1:1e6
+    )[, "end_window" := pmin(end_window, .N)
+      ][, "len_window" := end_window-row+1L]
+    baser = function(x) x[, mapply(function(from, to) max(value[from:to]), row, end_window)]
+    sj = function(x) x[x, max(value), on=.(row >= row, row <= end_window), by=.EACHI]$V1
+    frmax = function(x) x[, frollmax(value, len_window, adaptive=TRUE, align="left", has.nf=FALSE)]
+    frapply = function(x) x[, frollapply(value, len_window, max, adaptive=TRUE, align="left")]
+    microbenchmark::microbenchmark(
+      baser(x), sj(x), frmax(x), frapply(x),
+      times=10, check="identical"
+    )
+    #Unit: milliseconds
+    #       expr        min         lq       mean     median         uq        max neval
+    #   baser(x) 3094.88357 3097.84966 3186.74832 3163.58050 3251.66753 3370.33785    10
+    #      sj(x) 2221.55456 2255.12083 2306.61382 2303.47883 2346.70293 2412.62975    10
+    #   frmax(x)   17.45124   24.16809   28.10062   28.58153   32.79802   34.83941    10
+    # frapply(x)  272.07830  316.47060  366.94771  396.23566  416.06699  421.38701    10
+    ```
+
+    As of now, adaptive rolling max has no _on-line_ implementation (`algo="fast"`), it uses a naive approach (`algo="exact"`). Therefore further speed up is still possible if `algo="fast"` gets implemented.
+
 ### BUG FIXES
 
 1. `fread()` no longer warns on certain systems on R 4.5.0+ where the file owner can't be resolved, [#6918](https://github.com/Rdatatable/data.table/issues/6918). Thanks @ProfFancyPants for the report and PR.
 
@@ -1,21 +1,136 @@
-froll = function(fun, x, n, fill=NA, algo=c("fast", "exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) {
-  stopifnot(!missing(fun), is.character(fun), length(fun)==1L, !is.na(fun))
-  algo = match.arg(algo)
+# helpers for partial2adaptive
+trimn = function(n, len, align) {
+  n = min(n, len) ## so frollsum(1:2, 3, partial=TRUE) works
+  if (align=="right")
+    c(seq_len(n), rep.int(n, len-n))
+  else
+    c(rep.int(n, len-n), rev(seq_len(n)))
+}
+trimnadaptive = function(n, align) {
+  if (align=="right")
+    pmin(n, seq_along(n))
+  else
+    pmin(n, rev(seq_along(n)))
+}
+
+# partial2adaptive helper function
+## tune provided 'n' via partial=TRUE to adaptive=TRUE by prepared adaptive 'n' as shown in ?froll examples
+# partial2adaptive(1:4, 2, "right", adaptive=FALSE)
+# partial2adaptive(1:4, 2:3, "right", adaptive=FALSE)
+# partial2adaptive(list(1:4, 2:5), 2:3, "right", adaptive=FALSE)
+# frollsum(1:4, 2, partial=FALSE, adaptive=FALSE)
+# frollsum(1:4, 2, partial=TRUE, adaptive=FALSE)
+# frollsum(1:4, 2:3, partial=FALSE, adaptive=FALSE)
+# frollsum(1:4, 2:3, partial=TRUE, adaptive=FALSE)
+# frollsum(list(1:4, 2:5), 2:3, partial=FALSE, adaptive=FALSE)
+# frollsum(list(1:4, 2:5), 2:3, partial=TRUE, adaptive=FALSE)
+partial2adaptive = function(x, n, align, adaptive) {
+  if (!length(n))
+    stopf("n must be non 0 length")
+  if (align=="center")
+    stopf("'partial' cannot be used together with align='center'")
+  if (is.list(x) && length(unique(lengths(x))) != 1L)
+    stopf("'partial' does not support variable length of columns in 'x'")
+  len = if (is.list(x)) length(x[[1L]]) else length(x)
+  verbose = getOption("datatable.verbose")
+  if (!adaptive) {
+    if (is.list(n))
+      stopf("n must be an integer, list is accepted for adaptive TRUE")
+    if (!is.numeric(n))
+      stopf("n must be an integer vector or a list of integer vectors")
+    if (verbose)
+      catf("partial2adaptive: froll partial=TRUE trimming 'n' and redirecting to adaptive=TRUE\n")
+    if (length(n) > 1L) {
+      ## c(2,3) -> list(c(1,2,2,2),c(1,2,3,3)) ## for x=1:4
+      lapply(n, len, align, FUN=trimn)
+    } else {
+      ## 3 -> c(1,2,3,3) ## for x=1:4
+      trimn(n, len, align)
+    }
+  } else {
+    if (!(is.numeric(n) || (is.list(n) && all(vapply_1b(n, is.numeric)))))
+      stopf("n must be an integer vector or a list of integer vectors")
+    if (length(unique(lengths(n))) != 1L)
+      stopf("adaptive window provided in 'n' must not to have different lengths")
+    if (is.numeric(n) && length(n) != len)
+      stopf("length of 'n' argument must be equal to number of observations provided in 'x'")
+    if (is.list(n) && length(n[[1L]]) != len)
+      stopf("length of vectors in 'x' must match to length of adaptive window in 'n'")
+    if (verbose)
+      catf("partial2adaptive: froll adaptive=TRUE and partial=TRUE trimming 'n'\n")
+    if (is.numeric(n)) {
+      ## c(3,3,3,2) -> c(1,2,3,2) ## for x=1:4
+      trimnadaptive(n, align)
+    } else {
+      ## list(c(3,3,3,2),c(4,2,3,3)) -> list(c(1,2,3,2),c(1,2,3,3)) ## for x=1:4
+      lapply(n, align, FUN = trimnadaptive)
+    }
+  }
+}
+
+froll = function(fun, x, n, fill=NA, algo, align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, FUN, rho, give.names=FALSE) {
   align = match.arg(align)
-  ans = .Call(CfrollfunR, fun, x, n, fill, algo, align, na.rm, hasNA, adaptive)
+  if (isTRUE(give.names))
+    orig = list(n=n, adaptive=adaptive)
+  if (isTRUE(partial)) {
+    n = partial2adaptive(x, n, align, adaptive)
+    adaptive = TRUE
+  }
+  leftadaptive = isTRUE(adaptive) && align=="left"
+  if (leftadaptive) {
+    verbose = getOption("datatable.verbose")
+    rev2 = function(x) if (is.list(x)) lapply(x, rev) else rev(x)
+    if (verbose)
+      catf("froll: adaptive=TRUE && align='left' pre-processing for align='right'\n")
+    x = rev2(x)
+    n = rev2(n)
+    align = "right"
+  } ## support for left adaptive added in #5441
+  if (missing(FUN))
+    ans = .Call(CfrollfunR, fun, x, n, fill, algo, align, na.rm, has.nf, adaptive)
+  else
+    ans = .Call(CfrollapplyR, FUN, x, n, fill, align, adaptive, rho)
+  if (leftadaptive) {
+    if (verbose)
+      catf("froll: adaptive=TRUE && align='left' post-processing from align='right'\n")
+    ans = rev2(ans)
+  }
+  if (isTRUE(give.names) && is.list(ans)) {
+    n = orig$n
+    adaptive = orig$adaptive
+    nx = names(x)
+    nn = names(n)
+    if (is.null(nx)) nx = paste0("V", if (is.atomic(x)) 1L else seq_along(x))
+    if (is.null(nn)) nn = if (adaptive) paste0("N", if (is.atomic(n)) 1L else seq_along(n)) else paste("roll", as.character(n), sep="_")
+    setattr(ans, "names",  paste(rep(nx, each=length(nn)), nn, sep="_"))
+  }
   ans
 }
 
-frollmean = function(x, n, fill=NA, algo=c("fast", "exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) {
-  froll(fun="mean", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, hasNA=hasNA, adaptive=adaptive)
+frollfun = function(fun, x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, hasNA, give.names=FALSE) {
+  stopifnot(!missing(fun), is.character(fun), length(fun)==1L, !is.na(fun))
+  if (!missing(hasNA)) {
+    if (!is.na(has.nf))
+      stopf("hasNA is deprecated, use has.nf instead")
+    warningf("hasNA is deprecated, use has.nf instead")
+    has.nf = hasNA
+  } # remove check on next major release
+  algo = match.arg(algo)
+  froll(fun=fun, x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, give.names=give.names)
+}
+
+frollmean = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, hasNA, give.names=FALSE) {
+  frollfun(fun="mean", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, hasNA=hasNA, give.names=give.names)
+}
+frollsum = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, hasNA, give.names=FALSE) {
+  frollfun(fun="sum", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, hasNA=hasNA, give.names=give.names)
 }
-frollsum = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) {
-  froll(fun="sum", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, hasNA=hasNA, adaptive=adaptive)
+frollmax = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, hasNA, give.names=FALSE) {
+  frollfun(fun="max", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, hasNA=hasNA, give.names=give.names)
 }
-frollapply = function(x, n, FUN, ..., fill=NA, align=c("right", "left", "center")) {
+
+frollapply = function(x, n, FUN, ..., fill=NA, align=c("right","left","center"), adaptive=FALSE, partial=FALSE, give.names=FALSE) {
   FUN = match.fun(FUN)
-  align = match.arg(align)
   rho = new.env()
-  ans = .Call(CfrollapplyR, FUN, x, n, fill, align, rho)
-  ans
+  froll(FUN=FUN, rho=rho, x=x, n=n, fill=fill, align=align, adaptive=adaptive, partial=partial, give.names=give.names)
 }