Performance regression with `.N` and `:=` #5424

svraka · 2022-07-25T22:19:23Z

On the latest Github master dt[, N = .N, by = id] became much slower than on the latest CRAN version. The output below is from an M1 Mac but the issue can be reproduced on x64 Linux and Windows as well. I included tests for all optimization levels. I haven't found similar issues with other optimized functions.

CRAN (1.14.2)

library(data.table)
options(datatable.verbose = TRUE)

N <- 1e8
n <- 1e6

set.seed(1L)
dt <- data.table(g = sample(seq_len(n), N, TRUE),
                 x = runif(N),
                 key = "g")
#> forder.c received 100000000 rows and 2 columns
#> forder took 2.256 sec
#> reorder took 1.934s elapsed (4.338s cpu)

options(datatable.optimize = 0L)
dt_mod <- copy(dt)
dt_mod[, N := .N, by = g]
#> Detected that j uses these columns: <none> 
#> Finding groups using uniqlist on key ... 0.115s elapsed (0.097s cpu) 
#> Finding group sizes from the positions (can be avoided to save RAM) ... 0.004s elapsed (0.003s cpu) 
#> All optimizations are turned off
#> Making each group and running j (GForce FALSE) ... 
#>   memcpy contiguous groups took 0.310s for 1000000 groups
#>   eval(j) took 0.300s for 1000000 calls
#> 1.412s elapsed (0.630s cpu)

options(datatable.optimize = 1L)
dt_mod <- copy(dt)
dt_mod[, N := .N, by = g]
#> Detected that j uses these columns: <none> 
#> Finding groups using uniqlist on key ... 0.112s elapsed (0.099s cpu) 
#> Finding group sizes from the positions (can be avoided to save RAM) ... 0.003s elapsed (0.003s cpu) 
#> lapply optimization is on, j unchanged as '.N'
#> Making each group and running j (GForce FALSE) ... 
#>   memcpy contiguous groups took 0.312s for 1000000 groups
#>   eval(j) took 0.300s for 1000000 calls
#> 1.767s elapsed (0.610s cpu)

options(datatable.optimize = 2L)
dt_mod <- copy(dt)
dt_mod[, N := .N, by = g]
#> Detected that j uses these columns: <none> 
#> Finding groups using uniqlist on key ... 0.104s elapsed (0.096s cpu) 
#> Finding group sizes from the positions (can be avoided to save RAM) ... 0.004s elapsed (0.003s cpu) 
#> lapply optimization is on, j unchanged as '.N'
#> Making each group and running j (GForce FALSE) ... 
#>   memcpy contiguous groups took 0.311s for 1000000 groups
#>   eval(j) took 0.298s for 1000000 calls
#> 1.442s elapsed (0.613s cpu)

options(datatable.optimize = 3L)
dt_mod <- copy(dt)
dt_mod[, N := .N, by = g]
#> Detected that j uses these columns: <none> 
#> Finding groups using uniqlist on key ... 0.114s elapsed (0.098s cpu) 
#> Finding group sizes from the positions (can be avoided to save RAM) ... 0.003s elapsed (0.003s cpu) 
#> lapply optimization is on, j unchanged as '.N'
#> Making each group and running j (GForce FALSE) ... 
#>   memcpy contiguous groups took 0.308s for 1000000 groups
#>   eval(j) took 0.296s for 1000000 calls
#> 1.364s elapsed (0.595s cpu)

^{Created on 2022-07-25 by the reprex package (v2.0.1)}

Session info

sessionInfo()
#> R version 4.2.1 (2022-06-23)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Monterey 12.5
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] data.table_1.14.2
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.29   withr_2.5.0     magrittr_2.0.3  reprex_2.0.1   
#>  [5] evaluate_0.15   highr_0.9       stringi_1.7.8   rlang_1.0.4    
#>  [9] cli_3.3.0       fs_1.5.2        rmarkdown_2.14  tools_4.2.1    
#> [13] stringr_1.4.0   glue_1.6.2      xfun_0.31       yaml_2.3.5     
#> [17] fastmap_1.1.0   compiler_4.2.1  htmltools_0.5.3 knitr_1.39

Github (master)

library(data.table)
options(datatable.verbose = TRUE)

N <- 1e8
n <- 1e6

set.seed(1L)
dt <- data.table(g = sample(seq_len(n), N, TRUE),
                 x = runif(N),
                 key = "g")
#> forder.c received 100000000 rows and 2 columns
#> forder took 2.672 sec
#> reorder took 1.965s elapsed (4.403s cpu)

options(datatable.optimize = 0L)
dt_mod <- copy(dt)
dt_mod[, N := .N, by = g]
#> Argument 'by' after substitute: g
#> Detected that j uses these columns: <none>
#> Finding groups using uniqlist on key ... 0.174s elapsed (0.105s cpu) 
#> Finding group sizes from the positions (can be avoided to save RAM) ... 0.014s elapsed (0.003s cpu) 
#> All optimizations are turned off
#> Making each group and running j (GForce FALSE) ... 
#>   memcpy contiguous groups took 0.260s for 1000000 groups
#>   eval(j) took 0.249s for 1000000 calls
#> 2.103s elapsed (1.489s cpu)

options(datatable.optimize = 1L)
dt_mod <- copy(dt)
dt_mod[, N := .N, by = g]
#> Argument 'by' after substitute: g
#> Detected that j uses these columns: <none>
#> Finding groups using uniqlist on key ... 0.116s elapsed (0.099s cpu) 
#> Finding group sizes from the positions (can be avoided to save RAM) ... 0.004s elapsed (0.003s cpu) 
#> lapply optimization is on, j unchanged as '.N'
#> Making each group and running j (GForce FALSE) ... 
#>   memcpy contiguous groups took 0.263s for 1000000 groups
#>   eval(j) took 0.249s for 1000000 calls
#> 2.294s elapsed (1.479s cpu)

options(datatable.optimize = 2L)
dt_mod <- copy(dt)
dt_mod[, N := .N, by = g]
#> Argument 'by' after substitute: g
#> Detected that j uses these columns: <none>
#> Finding groups using uniqlist on key ... 0.096s elapsed (0.094s cpu) 
#> Finding group sizes from the positions (can be avoided to save RAM) ... 0.004s elapsed (0.003s cpu) 
#> lapply optimization is on, j unchanged as '.N'
#> Making each group and running j (GForce FALSE) ... 
#>   memcpy contiguous groups took 0.262s for 1000000 groups
#>   eval(j) took 0.249s for 1000000 calls
#> 2.096s elapsed (1.468s cpu)

options(datatable.optimize = 3L)
dt_mod <- copy(dt)
dt_mod[, N := .N, by = g]
#> Argument 'by' after substitute: g
#> Detected that j uses these columns: <none>
#> Finding groups using uniqlist on key ... 0.114s elapsed (0.098s cpu) 
#> Finding group sizes from the positions (can be avoided to save RAM) ... 0.003s elapsed (0.004s cpu) 
#> lapply optimization is on, j unchanged as '.N'
#> Making each group and running j (GForce FALSE) ... 
#>   memcpy contiguous groups took 0.261s for 1000000 groups
#>   eval(j) took 0.247s for 1000000 calls
#> 2.048s elapsed (1.464s cpu)

^{Created on 2022-07-26 by the reprex package (v2.0.1)}

Session info

sessionInfo()
#> R version 4.2.1 (2022-06-23)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Monterey 12.5
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] data.table_1.14.3
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.29   withr_2.5.0     magrittr_2.0.3  reprex_2.0.1   
#>  [5] evaluate_0.15   highr_0.9       stringi_1.7.8   rlang_1.0.4    
#>  [9] cli_3.3.0       fs_1.5.2        rmarkdown_2.14  tools_4.2.1    
#> [13] stringr_1.4.0   glue_1.6.2      xfun_0.31       yaml_2.3.5     
#> [17] fastmap_1.1.0   compiler_4.2.1  htmltools_0.5.3 knitr_1.39

The text was updated successfully, but these errors were encountered:

tdhock · 2022-07-26T22:14:26Z

I did a git bisect and I found that the slowdown started after merging #4491 which was done by @jangorecki so maybe you have some idea how to fix?
In detail here are the commits near where the slowdown occured

and here are the timings,

The plot above shows that #4491 is just as slow as master, whereas the two commits immediate before that are significantly faster, which implies that PR is responsible for the slowdown.

ben-schwen · 2022-07-27T07:48:41Z

Well if it was introduced by #4491 then it might be the same issue with snprintf() as in #5371

jangorecki · 2022-07-27T14:17:56Z

I think it is exactly what Ben noticed. Fix is simple because this part of code is responsible for verbose output, not the actual computation.

tdhock · 2022-07-27T16:47:50Z

Is there a commit with that supposed simple fix that I can test?
To generate that timings figure I did not use verbose output, which suggests the problem is something else.

jangorecki · 2022-07-27T18:54:01Z

Those messages are generated always and just displayed with verbose output. It may have been introduced in e793f53
In the other issue I asked Matt about preferred solution from few options proposed there.

tdhock · 2022-07-27T19:34:49Z

I did another git bisect, using only the commits in #4491.

The figure above confirms that the slowdown started happening in that commit, e793f53 which was attributed to @mattdowle
BTW these figures are made by timing the following code for different versions and different values of N,

  N = 10^6,
    n <- N/100
    set.seed(1L)
    dt <- data.table(
      g = sample(seq_len(n), N, TRUE),
      x = runif(N),
      key = "g")
    dt_mod <- copy(dt)
    data.table:::`[.data.table`(dt_mod, , N := .N, by = g)

jangorecki added this to the 1.14.3 milestone Jul 26, 2022

jangorecki added the dev label Jul 26, 2022

jangorecki mentioned this issue Sep 14, 2022

memrecycle no snprintf overhead #5463

Merged

mattdowle closed this as completed in #5463 Oct 14, 2022

tdhock mentioned this issue Dec 6, 2022

dcast only computes default fill if necessary #5549

Merged

jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression with `.N` and `:=` #5424

Performance regression with `.N` and `:=` #5424

svraka commented Jul 25, 2022

tdhock commented Jul 26, 2022 •

edited

ben-schwen commented Jul 27, 2022

jangorecki commented Jul 27, 2022

tdhock commented Jul 27, 2022

jangorecki commented Jul 27, 2022

tdhock commented Jul 27, 2022 •

edited

Performance regression with .N and := #5424

Performance regression with .N and := #5424

Comments

svraka commented Jul 25, 2022

CRAN (1.14.2)

Github (master)

tdhock commented Jul 26, 2022 • edited

ben-schwen commented Jul 27, 2022

jangorecki commented Jul 27, 2022

tdhock commented Jul 27, 2022

jangorecki commented Jul 27, 2022

tdhock commented Jul 27, 2022 • edited

Performance regression with `.N` and `:=` #5424

Performance regression with `.N` and `:=` #5424

tdhock commented Jul 26, 2022 •

edited

tdhock commented Jul 27, 2022 •

edited