New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression with .N
and :=
#5424
Comments
I did a git bisect and I found that the slowdown started after merging #4491 which was done by @jangorecki so maybe you have some idea how to fix? |
I think it is exactly what Ben noticed. Fix is simple because this part of code is responsible for verbose output, not the actual computation. |
Is there a commit with that supposed simple fix that I can test? |
Those messages are generated always and just displayed with verbose output. It may have been introduced in e793f53 |
I did another git bisect, using only the commits in #4491. N = 10^6,
n <- N/100
set.seed(1L)
dt <- data.table(
g = sample(seq_len(n), N, TRUE),
x = runif(N),
key = "g")
dt_mod <- copy(dt)
data.table:::`[.data.table`(dt_mod, , N := .N, by = g) |
On the latest Github master
dt[, N = .N, by = id]
became much slower than on the latest CRAN version. The output below is from an M1 Mac but the issue can be reproduced on x64 Linux and Windows as well. I included tests for all optimization levels. I haven't found similar issues with other optimized functions.CRAN (1.14.2)
Created on 2022-07-25 by the reprex package (v2.0.1)
Session info
Github (master)
Created on 2022-07-26 by the reprex package (v2.0.1)
Session info
The text was updated successfully, but these errors were encountered: