-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apply() faster than colQuantiles() #153
Comments
Not near a computer for a few days, but I suspect the |
Forgot to say, |
Thanks. Any idea at what point colQuantiles() will be optimized? |
Since it's most likely Note, matrixStats is designed for matrices (2d arrays) and some vectors (1d arrays). Your example uses a 3d array. |
Alright, thanks. I guess I will have to find a faster version of drop(), or wait until an 'arrayStats' package is released... |
Now at a computer. My guess above was incorrect; it's not library(matrixStats)
x <- array(1:(10000*1000), dim = c(10000, 1000, 1))
xd <- drop(x)
y0 <- apply(x, MARGIN = 2L, FUN = quantile, probs = 0.975)
y1 <- colQuantiles(drop(x), probs = 0.975)
y2 <- colQuantiles(xd, probs = 0.975)
stopifnot(identical(y1, y0), identical(y2, y0))
stats <- microbenchmark::microbenchmark(
apply(x, MARGIN = 2L, FUN = quantile, probs = 0.975),
colQuantiles(drop(x), probs = 0.975),
colQuantiles(xd, probs = 0.975),
drop(x),
unit = "ms",
times = 10L
)
print(stats)
## Unit: milliseconds
## expr min lq
## apply(x, MARGIN = 2L, FUN = quantile, probs = 0.975) 142.430746 143.457675
## colQuantiles(drop(x), probs = 0.975) 286.959245 363.392250
## colQuantiles(xd, probs = 0.975) 264.892077 303.891283
## drop(x) 0.000419 0.000782
## mean median uq max neval cld
## 165.3461039 150.036045 170.998975 267.598677 10 b
## 399.3671685 384.558625 457.812438 481.906615 10 d
## 345.1908052 364.407124 384.364789 389.169873 10 c
## 0.0029067 0.002321 0.002681 0.011392 10 a In other words, there's indeed room for improvement. |
Thanks again! My original code has been released to CRAN as part of an R package. It's sufficiently fast for now, but whenever you get around to optimizing colQuantiles(), I will use that instead of stats::quantile(), which can be improved based on code profiling. |
Benchmarking with library(matrixStats)
options(width=120)
X <- matrix(1:(10000*1000), nrow=10000L, ncol=1000L)
y0 <- apply(X, MARGIN=2L, FUN=quantile, probs=0.975)
y1 <- colQuantiles(X, probs=0.975)
stopifnot(identical(y1, y0))
stats <- bench::mark(
apply(X, MARGIN=2L, FUN=quantile, probs=0.975),
colQuantiles(X, probs=0.975),
min_iterations=10L
) It does not like replacing generic ## matrixStats 0.55.0
print(stats)
## # A tibble: 2 x 13
## expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
## <bch:expr> <bch> <bch:> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
## 1 apply(X, MARGIN = 2L, FUN = quantile, probs = 0.975) 146ms 155ms 6.03 191MB 19.9 10 33 1.66s
## 2 colQuantiles(X, probs = 0.975) 286ms 310ms 3.18 496MB 20.1 10 63 3.14s
## # … with 4 more variables: result <list>, memory <list>, time <list>, gc <list> ## matrixStats 0.55.0-9000 (with sort.int() instead of generic sort())
print(stats)
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch> <bch:> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 apply(X, MARGIN = 2L, FUN = quantile, probs = 0.975) 151ms 174ms 5.38 191MB 17.7 10 33 1.86s
2 colQuantiles(X, probs = 0.975) 292ms 333ms 2.89 496MB 23.4 10 81 3.46s
# … with 4 more variables: result <list>, memory <list>, time <list>, gc <list> Now, looking at the above benchmark results, it's clear that The dominant memory allocations are: > p1 <- profmem::profmem(y1 <- colQuantiles(X, probs=0.975))
> subset(p1, bytes > 100e3)
Rprofmem memory profiling of:
y1 <- colQuantiles(X, probs = 0.975)
Memory allocations:
what bytes calls
4 alloc 40000048 colQuantiles() -> apply() -> aperm() -> aperm.default()
4042 alloc 40000048 colQuantiles() -> apply() -> unlist()
4043 alloc 40000048 colQuantiles() -> apply() -> array()
4044 alloc 40000048 colQuantiles() -> apply() -> aperm() -> aperm.default()
6051 alloc 40000048 colQuantiles() -> apply() -> aperm() -> aperm.default()
total 200000240 |
Great news: In matrixStats 0.55.0-9000 (develop branch) we now have:
For the example in this issue, the speedup is ~7 times (compared to #153 (comment)), which means > library(matrixStats)
> options(width=120)
>
> X <- matrix(1:(10000*1000), nrow=10000L, ncol=1000L)
> y0 <- apply(X, MARGIN=2L, FUN=quantile, probs=0.975)
> y1 <- colQuantiles(X, probs=0.975)
> stopifnot(identical(y1, y0))
>
> stats <- bench::mark(
+ apply(X, MARGIN=2L, FUN=quantile, probs=0.975),
+ colQuantiles(X, probs=0.975),
+ min_iterations=10L
+ )
Warning message:
Some expressions had a GC in every iteration; so filtering is disabled.
> stats
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
<bch:expr> <bch:t> <bch:t> <dbl> <bch:byt> <dbl> <int> <dbl>
1 apply(X, MARGIN = 2L, FUN = quantile, probs = 0.975) 209.6ms 218.6ms 4.56 191MB 20.5 10 45
2 colQuantiles(X, probs = 0.975) 68.5ms 70.4ms 13.7 115MB 28.8 10 21
# … with 5 more variables: total_time <bch:tm>, result <list>, memory <list>, time <list>, gc <list> This was achieved by avoid lots of large memory allocation. Comparing to #153 (comment), there large allocations are no longer done: > p1 <- profmem::profmem(y1 <- colQuantiles(X, probs=0.975))
> subset(p1, bytes > 50e3)
Rprofmem memory profiling of:
y1 <- colQuantiles(X, probs = 0.975)
Memory allocations:
what bytes calls
total 0 PS. There's still some room for improvements, but that's minor to what was achieved here. |
Thanks! Do you know when you plan to submit an updated package to CRAN? |
No guesstimate. matrixStats 0.55.0 was just released and it was a multi-day effort to check it across platforms and run all of the 244 reverse dependency check (I have access to computer cluster so that helped). So, releasing matrixStats is a bit of a time commitment. |
FYI, matrixStats 0.56.0 with this speedup is now on CRAN. |
Excellent! Thanks Henrik! |
Computing quantiles via apply() across columns (MARGIN = 2) appears to be faster than using colQuantiles(). How can this be? Since colQuantiles is written in pure C, it should be much faster than the pure R apply(), but it's actually the other way around...
library(matrixStats)
library(microbenchmark)
x <- array(1:(10000 * 1000), dim = c(10000, 1000, 1)) # example array
microbenchmark(apply(x, 2, function(x) quantile(x, 0.975)),
colQuantiles(drop(x), probs = 0.975)) # timing each function
Unit: milliseconds
expr min lq mean median
apply(x, 2, function(x) quantile(x, 0.975)) 252.7643 263.4260 278.0485 268.7132
colQuantiles(drop(x), probs = 0.975) 447.6436 514.2717 537.6342 526.2200
uq max neval
281.7958 356.3389 100
573.9136 615.9201 100
identical(apply(x, 2, function(x) quantile(x, 0.975)), colQuantiles(drop(x), probs = 0.975))
[1] TRUE
Can someone explain?
The text was updated successfully, but these errors were encountered: