Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upUpdate call to RcppArmadillo_fastLm_impl #56
Conversation
- RcppArmadillo's underlying implementation of fastLm was renamed to fastLm_impl in RcppCore/RcppArmadillo#164 (mid-August 2017) - Call the RcppAttributes generated C++ function directly
| @@ -37,7 +37,7 @@ exprs$QR <- expression(.Call("RcppEigen_fastLm_Impl", mm, y, 1L, PACKAGE="RcppEi | |||
| exprs$LLt <- expression(.Call("RcppEigen_fastLm_Impl", mm, y, 3L, PACKAGE="RcppEigen")) | |||
|
|
|||
| if (suppressMessages(require("RcppArmadillo", character=TRUE, quietly=TRUE))) { | |||
| exprs$arma <- expression(.Call("RcppArmadillo_fastLm", mm, y, PACKAGE="RcppArmadillo")) | |||
| exprs$arma <- expression(.Call("_RcppArmadillo_fastLm_impl", mm, y, PACKAGE="RcppArmadillo")) | |||
eddelbuettel
May 30, 2018
Member
Bigger question is why we even call this way by .Call() anymore ...
Bigger question is why we even call this way by .Call() anymore ...
michaelweylandt
May 30, 2018
Author
Contributor
Minimizing non-C++ overhead?
Happy to rewrite everything to call Rcpp{Eigen,Armadillo,GSL}::fastLmPure if you'd rather that.
Minimizing non-C++ overhead?
Happy to rewrite everything to call Rcpp{Eigen,Armadillo,GSL}::fastLmPure if you'd rather that.
eddelbuettel
May 30, 2018
•
Member
I was mostly just thinking out aloud but if you wouldn't mind running a quick check?
Now that we have registration on, there should be little overhead. And yes, I would take an enlarged PR ...
I was mostly just thinking out aloud but if you wouldn't mind running a quick check?
Now that we have registration on, there should be little overhead. And yes, I would take an enlarged PR ...
michaelweylandt
May 30, 2018
Author
Contributor
I took a quick look at this. The results from calling fastLmPure are basically the same, and might even be a bit faster than the current approach.
From playing around with it, it feels like the fastLmPure approach appears to be a bit more random (since it does a bit more), but the additional noise is on the same order of magnitude as the noise in the benchmarks themselves.
Unless it's important to remove all additional sources of noise from the benchmarks, I'd go with fastLmPure. If getting optimal benchmarks is important, using the higher-precision timers from microbenchmark or bench might be a good idea.
Here's three runs (100 replicates per run) of just the two Cholesky approaches:
> source("lmBenchmark-short.R")
lm benchmark for n = 100000 and p = 40: nrep = 100
test relative elapsed user.self sys.self
2 LDLt2 1.000 1.683 1.683 0.000
4 LLt2 1.011 1.701 1.701 0.000
1 LDLt 1.014 1.706 1.706 0.000
3 LLt 1.017 1.712 1.708 0.004
> source("lmBenchmark-short.R")
lm benchmark for n = 100000 and p = 40: nrep = 100
test relative elapsed user.self sys.self
2 LDLt2 1.000 1.580 1.544 0.036
4 LLt2 1.001 1.582 1.546 0.036
3 LLt 1.005 1.588 1.553 0.035
1 LDLt 1.007 1.591 1.560 0.032
> source("lmBenchmark-short.R")
lm benchmark for n = 100000 and p = 40: nrep = 100
test relative elapsed user.self sys.self
3 LLt 1.000 1.710 1.643 0.066
1 LDLt 1.019 1.742 1.738 0.004
2 LDLt2 1.037 1.774 1.773 0.000
4 LLt2 1.039 1.777 1.778 0.000
The "2" versions are the ones calling the appropriate fastLmPure instead of .Call directly.
Here's the result of three runs on all methods :
> source("lmBenchmark-mod.R")
lm benchmark for n = 100000 and p = 40: nrep = 20
test relative elapsed user.self sys.self
4 LDLt 1.000 0.325 0.324 0.000
15 LLt2 1.006 0.327 0.327 0.000
14 LLt 1.012 0.329 0.329 0.000
5 LDLt2 1.015 0.330 0.331 0.000
11 SymmEig2 2.222 0.722 0.723 0.000
10 SymmEig 2.255 0.733 0.723 0.009
13 QR2 7.163 2.328 2.328 0.000
12 QR 7.172 2.331 2.331 0.000
2 PivQR 7.455 2.423 2.413 0.011
3 PivQR2 7.458 2.424 2.419 0.005
1 lm.fit 14.597 4.744 4.641 0.102
17 arma2 17.588 5.716 5.716 0.000
16 arma 17.649 5.736 5.735 0.001
6 GESDD 31.440 10.218 9.974 0.243
7 GESDD2 31.658 10.289 10.034 0.254
8 SVD 54.985 17.870 17.356 0.511
9 SVD2 55.018 17.881 17.366 0.513
19 GSL2 108.926 35.401 35.273 0.124
18 GSL 111.711 36.306 36.186 0.117
> source("lmBenchmark-mod.R")
lm benchmark for n = 100000 and p = 40: nrep = 20
test relative elapsed user.self sys.self
15 LLt2 1.000 0.312 0.311 0.001
14 LLt 1.006 0.314 0.313 0.000
5 LDLt2 1.026 0.320 0.320 0.000
4 LDLt 1.048 0.327 0.327 0.000
11 SymmEig2 2.237 0.698 0.696 0.002
10 SymmEig 2.263 0.706 0.697 0.009
13 QR2 7.288 2.274 2.272 0.001
12 QR 7.304 2.279 2.276 0.002
2 PivQR 7.667 2.392 2.392 0.000
3 PivQR2 7.679 2.396 2.396 0.000
1 lm.fit 14.019 4.374 4.368 0.006
17 arma2 16.888 5.269 5.267 0.002
16 arma 17.119 5.341 5.338 0.003
7 GESDD2 30.394 9.483 9.448 0.034
6 GESDD 30.532 9.526 9.491 0.034
8 SVD 56.388 17.593 17.309 0.283
9 SVD2 59.721 18.633 18.320 0.312
19 GSL2 110.465 34.465 34.346 0.117
18 GSL 111.256 34.712 34.588 0.120
> source("lmBenchmark-mod.R")
lm benchmark for n = 100000 and p = 40: nrep = 20
test relative elapsed user.self sys.self
4 LDLt 1.000 0.308 0.306 0.002
5 LDLt2 1.010 0.311 0.311 0.000
14 LLt 1.019 0.314 0.314 0.000
15 LLt2 1.036 0.319 0.319 0.000
10 SymmEig 2.221 0.684 0.683 0.001
11 SymmEig2 2.237 0.689 0.688 0.000
12 QR 7.292 2.246 2.246 0.000
13 QR2 7.315 2.253 2.253 0.000
2 PivQR 7.500 2.310 2.307 0.002
3 PivQR2 7.506 2.312 2.311 0.001
1 lm.fit 14.221 4.380 4.352 0.028
16 arma 17.237 5.309 5.308 0.001
17 arma2 17.354 5.345 5.344 0.000
6 GESDD 30.114 9.275 9.268 0.007
7 GESDD2 30.253 9.318 9.315 0.002
8 SVD 55.571 17.116 16.872 0.242
9 SVD2 57.045 17.570 17.320 0.249
18 GSL 111.227 34.258 34.139 0.115
19 GSL2 114.195 35.172 35.049 0.120
I took a quick look at this. The results from calling fastLmPure are basically the same, and might even be a bit faster than the current approach.
From playing around with it, it feels like the fastLmPure approach appears to be a bit more random (since it does a bit more), but the additional noise is on the same order of magnitude as the noise in the benchmarks themselves.
Unless it's important to remove all additional sources of noise from the benchmarks, I'd go with fastLmPure. If getting optimal benchmarks is important, using the higher-precision timers from microbenchmark or bench might be a good idea.
Here's three runs (100 replicates per run) of just the two Cholesky approaches:
> source("lmBenchmark-short.R")
lm benchmark for n = 100000 and p = 40: nrep = 100
test relative elapsed user.self sys.self
2 LDLt2 1.000 1.683 1.683 0.000
4 LLt2 1.011 1.701 1.701 0.000
1 LDLt 1.014 1.706 1.706 0.000
3 LLt 1.017 1.712 1.708 0.004
> source("lmBenchmark-short.R")
lm benchmark for n = 100000 and p = 40: nrep = 100
test relative elapsed user.self sys.self
2 LDLt2 1.000 1.580 1.544 0.036
4 LLt2 1.001 1.582 1.546 0.036
3 LLt 1.005 1.588 1.553 0.035
1 LDLt 1.007 1.591 1.560 0.032
> source("lmBenchmark-short.R")
lm benchmark for n = 100000 and p = 40: nrep = 100
test relative elapsed user.self sys.self
3 LLt 1.000 1.710 1.643 0.066
1 LDLt 1.019 1.742 1.738 0.004
2 LDLt2 1.037 1.774 1.773 0.000
4 LLt2 1.039 1.777 1.778 0.000
The "2" versions are the ones calling the appropriate fastLmPure instead of .Call directly.
Here's the result of three runs on all methods :
> source("lmBenchmark-mod.R")
lm benchmark for n = 100000 and p = 40: nrep = 20
test relative elapsed user.self sys.self
4 LDLt 1.000 0.325 0.324 0.000
15 LLt2 1.006 0.327 0.327 0.000
14 LLt 1.012 0.329 0.329 0.000
5 LDLt2 1.015 0.330 0.331 0.000
11 SymmEig2 2.222 0.722 0.723 0.000
10 SymmEig 2.255 0.733 0.723 0.009
13 QR2 7.163 2.328 2.328 0.000
12 QR 7.172 2.331 2.331 0.000
2 PivQR 7.455 2.423 2.413 0.011
3 PivQR2 7.458 2.424 2.419 0.005
1 lm.fit 14.597 4.744 4.641 0.102
17 arma2 17.588 5.716 5.716 0.000
16 arma 17.649 5.736 5.735 0.001
6 GESDD 31.440 10.218 9.974 0.243
7 GESDD2 31.658 10.289 10.034 0.254
8 SVD 54.985 17.870 17.356 0.511
9 SVD2 55.018 17.881 17.366 0.513
19 GSL2 108.926 35.401 35.273 0.124
18 GSL 111.711 36.306 36.186 0.117
> source("lmBenchmark-mod.R")
lm benchmark for n = 100000 and p = 40: nrep = 20
test relative elapsed user.self sys.self
15 LLt2 1.000 0.312 0.311 0.001
14 LLt 1.006 0.314 0.313 0.000
5 LDLt2 1.026 0.320 0.320 0.000
4 LDLt 1.048 0.327 0.327 0.000
11 SymmEig2 2.237 0.698 0.696 0.002
10 SymmEig 2.263 0.706 0.697 0.009
13 QR2 7.288 2.274 2.272 0.001
12 QR 7.304 2.279 2.276 0.002
2 PivQR 7.667 2.392 2.392 0.000
3 PivQR2 7.679 2.396 2.396 0.000
1 lm.fit 14.019 4.374 4.368 0.006
17 arma2 16.888 5.269 5.267 0.002
16 arma 17.119 5.341 5.338 0.003
7 GESDD2 30.394 9.483 9.448 0.034
6 GESDD 30.532 9.526 9.491 0.034
8 SVD 56.388 17.593 17.309 0.283
9 SVD2 59.721 18.633 18.320 0.312
19 GSL2 110.465 34.465 34.346 0.117
18 GSL 111.256 34.712 34.588 0.120
> source("lmBenchmark-mod.R")
lm benchmark for n = 100000 and p = 40: nrep = 20
test relative elapsed user.self sys.self
4 LDLt 1.000 0.308 0.306 0.002
5 LDLt2 1.010 0.311 0.311 0.000
14 LLt 1.019 0.314 0.314 0.000
15 LLt2 1.036 0.319 0.319 0.000
10 SymmEig 2.221 0.684 0.683 0.001
11 SymmEig2 2.237 0.689 0.688 0.000
12 QR 7.292 2.246 2.246 0.000
13 QR2 7.315 2.253 2.253 0.000
2 PivQR 7.500 2.310 2.307 0.002
3 PivQR2 7.506 2.312 2.311 0.001
1 lm.fit 14.221 4.380 4.352 0.028
16 arma 17.237 5.309 5.308 0.001
17 arma2 17.354 5.345 5.344 0.000
6 GESDD 30.114 9.275 9.268 0.007
7 GESDD2 30.253 9.318 9.315 0.002
8 SVD 55.571 17.116 16.872 0.242
9 SVD2 57.045 17.570 17.320 0.249
18 GSL 111.227 34.258 34.139 0.115
19 GSL2 114.195 35.172 35.049 0.120
|
Pushed a new version which i) uses the Here's example output:
|
- Use `microbenchmark` instead of `rbenchmark` Report relative and absolute timings - Use `fastLmPure` functions instead of calling C++ directly
| @@ -5,7 +5,7 @@ | |||
| ## This file is part of RcppEigen. | |||
|
|
|||
| require("stats", character=TRUE, quietly=TRUE) | |||
| require("rbenchmark", character=TRUE, quietly=TRUE) | |||
| require("microbenchmark", character=TRUE, quietly=TRUE) | |||
eddelbuettel
May 30, 2018
Member
We need to add it to Suggests, and we cannot "just" require it. Please make the whole block below conditional of microbenchmark being present.
We need to add it to Suggests, and we cannot "just" require it. Please make the whole block below conditional of microbenchmark being present.
michaelweylandt
May 30, 2018
•
Author
Contributor
Are you sure? rbenchmark wasn't require()'d previously and the below scripts weren't conditional. Even though this is in inst/examples, R CMD check doesn't register it as an "example" (in the sense of getting checked automatically) so this wasn't a problem. I'm happy to add it, just want to make sure you're ok with adding another soft-dependency first.
Are you sure? rbenchmark wasn't require()'d previously and the below scripts weren't conditional. Even though this is in inst/examples, R CMD check doesn't register it as an "example" (in the sense of getting checked automatically) so this wasn't a problem. I'm happy to add it, just want to make sure you're ok with adding another soft-dependency first.
eddelbuettel
May 30, 2018
•
Member
I am not; I just noticed it was not in DESCRIPTION but then again inst/examples/ may not get parsed.
But I also used to use this wrong code idiom of just require() which does NOT reliably fail / skip in the absence of the tested package. I have changed this since in some places.
So with that, I would still like you to condition on microbenchmark being present as it is a conditional dependency via Suggests. See what we do here in tests/doRUnit.R:
if(require("RUnit", quietly = TRUE)) {
# ...
That is the only correct way for Suggests.
I am not; I just noticed it was not in DESCRIPTION but then again inst/examples/ may not get parsed.
But I also used to use this wrong code idiom of just require() which does NOT reliably fail / skip in the absence of the tested package. I have changed this since in some places.
So with that, I would still like you to condition on microbenchmark being present as it is a conditional dependency via Suggests. See what we do here in tests/doRUnit.R:
if(require("RUnit", quietly = TRUE)) {
# ...That is the only correct way for Suggests.
michaelweylandt
May 30, 2018
Author
Contributor
Ok, done.
I left RcppArmadillo and RcppGSL out of "Suggests" unless you want them included also.
Ok, done.
I left RcppArmadillo and RcppGSL out of "Suggests" unless you want them included also.
|
Thank you! Almost there -- the |
|
Now you must add |
- Move benchmark script into a
`if(require("microbenchmark"))` block
so script works even if microbenchmark
is not installed
- Add microbenchmark to `Suggests`
- Add r-cran-microbenchmark to .travis.yml
|
Sorry about that - I didn't get a build-failed email from Travis. Done now -- Travis passes with a |
|
Yes, they are constants. In the setup I prefer at Travis, vignettes are skipped, and you get idea about directory sizes. |
The RcppEigen benchmark script calls a C++ function which is no longer provided by RcppArmadillo (since mid-August 2017). This updates the benchmark script to call the RcppAttributes-generated entry point
_RcppArmadillo_fastLm_impl