New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update call to RcppArmadillo_fastLm_impl #56

Merged
merged 3 commits into from May 31, 2018

Conversation

Projects
None yet
2 participants
@michaelweylandt
Contributor

michaelweylandt commented May 30, 2018

The RcppEigen benchmark script calls a C++ function which is no longer provided by RcppArmadillo (since mid-August 2017). This updates the benchmark script to call the RcppAttributes-generated entry point _RcppArmadillo_fastLm_impl

$ R
The following objects are masked from package:base:

    F, pi, T

> # Before
> source("https://raw.githubusercontent.com/RcppCore/RcppEigen/master/inst/examples/lmBenchmark.R")
lm benchmark for n = 100000 and p = 40: nrep = 20
Error in .Call("RcppArmadillo_fastLm", mm, y, PACKAGE = "RcppArmadillo") :
  "RcppArmadillo_fastLm" not available for .Call() for package "RcppArmadillo"
Timing stopped at: 0.002 0 0.001

> # After
> source("https://raw.githubusercontent.com/michaelweylandt/RcppEigen/mw/update_armadillo_call_in_lmBenchmark/inst/examples/lmBenchmark.R")
lm benchmark for n = 100000 and p = 40: nrep = 20
      test relative elapsed user.self sys.self
3     LDLt    1.000   0.321     0.320    0.000
8      LLt    1.012   0.325     0.325    0.000
6  SymmEig    2.246   0.721     0.715    0.006
7       QR    7.255   2.329     2.329    0.000
2    PivQR    7.695   2.470     2.463    0.000
1   lm.fit   13.838   4.442     4.434    0.007
9     arma   18.209   5.845     5.843    0.000
4    GESDD   31.931  10.250    10.221    0.023
5      SVD   56.106  18.010    17.743    0.266
10     GSL  115.374  37.035    36.920    0.112
Update call to RcppArmadillo_fastLm_impl
- RcppArmadillo's underlying implementation of
  fastLm was renamed to fastLm_impl in
  RcppCore/RcppArmadillo#164 (mid-August 2017)
- Call the RcppAttributes generated C++
  function directly
@@ -37,7 +37,7 @@ exprs$QR <- expression(.Call("RcppEigen_fastLm_Impl", mm, y, 1L, PACKAGE="RcppEi
exprs$LLt <- expression(.Call("RcppEigen_fastLm_Impl", mm, y, 3L, PACKAGE="RcppEigen"))
if (suppressMessages(require("RcppArmadillo", character=TRUE, quietly=TRUE))) {
exprs$arma <- expression(.Call("RcppArmadillo_fastLm", mm, y, PACKAGE="RcppArmadillo"))
exprs$arma <- expression(.Call("_RcppArmadillo_fastLm_impl", mm, y, PACKAGE="RcppArmadillo"))

This comment has been minimized.

@eddelbuettel

eddelbuettel May 30, 2018

Member

Bigger question is why we even call this way by .Call() anymore ...

This comment has been minimized.

@michaelweylandt

michaelweylandt May 30, 2018

Contributor

Minimizing non-C++ overhead?

Happy to rewrite everything to call Rcpp{Eigen,Armadillo,GSL}::fastLmPure if you'd rather that.

This comment has been minimized.

@eddelbuettel

eddelbuettel May 30, 2018

Member

I was mostly just thinking out aloud but if you wouldn't mind running a quick check?

Now that we have registration on, there should be little overhead. And yes, I would take an enlarged PR ...

This comment has been minimized.

@michaelweylandt

michaelweylandt May 30, 2018

Contributor

I took a quick look at this. The results from calling fastLmPure are basically the same, and might even be a bit faster than the current approach.

From playing around with it, it feels like the fastLmPure approach appears to be a bit more random (since it does a bit more), but the additional noise is on the same order of magnitude as the noise in the benchmarks themselves.

Unless it's important to remove all additional sources of noise from the benchmarks, I'd go with fastLmPure. If getting optimal benchmarks is important, using the higher-precision timers from microbenchmark or bench might be a good idea.

Here's three runs (100 replicates per run) of just the two Cholesky approaches:

> source("lmBenchmark-short.R")
lm benchmark for n = 100000 and p = 40: nrep = 100
   test relative elapsed user.self sys.self
2 LDLt2    1.000   1.683     1.683    0.000
4  LLt2    1.011   1.701     1.701    0.000
1  LDLt    1.014   1.706     1.706    0.000
3   LLt    1.017   1.712     1.708    0.004
> source("lmBenchmark-short.R")
lm benchmark for n = 100000 and p = 40: nrep = 100
   test relative elapsed user.self sys.self
2 LDLt2    1.000   1.580     1.544    0.036
4  LLt2    1.001   1.582     1.546    0.036
3   LLt    1.005   1.588     1.553    0.035
1  LDLt    1.007   1.591     1.560    0.032
> source("lmBenchmark-short.R")
lm benchmark for n = 100000 and p = 40: nrep = 100
   test relative elapsed user.self sys.self
3   LLt    1.000   1.710     1.643    0.066
1  LDLt    1.019   1.742     1.738    0.004
2 LDLt2    1.037   1.774     1.773    0.000
4  LLt2    1.039   1.777     1.778    0.000 

The "2" versions are the ones calling the appropriate fastLmPure instead of .Call directly.

Here's the result of three runs on all methods :

> source("lmBenchmark-mod.R")
lm benchmark for n = 100000 and p = 40: nrep = 20
       test relative elapsed user.self sys.self
4      LDLt    1.000   0.325     0.324    0.000
15     LLt2    1.006   0.327     0.327    0.000
14      LLt    1.012   0.329     0.329    0.000
5     LDLt2    1.015   0.330     0.331    0.000
11 SymmEig2    2.222   0.722     0.723    0.000
10  SymmEig    2.255   0.733     0.723    0.009
13      QR2    7.163   2.328     2.328    0.000
12       QR    7.172   2.331     2.331    0.000
2     PivQR    7.455   2.423     2.413    0.011
3    PivQR2    7.458   2.424     2.419    0.005
1    lm.fit   14.597   4.744     4.641    0.102
17    arma2   17.588   5.716     5.716    0.000
16     arma   17.649   5.736     5.735    0.001
6     GESDD   31.440  10.218     9.974    0.243
7    GESDD2   31.658  10.289    10.034    0.254
8       SVD   54.985  17.870    17.356    0.511
9      SVD2   55.018  17.881    17.366    0.513
19     GSL2  108.926  35.401    35.273    0.124
18      GSL  111.711  36.306    36.186    0.117
> source("lmBenchmark-mod.R")
lm benchmark for n = 100000 and p = 40: nrep = 20
       test relative elapsed user.self sys.self
15     LLt2    1.000   0.312     0.311    0.001
14      LLt    1.006   0.314     0.313    0.000
5     LDLt2    1.026   0.320     0.320    0.000
4      LDLt    1.048   0.327     0.327    0.000
11 SymmEig2    2.237   0.698     0.696    0.002
10  SymmEig    2.263   0.706     0.697    0.009
13      QR2    7.288   2.274     2.272    0.001
12       QR    7.304   2.279     2.276    0.002
2     PivQR    7.667   2.392     2.392    0.000
3    PivQR2    7.679   2.396     2.396    0.000
1    lm.fit   14.019   4.374     4.368    0.006
17    arma2   16.888   5.269     5.267    0.002
16     arma   17.119   5.341     5.338    0.003
7    GESDD2   30.394   9.483     9.448    0.034
6     GESDD   30.532   9.526     9.491    0.034
8       SVD   56.388  17.593    17.309    0.283
9      SVD2   59.721  18.633    18.320    0.312
19     GSL2  110.465  34.465    34.346    0.117
18      GSL  111.256  34.712    34.588    0.120
> source("lmBenchmark-mod.R")
lm benchmark for n = 100000 and p = 40: nrep = 20
       test relative elapsed user.self sys.self
4      LDLt    1.000   0.308     0.306    0.002
5     LDLt2    1.010   0.311     0.311    0.000
14      LLt    1.019   0.314     0.314    0.000
15     LLt2    1.036   0.319     0.319    0.000
10  SymmEig    2.221   0.684     0.683    0.001
11 SymmEig2    2.237   0.689     0.688    0.000
12       QR    7.292   2.246     2.246    0.000
13      QR2    7.315   2.253     2.253    0.000
2     PivQR    7.500   2.310     2.307    0.002
3    PivQR2    7.506   2.312     2.311    0.001
1    lm.fit   14.221   4.380     4.352    0.028
16     arma   17.237   5.309     5.308    0.001
17    arma2   17.354   5.345     5.344    0.000
6     GESDD   30.114   9.275     9.268    0.007
7    GESDD2   30.253   9.318     9.315    0.002
8       SVD   55.571  17.116    16.872    0.242
9      SVD2   57.045  17.570    17.320    0.249
18      GSL  111.227  34.258    34.139    0.115
19     GSL2  114.195  35.172    35.049    0.120
@michaelweylandt

This comment has been minimized.

Contributor

michaelweylandt commented May 30, 2018

Pushed a new version which i) uses the fastLmPure functions and ii) switches to microbenchmark

Here's example output:

> source("lmBenchmark.R")
lm benchmark for n = 100000 and p = 40: nrep = 20
    Method   Relative Elapsed (milliseconds)
4     LDLt   1.000000               16.59547
5      LLt   1.001358               16.61801
10 SymmEig   2.177705               36.14004
8       QR   7.230222              119.98892
7    PivQR   7.772614              128.99018
6   lm.fit  13.522861              224.41823
1     arma  17.717057              294.02287
2    GESDD  31.514008              522.98975
9      SVD  56.295458              934.24953
3      GSL 112.837376             1872.58917
Update benchmark script
- Use `microbenchmark` instead of `rbenchmark`

  Report relative and absolute timings

- Use `fastLmPure` functions instead of calling
  C++ directly
@@ -5,7 +5,7 @@
## This file is part of RcppEigen.
require("stats", character=TRUE, quietly=TRUE)
require("rbenchmark", character=TRUE, quietly=TRUE)
require("microbenchmark", character=TRUE, quietly=TRUE)

This comment has been minimized.

@eddelbuettel

eddelbuettel May 30, 2018

Member

We need to add it to Suggests, and we cannot "just" require it. Please make the whole block below conditional of microbenchmark being present.

This comment has been minimized.

@michaelweylandt

michaelweylandt May 30, 2018

Contributor

Are you sure? rbenchmark wasn't require()'d previously and the below scripts weren't conditional. Even though this is in inst/examples, R CMD check doesn't register it as an "example" (in the sense of getting checked automatically) so this wasn't a problem. I'm happy to add it, just want to make sure you're ok with adding another soft-dependency first.

This comment has been minimized.

@eddelbuettel

eddelbuettel May 30, 2018

Member

I am not; I just noticed it was not in DESCRIPTION but then again inst/examples/ may not get parsed.

But I also used to use this wrong code idiom of just require() which does NOT reliably fail / skip in the absence of the tested package. I have changed this since in some places.

So with that, I would still like you to condition on microbenchmark being present as it is a conditional dependency via Suggests. See what we do here in tests/doRUnit.R:

if(require("RUnit", quietly = TRUE)) {
    # ...

That is the only correct way for Suggests.

This comment has been minimized.

@michaelweylandt

michaelweylandt May 30, 2018

Contributor

Ok, done.

I left RcppArmadillo and RcppGSL out of "Suggests" unless you want them included also.

@eddelbuettel

This comment has been minimized.

Member

eddelbuettel commented May 30, 2018

Thank you! Almost there -- the microbenchmark use requires a bit more care. Maybe tarball it at your end and run R CMD check --as-cran on it for good measure.

@eddelbuettel

This comment has been minimized.

Member

eddelbuettel commented May 31, 2018

Now you must add r-cran-microbenchmark here to appease Travis.

Condition benchmarking on microbenchmark package
- Move benchmark script into a
  `if(require("microbenchmark"))` block
  so script works even if microbenchmark
  is not installed
- Add microbenchmark to `Suggests`
- Add r-cran-microbenchmark to .travis.yml
@michaelweylandt

This comment has been minimized.

Contributor

michaelweylandt commented May 31, 2018

Sorry about that - I didn't get a build-failed email from Travis.

Done now -- Travis passes with a WARNING about missing vignette source and a NOTE about large directories. Both appear to be unrelated to this change.

@eddelbuettel

This comment has been minimized.

Member

eddelbuettel commented May 31, 2018

Yes, they are constants. In the setup I prefer at Travis, vignettes are skipped, and you get idea about directory sizes.

@eddelbuettel eddelbuettel merged commit 19ce275 into RcppCore:master May 31, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@michaelweylandt michaelweylandt deleted the michaelweylandt:mw/update_armadillo_call_in_lmBenchmark branch Jun 7, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment