# Accessing and extending the timings

This repository is in the form of an `R` package that can be installed using

In [1]:
devtools::install_github("Stat990-033/Timings")

Downloading github repo Stat990-033/Timings@master
Installing Timings
Skipping 1 packages ahead of CRAN: lme4
'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore CMD  \
  INSTALL '/tmp/Rtmp5T7A6q/devtools64a3e2641c8/Stat990-033-Timings-2958748'  \
  --library='/home/bates/R/x86_64-unknown-linux-gnu-library/3.2'  \
  --install-tests 



The timing data are stored as [JSON](http://json.org) (Javascript Object Notation) files that can be read into `R` using the [jsonlite](http://cran.rstudio.com/web/packages/jsonlite/index.html) package and into [Julia](https://julialang.org) using the 
[JSON](https://github.org/JuliaLang/JSON.jl) package.  In `R` the `.json` files are in the directory

In [2]:
(flist <- list.files(system.file("JSON",package="Timings")))

The `extractor` function in this package returns a summary of the model fits as list of data frames.  The rows of the data frames are sorted according to the minimum deviance achieved by the optimizer (rounded to 2 digits - we assume that differences less than 0.01 in the deviance are negligible) and by elapsed time within deviance value.

In [5]:
library(Timings)
bs10times <- 
  extractor(system.file("JSON","BS10.json",package="Timings"))
bs10times[[1]]

Unnamed: 0,dev,optimizer,func,time,feval
20,1030.96,LD_LBFGS,lmm,0.4845596,25.0
26,1030.96,LD_VAR2,lmm,0.5328337,28.0
25,1030.96,LD_VAR1,lmm,0.5457067,28.0
24,1030.96,LD_TNEWTON_PRECOND_RESTART,lmm,0.8387316,44.0
23,1030.96,LD_TNEWTON_RESTART,lmm,0.8753519,46.0
22,1030.96,LD_TNEWTON_PRECOND,lmm,0.9104886,48.0
21,1030.96,LD_TNEWTON,lmm,1.426317,76.0
19,1030.96,LD_SLSQP,lmm,1.878597,181.0
12,1030.96,LN_BOBYQA,lmm,1.917175,289.0
17,1030.96,LD_MMA,lmm,2.200292,56.0


The `func` column indicates whether the _Julia_ function `lmm` or the _R_ function `lmer` was used to fit the model.  The optimizer names for `lmm` are those from the [NLopt](http://github.com/JuliaOpt/NLopt.jl) package.  Names beginning with `LD_` are local, derivative-based optimizers.  Those beginning with `LN_` are local, non-derivative-based optimizers.  For `lmer` the names beginning with `NLOPT_LN_` are the same derivative-free optimizers, accessed through the [nloptr](http://cran.rstudio.com/web/packages/nloptr/index.html) package.  The Julia function uses stricter convergence criteria, which is why it requires more function evaluations than the corresponding optimization in R.  The default optimizers in `lmer`, `bobyqa` and `Nelder_Mead`, failed to converge to the global optimum on this example, as did `LN_PRAXIS` in _Julia_ and in _R_.  The `LN_NELDERMEAD` optimizer also failed to converge to the global optimum but the deviance was closer to the best value achieved by other optimizers.  Interestingly `LN_BOBYQA` converged whereas Powell's original implementation, available as `bobyqa` or `optimx:bobyqa`, did not converge to the global optimum.  The `Nelder_Mead` optimizer was terrible (I can say that because I wrote that implementation).

The first model fit to these data is a "maximal" model in the sense of Barr et al.(2012) and, like most such models, is overparameterized.  The second is a "zero correlation parameter" model as described by Kliegl and, even though it converges to a value on the boundary (some of the variance components are estimated as zero) it is much less problematic.

In [7]:
bs10times[[2]]

Unnamed: 0,dev,optimizer,func,time,feval
12,1080.08,LN_BOBYQA,lmm,0.2159726,124.0
16,1080.08,LN_SBPLX,lmm,0.2508537,340.0
20,1080.08,LD_LBFGS,lmm,0.2816979,17.0
25,1080.08,LD_VAR1,lmm,0.2966682,18.0
26,1080.08,LD_VAR2,lmm,0.3037488,18.0
17,1080.08,LD_MMA,lmm,0.3877286,23.0
14,1080.08,LN_PRAXIS,lmm,0.5777748,785.0
3,1080.08,NLOPT_LN_BOBYQA,lmer,0.784,110.0
13,1080.08,LN_COBYLA,lmm,0.8825582,1205.0
1,1080.08,bobyqa,lmer,1.146,230.0


## Extending the timings

To assess the timings locally, use `retime` with the name of the file.

In [8]:
extractor(system.file("JSON","InstEval.json",package="Timings"))[[2]]

Unnamed: 0,dev,optimizer,func,time,feval
12,237721.8,LN_BOBYQA,lmm,4.877744,115.0
14,237721.8,LN_PRAXIS,lmm,5.469709,132.0
16,237721.8,LN_SBPLX,lmm,7.514635,194.0
15,237721.8,LN_NELDERMEAD,lmm,8.367049,216.0
1,237721.8,bobyqa,lmer,12.611,59.0
3,237721.8,NLOPT_LN_BOBYQA,lmer,17.607,79.0
5,237721.8,NLOPT_LN_PRAXIS,lmer,18.767,87.0
6,237721.8,NLOPT_LN_NELDERMEAD,lmer,22.265,116.0
2,237721.8,Nelder_Mead,lmer,23.638,123.0
8,237721.8,optimx:L-BFGS-B,lmer,26.95,26.0


In [9]:
retime(system.file("JSON","InstEval.json",package="Timings"))
extractor(system.file("JSON","InstEval.json",package="Timings"))

In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model failed to converge with max|grad| = 0.00359818 (tol = 0.002, component 1)

Unnamed: 0,dev,optimizer,func,time,feval
13,237585.5,LN_COBYLA,lmm,3.561462,56.0
14,237585.5,LN_PRAXIS,lmm,3.966381,80.0
15,237585.5,LN_NELDERMEAD,lmm,4.364728,89.0
12,237585.5,LN_BOBYQA,lmm,4.459778,47.0
16,237585.5,LN_SBPLX,lmm,6.35131,136.0
1,237585.5,bobyqa,lmer,10.533,25.0
3,237585.5,NLOPT_LN_BOBYQA,lmer,12.689,33.0
4,237585.5,NLOPT_LN_COBYLA,lmer,13.604,37.0
2,237585.5,Nelder_Mead,lmer,22.831,77.0
11,237585.5,optimx:bobyqa,lmer,23.015,

Unnamed: 0,dev,optimizer,func,time,feval
12,237721.8,LN_BOBYQA,lmm,4.877744,115.0
14,237721.8,LN_PRAXIS,lmm,5.469709,132.0
16,237721.8,LN_SBPLX,lmm,7.514635,194.0
15,237721.8,LN_NELDERMEAD,lmm,8.367049,216.0
1,237721.8,bobyqa,lmer,20.201,61.0
5,237721.8,NLOPT_LN_PRAXIS,lmer,22.262,84.0
3,237721.8,NLOPT_LN_BOBYQA,lmer,22.577,79.0
6,237721.8,NLOPT_LN_NELDERMEAD,lmer,28.244,116.0
2,237721.8,Nelder_Mead,lmer,29.552,123.0
7,237721.8,NLOPT_LN_SBPLX,lmer,37.536,157.0


To run the `lmm` timings install `julia`, version 0.3.8 or later, and add the `MixedModels` package.  The file

In [None]:
file.show(system.file("julia","retime.jl",package="Timings"),
b          pager="cat",title="")

### Adding another data set/model combination

As it is currently set up the `retime` function tries to access the data frame in the `Timings` package itself.  If you want to do timings on data that you can release then create a github pull request to add the data to the `Timings` package.  Also add a JSON file for the data set with the models that are to be fit.  The easiest way to do this is to copy another JSON file to the new name and edit the `dsname` and `formula` entries.  You don't need to take out the existing timings because they will be overwritten when you run `retime`.

For proprietary or confidential data either add the data set to the package locally (by cloning a copy of the repository and running `R CMD build` and `R CMD INSTALL` after modifying the repository) or submit a pull request to make the `retime` function more flexible.