Inclusion of PRIMA solvers in lme4 #744

zaikunzhang · 2023-09-20T03:24:44Z

Dear lme4 maintainers

This is Dr. Zaikun Zhang from The Hong Kong Polytechnic University. Together with Professor N.I.M. Gould, I am responsible for maintaining the renowned derivative-free optimization solvers of the late Professor M.J.D. Powell, namely COBYLA, UOBYQA, NEWUOA, BOBYQA, and LINCOA. I am the author of PRIMA, which provides the reference implementation for these solvers. They are widely used by engineers and scientists. For instance, see Section 1 of a recent paper on Powell's solvers as well as the Google searches of COBYLA and BOBYQA.

I note that lme4 provides COBYLA and BOBYQA. However, they are based on a version corresponding to the old Fortran 77 implementation of these solvers, which is not maintained anymore. Even though the original Fortran 77 implementation is truly a masterpiece, it contains many bugs (mostly due to the language itself), which can lead to segmentation faults or infinite loops. For example, see Section 4.4 of the above paper and many GitHub issues. It is strongly discouraged to use the Fortran 77 version of these solvers anymore.

That said, it might be desirable to include the PRIMA implementation of these solvers into lme4 via an R interface, in order to make these renowned solvers available to the R community. I will be happy to assist on the Fortran side if you would like to do so.

Thanks and regards,
Zaikun ZHANG
Ph.D. and Assistant Professor
Dept. App. Math., Hong Kong Polytechnic University

bbolker · 2023-09-21T16:15:11Z

This is a really interesting idea. Quick question: are the versions of these solvers that are available through the NLopt library also based on old code, or do they include your updates?

zaikunzhang · 2023-09-21T16:19:07Z

Hi @bbolker ,

Thank you for the response. The NLopt library is based on a C translation (by f2c) of the original Fortran 77 implementation, the old code by Professor Powell.

Thanks,
Zaikun

bbolker · 2023-09-21T17:51:18Z

Thanks. I definitely like the idea of wrapping the PRIMA functions to make them available to the R community, although I will say in advance that it won't necessarily be easy (e.g., interfacing R objective functions with the Fortran code, making sure that the Fortran code is compliant with all the standards insisted on by the CRAN (R package archive) maintainers, etc ...

zaikunzhang · 2023-09-21T18:05:50Z

I fully understand and agree. It is the objective of PRIMA to make these powerful solvers available to everyone in her/his favorite languages. This objective, of course, as you mentioned, is nontrivial. But I will work towards this direction. However, since I cannot take care of everything even if I want to, I will also try to communicate with major players and leaders in the respective communities (R, Python, Julia, ...) like you to see whether there is some interest on your side. You are the experts anyway.

BTW, I personaly do not think the old Fortran 77 code or its machine-generated C code has a better shape than the modernized PRIMA implementation. If the former can meet the standards, then the latter will likely. Of course, I am biased :)

dmbates · 2024-02-03T22:58:26Z

@zaikunzhang I would note that I have tried the PRIMA version of BOBYQA in the MixedModels.jl package for Julia using your PRIMA.jl package to replace the NLopt version. It is much easier to configure the optimization problem with PRIMA than with the NLopt version. I did find that the PRIMA implementation of BOBYQA took more iterations to converge but that could be a matter of the convergence tolerances, etc.

zaikunzhang · 2024-02-04T02:18:00Z

@zaikunzhang I would note that I have tried the PRIMA version of BOBYQA in the MixedModels.jl package for Julia using your PRIMA.jl package to replace the NLopt version. It is much easier to configure the optimization problem with PRIMA than with the NLopt version.

Thank you @dmbates for your comments. I am glad that you found PRIMA.jl easy to use.

I did find that the PRIMA implementation of BOBYQA took more iterations to converge but that could be a matter of the convergence tolerances, etc.

Would it be possible for you to show us how you called PRIMA, and what was the exact output? Indeed, we never report the number of "iterations", because it is a concept irrelevant in derivative-free optimization. We are only concerned about the number of function evaluations and the quality of the solution. See https://github.com/orgs/libprima/discussions/145 for more elaboration. I am curious about what you really observed. Thank you.

dmbates · 2024-02-05T20:39:52Z

Thanks. I definitely like the idea of wrapping the PRIMA functions to make them available to the R community, although I will say in advance that it won't necessarily be easy (e.g., interfacing R objective functions with the Fortran code, making sure that the Fortran code is compliant with all the standards insisted on by the CRAN (R package archive) maintainers, etc ...

I'm not sure if this came up in other parts of this discussion but there is a C API for libprima, which is what was used in PRIMA.jl. As a result, the interfacing part of the PRIMA.jl code is reasonably straightforward. The only unnatural part of the C API is that the objective function is defined to return void rather than the value (and the value is written as the contents of a pointer passed as an argument).

dmbates · 2024-02-05T21:00:46Z

I have been experimenting with using the optimizers from PRIMA.jl instead of the NLopt.jl optimizers in the db/PRIMA branch of MixedModels.jl.

The results are encouraging.

For linear mixed models the more difficult model fits often require considerably fewer objective evaluations with PRIMA. The differences in the objective at the minimum for the two implementations of BOBYQA are either negligible (less than 0.001 for this objective) or favor PRIMA. In only one case did the PRIMA result exceed the NLopt result by more than 0.001 (column fdiff in the table below, expressed as NLopt objective - PRIMA objective).
The ediff column is the difference in the number of evaluations of the objective (also expressed as NLopt - PRIMA so positive values indicate that PRIMA is doing better).

You will need to look at the original code to see the data sets and the models being fit.

% julia -t6 --project ./test/PRIMA.jl
17×6 DataFrame
 Row │ ds          frm    Pfinal         fdiff         Peval  ediff 
     │ Symbol      Int64  Float64        Float64       Int64  Int64 
─────┼──────────────────────────────────────────────────────────────
   1 │ sleepstudy      2   1752.0         1.68939e-9      39     -9
   2 │ sleepstudy      3   1752.0         1.68939e-9      39     -9
   3 │ sleepstudy      4   1751.94       -5.84648e-8      61     -4
   4 │ kb07            1  28823.1        -1.53887e-9      31     -3
   5 │ sleepstudy      1   1794.08       -5.68434e-12     16     -1
   6 │ dyestuff        1    327.327       1.3813e-11      18      0
   7 │ penicillin      1    332.188       3.48973e-9      44      0
   8 │ kb07            2  28658.5         1.05138e-8      80      1
   9 │ pastes          1    248.402       2.09809e-10     16      3
  10 │ pastes          2    247.994       3.52514e-10     29      6
  11 │ insteval        2      2.37586e5  -1.45409e-6      39      9
  12 │ dyestuff2       1    162.873       0.0              8     10
  13 │ oxide           2    453.227       2.04803e-8      87     14
  14 │ oxide           1    458.677      -2.27972e-8      26     26
  15 │ insteval        1      2.37722e5  -2.48838e-7      61     57
  16 │ kb07            3  28637.1         0.847336       555    338
  17 │ d3              1      8.84958e5  -0.00354616     720    552

Even more interesting is some experimentation with the optimization for Generalized Linear Mixed Models, as fit by lme4::glmer. The optimization in the more accurate versions of the model fits is with respect to an extended parameter vector involving both covariance parameters for the random effects and coefficients of the fixed-effects. I discovered that the optional parameter scale in the PRIMA.jl optimizers is very helpful in these cases. The covariance parameters have a natural scale but the coefficients don't. However you have a prior guess at the coefficients (from the GLM w/o random effects) that allows you to scale the problem. That is very helpful in obtaining faster, stable convergence.

zaikunzhang · 2024-02-05T21:24:46Z

Hi @dmbates , thank you very much for these very encouraging experiments!

What kind of constraints does your problems have? I suppose they only have bound constrains.

cc @emmt and @amontoison, who coded the Julia binding for the Fortran backend of PRIMA.

dmbates · 2024-02-06T17:07:58Z

@zaikunzhang The constraints in this type of problem are simple: a subset of the parameters are required to be non-negative. I re-did the table adding k: the dimension of the parameter vector, and knn: the number of parameters that must be non-negative.

% julia --project ./test/PRIMA.jl 
17×8 DataFrame
 Row │ ds          frm    k      knn    Pfinal         fdiff         Peval  ediff 
     │ Symbol      Int64  Int64  Int64  Float64        Float64       Int64  Int64 
─────┼────────────────────────────────────────────────────────────────────────────
   1 │ sleepstudy      2      2      2   1752.0         1.68939e-9      39     -9
   2 │ sleepstudy      3      2      2   1752.0         1.68939e-9      39     -9
   3 │ sleepstudy      4      3      2   1751.94       -5.84648e-8      61     -4
   4 │ kb07            1      2      2  28823.1        -1.53887e-9      31     -3
   5 │ sleepstudy      1      1      1   1794.08       -5.68434e-12     16     -1
   6 │ dyestuff        1      1      1    327.327       1.3813e-11      18      0
   7 │ penicillin      1      2      2    332.188       3.48973e-9      44      0
   8 │ kb07            2      4      3  28658.5         1.05138e-8      80      1
   9 │ pastes          1      1      1    248.402       2.09809e-10     16      3
  10 │ pastes          2      2      2    247.994       3.52514e-10     29      6
  11 │ insteval        2      2      2      2.37586e5  -1.45409e-6      39      9
  12 │ dyestuff2       1      1      1    162.873       0.0              8     10
  13 │ oxide           2      6      4    453.227       2.04803e-8      87     14
  14 │ oxide           1      2      2    458.677      -2.27972e-8      26     26
  15 │ insteval        1      3      3      2.37722e5  -2.48838e-7      61     57
  16 │ kb07            3     20      8  28637.1         0.847336       555    338
  17 │ d3              1      9      6      8.84958e5  -0.00354616     720    552

The first two rows are identical because the models are identical but generated from slightly different formulas.

Many of these problems are trivial as optimization problems but some toward the bottom of the table are of interest.

zaikunzhang · 2024-02-06T18:02:17Z

Thank you @dmbates . Are the nonnegtivity constraints relaxiable in the sense that the objective function is still well defined if these constraints are violated? If yes, you may try LINCOA as well. It is an infeasible method, but it may use less function evaluations.

Thank you.

bbolker · 2024-02-06T18:50:09Z

Generally not relatable ...

dmbates · 2024-02-06T20:08:12Z

@zaikunzhang It is possible to evaluate the objective when the non-negativity constraints are violated but such a case corresponds to another set of parameters that do not violate the constraints. In the straightforward cases we are optimizing an objective with respect to the elements in the lower triangle of the Cholesky factor of a positive semi-definite symmetric matrix. For definiteness we require the elements on the diagonal of the factor to be non-negative. There are occasions where models will converge to a singular Cholesky factor and we want to know that.

We could remove the non-negativity constrains on the diagonal elements of the factor but the objective for parameters providing negative diagonal elements in a Cholesky factor is the same as the factor with all the signs in those column(s) being flipped.

In statistical terms, we are estimating one or more covariance matrices parameterized by their Cholesky factor. It is the multivariate analogue of estimating a variance parameterized by its standard deviation, which must be non-negative.

dmbates · 2024-02-06T20:23:42Z

@zaikunzhang Did you mean to suggest NEWUOA for the unconstrained problem when you wrote LINCOA?

dmbates · 2024-02-06T20:32:14Z

@zaikunzhang Here is the table using NEWUOA instead of BOBYQA with the non-negativity constraints. As you can see the difficult problems (kb07 formula 3, oxide formula 2, d3 formula 1) get to a comparable optimum but take many more evaluations.

% julia -t4 --project ./test/PRIMA.jl 
17×8 DataFrame
 Row │ ds          frm    k      knn    Pfinal         fdiff         Peval  ediff 
     │ Symbol      Int64  Int64  Int64  Float64        Float64       Int64  Int64 
─────┼────────────────────────────────────────────────────────────────────────────
   1 │ kb07            3     20      8  28637.1         0.848324      1047   -154
   2 │ oxide           2      6      4    453.227       1.95279e-8     126    -25
   3 │ sleepstudy      2      2      2   1752.0        -1.04978e-9      45    -15
   4 │ sleepstudy      3      2      2   1752.0        -1.04978e-9      45    -15
   5 │ kb07            1      2      2  28823.1        -5.45697e-9      35     -7
   6 │ sleepstudy      1      1      1   1794.08       -4.3201e-12      17     -2
   7 │ sleepstudy      4      3      2   1751.94        1.0607e-9       59     -2
   8 │ pastes          1      1      1    248.402       1.61265e-10     18      1
   9 │ kb07            2      4      3  28658.5        -2.11552e-7      79      2
  10 │ dyestuff        1      1      1    327.327       2.95586e-12     16      2
  11 │ pastes          2      2      2    247.994       5.64711e-10     32      3
  12 │ penicillin      1      2      2    332.188       3.43192e-9      40      4
  13 │ dyestuff2       1      1      1    162.873       0.0             13      5
  14 │ insteval        2      2      2      2.37586e5  -5.50062e-9      41      7
  15 │ oxide           1      2      2    458.677      -5.72498e-9      31     21
  16 │ insteval        1      3      3      2.37722e5  -2.31026e-7      79     39
  17 │ d3              1      9      6      8.84958e5  -0.00299697     898    374

The dyestuff2 example does converge on the boundary using BOBYQA and provides the same fit with NEWUOA.

bbolker · 2024-02-06T23:26:34Z

@dmbates, thanks for the clarification above - it made me realize I've been thinking about this too loosely, and that we could think of points where any of the diagonal elements of the Cholesky vector are zero not as being on the boundary of the feasible space, but rather as being at points where (unless the first derivatives happen to go to zero at the point) the maximum has a cusp rather than a well-defined curvature ...

zaikunzhang · 2024-02-06T23:43:23Z

@zaikunzhang Did you mean to suggest NEWUOA for the unconstrained problem when you wrote LINCOA?

I did mean LINCOA. It can handle bound constraints (as well as linear constraints), but the iterates may violate the bounds, which are expected to be satisfied asymptotically.

NEWUOA cannot handle constraints. NLopt extends it to bound-constrained problems by truncating the variables when they volatile the bounds (if my memory is correct), which seems quite heuristic.

zaikunzhang · 2024-02-06T23:46:17Z

Here is the table using NEWUOA instead of BOBYQA with the non-negativity constraints.

How did you do this? I suppose you are using NLopt, right?

PRIMA does not and will not make NEWUOA handle bounds.

dmbates · 2024-02-07T15:15:24Z

@zaikunzhang I misunderstood your initial suggestion to mean that I should try removing the bounds altogether and apply NEWUOA to the unconstrained problem. Please ignore that table.

Thank you for your clarification about LINCOA handling the constraints differently and possibly achieving convergence in fewer evaluations. It turns out that LINCOA does significantly worse than the PRIMA implementation of BOBYQA in some of these cases

% julia -t4 --project ./test/PRIMA.jl
17×8 DataFrame
 Row │ ds          frm    k      knn    Pfinal         fdiff         Peval  ediff 
     │ Symbol      Int64  Int64  Int64  Float64        Float64       Int64  Int64 
─────┼────────────────────────────────────────────────────────────────────────────
   1 │ sleepstudy      2      2      2   1752.0         1.68143e-9      52    -22
   2 │ sleepstudy      3      2      2   1752.0         1.68143e-9      52    -22
   3 │ oxide           2      6      4    453.227       2.10183e-8     121    -20
   4 │ kb07            1      2      2  28823.1         1.81899e-11     38    -10
   5 │ sleepstudy      4      3      2   1751.94       -2.01114e-8      66     -9
   6 │ sleepstudy      1      1      1   1794.08       -7.50333e-12     17     -2
   7 │ pastes          1      1      1    248.402       3.02748e-10     18      1
   8 │ penicillin      1      2      2    332.188       3.65469e-9      43      1
   9 │ dyestuff        1      1      1    327.327       1.08002e-12     16      2
  10 │ pastes          2      2      2    247.994       3.45949e-10     32      3
  11 │ dyestuff2       1      1      1    162.873       0.0             12      6
  12 │ insteval        2      2      2      2.37586e5  -4.57512e-8      40      8
  13 │ kb07            2      4      3  28658.5         1.04083e-8      72      9
  14 │ oxide           1      2      2    458.677      -1.53373e-8      31     21
  15 │ insteval        1      3      3      2.37722e5  -7.48842e-7      62     56
  16 │ d3              1      9      6      8.84958e5  -0.029627       900    372
  17 │ kb07            3     20      8  28641.4        -3.40845        401    492

The last line, where LINCOA declares convergence at an objective value 3.408 greater than that from NLopt's bobyqa, means that we would not want to use this approach.

My thanks for your taking the time to respond on this issue. We look forward to using libprima and PRIMA.jl in out MixedModels.jl package.

zaikunzhang mentioned this issue Oct 26, 2023

Add R binding libprima/prima#104

Open

zaikunzhang mentioned this issue Jan 8, 2024

Add more optimisers! samuel-watson/glmmrBase#34

Open

2 tasks

dmbates mentioned this issue Feb 7, 2024

Switch to PRIMA.bobyqa for optimization? JuliaStats/MixedModels.jl#742

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inclusion of PRIMA solvers in lme4 #744

Inclusion of PRIMA solvers in lme4 #744

zaikunzhang commented Sep 20, 2023 •

edited

bbolker commented Sep 21, 2023

zaikunzhang commented Sep 21, 2023

bbolker commented Sep 21, 2023

zaikunzhang commented Sep 21, 2023 •

edited

dmbates commented Feb 3, 2024

zaikunzhang commented Feb 4, 2024 •

edited

dmbates commented Feb 5, 2024

dmbates commented Feb 5, 2024

zaikunzhang commented Feb 5, 2024 •

edited

dmbates commented Feb 6, 2024

zaikunzhang commented Feb 6, 2024 •

edited

bbolker commented Feb 6, 2024

dmbates commented Feb 6, 2024

dmbates commented Feb 6, 2024

dmbates commented Feb 6, 2024

bbolker commented Feb 6, 2024

zaikunzhang commented Feb 6, 2024

zaikunzhang commented Feb 6, 2024

dmbates commented Feb 7, 2024

Inclusion of PRIMA solvers in lme4 #744

Inclusion of PRIMA solvers in lme4 #744

Comments

zaikunzhang commented Sep 20, 2023 • edited

bbolker commented Sep 21, 2023

zaikunzhang commented Sep 21, 2023

bbolker commented Sep 21, 2023

zaikunzhang commented Sep 21, 2023 • edited

dmbates commented Feb 3, 2024

zaikunzhang commented Feb 4, 2024 • edited

dmbates commented Feb 5, 2024

dmbates commented Feb 5, 2024

zaikunzhang commented Feb 5, 2024 • edited

dmbates commented Feb 6, 2024

zaikunzhang commented Feb 6, 2024 • edited

bbolker commented Feb 6, 2024

dmbates commented Feb 6, 2024

dmbates commented Feb 6, 2024

dmbates commented Feb 6, 2024

bbolker commented Feb 6, 2024

zaikunzhang commented Feb 6, 2024

zaikunzhang commented Feb 6, 2024

dmbates commented Feb 7, 2024

zaikunzhang commented Sep 20, 2023 •

edited

zaikunzhang commented Sep 21, 2023 •

edited

zaikunzhang commented Feb 4, 2024 •

edited

zaikunzhang commented Feb 5, 2024 •

edited

zaikunzhang commented Feb 6, 2024 •

edited