Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inclusion of PRIMA solvers in lme4 #744

Open
zaikunzhang opened this issue Sep 20, 2023 · 19 comments
Open

Inclusion of PRIMA solvers in lme4 #744

zaikunzhang opened this issue Sep 20, 2023 · 19 comments

Comments

@zaikunzhang
Copy link

zaikunzhang commented Sep 20, 2023

Dear lme4 maintainers

This is Dr. Zaikun Zhang from The Hong Kong Polytechnic University. Together with Professor N.I.M. Gould, I am responsible for maintaining the renowned derivative-free optimization solvers of the late Professor M.J.D. Powell, namely COBYLA, UOBYQA, NEWUOA, BOBYQA, and LINCOA. I am the author of PRIMA, which provides the reference implementation for these solvers. They are widely used by engineers and scientists. For instance, see Section 1 of a recent paper on Powell's solvers as well as the Google searches of COBYLA and BOBYQA.

I note that lme4 provides COBYLA and BOBYQA. However, they are based on a version corresponding to the old Fortran 77 implementation of these solvers, which is not maintained anymore. Even though the original Fortran 77 implementation is truly a masterpiece, it contains many bugs (mostly due to the language itself), which can lead to segmentation faults or infinite loops. For example, see Section 4.4 of the above paper and many GitHub issues. It is strongly discouraged to use the Fortran 77 version of these solvers anymore.

That said, it might be desirable to include the PRIMA implementation of these solvers into lme4 via an R interface, in order to make these renowned solvers available to the R community. I will be happy to assist on the Fortran side if you would like to do so.

Thanks and regards,
Zaikun ZHANG
Ph.D. and Assistant Professor
Dept. App. Math., Hong Kong Polytechnic University

@bbolker
Copy link
Member

bbolker commented Sep 21, 2023

This is a really interesting idea. Quick question: are the versions of these solvers that are available through the NLopt library also based on old code, or do they include your updates?

@zaikunzhang
Copy link
Author

Hi @bbolker ,

Thank you for the response. The NLopt library is based on a C translation (by f2c) of the original Fortran 77 implementation, the old code by Professor Powell.

Thanks,
Zaikun

@bbolker
Copy link
Member

bbolker commented Sep 21, 2023

Thanks. I definitely like the idea of wrapping the PRIMA functions to make them available to the R community, although I will say in advance that it won't necessarily be easy (e.g., interfacing R objective functions with the Fortran code, making sure that the Fortran code is compliant with all the standards insisted on by the CRAN (R package archive) maintainers, etc ...

@zaikunzhang
Copy link
Author

zaikunzhang commented Sep 21, 2023

I fully understand and agree. It is the objective of PRIMA to make these powerful solvers available to everyone in her/his favorite languages. This objective, of course, as you mentioned, is nontrivial. But I will work towards this direction. However, since I cannot take care of everything even if I want to, I will also try to communicate with major players and leaders in the respective communities (R, Python, Julia, ...) like you to see whether there is some interest on your side. You are the experts anyway.

BTW, I personaly do not think the old Fortran 77 code or its machine-generated C code has a better shape than the modernized PRIMA implementation. If the former can meet the standards, then the latter will likely. Of course, I am biased :)

@dmbates
Copy link
Member

dmbates commented Feb 3, 2024

@zaikunzhang I would note that I have tried the PRIMA version of BOBYQA in the MixedModels.jl package for Julia using your PRIMA.jl package to replace the NLopt version. It is much easier to configure the optimization problem with PRIMA than with the NLopt version. I did find that the PRIMA implementation of BOBYQA took more iterations to converge but that could be a matter of the convergence tolerances, etc.

@zaikunzhang
Copy link
Author

zaikunzhang commented Feb 4, 2024

@zaikunzhang I would note that I have tried the PRIMA version of BOBYQA in the MixedModels.jl package for Julia using your PRIMA.jl package to replace the NLopt version. It is much easier to configure the optimization problem with PRIMA than with the NLopt version.

Thank you @dmbates for your comments. I am glad that you found PRIMA.jl easy to use.

I did find that the PRIMA implementation of BOBYQA took more iterations to converge but that could be a matter of the convergence tolerances, etc.

Would it be possible for you to show us how you called PRIMA, and what was the exact output? Indeed, we never report the number of "iterations", because it is a concept irrelevant in derivative-free optimization. We are only concerned about the number of function evaluations and the quality of the solution. See https://github.com/orgs/libprima/discussions/145 for more elaboration. I am curious about what you really observed. Thank you.

@dmbates
Copy link
Member

dmbates commented Feb 5, 2024

Thanks. I definitely like the idea of wrapping the PRIMA functions to make them available to the R community, although I will say in advance that it won't necessarily be easy (e.g., interfacing R objective functions with the Fortran code, making sure that the Fortran code is compliant with all the standards insisted on by the CRAN (R package archive) maintainers, etc ...

I'm not sure if this came up in other parts of this discussion but there is a C API for libprima, which is what was used in PRIMA.jl. As a result, the interfacing part of the PRIMA.jl code is reasonably straightforward. The only unnatural part of the C API is that the objective function is defined to return void rather than the value (and the value is written as the contents of a pointer passed as an argument).

@dmbates
Copy link
Member

dmbates commented Feb 5, 2024

I have been experimenting with using the optimizers from PRIMA.jl instead of the NLopt.jl optimizers in the db/PRIMA branch of MixedModels.jl.

The results are encouraging.

For linear mixed models the more difficult model fits often require considerably fewer objective evaluations with PRIMA. The differences in the objective at the minimum for the two implementations of BOBYQA are either negligible (less than 0.001 for this objective) or favor PRIMA. In only one case did the PRIMA result exceed the NLopt result by more than 0.001 (column fdiff in the table below, expressed as NLopt objective - PRIMA objective).
The ediff column is the difference in the number of evaluations of the objective (also expressed as NLopt - PRIMA so positive values indicate that PRIMA is doing better).

You will need to look at the original code to see the data sets and the models being fit.

% julia -t6 --project ./test/PRIMA.jl
17×6 DataFrame
 Row │ ds          frm    Pfinal         fdiff         Peval  ediff 
     │ Symbol      Int64  Float64        Float64       Int64  Int64 
─────┼──────────────────────────────────────────────────────────────
   1 │ sleepstudy      2   1752.0         1.68939e-9      39     -9
   2 │ sleepstudy      3   1752.0         1.68939e-9      39     -9
   3 │ sleepstudy      4   1751.94       -5.84648e-8      61     -4
   4 │ kb07            1  28823.1        -1.53887e-9      31     -3
   5 │ sleepstudy      1   1794.08       -5.68434e-12     16     -1
   6 │ dyestuff        1    327.327       1.3813e-11      18      0
   7 │ penicillin      1    332.188       3.48973e-9      44      0
   8 │ kb07            2  28658.5         1.05138e-8      80      1
   9 │ pastes          1    248.402       2.09809e-10     16      3
  10 │ pastes          2    247.994       3.52514e-10     29      6
  11 │ insteval        2      2.37586e5  -1.45409e-6      39      9
  12 │ dyestuff2       1    162.873       0.0              8     10
  13 │ oxide           2    453.227       2.04803e-8      87     14
  14 │ oxide           1    458.677      -2.27972e-8      26     26
  15 │ insteval        1      2.37722e5  -2.48838e-7      61     57
  16 │ kb07            3  28637.1         0.847336       555    338
  17 │ d3              1      8.84958e5  -0.00354616     720    552

Even more interesting is some experimentation with the optimization for Generalized Linear Mixed Models, as fit by lme4::glmer. The optimization in the more accurate versions of the model fits is with respect to an extended parameter vector involving both covariance parameters for the random effects and coefficients of the fixed-effects. I discovered that the optional parameter scale in the PRIMA.jl optimizers is very helpful in these cases. The covariance parameters have a natural scale but the coefficients don't. However you have a prior guess at the coefficients (from the GLM w/o random effects) that allows you to scale the problem. That is very helpful in obtaining faster, stable convergence.

@zaikunzhang
Copy link
Author

zaikunzhang commented Feb 5, 2024

Hi @dmbates , thank you very much for these very encouraging experiments!

What kind of constraints does your problems have? I suppose they only have bound constrains.

cc @emmt and @amontoison, who coded the Julia binding for the Fortran backend of PRIMA.

@dmbates
Copy link
Member

dmbates commented Feb 6, 2024

@zaikunzhang The constraints in this type of problem are simple: a subset of the parameters are required to be non-negative. I re-did the table adding k: the dimension of the parameter vector, and knn: the number of parameters that must be non-negative.

% julia --project ./test/PRIMA.jl 
17×8 DataFrame
 Row │ ds          frm    k      knn    Pfinal         fdiff         Peval  ediff 
     │ Symbol      Int64  Int64  Int64  Float64        Float64       Int64  Int64 
─────┼────────────────────────────────────────────────────────────────────────────
   1 │ sleepstudy      2      2      2   1752.0         1.68939e-9      39     -9
   2 │ sleepstudy      3      2      2   1752.0         1.68939e-9      39     -9
   3 │ sleepstudy      4      3      2   1751.94       -5.84648e-8      61     -4
   4 │ kb07            1      2      2  28823.1        -1.53887e-9      31     -3
   5 │ sleepstudy      1      1      1   1794.08       -5.68434e-12     16     -1
   6 │ dyestuff        1      1      1    327.327       1.3813e-11      18      0
   7 │ penicillin      1      2      2    332.188       3.48973e-9      44      0
   8 │ kb07            2      4      3  28658.5         1.05138e-8      80      1
   9 │ pastes          1      1      1    248.402       2.09809e-10     16      3
  10 │ pastes          2      2      2    247.994       3.52514e-10     29      6
  11 │ insteval        2      2      2      2.37586e5  -1.45409e-6      39      9
  12 │ dyestuff2       1      1      1    162.873       0.0              8     10
  13 │ oxide           2      6      4    453.227       2.04803e-8      87     14
  14 │ oxide           1      2      2    458.677      -2.27972e-8      26     26
  15 │ insteval        1      3      3      2.37722e5  -2.48838e-7      61     57
  16 │ kb07            3     20      8  28637.1         0.847336       555    338
  17 │ d3              1      9      6      8.84958e5  -0.00354616     720    552

The first two rows are identical because the models are identical but generated from slightly different formulas.

Many of these problems are trivial as optimization problems but some toward the bottom of the table are of interest.

@zaikunzhang
Copy link
Author

zaikunzhang commented Feb 6, 2024

Thank you @dmbates . Are the nonnegtivity constraints relaxiable in the sense that the objective function is still well defined if these constraints are violated? If yes, you may try LINCOA as well. It is an infeasible method, but it may use less function evaluations.

Thank you.

@bbolker
Copy link
Member

bbolker commented Feb 6, 2024

Generally not relatable ...

@dmbates
Copy link
Member

dmbates commented Feb 6, 2024

@zaikunzhang It is possible to evaluate the objective when the non-negativity constraints are violated but such a case corresponds to another set of parameters that do not violate the constraints. In the straightforward cases we are optimizing an objective with respect to the elements in the lower triangle of the Cholesky factor of a positive semi-definite symmetric matrix. For definiteness we require the elements on the diagonal of the factor to be non-negative. There are occasions where models will converge to a singular Cholesky factor and we want to know that.

We could remove the non-negativity constrains on the diagonal elements of the factor but the objective for parameters providing negative diagonal elements in a Cholesky factor is the same as the factor with all the signs in those column(s) being flipped.

In statistical terms, we are estimating one or more covariance matrices parameterized by their Cholesky factor. It is the multivariate analogue of estimating a variance parameterized by its standard deviation, which must be non-negative.

@dmbates
Copy link
Member

dmbates commented Feb 6, 2024

@zaikunzhang Did you mean to suggest NEWUOA for the unconstrained problem when you wrote LINCOA?

@dmbates
Copy link
Member

dmbates commented Feb 6, 2024

@zaikunzhang Here is the table using NEWUOA instead of BOBYQA with the non-negativity constraints. As you can see the difficult problems (kb07 formula 3, oxide formula 2, d3 formula 1) get to a comparable optimum but take many more evaluations.

% julia -t4 --project ./test/PRIMA.jl 
17×8 DataFrame
 Row │ ds          frm    k      knn    Pfinal         fdiff         Peval  ediff 
     │ Symbol      Int64  Int64  Int64  Float64        Float64       Int64  Int64 
─────┼────────────────────────────────────────────────────────────────────────────
   1 │ kb07            3     20      8  28637.1         0.848324      1047   -154
   2 │ oxide           2      6      4    453.227       1.95279e-8     126    -25
   3 │ sleepstudy      2      2      2   1752.0        -1.04978e-9      45    -15
   4 │ sleepstudy      3      2      2   1752.0        -1.04978e-9      45    -15
   5 │ kb07            1      2      2  28823.1        -5.45697e-9      35     -7
   6 │ sleepstudy      1      1      1   1794.08       -4.3201e-12      17     -2
   7 │ sleepstudy      4      3      2   1751.94        1.0607e-9       59     -2
   8 │ pastes          1      1      1    248.402       1.61265e-10     18      1
   9 │ kb07            2      4      3  28658.5        -2.11552e-7      79      2
  10 │ dyestuff        1      1      1    327.327       2.95586e-12     16      2
  11 │ pastes          2      2      2    247.994       5.64711e-10     32      3
  12 │ penicillin      1      2      2    332.188       3.43192e-9      40      4
  13 │ dyestuff2       1      1      1    162.873       0.0             13      5
  14 │ insteval        2      2      2      2.37586e5  -5.50062e-9      41      7
  15 │ oxide           1      2      2    458.677      -5.72498e-9      31     21
  16 │ insteval        1      3      3      2.37722e5  -2.31026e-7      79     39
  17 │ d3              1      9      6      8.84958e5  -0.00299697     898    374

The dyestuff2 example does converge on the boundary using BOBYQA and provides the same fit with NEWUOA.

@bbolker
Copy link
Member

bbolker commented Feb 6, 2024

@dmbates, thanks for the clarification above - it made me realize I've been thinking about this too loosely, and that we could think of points where any of the diagonal elements of the Cholesky vector are zero not as being on the boundary of the feasible space, but rather as being at points where (unless the first derivatives happen to go to zero at the point) the maximum has a cusp rather than a well-defined curvature ...

@zaikunzhang
Copy link
Author

@zaikunzhang Did you mean to suggest NEWUOA for the unconstrained problem when you wrote LINCOA?

I did mean LINCOA. It can handle bound constraints (as well as linear constraints), but the iterates may violate the bounds, which are expected to be satisfied asymptotically.

NEWUOA cannot handle constraints. NLopt extends it to bound-constrained problems by truncating the variables when they volatile the bounds (if my memory is correct), which seems quite heuristic.

@zaikunzhang
Copy link
Author

Here is the table using NEWUOA instead of BOBYQA with the non-negativity constraints.

How did you do this? I suppose you are using NLopt, right?

PRIMA does not and will not make NEWUOA handle bounds.

@dmbates
Copy link
Member

dmbates commented Feb 7, 2024

@zaikunzhang I misunderstood your initial suggestion to mean that I should try removing the bounds altogether and apply NEWUOA to the unconstrained problem. Please ignore that table.

Thank you for your clarification about LINCOA handling the constraints differently and possibly achieving convergence in fewer evaluations. It turns out that LINCOA does significantly worse than the PRIMA implementation of BOBYQA in some of these cases

% julia -t4 --project ./test/PRIMA.jl
17×8 DataFrame
 Row │ ds          frm    k      knn    Pfinal         fdiff         Peval  ediff 
     │ Symbol      Int64  Int64  Int64  Float64        Float64       Int64  Int64 
─────┼────────────────────────────────────────────────────────────────────────────
   1 │ sleepstudy      2      2      2   1752.0         1.68143e-9      52    -22
   2 │ sleepstudy      3      2      2   1752.0         1.68143e-9      52    -22
   3 │ oxide           2      6      4    453.227       2.10183e-8     121    -20
   4 │ kb07            1      2      2  28823.1         1.81899e-11     38    -10
   5 │ sleepstudy      4      3      2   1751.94       -2.01114e-8      66     -9
   6 │ sleepstudy      1      1      1   1794.08       -7.50333e-12     17     -2
   7 │ pastes          1      1      1    248.402       3.02748e-10     18      1
   8 │ penicillin      1      2      2    332.188       3.65469e-9      43      1
   9 │ dyestuff        1      1      1    327.327       1.08002e-12     16      2
  10 │ pastes          2      2      2    247.994       3.45949e-10     32      3
  11 │ dyestuff2       1      1      1    162.873       0.0             12      6
  12 │ insteval        2      2      2      2.37586e5  -4.57512e-8      40      8
  13 │ kb07            2      4      3  28658.5         1.04083e-8      72      9
  14 │ oxide           1      2      2    458.677      -1.53373e-8      31     21
  15 │ insteval        1      3      3      2.37722e5  -7.48842e-7      62     56
  16 │ d3              1      9      6      8.84958e5  -0.029627       900    372
  17 │ kb07            3     20      8  28641.4        -3.40845        401    492

The last line, where LINCOA declares convergence at an objective value 3.408 greater than that from NLopt's bobyqa, means that we would not want to use this approach.

My thanks for your taking the time to respond on this issue. We look forward to using libprima and PRIMA.jl in out MixedModels.jl package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants