Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMA-ES #373

Merged
merged 22 commits into from
Apr 6, 2024
Merged

CMA-ES #373

merged 22 commits into from
Apr 6, 2024

Conversation

mateuszbaran
Copy link
Member

@mateuszbaran mateuszbaran commented Mar 29, 2024

The Covariance Matrix Adaptation-Evolutionary Strategy algorithm adapted to the Riemannian setting.

TODO:

  • Stopping criteria (NoEffectAxis, NoEffectCoord, EqualFunValues, Stagnation, TolXUp, TolFun, TolX).
  • More tests.
  • Check performance.
  • I'm not sure about the in-place convenience wrapper.

TODO beyond this PR:

  • spd_matrix_transport_to is useful more generally.
  • Maybe support vector transport in coordinates.
  • Parallelization of objective evaluation.

@mateuszbaran mateuszbaran added the WIP Work in Progress (for a pull request) label Mar 29, 2024
Copy link

codecov bot commented Mar 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.74%. Comparing base (1ea2ac6) to head (8c39dca).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #373      +/-   ##
==========================================
+ Coverage   99.73%   99.74%   +0.01%     
==========================================
  Files          73       74       +1     
  Lines        6876     7167     +291     
==========================================
+ Hits         6858     7149     +291     
  Misses         18       18              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kellertuer
Copy link
Member

kellertuer commented Mar 30, 2024

Until now I just have one minor remark: For all algorithms I tried to find speaking names. CPPA is cyclic_proximal_point. ALM and EPM also enjoy their long names,...
but on the other hand I do see that the 5 letter acronym here would really be a bit long if written as a function name. Still, this would be the first shortened one, right? Maybe we can think about a tradeoff?

@mateuszbaran
Copy link
Member Author

Yes, I see. Unfortunately the five word name seems a bit too long and I don't have a better idea.

@kellertuer
Copy link
Member

I totally understand this. That is also why I tried to phrase this as careful as possible.
While for documentation I am fine with math variable letters, in code (unless it is function internal variables) I do prefer a bit more speaking variables, that depend a bit less on “the user already knows”. But for this case I indeed do also not yet see how we could do something a bit longer than cma_es that helps starters a bit without confusing experts (that do understand what cma_es is).

@mateuszbaran
Copy link
Member Author

I think those NoEffect stopping criteria aren't particularly important and we already have so much stopping criteria code here. @kellertuer , it would be nice to export all of them but I don't want to pollute the namespace too much. At least TolX and TolFun are things user may want to adjust but they would likely want to keep other criteria too. Stagnation and EqualFunValues are somewhat generic, that is they can in theory be applied to any evolutionary optimization algorithm like the particle swarm we already have here. At least if we adjusted the particle swarm to keep track of some population heuristics. Not something I'd like to do in this PR but could be an idea for the future.

Copy link
Member

@kellertuer kellertuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, could maybe some of the stopping criteria be combined into one? Also, currently they have a bit cryptic names, but I can check them out later when they have a bit more description.
For example EqualFunValuesCondition without a reference to CMAES (that is out of context) sounds super random and may even be confused to fit other solvers? I tried to use names StopWhenX names for stopping criteria; then exporting all of them is also not that bad, and code stays relatively readable.

docs/src/solvers/cma_es.md Outdated Show resolved Hide resolved
@kellertuer
Copy link
Member

At least TolX and TolFun

do you mean stopping criteria like StopWhenChangeLess and StopWhenFunctionDecreaseLess (second does not yet exist I think)? Sure those could have tolerances a user can also set via a keyword directly, if you feel that is reasonable. For 1-2 StoppingCriteria in the default I usually did not pass keywords to them, but if the have 4-5 it might be a good idea to accept keyword parameters to set stopping criteria tolerances.

@mateuszbaran
Copy link
Member Author

mateuszbaran commented Mar 31, 2024

Hm, could maybe some of the stopping criteria be combined into one? Also, currently they have a bit cryptic names, but I can check them out later when they have a bit more description.
For example EqualFunValuesCondition without a reference to CMAES (that is out of context) sounds super random and may even be confused to fit other solvers? I tried to use names StopWhenX names for stopping criteria; then exporting all of them is also not that bad, and code stays relatively readable.

I used names from appendix B.3 of Hansen's paper. We could surely have better names but they are fairly complex criteria that don't have nice StopWhenX names. StopWhenSigmaTimesMaximumEigenvalueOfCovarianceMatrixGreaterThan?

do you mean stopping criteria like StopWhenChangeLess and StopWhenFunctionDecreaseLess (second does not yet exist I think)?

No, I mean those from Hansen's paper. TolX is specific to CMA-ES and TolFun is specific to evolutionary algorithms.

@mateuszbaran
Copy link
Member Author

For 1-2 StoppingCriteria in the default I usually did not pass keywords to them, but if the have 4-5 it might be a good idea to accept keyword parameters to set stopping criteria tolerances.

Sure, that makes sense.

@kellertuer
Copy link
Member

I used names from appendix B.3 of Hansen's paper. We could surely have better names but they are fairly complex criteria that don't have nice StopWhenX names. StopWhenSigmaTimesMaximumEigenvalueOfCovarianceMatrixGreaterThan?

No. That is too short, we would need something a bit longer I think ;)

We could maybe check for an interpreting name? What does this mean that sigma times the largest eigenvalue of the covariance matrix exceeds some value?

do you mean stopping criteria like StopWhenChangeLess and StopWhenFunctionDecreaseLess (second does not yet exist I think)?

No, I mean those from Hansen's paper. TolX is specific to CMA-ES and TolFun is specific to evolutionary algorithms.

Ah, ok. We then should check that paper for more details and (as above) for interpreting and not-too-long names.
I was not able to work the last few days (fetched a cold, but getting better already), so I will first have to fetch up teaching preparations tomorrow, but besides that I can surely help looking for nice names (as you might have noticed, I do like to do that anyways).

@kellertuer
Copy link
Member

Here is a few ideas for naming

  • NoEffectAxis, NoEffectCoord – in the appendix he is a bit short about this, what m actually does, but since this method is basically a mean based one – we could instead of population even use names StopWhenMeanXfor these two? I would have to read the paper more thoroughly for a more detailed proposal here
  • EqualFunValues – for me this is a stronger statement then TolFun so basically super-concentration?
  • Stagnation maybe StopWhenCostStagnates?
  • TolXUp when including a warning about possible too small sigma, this could also be called StopWhenPopulationDiverges?
  • TolFun This is siilar, but with a longer observation phase than a StopWhenCostDecreaseLess ? Or even StopWhenCostLess? Hansen speaks about range of function values, so one could say StopWhenPopulationCostConcentrated?
  • TolXI think this is basically StopWhenPopulationConcentrated?

The overview paper, though Euclidean is great, but we could also mentiond https://ieeexplore.ieee.org/document/5299260
Also, the bib entry can be improved to remove the DOI and change the URL to eprint/eprinttype, but I can do that somewhen later as well, but of course also Dreisigmeyer (already in the bib) https://optimization-online.org/wp-content/uploads/2017/03/5916.pdf – I would have to check the right reference still.

@mateuszbaran
Copy link
Member Author

  • NoEffectAxis, NoEffectCoord – in the appendix he is a bit short about this, what m actually does, but since this method is basically a mean based one – we could instead of population even use names StopWhenMeanXfor these two? I would have to read the paper more thoroughly for a more detailed proposal here

I'd skip those two for this PR actually. It seems that they are just for terminating a bit earlier than other conditions would when the search stagnates.

  • EqualFunValues – for me this is a stronger statement then TolFun so basically super-concentration?

Yes, this is correct.

  • Stagnation maybe StopWhenCostStagnates?

I think we usually say that the algorithm stagnates, not the cost. Maybe StopWhenEvolutionStagnates?

We could maybe check for an interpreting name? What does this mean that sigma times the largest eigenvalue of the covariance matrix exceeds some value?

  • TolXUp when including a warning about possible too small sigma, this could also be called StopWhenPopulationDiverges?

That's a good idea.

  • TolFun This is siilar, but with a longer observation phase than a StopWhenCostDecreaseLess ? Or even StopWhenCostLess? Hansen speaks about range of function values, so one could say StopWhenPopulationCostConcentrated?

I like StopWhenPopulationCostConcentrated.

  • TolXI think this is basically StopWhenPopulationConcentrated?

TolX is a bit stronger because it also demands that the rank-1 update of covariance matrix is small, so it's more "population is concentrated and unlikely to get de-concentrated".

The overview paper, though Euclidean is great, but we could also mentiond https://ieeexplore.ieee.org/document/5299260
Also, the bib entry can be improved to remove the DOI and change the URL to eprint/eprinttype, but I can do that somewhen later as well, but of course also Dreisigmeyer (already in the bib) https://optimization-online.org/wp-content/uploads/2017/03/5916.pdf – I would have to check the right reference still.

Thanks for the link to Colutto's paper, I must have missed it. They seem to skip the parallel transport of covariance matrix and base their work on an older variant of the Euclidean CMA-ES but otherwise it's more or less the same thing. The Dreisigmeyer's paper doesn't mention CMA-ES but it's also about direct optimization on manifolds so maybe it could be mentioned somewhere.

@kellertuer
Copy link
Member

I think we usually say that the algorithm stagnates, not the cost. Maybe StopWhenEvolutionStagnates?

That is also fine with me, I just thought personally, it could also be a very flat area, where the Evolution continues but the cost stagnates.

So both EqualFunValues and its weaker cousin TolX still need a good name that is stronger than Population concentrated, hm. I do not yet have a good name here, but I like the others so far.

For the Collate paper, I did not yet check that too closely, but it might still be fair to mention them.
I haven‘t had the time to check too much versus Dreisigmayer, but my student will work on his mesh version next semester then, though I missed the paper slightly in my last post I meant to mention https://optimization-online.org/wp-content/uploads/2007/08/1742.pdf –and LTMADS is a project for said student (but sure that is more grid based).

@mateuszbaran
Copy link
Member Author

I've checked performance. For non-trivial examples it's bound by either objective calculation or eigendecomposition, so I've reworked the code a bit to make sure only one eigendecomposition per iteration is made. A standard trick in Euclidean CMA-ES is updating decomposition every few iterations rather than every single one but to make it work here we'd still need fast decomposition transport. It could be a fun follow-up project but for this PR I think it's fast enough.

docs/src/solvers/cma_es.md Outdated Show resolved Hide resolved
src/plans/stopping_criterion.jl Show resolved Hide resolved
src/solvers/cma_es.jl Show resolved Hide resolved
@mateuszbaran
Copy link
Member Author

@kellertuer the latest failure is due to convex bundle method, maybe something you'd like to take a look at?

┌ Warning: WolfePowellLinesearch
│   caller = ip:0x0
└ @ Core :-1
┌ Warning: The Lagrange multiplier is positive.
│ At iteration #11 the negative of the Lagrange multiplier, -ξ, became negative.
│ 
│ Consider increasing either the `diameter` keyword argument, or changing
│ one of the parameters involved in the estimation of the sectional curvature, such as
│ `k_max`, or `ϱ` in the `convex_bundle_method` call.
└ @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
┌ Warning: The Lagrange multiplier is positive.
│ At iteration #12 the negative of the Lagrange multiplier, -ξ, became negative.
│ 
│ Consider increasing either the `diameter` keyword argument, or changing
│ one of the parameters involved in the estimation of the sectional curvature, such as
│ `k_max`, or `ϱ` in the `convex_bundle_method` call.
└ @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
┌ Warning: The Lagrange multiplier is positive.
│ At iteration #13 the negative of the Lagrange multiplier, -ξ, became negative.
│ 
│ Consider increasing either the `diameter` keyword argument, or changing
│ one of the parameters involved in the estimation of the sectional curvature, such as
│ `k_max`, or `ϱ` in the `convex_bundle_method` call.
└ @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
┌ Warning: The Lagrange multiplier is positive.
│ At iteration #14 the negative of the Lagrange multiplier, -ξ, became negative.
│ 
│ Consider increasing either the `diameter` keyword argument, or changing
│ one of the parameters involved in the estimation of the sectional curvature, such as
│ `k_max`, or `ϱ` in the `convex_bundle_method` call.
└ @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
┌ Warning: The Lagrange multiplier is positive.
│ At iteration #15 the negative of the Lagrange multiplier, -ξ, became negative.
│ 
│ Consider increasing either the `diameter` keyword argument, or changing
│ one of the parameters involved in the estimation of the sectional curvature, such as
│ `k_max`, or `ϱ` in the `convex_bundle_method` call.
└ @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
┌ Warning: The Lagrange multiplier is positive.
│ At iteration #16 the negative of the Lagrange multiplier, -ξ, became negative.
│ 
│ Consider increasing either the `diameter` keyword argument, or changing
│ one of the parameters involved in the estimation of the sectional curvature, such as
│ `k_max`, or `ϱ` in the `convex_bundle_method` call.
└ @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
┌ Warning: The Lagrange multiplier is positive.
│ At iteration #17 the negative of the Lagrange multiplier, -ξ, became negative.
│ 
│ Consider increasing either the `diameter` keyword argument, or changing
│ one of the parameters involved in the estimation of the sectional curvature, such as
│ `k_max`, or `ϱ` in the `convex_bundle_method` call.
└ @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
┌ Warning: The Lagrange multiplier is positive.
│ At iteration #18 the negative of the Lagrange multiplier, -ξ, became negative.
│ 
│ Consider increasing either the `diameter` keyword argument, or changing
│ one of the parameters involved in the estimation of the sectional curvature, such as
│ `k_max`, or `ϱ` in the `convex_bundle_method` call.
└ @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
┌ Warning: The Lagrange multiplier is positive.
│ At iteration #19 the negative of the Lagrange multiplier, -ξ, became negative.
│ 
│ Consider increasing either the `diameter` keyword argument, or changing
│ one of the parameters involved in the estimation of the sectional curvature, such as
│ `k_max`, or `ϱ` in the `convex_bundle_method` call.
└ @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
┌ Warning: The Lagrange multiplier is positive.
│ At iteration #20 the negative of the Lagrange multiplier, -ξ, became negative.
│ 
│ Consider increasing either the `diameter` keyword argument, or changing
│ one of the parameters involved in the estimation of the sectional curvature, such as
│ `k_max`, or `ϱ` in the `convex_bundle_method` call.
└ @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
A simple median run: Test Failed at /Users/runner/work/Manopt.jl/Manopt.jl/test/solvers/test_convex_bundle_method.jl:178
  Expression: distance(M, q2, m) < 0.01
   Evaluated: 0.01322676092[446](https://github.com/JuliaManifolds/Manopt.jl/actions/runs/8511560037/job/23311403520?pr=373#step:6:449)8948 < 0.01

Stacktrace:
 [1] macro expansion
   @ ~/hostedtoolcache/julia/1.9.4/x64/share/julia/stdlib/v1.9/Test/src/Test.jl:478 [inlined]
 [2] macro expansion
   @ ~/work/Manopt.jl/Manopt.jl/test/solvers/test_convex_bundle_method.jl:178 [inlined]
 [3] macro expansion
   @ ~/hostedtoolcache/julia/1.9.4/x64/share/julia/stdlib/v1.9/Test/src/Test.jl:1498 [inlined]
 [4] macro expansion
   @ ~/work/Manopt.jl/Manopt.jl/test/solvers/test_convex_bundle_method.jl:150 [inlined]
 [5] macro expansion
   @ ~/hostedtoolcache/julia/1.9.4/x64/share/julia/stdlib/v1.9/Test/src/Test.jl:1498 [inlined]
 [6] top-level scope
   @ ~/work/Manopt.jl/Manopt.jl/test/solvers/test_convex_bundle_method.jl:6
WARNING: using JuMP.VectorConstraint in module Main conflicts with an existing identifier.

@kellertuer
Copy link
Member

Increase the tolerance, we are reworking that algorithm a bit currently anyways, since those warnings and errors appeared more than we thought.

@mateuszbaran
Copy link
Member Author

Increase the tolerance, we are reworking that algorithm a bit currently anyways, since those warnings and errors appeared more than we thought.

Sure, that's good to know.

@mateuszbaran mateuszbaran added Ready-for-Review A label for pull requests that are feature-ready and removed WIP Work in Progress (for a pull request) labels Apr 2, 2024
@mateuszbaran mateuszbaran marked this pull request as ready for review April 2, 2024 08:49
@mateuszbaran
Copy link
Member Author

CMA-ES seems to handle this problem: JuliaManifolds/ManoptExamples.jl#13 fairly well, and the performance looks competitive compared to Evolutionary.jl so I'd say this can be reviewed now.

Copy link
Member

@kellertuer kellertuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this nice contribution, here is a few comments – I have not yet looked at the details of the stopping criteria, but they probably could also have longer doc strings I think.

docs/src/references.bib Outdated Show resolved Hide resolved
docs/src/solvers/cma_es.md Outdated Show resolved Hide resolved
src/solvers/cma_es.jl Show resolved Hide resolved
src/solvers/cma_es.jl Outdated Show resolved Hide resolved
src/solvers/cma_es.jl Show resolved Hide resolved
src/solvers/cma_es.jl Show resolved Hide resolved
src/solvers/cma_es.jl Show resolved Hide resolved
src/solvers/cma_es.jl Outdated Show resolved Hide resolved
src/solvers/cma_es.jl Show resolved Hide resolved
src/solvers/cma_es.jl Show resolved Hide resolved
@mateuszbaran
Copy link
Member Author

I think I've addressed all your points.

@kellertuer
Copy link
Member

I would still prefer a nicer constructor for the state with (a) the manifold as first argument to fill defaults properly) and that (b) initialises all internal / coipy things. The idea is that cma_es can call this one and is easier to read in code itself then.

The current constructor does have M first, but retraction and vector transports for ecxample can be set to nice default (and become kwargs) so can the basis and rng, maybe even stop.

The rest looks fine so far, I think.

@mateuszbaran
Copy link
Member Author

I've added defaults for some arguments but many of them could potentially be changed if someone wants to experiment. There is quite a lot of logic in cma_es! because it reflects the Hansen's paper, and providing them as defaults in the state constructor would be a bit messy.

@kellertuer
Copy link
Member

Yes I saw that part of the logic.
You know changed at least the manifold part to have nice defaults, which is in line with the other solvers. That's nice.

docs/src/solvers/cma_es.md Outdated Show resolved Hide resolved
src/solvers/cma_es.jl Outdated Show resolved Hide resolved
src/solvers/cma_es.jl Outdated Show resolved Hide resolved
Copy link
Member

@kellertuer kellertuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the thorough work, I approve this for now, a tutorial for each solver would maybe be nice in the long run.

@mateuszbaran mateuszbaran merged commit 7f8d7c6 into master Apr 6, 2024
15 checks passed
@mateuszbaran
Copy link
Member Author

A tutorial could definitely be useful for the more involved solvers, this one is actually fairly straightforward to use.

I think we can wait with registering a new Manopt version until #376 is merged.

@kellertuer
Copy link
Member

Still, a small tutorial for most solvers might be nice – maybe also just one tutorial for all “derivative-free” ones, since their calls are really similar.

And sure with registration let's wait for that other PR.

@kellertuer kellertuer deleted the mbaran/cma-es branch May 4, 2024 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ready-for-Review A label for pull requests that are feature-ready
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants