Skip to content

Commit

Permalink
Add a structured list of solvers (#380)
Browse files Browse the repository at this point in the history
* Start sketich the list, keeping a temp which ones to sort in still.
* Finish a list of  algorithms grouped by features.
* Adds two symbols as indicators for state-of-the-art and robust.
* Add a missing description, change menu entry for the list of solvers.

---------

Co-authored-by: Mateusz Baran <mateuszbaran89@gmail.com>
  • Loading branch information
kellertuer and mateuszbaran committed Apr 16, 2024
1 parent 462f8f0 commit 1856a26
Show file tree
Hide file tree
Showing 5 changed files with 115 additions and 34 deletions.
1 change: 1 addition & 0 deletions Changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added

* Tests now also use `Aqua.jl` to spot problems in the code, e.g. ambiguities.
* introduce a feature-based list of solvers and reduce the details in the alphabetical list

### Fixed

Expand Down
2 changes: 1 addition & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ makedocs(;
"About" => "about.md",
(tutorials_in_menu ? [tutorials_menu] : [])...,
"Solvers" => [
"Introduction" => "solvers/index.md",
"List of Solvers" => "solvers/index.md",
"Adaptive Regularization with Cubics" => "solvers/adaptive-regularization-with-cubics.md",
"Alternating Gradient Descent" => "solvers/alternating_gradient_descent.md",
"Augmented Lagrangian Method" => "solvers/augmented_Lagrangian_method.md",
Expand Down
140 changes: 110 additions & 30 deletions docs/src/solvers/index.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,123 @@

# List of available solvers
# Available solvers in Manopt.jl

```@meta
CurrentModule = Manopt
```

Solvers can be applied to [`AbstractManoptProblem`](@ref)s with solver
specific [`AbstractManoptSolverState`](@ref).
Optimisation problems can be classified with respect to several criteria.
In the following we provide a grouping of the algorithms with respect to the “information”
available about your optimisation problem

# List of algorithms
```math
\operatorname*{arg\,min}_{p∈\mathbb M} f(p)
```

Within the groups we provide short notes on advantages of the individual solvers, pointing our properties the cost ``f`` should have.
We use 🏅 to indicate state-of-the-art solvers, that usually perform best in their corresponding group and 🫏 for a maybe not so fast, maybe not so state-of-the-art method, that nevertheless gets the job done most reliably.

## Derivative Free

For derivative free only function evaluations of ``f`` are used.

* [Nelder-Mead](NelderMead.md) a simplex based variant, that is using ``d+1`` points, where ``d`` is the dimension of the manifold.
* [Particle Swarm](particle_swarm.md) 🫏 use the evolution of a set of points, called swarm, to explore the domain of the cost and find a minimizer.
* [CMA-ES](cma_es.md) uses a stochastic evolutionary strategy to perform minimization robust to local minima of the objective.

## First Order

### Gradient

* [Gradient Descent](gradient_descent.md) uses the gradient from ``f`` to determine a descent direction. Here, the direction can also be changed to be Averaged, Momentum-based, based on Nesterovs rule.
* [Conjugate Gradient Descent](conjugate_gradient_descent.md) uses information from the previous descent direction to improve the current (gradient-based) one including several such update rules.
* The [Quasi-Newton Method](quasi_Newton.md) 🏅 uses gradient evaluations to approximate the Hessian, which is then used in a Newton-like scheme, where both a limited memory and a full Hessian approximation are available with several different update rules.
* [Steihaug-Toint Truncated Conjugate-Gradient Method](@ref tCG) a solver for a constrained problem defined on a tangent space.

### Subgradient

The following methods require the Riemannian subgradient ``∂f`` to be available.
While the subgradient might be set-valued, the function should provide one of the subgradients.

* The [Subgradient Method](subgradient.md) takes the negative subgradient as a step direction and can be combined with a step size.
* The [Convex Bundle Method](convex_bundle_method.md) (CBM) uses a former collection of sub gradients at the previous iterates and iterate candidates to solve a local approximation to `f` in every iteration by solving a quadratic problem in the tangent space.
* The [Proximal Bundle Method](proximal_bundle_method.md) works similar to CBM, but solves a proximal map-based problem in every iteration.

## Second Order

* [Adaptive Regularisation with Cubics](adaptive-regularization-with-cubics.md) 🏅 locally builds a cubic model to determine the next descent direction.
* The [Riemannian Trust-Regions Solver](trust_regions.md) builds a quadratic model within a trust region to determine the next descent direction.

## Splitting based

For splitting methods, the algorithms are based on splitting the cost into different parts, usually in a sum of two or more summands.
This is usually very well tailored for non-smooth objectives.

### Smooth

The following methods require that the splitting, for example into several summands, is smooth in the sense that for every summand of the cost, the gradient should still exist everywhere

* [Levenberg-Marquardt](LevenbergMarquardt.md) minimizes the square norm of ``f: \mathcal M→ℝ^d`` provided the gradients of the component functions, or in other words the Jacobian of ``f``.
* [Stochastic Gradient Descent](stochastic_gradient_descent.md) is based on a splitting of ``f`` into a sum of several components ``f_i`` whose gradients are provided. Steps are performed according to gradients of randomly selected components.
* The [Alternating Gradient Descent](@ref solver-alternating-gradient-descent) alternates gradient descent steps on the components of the product manifold. All these components should be smooth aso the gradient exists, and (locally) convex.

### Nonsmooth

If the gradient does not exist everywhere, that is if the splitting yields summands that are nonsmooth, usually methods based on proximal maps are used.

* The [Chambolle-Pock](ChambollePock.md) algorithm uses a splitting ``f(p) = F(p) + G(Λ(p))``,
where ``G`` is defined on a manifold ``\mathcal N`` and we need the proximal map of its Fenchel dual. Both these functions can be non-smooth.
* The [Cyclic Proximal Point](cyclic_proximal_point.md) 🫏 uses proximal maps of the functions from splitting ``f`` into summands ``f_i``
* [Difference of Convex Algorithm](@ref solver-difference-of-convex) (DCA) uses a splitting of the (nonconvex) function ``f = g - h`` into a difference of two functions; for each of these we require the gradient of ``g`` and the subgradient of ``h`` to state a sub problem in every iteration to be solved.
* [Difference of Convex Proximal Point](@ref solver-difference-of-convex-proximal-point) uses a splitting of the (nonconvex) function ``f = g - h`` into a difference of two functions; provided the proximal map of ``g`` and the subgradient of ``h``, the next iterate is computed. Compared to DCA, the correpsonding sub problem is here written in a form that yields the proximal map.
* [Douglas—Rachford](DouglasRachford.md) uses a splitting ``f(p) = F(x) + G(x)`` and their proximal maps to compute a minimizer of ``f``, which can be non-smooth.
* [Primal-dual Riemannian semismooth Newton Algorithm](@ref solver-pdrssn) extends Chambolle-Pock and requires the differentials of the proximal maps additionally.

## Constrained

Constrained problems of the form

```math
\begin{align*}
\operatorname*{arg\,min}_{p∈\mathbb M}& f(p)\\
\text{such that } & g(p) \leq 0\\&h(p) = 0
\end{align*}
```

For these you can use

* The [Augmented Lagrangian Method](augmented_Lagrangian_method.md) (ALM), where both `g` and `grad_g` as well as `h` and `grad_h` are keyword arguments, and one of these pairs is mandatory.
* The [Exact Penalty Method](exact_penalty_method.md) (EPM) uses a penalty term instead of augmentation, but has the same interface as ALM.
* [Frank-Wolfe algorithm](FrankWolfe.md), where besides the gradient of ``f`` either a closed form solution or a (maybe even automatically generated) sub problem solver for ``\operatorname*{arg\,min}_{q ∈ C} ⟨\operatorname{grad} f(p_k), \log_{p_k}q⟩`` is required, where ``p_k`` is a fixed point on the manifold (changed in every iteration).

The following algorithms are currently available
# Alphabetical list List of algorithms

| Solver | Function & State | Objective |
| Solver | Function | State |
|:---------|:----------------|:---------|
[Alternating Gradient Descent](@ref solver-alternating-gradient-descent) | [`alternating_gradient_descent`](@ref) [`AlternatingGradientDescentState`](@ref) | ``f=(f_1,\ldots,f_n)``, ``\operatorname{grad} f_i`` |
[Augmented Lagrangian Method](augmented_Lagrangian_method.md) | [`augmented_Lagrangian_method`](@ref), [`AugmentedLagrangianMethodState`](@ref) | ``f``, ``\operatorname{grad} f``, ``g``, ``\operatorname{grad} g_i``, ``h``, ``\operatorname{grad} h_j`` |
[Chambolle-Pock](ChambollePock.md) | [`ChambollePock`](@ref), [`ChambollePockState`](@ref) (using [`TwoManifoldProblem`](@ref)) | ``f=F+G(Λ\cdot)``, ``\operatorname{prox}_{σ F}``, ``\operatorname{prox}_{τ G^*}``, ``Λ`` |
[Conjugate Gradient Descent](conjugate_gradient_descent.md) | [`conjugate_gradient_descent`](@ref), [`ConjugateGradientDescentState`](@ref) | ``f``, ``\operatorname{grad} f``
[Convex Bundle Method](convex_bundle_method.md) | [`convex_bundle_method`](@ref), [`ConvexBundleMethodState`](@ref) | ``f``, ``\partial f``
[Cyclic Proximal Point](cyclic_proximal_point.md) | [`cyclic_proximal_point`](@ref), [`CyclicProximalPointState`](@ref) | ``f=\sum f_i``, ``\operatorname{prox}_{\lambda f_i}`` |
[Difference of Convex Algorithm](@ref solver-difference-of-convex) | [`difference_of_convex_algorithm`](@ref), [`DifferenceOfConvexState`](@ref) | ``f=g-h``, ``∂h``, and for example ``g``, ``\operatorname{grad} g`` |
[Difference of Convex Proximal Point](@ref solver-difference-of-convex-proximal-point) | [`difference_of_convex_proximal_point`](@ref), [`DifferenceOfConvexProximalState`](@ref) | ``f=g-h``, ``∂h``, and for example ``g``, ``\operatorname{grad} g`` |
[Douglas—Rachford](DouglasRachford.md) | [`DouglasRachford`](@ref), [`DouglasRachfordState`](@ref) | ``f=\sum f_i``, ``\operatorname{prox}_{\lambda f_i}`` |
[Exact Penalty Method](exact_penalty_method.md) | [`exact_penalty_method`](@ref), [`ExactPenaltyMethodState`](@ref) | ``f``, ``\operatorname{grad} f``, ``g``, ``\operatorname{grad} g_i``, ``h``, ``\operatorname{grad} h_j`` |
[Frank-Wolfe algorithm](FrankWolfe.md) | [`Frank_Wolfe_method`](@ref), [`FrankWolfeState`](@ref) | sub-problem solver |
[Gradient Descent](gradient_descent.md) | [`gradient_descent`](@ref), [`GradientDescentState`](@ref) | ``f``, ``\operatorname{grad} f`` |
[Levenberg-Marquardt](LevenbergMarquardt.md) | [`LevenbergMarquardt`](@ref), [`LevenbergMarquardtState`](@ref) | ``f = \sum_i f_i`` ``\operatorname{grad} f_i`` (Jacobian)|
[Nelder-Mead](NelderMead.md) | [`NelderMead`](@ref), [`NelderMeadState`](@ref) | ``f``
[Particle Swarm](particle_swarm.md) | [`particle_swarm`](@ref), [`ParticleSwarmState`](@ref) | ``f`` |
[Primal-dual Riemannian semismooth Newton Algorithm](@ref solver-pdrssn) | [`primal_dual_semismooth_Newton`](@ref), [`PrimalDualSemismoothNewtonState`](@ref) (using [`TwoManifoldProblem`](@ref)) | ``f=F+G(Λ\cdot)``, ``\operatorname{prox}_{σ F}`` & diff., ``\operatorname{prox}_{τ G^*}`` & diff., ``Λ``
[Proximal Bundle Method](proximal_bundle_method.md) | [`proximal_bundle_method`](@ref), [`ProximalBundleMethodState`](@ref) | ``f``, ``\partial f``
[Quasi-Newton Method](quasi_Newton.md) | [`quasi_Newton`](@ref), [`QuasiNewtonState`](@ref) | ``f``, ``\operatorname{grad} f`` |
[Steihaug-Toint Truncated Conjugate-Gradient Method](@ref tCG) | [`truncated_conjugate_gradient_descent`](@ref), [`TruncatedConjugateGradientState`](@ref) | ``f``, ``\operatorname{grad} f``, ``\operatorname{Hess} f`` |
[Subgradient Method](subgradient.md) | [`subgradient_method`](@ref), [`SubGradientMethodState`](@ref) | ``f``, ``∂ f`` |
[Stochastic Gradient Descent](stochastic_gradient_descent.md) | [`stochastic_gradient_descent`](@ref), [`StochasticGradientDescentState`](@ref) | ``f = \sum_i f_i``, ``\operatorname{grad} f_i`` |
[The Riemannian Trust-Regions Solver](trust_regions.md) | [`trust_regions`](@ref), [`TrustRegionsState`](@ref) | ``f``, ``\operatorname{grad} f``, ``\operatorname{Hess} f`` |

Note that the solvers (their [`AbstractManoptSolverState`](@ref), to be precise) can also be decorated to enhance your algorithm by general additional properties, see [debug output](@ref sec-debug) and [recording values](@ref sec-record). This is done using the `debug=` and `record=` keywords in the function calls. Similarly, since Manopt.jl 0.4 a (simple) [caching of the objective function](@ref subsection-cache-objective) using the `cache=` keyword is available in any of the function calls..
| [Adaptive Regularisation with Cubics](adaptive-regularization-with-cubics.md) | [`adaptive_regularization_with_cubics`](@ref) | [`AdaptiveRegularizationState`](@ref) |
| [Augmented Lagrangian Method](augmented_Lagrangian_method.md) | [`augmented_Lagrangian_method`](@ref) | [`AugmentedLagrangianMethodState`](@ref) |
| [Chambolle-Pock](ChambollePock.md) | [`ChambollePock`](@ref) | [`ChambollePockState`](@ref) |
| [Conjugate Gradient Descent](conjugate_gradient_descent.md) | [`conjugate_gradient_descent`](@ref) | [`ConjugateGradientDescentState`](@ref) |
| [Convex Bundle Method](convex_bundle_method.md) | [`convex_bundle_method`](@ref) | [`ConvexBundleMethodState`](@ref) |
| [Cyclic Proximal Point](cyclic_proximal_point.md) | [`cyclic_proximal_point`](@ref) | [`CyclicProximalPointState`](@ref) |
| [Difference of Convex Algorithm](@ref solver-difference-of-convex) | [`difference_of_convex_algorithm`](@ref) | [`DifferenceOfConvexState`](@ref) |
| [Difference of Convex Proximal Point](@ref solver-difference-of-convex-proximal-point) | [`difference_of_convex_proximal_point`](@ref) | [`DifferenceOfConvexProximalState`](@ref) |
| [Douglas—Rachford](DouglasRachford.md) | [`DouglasRachford`](@ref) | [`DouglasRachfordState`](@ref) |
| [Exact Penalty Method](exact_penalty_method.md) | [`exact_penalty_method`](@ref) | [`ExactPenaltyMethodState`](@ref) |
| [Frank-Wolfe algorithm](FrankWolfe.md) | [`Frank_Wolfe_method`](@ref) | [`FrankWolfeState`](@ref) |
| [Gradient Descent](gradient_descent.md) | [`gradient_descent`](@ref) | [`GradientDescentState`](@ref) |
| [Levenberg-Marquardt](LevenbergMarquardt.md) | [`LevenbergMarquardt`](@ref) | [`LevenbergMarquardtState`](@ref) | ``f = \sum_i f_i`` ``\operatorname{grad} f_i`` (Jacobian)|
| [Nelder-Mead](NelderMead.md) | [`NelderMead`](@ref) | [`NelderMeadState`](@ref) |
| [Particle Swarm](particle_swarm.md) | [`particle_swarm`](@ref) | [`ParticleSwarmState`](@ref) |
[Primal-dual Riemannian semismooth Newton Algorithm](@ref solver-pdrssn) | [`primal_dual_semismooth_Newton`](@ref) | [`PrimalDualSemismoothNewtonState`](@ref) |
| [Proximal Bundle Method](proximal_bundle_method.md) | [`proximal_bundle_method`](@ref) | [`ProximalBundleMethodState`](@ref) |
| [Quasi-Newton Method](quasi_Newton.md) | [`quasi_Newton`](@ref) | [`QuasiNewtonState`](@ref) |
| [Steihaug-Toint Truncated Conjugate-Gradient Method](@ref tCG) | [`truncated_conjugate_gradient_descent`](@ref) | [`TruncatedConjugateGradientState`](@ref) |
| [Subgradient Method](subgradient.md) | [`subgradient_method`](@ref) | [`SubGradientMethodState`](@ref) |
| [Stochastic Gradient Descent](stochastic_gradient_descent.md) | [`stochastic_gradient_descent`](@ref) | [`StochasticGradientDescentState`](@ref) |
| [Riemannian Trust-Regions](trust_regions.md) | [`trust_regions`](@ref) | [`TrustRegionsState`](@ref) |


Note that the solvers (their [`AbstractManoptSolverState`](@ref), to be precise) can also be decorated to enhance your algorithm by general additional properties, see [debug output](@ref sec-debug) and [recording values](@ref sec-record). This is done using the `debug=` and `record=` keywords in the function calls. Similarly, a `cache=` keyword is available in any of the function calls, that wraps the [`AbstractManoptProblem`](@ref) in a cache for certain parts of the objective.

## Technical details

Expand Down
4 changes: 2 additions & 2 deletions src/solvers/FrankWolfe.jl
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ It comes in two forms, depending on the realisation of the `subproblem`.
The sub task requires a method to solve
```math
\operatorname*{argmin}_{q\mathcal M} ⟨X, \log_p q⟩,\qquad \text{ where }X=\operatorname{grad} f(p)
\operatorname*{arg\,min}_{q ∈ C} ⟨\operatorname{grad} f(p_k), \log_{p_k}q⟩.
```
# Constructor
Expand Down Expand Up @@ -135,7 +135,7 @@ where the main step is a constrained optimisation is within the algorithm,
that is the sub problem (Oracle)
```math
q_k = \operatorname{arg\,min}_{q ∈ C} ⟨\operatorname{grad} F(p_k), \log_{p_k}q⟩.
q_k = \operatorname*{arg\,min}_{q ∈ C} ⟨\operatorname{grad} f(p_k), \log_{p_k}q⟩.
```
for every iterate ``p_k`` together with a stepsize ``s_k≤1``, by default ``s_k = \frac{2}{k+2}``.
Expand Down
2 changes: 1 addition & 1 deletion src/solvers/LevenbergMarquardt.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
Solve an optimization problem of the form
```math
\operatorname{arg\,min}_{p ∈ \mathcal M} \frac{1}{2} \lVert f(p) \rVert^2,
\operatorname*{arg\,min}_{p ∈ \mathcal M} \frac{1}{2} \lVert f(p) \rVert^2,
```
where ``f: \mathcal M → ℝ^d`` is a continuously differentiable function,
Expand Down

0 comments on commit 1856a26

Please sign in to comment.