Discussion on how to use Manopt.jl in IncrementalInference.jl/Caesar.jl #90

Affie · 2021-06-18T16:13:23Z

Affie
Jun 18, 2021

Hi I'm new to Manopt and I'm trying to figure out how to integrate Manopt into IncrementalInference (also see JuliaRobotics/IncrementalInference.jl#1276 for more complex tries).

The optimization is formulated as a factor graph and this example represents a simple factor graph with one variable and 2 priors (so basically the minimum of 2 points). I don't get the results I expect and I made this MWE:

using Manopt
using Manifolds
using StaticArrays

# "factor"
function Prior_distance(M, meas, p)		
    return distance(M, meas, p)
end

## sanity check factors
M = TranslationGroup(2)

p = SA[1., 0]
q = SA[1., 1]
Prior_distance(M, p, q)

p = SA[0., 1]
q = SA[1., 1]
Prior_distance(M, p, q)

p = SA[0., -1]
q = SA[0., 0]
Prior_distance(M, p, q)

p = SA[0., -1]
q = SA[0., -1]
Prior_distance(M, p, q)

## Try  with Manopt

priorx1 = SA[1.1, 1.0]
priorx2 = SA[0.9, 1.0]

# Factor graph with 2 priors on variable x
function F(M,x) 
    return Prior_distance(M, priorx1, x) + Prior_distance(M, priorx2, x)
end

function gradF(M, x)
    Manifolds.gradient(M,(x)->F(M,x), x)
end

# start value
x0 = SA[0.,0]
# test cost function
F(M, x0)
gradF(M, x0)

# run
xMean = gradient_descent(M, F, gradF, x0)
F(M, xMean)
gradF(M, xMean)

#= results
# I expect [1,1]
julia> xMean = gradient_descent(M, F, gradF, x0)
2-element MVector{2, Float64} with indices SOneTo(2):
  0.9859635170635069
 -0.0031281100934246187

# the cost is still high
julia> F(M, xMean)
2.0163938998034947

# and the gradient also
julia> gradF(M, xMean)
2-element MVector{2, Float64} with indices SOneTo(2):
 -0.02757083871664094
 -1.9899485375183275
=#

cc @dehann

EDIT: I'm on
[1cead3c2] Manifolds v0.5.4
[0fc0a36d] Manopt v0.3.9

kellertuer · 2021-06-18T17:39:20Z

kellertuer
Jun 18, 2021
Maintainer

Interesting.
First you do not really have to use the AD tools (from Manifolds.gradient), since for distance there are gradients available here as grad_distance(M, p, x, exponent=2) (that is the default gradient is for the distance squared), so a little more efficient is to use

gradF(M, x) = grad_distance(M, priorx1, x0,1) + grad_distance(M, priorx2, x0,1)

this yields exactly the same numbers as your gradient, but on other manifolds that R2 (your manifold is essentially Euclidean(2) here).

Then, to see a little more what is happening (though I saw what the reason is directly) you can do

 xMean = gradient_descent(M, F, gradF, x0; debug= [:Iteration, :Cost,"\n",10])

to print a line of the iteration and the cost (allowed by a newline) every 10th iteration.

If you then look for the default values of the gradient descent, you can see that it works with a constant step size by default (machine learning people call that learning rate), which is not guaranteed to converge (ever).

But you can exchange that, see https://manoptjl.org/stable/plans/index.html#Stepsize-1 with a step size that might fit better, for example the Armijo step size rule, its defaults are chosen to be just fine, so

julia> xMean = gradient_descent(M, F, gradF, x0; stepsize=ArmijoLinesearch(),  debug= [:Iteration, :Cost,"\n",10])
InitialF(x): 2.831969279439222
# 10F(x): 0.2578005738010819
# 20F(x): 0.2000672051680747
# 30F(x): 0.20000006190528008
# 40F(x): 0.20000000005719792
# 50F(x): 0.2000000000000529
# 60F(x): 0.20000000000000012
2-element MVector{2, Float64} with indices SOneTo(2):
 1.0000197125126835
 0.9999999998505151

is this close enough? If not, you can tweak the stopping criterion, see https://manoptjl.org/stable/solvers/index.html#StoppingCriteria-1 and note that you can combine stopping criteria just with & or |. The default is StopAfterIteration(200) | StopWhenGradientNormLess(10.0^-8), so it stops after 200 iterations or when the gradient Is less than 10.0^-8, what ever hits first. Let's also increase the debug oversampling to 25 and we get for example

julia> xMean = gradient_descent(M, F, gradF, x0;
   stepsize=ArmijoLinesearch(),
   stopping_criterion=StopAfterIteration(200) | StopWhenGradientNormLess(10.0^-12),
   debug=[:Iteration, :Cost,"\n",25]
)
InitialF(x): 2.831969279439222
# 25F(x): 0.2000020367614238
# 50F(x): 0.2000000000000529
2-element MVector{2, Float64} with indices SOneTo(2):
 1.0000197125126835
 0.9999999999999954

Let me know if this solves your problem,
whether there is anything missing in the docs or
could be rephrased therein.

One could for example change the default to use Armijo - though that has per default quite some function evaluations to determine the step size.

0 replies

kellertuer · 2021-06-18T17:46:01Z

kellertuer
Jun 18, 2021
Maintainer

Adendum, after clicking on your link – it seems the allocate in ManifoldsBase does something strange in NelderMead but if you provide the population yourself and use the mutating variant

NelderMead!(M.manifold, F, [random_point(M) for _=1:3])

it does something. Note that Nelder Mead is not a very efficient algorithm and should only be used if you do not have any gradient.

edit: sorry my bad :) I do not work with NelderMead usually.

You should call also NelderMead not with an initial point but with an initial population (manifold_dimension+1 points) so

NelderMead(M.manifold, F, [random_point(M) for _=1:3])

just works fine, too as does

NelderMead(M.manifold, F)

But note that its convergence is veeeeeery slow (the default 2000 iterations are not enough).

0 replies

dehann · 2021-06-18T17:52:49Z

dehann
Jun 18, 2021

Hi Ronny Johan, thanks this helps me follow better too.

it seems the allocate in ManifoldsBase does something strange in NelderMead

hmm, makes me curious about #64 again. I'll check when I get a chance and comment there if something shows up.

0 replies

kellertuer · 2021-06-18T17:53:31Z

kellertuer
Jun 18, 2021
Maintainer

Hi Ronny Johan, thanks this helps me follow better too.

it seems the allocate in ManifoldsBase does something strange in NelderMead

hmm, makes me curious about #64 again. I'll check when I get a chance and comment there if something shows up.

No, that was my mistake at first glance – the error is that you have to provide a population not a starting point. Then everything works :)

0 replies

kellertuer · 2021-06-18T17:54:14Z

kellertuer
Jun 18, 2021
Maintainer

Oh and a final look, since you used BFGS in Optim. We have that on manifolds, too, just use

julia> quasi_Newton(M, F, gradF, x0)
2-element MVector{2, Float64} with indices SOneTo(2):
 1.0204551069445331
 1.000000000000014

whose default is the (inverse) BFGS here, too.

0 replies

dehann · 2021-06-18T17:54:15Z

dehann
Jun 18, 2021

Ah, got it thanks!

0 replies

kellertuer · 2021-06-18T17:58:22Z

kellertuer
Jun 18, 2021
Maintainer

Perfect. Let me know when you think a default should be adapted or the docs should be changed (for example at first glance Stopping Criteria and Step Size rules could be made a little more prominent).

0 replies

dehann · 2021-06-18T18:06:40Z

dehann
Jun 18, 2021

Stopping Criteria and Step Size rules could be made a little more prominent

Maybe just a !!! note :-)

0 replies

kellertuer · 2021-06-18T18:08:13Z

kellertuer
Jun 18, 2021
Maintainer

Where? The default step size is mentioned at every algorithm as are the default sopping criteria.
I was like planning to give them separate pages maybe

0 replies

dehann · 2021-06-18T18:29:25Z

dehann
Jun 18, 2021

I made a PR on where I would add it... Feel free to ignore if that does not fit/work.

Reason for putting it there is that when following a design template example for the first time, one expects the defaults to work (regardless of performance). When I get stuck on the first example, I find myself scrolling to the bottom of the "Getting Started" page hoping to find the obvious stumbling blocks. FAQ is second best place (but can be a discouraging search). Once a user sees their own first example working (different than just copy paste), then you start to invest more time to read deeper into all the docs. Training wheels first with absolute minimum details to get going, then scale up more and more on details from there.

0 replies

kellertuer · 2021-06-18T18:30:46Z

kellertuer
Jun 18, 2021
Maintainer

Manopt.jl does not have a FAQ.
I will comment on the idea in the PR.

0 replies

Affie · 2021-06-18T19:35:15Z

Affie
Jun 18, 2021
Author

Thanks a lot, I did find stopping criteria and played around with it. However, I expected the default to converge in such a simple example.

There are quite a few stumbling blocks for me, but I can already see the potential of Manopt. I know it's a relatively new package and perhaps we can help build out examples.

Reason for putting it there is that when following a design template example for the first time, one expects the defaults to work (regardless of performance). When I get stuck on the first example, I find myself scrolling to the bottom of the "Getting Started" page hoping to find the obvious stumbling blocks.

So, I'm in agreement with this, and that for defaults to rather work (with the possible cost of performance, with a note or something) especially if the user comes from other packages like Optim or JuMP where one can be a user without understanding the underlying algorithm.

Perhaps something like this will help a new user like me a lot, from the Optim.jl docs:

Summary
As a very crude heuristic:
For a low-dimensional problem with analytic gradients and Hessians, use the Newton method with trust region. For larger problems or when there is no analytic Hessian, use LBFGS, and tweak the parameter m if needed. If the function is non-differentiable, use Nelder-Mead. Use the HagerZhang linesearch for robustness and BackTracking for speed.

0 replies

Affie · 2021-06-18T20:12:58Z

Affie
Jun 18, 2021
Author

It's perhaps not part of this issue, but let me list some problems I encounter:
I run into a lot of error messages and don't know if it is something I'm doing wrong (in the NelderMead case):

Manopt.NelderMead(MP, F, xp)
ERROR: MethodError: no method matching NelderMeadOptions(::ProductRepr{Tuple{ProductRepr{Tuple{MVector{2, Float64}, MMatrix{2, 2, Float64, 4}}}, ProductRepr{Tuple{MVector{2, Float64}, MMatrix{2, 2, Float64, 4}}}}}, ::StopAfterIteration; α=1.0, γ=2.0, ρ=0.5, σ=0.5, retraction_method=ExponentialRetraction(), inverse_retraction_method=LogarithmicInverseRetraction())

or perhaps other errors, eg:

julia> xMean = quasi_Newton(M, F_prior, gradF_prior, x; debug= [:Iteration, :Cost,"\n",10])
InitialF(x): 1.4945569745366585
ERROR: vector_transport_to! not implemented on ProductManifold(TranslationGroup(2; field = ℝ), SpecialOrthogonal(2)) for arguments ProductRepr{Tuple{MVector{2, Float64}, MMatrix{2, 2, Float64, 4}}}, ProductRepr{Tuple{MVector{2, Float64}, MMatrix{2, 2, Float64, 4}}}, ProductRepr{Tuple{MVector{2, Float64}, MMatrix{2, 2, Float64, 4}}}, ProductRepr{Tuple{MVector{2, Float64}, MMatrix{2, 2, Float64, 4}}} and ParallelTransport.

julia> particle_swarm(M, F_prior; n, x0)
ERROR: ArgumentError: broadcasting requires an assigned BroadcastStyle

gradF(MP, xp)
ERROR: MethodError: no method matching fill!(::ProductRepr{Tuple{MVector{2, Float64}, MMatrix{2, 2, Float64, 4}}}, ::Int64)

The other issues are more to do with a lack of knowledge coming from "I just want to optimize my cost function", such as

the choice of algorithm and parameters. (basically ran them and compared results)
how to write my gradient function (just used automatic diff)
how to work with covariences (previously used Mahalanobis distance for measurement and hessian to estimate answer)

I hope this helps improve the quality of this great package and let me know if there is anywhere I can help. (although I still have a lot to learn)

0 replies

kellertuer · 2021-06-18T20:15:04Z

kellertuer
Jun 18, 2021
Maintainer

For the first and simple example, there is a certain tradeoff, especially on manifolds.
The constant step size might not converge, but other rules, that require function evaluations and/or exponential maps/retractions (Armijo needs both quite a lot) might take a long computational time. One has the choice between a user “stumbling” over either one of these.

We can still, for sure, switch to Armijo, I don‘t have a strong preference there.
The main problem is not that Manopt is that new (I started in 2016, but only got a proper first version in 2018), but that I am the only developer. Sure, some algorithms are provided by others, some of them were students of mine.
That‘s why some parts have not been used yet extensively, which always usually strengthens notes and tutorials. I hope, though, that the documentation is complete (concerning optional arguments for example).

The summary sounds like a good idea, and holds here completely analogously for Newton with TR, and LBFGS (BFGS uses https://manoptjl.org/stable/solvers/quasi_Newton.html#Manopt.AbstractQuasiNewtonDirectionUpdate switch to https://manoptjl.org/stable/solvers/quasi_Newton.html#Manopt.QuasiNewtonLimitedMemoryDirectionUpdate for the LBFGS one, which might also be good to have a shorter constructor).

Additionally, for non smooth functions, if you have probes use CyclicProximalPoint or DouglasRachford (Optim does not have those but there this package has also a little different focus, namely to include non smooth optimisation on manifolds) or if you are adventurous the ChambollePock algorithm from the most recent paper.

For the summary –

0 replies

Affie · 2021-06-18T20:19:53Z

Affie
Jun 18, 2021
Author

For the summary –

Thanks! A summary like that is perfect. I was actually looking for LBFGS, but didn't recognize it until you made it obvious just now 🤦

0 replies

Affie · 2021-06-24T17:19:42Z

Affie
Jun 24, 2021
Author

Ah there is one other major issue with your function.
The sum of two distances or even a distance itself. Is convex, but not differentiable (if one point is fixed and the other is the argument). For non-differentiable functions the gradient descent is not applicable. Compare the computation of the median in the docs. This will be the main issue here (sorry that I saw this a little late).

Ahh, thanks. I was working from our factors and missed it since it's done with the Mahalanobis distance in our case. It should be distance squared. I still can't get a satisfactory result though.

Perhaps I should take a step back and rather ask how to solve on manifolds such as SE(2), SE(3), SO(2), SO(3). Perhaps start with SE(2) for a POC.
I have factors of this form so far for groups, but am open to any suggestions:

#M - manifold
#p and q are points on manifold
#m is a measurement from point p
# meas is the prior measurement 

function PointPoint(M, m, p, q)
    q̂ = compose(M, p, m)
    return distance(M, q, q̂)^2
end

function Prior(M, meas, p)	
#		
    return distance(M, meas, p)^2
end

The factors are combined into cost functions that depends on the structure of the factor graph.

0 replies

kellertuer · 2021-06-24T17:45:10Z

kellertuer
Jun 24, 2021
Maintainer

Distance squared is better, that is geodetically convex (so locally it is convex.
I am not sure how compose (and the chain rule) act then here.

For distance squared, keep in mind that globally this might still be non-unique. For example on the sphere if you minimise with respect to x the function

distance(M, n, x)^2 + distance(M, s, x)^2

Where n and s are North and South Pole, respectively. Then any x on the equator is a minimiser of the function.

Your last sentence, I again can‘t follow. what factrors? what is to combine here? Which graph? What is its structure?

0 replies

Affie · 2021-06-25T08:15:26Z

Affie
Jun 25, 2021
Author

Hi, I found something interesting. If i set retraction_method=QRRetraction() in the basic rotations example it works as expected.
I looked at the exponential retraction in rotations but can't find anything wrong with it.

0 replies

kellertuer · 2021-06-25T08:21:02Z

kellertuer
Jun 25, 2021
Maintainer

That is good to know :)
That was one of my concerns when I wrote this #86 (comment), but all tests thereon look ok. It might be good to take a look at exp/log on Rotations again then, because I also could not find anything wrong with them. Or maybe they are not stable currently at least; I am not sure, I usually do not work on that manifold.

Retractions should locally be as fine as exp (inverse retractions as fine as log), but are usually cheaper or numericaly more stable. So good that you checked.

0 replies

Affie · 2021-06-25T08:21:53Z

Affie
Jun 25, 2021
Author

For distance squared, keep in mind that globally this might still be non-unique. For example on the sphere if you minimise with respect to x the function
distance(M, n, x)^2 + distance(M, s, x)^2
Where n and s are North and South Pole, respectively. Then any x on the equator is a minimiser of the function.

That is where IncrementalInference.jl comes in, in short, you will get the ring around the equator as a belief if you set it up that way. For example figure 1 in https://marinerobotics.mit.edu/sites/default/files/fourie_iros19_manifolds.pdf has a similar case on a circular manifold.

0 replies

Affie · 2021-06-25T08:35:59Z

Affie
Jun 25, 2021
Author

Your last sentence, I again can‘t follow. what factrors? what is to combine here? Which graph? What is its structure?

Sorry, an explanation will be too much for a comment. Perhaps a basic example:

#for a user-defined graph that looks like this:

p -- x1 -- f1 -- x2 -- f2 -- x3
      \                    /
        ------- f3 ------ 

#the cost would look like this:
cost(M, x) = Prior(M, p, x[1])^2 + PointPoint(M, f1, x[1], x[2])^2 + 
             PointPoint(M, f2, x[2], x[3])^2 + PointPoint(M, f3, x[1], x[3])^2

0 replies

kellertuer · 2021-06-25T08:42:46Z

kellertuer
Jun 25, 2021
Maintainer

Note that I just turned this into a discussion, since it is not an issue (yes, I also just activated discussion). I hope that is ok

0 replies

Affie · 2021-06-25T09:38:15Z

Affie
Jun 25, 2021
Author

I found the mistake in Manopt. I'll see if I can fix it in a PR.

15 replies

Affie Jun 25, 2021
Author

Ok, sure. The exp! for rotations will just be slower now because of the allocations [cosθ sinθ; -sinθ cosθ], perhaps it can be a static array SA[cosθ sinθ; -sinθ cosθ]?

mateuszbaran Jun 25, 2021
Maintainer

Sometimes it is tempting to assume that there is no memory aliasing between output and input arguments of mutating functions since it allows for a faster implementation in some cases. But, as you can see here, it can lead to hard-to-find bugs when aliasing occurs.

mateuszbaran Jun 25, 2021
Maintainer

Yes, it can definitely be a static array, see JuliaManifolds/Manifolds.jl#392 (comment).

mateuszbaran Jun 25, 2021
Maintainer

I've opened an issue about this memory aliasing: JuliaManifolds/Manifolds.jl#393 🙂

kellertuer Jun 25, 2021
Maintainer

Ok, sure. The exp! for rotations will just be slower now because of the allocations [cosθ sinθ; -sinθ cosθ], perhaps it can be a static array SA[cosθ sinθ; -sinθ cosθ]?

Maybe it is slower, but I prefer just Rotations(2) to be slower compared to all of Manopt-gradient-related stuff being slower.

Affie · 2021-06-25T10:58:22Z

Affie
Jun 25, 2021
Author

BTW, with the fix, my basic gradient_decent example finishes in 2 iterations with the exact value as expected.

1 reply

kellertuer Jun 25, 2021
Maintainer

Great!

From the code I expected this from the beginning, the memory effects were just hard to track down, since I avoided this effect in all my implementations anyways and I did not see (until you mentioned it works with the retraction) that that might be the bug.

Affie · 2021-06-28T09:15:42Z

Affie
Jun 28, 2021
Author

Slightly related to this, but I can also open an issue in Manifolds.jl if it fits better there:

I'm looking into using the Mahalanobis distance in the cost function, according to (Pennec [15]):

It led me to take a closer look at the choice of the inner product (according to docs, the metric from the embedding for SO(n)) and the distance function in Manifolds.jl (on SO(n)).

M = Rotations(2)
ϵSO2 = SA[1. 0; 0 1]
p = ϵSO2
X = hat(M, ϵSO2, [pi/4])
q = exp(M, ϵSO2, X)
dR = distance(M, p, q)
#1.1107207345395915

M = Circle()
p = 0
q = pi/4
dC = distance(M, p, q)
#0.7853981633974483

dR * 1/sqrt(2) == dC

Is the distance on SO still considered correct even if it doesn't agree with the same value on the Circle manifold and intuition, because of another metric in use?
Is there a place for a Mahalanobis distance in Manopt.jl? or Manifolds.jl?

16 replies

mateuszbaran Jun 28, 2021
Maintainer

Looks right to me, with dimensions of inv_Σ the same as the get_coordinates of a (log) point on M. Perhaps just double check whether the returned scalar distance is squared or not. I'd say stay with the p-root definition (i.e. not the square)?
https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.norm

The function with inner wouldn't really return a distance, and the one with norm should return scalar distance. You can of course use a different norm there

I think it is likely that for IncrementalInference.jl we will likely work in Hilbert space for now. We might also try Frobenius norm at a later stage. Just thought I'd mention if that influences any potential design choices now.

Out of curiosity, what would be in that Hilbert space? Points p and q for that mahalanobis_distance? Functional spaces present some interesting challenges.

dehann Jun 28, 2021

Out of curiosity, what would be in that Hilbert space?

TL;DR: arbitrary function approximations to represent and operate with non-Gaussian probability densities, for probabilistic chain/sum rules. Perhaps the joint probability over all system variables is intractible, but we decompose a larger problem into 'cliques' (think Bayesian networks) where the average sub-system dimension is around 5 - 50 degrees of freedom. Something a computer can perhaps handle.

The way I try think about it is that a probability density (aka function) on variables we want to estimate is a scalar field on manifold M. We evaluate the probability density function at point p. In this case above, the Malahanobis distance (L2) between two points is the normal (i.e. Gaussian) association of probabilistic events (taking one of the two points as the tangent space expansion, e.g. in SO(n), ΔX = log(R1'*R2), or alternatively Frobenius norm_F(Δy) = || R1 - R2 ||_F). This probability density is / should / can be in a Hilbert space, since we expect the scaled L2 integral bound at 1. Assuming conditions for the log are met. Structuring the overall joint system distance as a product of Hilbert spaces (i.e. have an inner product, etc.) makes available a whole bunch of machinery developed in other areas, for example from machine learning, kernel Gram matrices etc. What I think is important to get right in all this is to draw the connections from "linear least squares" through to "approximated non-Gaussian probability density functions on-manifold" as all being in a (Hilbert) space. From there research could lead towards more general Banach spaces. cc'ing @david-m-rosen from a previous conversation suggesting (and think he has an interest too) in other spaces beyond p=2. Hilbert works well for convex twice differentiable, but what to do when working with non-convex (i.e. #90 (comment)). So, if somehow it becomes humanly digestible to go from L2 to say L1 then there can be real benefits (either standardizing tools already developed, or develop other optimization techniques). Robotics for example can make some assumptions perhaps not available in the fully general sense -- a robot must stand, kinematic joints have inertia and are always connected (hopefully), sensors have predictable stochastics, etc.

mateuszbaran Jun 28, 2021
Maintainer

Wow, that's a very ambitious project.

A few notes:

Perhaps the joint probability over all system variables is intractible, but we decompose a larger problem into 'cliques' (think Bayesian networks) where the average sub-system dimension is around 5 - 50 degrees of freedom. Something a computer can perhaps handle.

OK, I see. One related thing I'd like to explore soon-ish is normalizing flow-like pdf estimation on manifolds, so I'm definitely interested in improving support for distributions in Manifolds.jl.

The way I try think about it is that a probability density (aka function) on variables we want to estimate is a scalar field on manifold M.

The concept of a pdf on a manifold is complicated on its own. It definitely requires integration on manifolds (which we don't have yet) and much more. Definitely doable but quite a bit of work.

We evaluate the probability density function at point p. In this case above, the Malahanobis distance (L2) between two points is the normal (i.e. Gaussian) association of probabilistic events (taking one of the two points as the tangent space expansion, e.g. in SO(n), ΔX = log(R1'*R2), or alternatively Frobenius norm_F(Δy) = || R1 - R2 ||_F). This probability density is / should / can be in a Hilbert space, since we expect the scaled L2 integral bound at 1

OK, that's a relatively complex thing. What exactly do you mean by probability density being in a Hilbert space? PDFs have an interesting property of living on a manifold which, when given the classic Fisher-Rao metric, is curved. So you can have manifolds of distributions on a manifold. There is even more structure there, and the theory of Hilbert manifolds (which is relevant for nonparametric distributions) is a bit confusing.

From there research could lead towards more general Banach spaces. cc'ing @david-m-rosen from a previous conversation suggesting (and think he has an interest too) in other spaces beyond p=2. Hilbert works well for convex twice differentiable, but what to do when working with non-convex (i.e. #90 (comment)). So, if somehow it becomes humanly digestible to go from L2 to say L1 then there can be real benefits (either standardizing tools already developed, or develop other optimization techniques).

There are important concepts that go beyond L2 when you start considering connection manifolds (a generalization of Riemannian manifolds). It then leads to information geometry (see for example here: https://project.inria.fr/gudhi/files/2018/10/An-elementary-introduction-to-information-geometry.pdf ) but I've so far only digested the most basic ideas.

In summary, it's definitely a direction I'd like to push Manifolds.jl in but it still requires a major effort.

dehann Jun 28, 2021

it's definitely a direction I'd like to push Manifolds.jl in but it still requires a major effort.

I would like to help do so! I was actually going to help write L-BGFS here, but then recently Ronny mentioned it's already done. On my side finishing the new doc PRs 355/366 is the first step to contributing to JuliaManifolds (beyond just issues). You can probably tell we are busy with a major overhaul of IncrementalInference.jl to adopt Manifolds.jl and Manopt.jl as a fundamental (with @Affie ). I'm pretty sure we will be contributing more to JuliaManifolds code and docs once the consolidation work settles and the more R&D can begin.

What exactly do you mean by probability density being in a Hilbert space?
There is even more structure there, and the theory of Hilbert manifolds (which is relevant for nonparametric distributions) is a bit confusing.

This is one of the main objectives of ApproxManifoldProducts.jl, and probably too long to write out here -- happy to share and discuss for sure! Short answer is I'm between application and research on this topic. Much of the existing probability function approximation on manifolds (at least in engineering) is discrete -- yuck! Simultaneously, the machine learning literature do a great deal of nonlinear function approximation -- so I feel there should be something between these two. I think this overall discussion in this thread is in the right area to be thinking. What i'd like to get is the computational tool for approximating arbitrary PDFs on manifold M (i.e. a function consisting of N degrees of freedom). What I mean is f(M, p; eta), where p (a point on M) are the variables of interest (robot position velocity sensor calibration etc) and dim(eta) = N is the fidelity of the function f itself. So a "good" approximation f has more degrees of freedom (barring the actual estimation process). So my intuition is that f should should exist in some Hilbert space. The hard part for me is describing succinctly the relation on how many points on M (i.e. kernels perhaps) come together to define "one function on M" -- i suspect push-forward and pull-backs will show up. Also charts/atlases... And then, we need to do probabilistic operations with these representations. Now, we already have a lot of this in proof of concept form, but the legacy Euclidean R^n is baked in everywhere. So the work right now is undoing that assumption and redressing everything on M rather than just Euclidean R^n. Turns out almost all the code needs to be changed :-P

There are important concepts that go beyond L2 when you start considering connection manifolds (a generalization of Riemannian manifolds). It then leads to information geometry

Cool, thanks for the reference! So i'm going to play it safe and stay with L2 for now (just to limit the scope of all the work), but one day I hope to get to L1.

mateuszbaran Jun 28, 2021
Maintainer

I see, I currently think that very likely a very good way for nonparametric approximation of distributions on manifolds may be continuous normalizing flows, see here: https://arxiv.org/pdf/2006.10254.pdf . I'm going to work on some variant of that for Manifolds.jl during the summer. It's great that we have similar interests 🙂 .

Cool, thanks for the reference! So i'm going to play it safe and stay with L2 for now (just to limit the scope of all the work), but one day I hope to get to L1.

Yes, I wouldn't recommend going into L1 before the L2 case is decently understood.

Affie · 2021-07-01T14:15:24Z

Affie
Jul 1, 2021
Author

Hi, I was just curious as to how much work it would be to make our own data structure (Factor Graphs) work with Manopt.jl.
From what I can tell if I implement similar functions as in ProductManifold it should work. We already store the point as well as the manifold in every variable.
The other part is if it would be worth it performance-wise. I'm thinking of giving it a go, to see how it performs.

The alternative looks like it would be to copy out all the points and their manifold into a new ProductManifold and ProductRepresentation. (this is similar to how we currently work in Optim.jl)
Is this the recommended way of working with a large number of points (think thousands) with different manifolds and different cost functions in Manopt?

14 replies

kellertuer Jul 2, 2021
Maintainer

I would just store the tangent vector (created sums as discussed below) within the algorihm; and if you start with a gradient descent, this is not the case. For quasi newton (which you might consider next), it is, but it would (only) store tangent vectors to the ProductManifold we already discussed.

mateuszbaran Jul 2, 2021
Maintainer

The problem with ProductRepr is that it really won't work well with the dictionary approach that you have. I think I can refactor ProductManifold similarly to PowerManifold and make it more general so that you'd have very little duplication with it.

For now I'd expect something like that to work well:

function exp!(M::ProductManifold, q, p, X) # q and p are variables from a graph, X is a separate tangent structure
    for (qi, pi, Xi, i) in zip(ordered_dict_variables(q), ordered_dict_variables(p), ordered_dict_tangents(X), 1:length(M.manifolds))
        exp!(M.manifolds[i], qi, pi, Xi)
    end
    return q
end

Then X may even be just an ordered dict of tangents at variables of the graph.

This way you don't have to map back and forth between ProductRepr and the factor graph structure you need to evaluate the cost function.

mateuszbaran Jul 2, 2021
Maintainer

Even better, you can just overload submanifold_components(M, p) to return that ordered dict of variables and it should work.

mateuszbaran Jul 2, 2021
Maintainer

Thanks again for all the help. It really cut down on the learning curve to use Manifolds.jl and Manopt.jl

Thanks, I really like your use case as it should nicely highlight the strengths of both libraries 🙂 .

mateuszbaran Jul 2, 2021
Maintainer

Oh, and if you bump into a problem that tangent vectors need to be broadcastable for Manopt.jl, I can fix that for ordered dicts 🙂 .

kellertuer · 2021-07-02T07:00:14Z

kellertuer
Jul 2, 2021
Maintainer

Ah, with that example it gets more clear, so what you have is that (x0, x1, x2) is your “parameter space” and basically a Product manifold. Then you also have a nice representation for a tangent vector, since it is a tangent vector to said product manifold.

The factors are single real-valued functions and what you want to minimise is their sum? Then one could write their gradients as being to seid product manifold (with corresponding zero tangent vectors for variables that do not appear). If the graph is very sparse on could with of trying to do this sparse, too.

12 replies

kellertuer Jul 2, 2021
Maintainer

Thanks for all the details!

Definitely, it's a whole other story -- punchline is we are using a relatively advanced "elimination tree" (actually we use the "Bayes tree") to only optimize over a small subset of the problem at one time. This subset (a.k.a. a clique) is assumed to be densely connected.

That is even better, since that works directly and sparsity is not yet used on a product manifold. So this is even better for a ProductManifold :)

We keep the "robotics names" of Pose, Point, Bearing, etc for a host of technology, community, and task-specific reasons.

This is totally fine and thanks for the link to explanations. I hope I do not ask too often for what they mean; I might, however, sometimes use the Optimization terminology, too.
From that view it sounds a little strange that you have “Factors” in the cost function that are summands ;) but sure, that is mainly adapted terminology form another field of science.

dispatching on either factor or variable types -- i.e. getManifold(Pose2) or getManifold(Pose3Pose3) should return the relevant <:AbstractManifold.

Just a naming remark, functions usually have camel case in most Julia cases, i.e. get_manifold (but in Manopt I also started with the slightly un-Julian names and at one point renamed them all).

dehann Jul 2, 2021

I hope I do not ask too often for what they mean; I might, however, sometimes use the Optimization terminology, too.

Oh yeah, please do, all good! Terminology is half the battle :-P What I'd like to do on our side is generate a nomenclature table of Manifolds vs "robotics" names (e.g. Images.jl). The idea is users can build their own Manifolds as per ManifoldsBase without having to build any other wrappers etc. JuliaManifolds is for us now the gold standard abstraction on how "robotics.variables and robotics.factors" will interact. That said, the learning curve (ie change in terminology) has just been quite tricky, at least for me.

Just a naming remark, ... in most Julia cases, i.e. get_manifold

thanks yes, getManifolds is not normal Julia syntax and comes up quite often :-), but we decided to follow a Java / JS / C# (maybe even C++) convention -- as traditional camelCase. Main reason is to meet the object orientated aficionados half way, and hopefully convince them multiple dispatch is pretty good (aka the generalization). Robotics community is way too attached to OO (my opinion), so the conversion to multiple dispatch for that community might be slow -- so hopefully a somewhat familiar naming convention will help... time will tell. Maybe one day we do a big deprecation cycle or something crazy like that.

it sounds a little strange that you have “Factors” in the cost function that are summands ;)

Perhaps I could ask for a definition of "Factors" that you would use in JuliaManifolds please?

We adopted the name "factor" from "Factor Graphs and sum product", Kschischang et al. A graphical representation in a bi-partite graph representing a known n-ary relationship between "variables". We go further in the probabilistic sense that "factors" are assumed statistically independent given the structure of the graph.

kellertuer Jul 2, 2021
Maintainer

Perhaps I could ask for a definition of "Factors" that you would use in JuliaManifolds please?

Well, that depends. Your “Variables” (x0,x1,x2above) in optimization I would call the variables, too, combined on the ProductManifold I would call them a “point” and the single components then ... components.

The Manifolds in the product manifold, say M = M1 × M2 × M3 I would maybe call factors.

Terms like Prior(SpecialEuclidian(2), m, p) is nothing that would get a name in the manifold sense, but in Optimisation they would really just be summands of the cost (function). Usually they get important when one considers some splitting methods, for example when functions of the form F(p) = G(p) + H(p) are used, i.e. gradients / proximal maps of G and H instead of from the whole F, especially when they are easier to evaluate than the whole gradient / prox. Sometimes this is also done for more than sum, but the main work is done for a splitting into two functions (forward-backward splitting, Chambolle-pock algorithms are two methods for non-smooth functions).

dehann Jul 2, 2021

Ah cool, got it, that helps a lot thanks!

dehann Jul 2, 2021

The Manifolds in the product manifold, say M = M1 × M2 × M3 I would maybe call factors.

Irony is I think that is exactly the product manifold we are going to construct for the joint optimization over variables. So actually the names fit together pretty well. I have just been using factor as the probabilistic function (not the manifold domain on which the function exists). At least there is some consistency!

Affie · 2022-06-09T13:14:29Z

Affie
Jun 9, 2022
Author

From Slack discussion on metrics for SE(n):

That's because now special euclidean has the left-invariant metric, and logarithmic map for left-invariant metrics is implemented generically using log_lie

I don't think our functions are written to work with the group exponent (log_lie)
I don't want to have a miscommunication, so I'll reference the image from A micro lie theory for state estimation in robotics paper again. Am I correct that log_lie relates to the first oplus equation, translation and rotation simultaneously. We used the second oplus equations where translation is followed by rotation. If I'm not mistaken the previous exp(SE(n), p, X) in manifolds was the third equation as the exp of SE was lowered down to the separate SO and Translation groups, rotations and translations did not interact?

53 replies

Affie Jun 27, 2022
Author

Note, that with g2 above, noticed that the rotation of adding pi/2 + pi/2 = +pi, however, in the strict sense it should be -pi given the range of values on the manifold for the rotation coordinate being [-pi,pi). Should* be okay for now, but may cause numerical quirks down the road during autodiff or other operations.

There is a numerical test failing in RoME that might be related, but I also moved away from Optim's own project and retract functions, so I can't confirm if it's a new issue or my fault.

I'll investigate further when I have some time.

Affie Jun 27, 2022
Author

One notable thing is that yet another thing I've started working on is reworking Manopt.jl interface. I have sketched an interface in the style of https://github.com/SciML/Optimization.jl , it is more generic and can be differentiated through relatively easily.
Also, most of Manopt has not been optimized performance-wise yet, and it doesn't (yet) offer other constraints than manifolds.

Thank you for pointing out this project. This will be ideal for us to be flexible with regard to which optimization package to use. Do I understand correctly that I can implement Optimazation.jl's interface and when Manopt.jl is upgraded it should just be switching between different optimization backends?

mateuszbaran Jun 27, 2022
Maintainer

I'd just like to verify the following please (is this correct):

Yes, this looks correct.

Note, that with g2 above, noticed that the rotation of adding pi/2 + pi/2 = +pi, however, in the strict sense it should be -pi given the range of values on the manifold for the rotation coordinate being [-pi,pi). Should* be okay for now, but may cause numerical quirks down the road during autodiff or other operations.

Well, technically

julia> pi/2 + pi/2 < pi
true

My general advice about working near singular points: don't work near singular points. You can for example choose another representation or a different chart on SE(n) / SO(n). Of course we should be careful how we handle such points but there the possibilities are limited.

Thank you for pointing out this project. This will be ideal for us to be flexible with regard to which optimization package to use. Do I understand correctly that I can implement Optimazation.jl's interface and when Manopt.jl is upgraded it should just be switching between different optimization backends?

No problem 🙂 . Optimization.jl is great, and it should get a Manopt.jl backend, likely some time next month. It won't be exactly just switching the optimizer (there are some custom solver-specific options not covered by the generic interface) but I think it's a good way to handle optimization, and you will be able to switch between backends relatively easily.

That interface is also a vital part of handling problem parameter sensitivity in optimization. I heard SciML ecosystem is also working towards that, and the interface works well for that purpose, so there are some good new things coming.

dehann Jun 28, 2022

My general advice about working near singular points: don't work near singular points.

Agreed, we avoid singular points as far possible. Not a serious issue for us at this time, and part of why we transitioning to Manifolds.jl as soon as possible!

dehann Jul 12, 2022

For completeness, I also recently found this paper:

Chirikjian, G.S. and Zhou, S., 1998. Metrics on motion and deformation of solid models.

Affie · 2022-06-09T13:28:01Z

Affie
Jun 9, 2022
Author

Discussion on AD

Mateusz Baran 3 hours ago
OK, cool 🙂 . Key parts of AD support are working already. I'm currently working on a paper about it and the current plan is to finish basic sensitivity analysis of manifold optimization before releasing it but if you need it sooner then I can release the parts that you need.

Mateusz Baran 3 hours ago
Depending on what function you need gradients of, AD is either very simple or very complicated

Mateusz Baran 3 hours ago
also, depends on whether you only need to "Riemannize" the final gradient or also some intermediate computations

Mateusz Baran 3 hours ago
We have a system that adds special rules for forward and reverse AD, either doing projections as an intermediate step or doing some more complicated stuff when necessary, for example using Jacobi field-based sensitivity

Johannes Terblanche 1 hour ago
I don't know enough about the subject to give an informed answer. Starting off I want to look into replacing Optim with Manopt towards supporting more manifolds in the future. We currently use no hand calculated gradients, but my idea is to have an interface that supports custom gradient functions with fallback to your choice of AD forward/reverse/finite depending on the function. The manifolds will likely first be SE(2/3), SO(2/3) and TranslationGroup followed by expanding to other Lie groups and most likely other manifolds on a per need basis, but the framework should be able to support for any Manifold.jl manifold. I think it is possible due to the good work you lot have done on the Manifolds.jl interface.
Regarding the functions, a lot can be done with only 'distance', but it can get complicated very fast with differential equations and machine learning factors between variables.

2 replies

kellertuer Jun 9, 2022
Maintainer

Maybe we should add a new topic for AD and not write all topics in this one?

Besides that – yes we are working towards AD on Manifolds, see https://manoptjl.org/stable/pluto/AutomaticDifferentiation/ for some first steps, but as Mateusz said, AD on manifolds is not that easy.

mateuszbaran Jun 10, 2022
Maintainer

Some further steps are here: https://github.com/JuliaManifolds/ManifoldDiff.jl .

Affie · 2023-10-11T14:51:32Z

Affie
Oct 11, 2023
Author

Just to give an update on our ongoing upgrade to Manifolds.jl and Manopt.jl.
With all the recent performance fixes (thanks) we can solve 10k variable problems in a reasonable time.

I chose Levenberg Marquardt to start with as it's the most familiar to me and I could use a sparse Jacobian. Related questions:
- I sometimes get a Cholesky factorization failed error.
- I was wondering if it's needed to re-calculate a new Jacobian if the last step was not successful.

Do you have any recommendations for a Manopt solver to try next? I want to have more than one to choose from and it looks like if I upgrade to support one of the gradient-based methods a few will work in the same way.

7 replies

mateuszbaran Oct 12, 2023
Maintainer

I see, 100x slower is quite a lot and would require significant work to improve. Maybe NLLSolver would work, though they seem to be using a very simple parametrization trick for non-Euclidean variables which is not ideal. There is no notion of retractions or at least chart switching.

3. Non-parametric [de/]convolutions on one variable at a time on 100's of samples, we are currently using Optim.jl's NelderMead (I want to upgrade this). It's also written as a nonlinear least squares problem, but it's often underconstrained.

If the problem is under-constrained, then some kind of regularization could help. The issue is that regularized optimization is a much harder problem or at least requires application-specific tricks.

Affie Oct 12, 2023
Author

I was wondering if it's needed to re-calculate a new Jacobian if the last step was not successful.
This doesn't look necessary. We could just remember if a step was made or not.

I tried it here: #303

dehann Oct 12, 2023

Cholesky

It's probably worth trying QR factorization which is numerically more robust and about 2x slower. QR itself is a bit more robust when using Householder reflections (vs Givens rotations).

dehann Oct 12, 2023

Underconstrained and regularization

We want to investigate solving these pieces of the problem (ie cliques from Bayes network terminology, J Pearl) using tangent bundles and parallel transport.

mateuszbaran Oct 12, 2023
Maintainer

Cholesky

It's probably worth trying QR factorization which is numerically more robust and about 2x slower. QR itself is a bit more robust when using Householder reflections (vs Givens rotations).

Sure, I think it would be best to just make the linear problem subsolver replaceable by external code.

Underconstrained and regularization

We want to investigate solving these pieces of the problem (ie cliques from Bayes network terminology, J Pearl) using tangent bundles and parallel transport.

I see, I'll have to check it out. Note that there is no closed-form parallel transport on general tangent bundles.

Discussion on how to use Manopt.jl in IncrementalInference.jl/Caesar.jl #90

Replies: 52 comments · 120 replies

kellertuer Jun 18, 2021 Maintainer

kellertuer Jun 18, 2021 Maintainer

kellertuer Jun 18, 2021 Maintainer

kellertuer Jun 18, 2021 Maintainer

kellertuer Jun 18, 2021 Maintainer

kellertuer Jun 18, 2021 Maintainer

kellertuer Jun 18, 2021 Maintainer

Affie Jun 18, 2021 Author

Affie Jun 18, 2021 Author

kellertuer Jun 18, 2021 Maintainer

Affie Jun 18, 2021 Author

Affie Jun 24, 2021 Author

kellertuer Jun 24, 2021 Maintainer

Affie Jun 25, 2021 Author

kellertuer Jun 25, 2021 Maintainer

Affie Jun 25, 2021 Author

Affie Jun 25, 2021 Author

kellertuer Jun 25, 2021 Maintainer

Affie Jun 25, 2021 Author

Affie Jun 25, 2021 Author

mateuszbaran Jun 25, 2021 Maintainer

mateuszbaran Jun 25, 2021 Maintainer

mateuszbaran Jun 25, 2021 Maintainer

kellertuer Jun 25, 2021 Maintainer

Affie Jun 25, 2021 Author

kellertuer Jun 25, 2021 Maintainer

Affie Jun 28, 2021 Author

mateuszbaran Jun 28, 2021 Maintainer

mateuszbaran Jun 28, 2021 Maintainer

mateuszbaran Jun 28, 2021 Maintainer

Affie Jul 1, 2021 Author

kellertuer Jul 2, 2021 Maintainer

mateuszbaran Jul 2, 2021 Maintainer

mateuszbaran Jul 2, 2021 Maintainer

mateuszbaran Jul 2, 2021 Maintainer

mateuszbaran Jul 2, 2021 Maintainer

kellertuer Jul 2, 2021 Maintainer

kellertuer Jul 2, 2021 Maintainer

kellertuer Jul 2, 2021 Maintainer

Affie Jun 9, 2022 Author

Affie Jun 27, 2022 Author

Affie Jun 27, 2022 Author

mateuszbaran Jun 27, 2022 Maintainer

Affie Jun 9, 2022 Author

Discussion on AD

kellertuer Jun 9, 2022 Maintainer

mateuszbaran Jun 10, 2022 Maintainer

Affie Oct 11, 2023 Author

mateuszbaran Oct 12, 2023 Maintainer

Replies: 52 comments 120 replies

kellertuer
Jun 18, 2021
Maintainer

kellertuer
Jun 18, 2021
Maintainer

kellertuer
Jun 18, 2021
Maintainer

kellertuer
Jun 18, 2021
Maintainer

kellertuer
Jun 18, 2021
Maintainer

kellertuer
Jun 18, 2021
Maintainer

kellertuer
Jun 18, 2021
Maintainer

Affie
Jun 18, 2021
Author

Affie
Jun 18, 2021
Author

kellertuer
Jun 18, 2021
Maintainer

Affie
Jun 18, 2021
Author

Affie
Jun 24, 2021
Author

kellertuer
Jun 24, 2021
Maintainer

Affie
Jun 25, 2021
Author

kellertuer
Jun 25, 2021
Maintainer

Affie
Jun 25, 2021
Author

Affie
Jun 25, 2021
Author

kellertuer
Jun 25, 2021
Maintainer

Affie
Jun 25, 2021
Author

Affie Jun 25, 2021
Author

mateuszbaran Jun 25, 2021
Maintainer

mateuszbaran Jun 25, 2021
Maintainer

mateuszbaran Jun 25, 2021
Maintainer

kellertuer Jun 25, 2021
Maintainer

Affie
Jun 25, 2021
Author

kellertuer Jun 25, 2021
Maintainer

Affie
Jun 28, 2021
Author

mateuszbaran Jun 28, 2021
Maintainer

mateuszbaran Jun 28, 2021
Maintainer

mateuszbaran Jun 28, 2021
Maintainer

Affie
Jul 1, 2021
Author

kellertuer Jul 2, 2021
Maintainer

mateuszbaran Jul 2, 2021
Maintainer

mateuszbaran Jul 2, 2021
Maintainer

mateuszbaran Jul 2, 2021
Maintainer

mateuszbaran Jul 2, 2021
Maintainer

kellertuer
Jul 2, 2021
Maintainer

kellertuer Jul 2, 2021
Maintainer

kellertuer Jul 2, 2021
Maintainer

Affie
Jun 9, 2022
Author

Affie Jun 27, 2022
Author

Affie Jun 27, 2022
Author

mateuszbaran Jun 27, 2022
Maintainer

Affie
Jun 9, 2022
Author

kellertuer Jun 9, 2022
Maintainer

mateuszbaran Jun 10, 2022
Maintainer

Affie
Oct 11, 2023
Author

mateuszbaran Oct 12, 2023
Maintainer