fix the derivative of the projection on exponential cones #59

AxelBreuer · 2023-05-03T15:07:20Z

this fix corrects the wrong differentials reported in
#57
#58
cvxgrp/cvxpylayers#135

akshayka · 2023-05-03T17:16:01Z

Hey @AxelBreuer, thanks for making this PR. It's great this makes the tests pass. Can you point to the math that motivates this change?

It's been a while since I've looked at the math (and this code). But I think t, i.e. the optimal dual variable associated with the projection problem, should appear in J_inv.

I'm looking at pages 23-24 of https://arxiv.org/pdf/1705.00772.pdf. With the caveat that I haven't looked at this code or math in a long time, I believe J_inv should be equal to the matrix D in equation (S.6), except for some reason we've permuted the rows in our code. Converting to their notation, rs equals [z1*, z2*, z3*], x = [z1, z2, z3], and t = nu*.

If that's correct, then I think (one) of the fixes is to just set l = t (or in other words eliminate l).

~~I have an additional question:~~

~~Is the third equation in (S.6) off by a sign error? Shouldn't the differential of (S.4) be dz3* + dv* = dz3, so shouldn't the coefficients be [0, 0, 1, 1], not [0, 0, 1, -1]?~~

For all of my points, I might be totally wrong. But it would be great if someone could double check ...

@sbarratt @bamos

AxelBreuer · 2023-05-03T20:31:04Z

Hi Akshay, thanks for your quick reply !

To be honest I haven't read https://arxiv.org/pdf/1705.00772.pdf as I worked with the formulas found in https://arxiv.org/pdf/1811.02157.pdf. In fact, thanks to your reference, I understand better now the naming convention [r,s,t] found in in the C++ code (the naming convention in my source article was [x,y,z]).

It is funny that you mention the, more elegant, solution of using l=t: In fact, prior to my pull request, this is exactly the formula I started with, but it failed: the gradient estimation worsened. It was really as a last attempt, and without much belief, that I tried l = rs[2] - x_i[2] instead of l = t - x_i[2] (which should be equivalent to l=t according to my understanding of _proj_exp_cone())... and it worked ! Basically, I implemented the third equality of equations (25) in https://arxiv.org/pdf/1811.02157.pdf to estimate the lagrange multiplier $\mu^*$. But I still cannot figure out why l=t didn't work out in the first place...

The most confusing part in the existing C++ code, is that vector rs should be named rst and t should be named mu or rho. At a later stage, it might be worth to modify these variable names for a better readability and comparison to equations in articles.

Apologies, but I do not understand your notation (S.4) and (S.6) hence I do not know what you are referring to (an equation in an article ? an equality in the C++ code ?). Could you give me some more explanations ?

akshayka · 2023-05-03T21:05:22Z

Basically, I implemented the third equality of equations (25) in https://arxiv.org/pdf/1811.02157.pdf to estimate the lagrange multiplier . But I still cannot figure out why l=t didn't work out in the first place...

Oh, interesting! Yes, you're right, the two should be similar (mathematically, they should be the same).
Did you try comparing the values of t and rs[2] - x_i[2], e.g. by printing them, for the problems you were testing?

I wonder if increasing EXP_CONE_MAX_ITERS in https://github.com/cvxgrp/diffcp/blob/master/cpp/src/cones.cpp would change anything ...

Apologies, but I do not understand your notation (S.4) and (S.6) hence I do not know what you are referring to (an equation in an article ? an equality in the C++ code ?). Would you give me some more explanations ?

Sorry about that. Actually I was totally mistaken in my additional question, there is no sign error, the coefficients are fine, so never mind that comment.

AxelBreuer · 2023-05-03T21:48:53Z

Did you try comparing the values of t and rs[2] - x_i[2], e.g. by printing them, for the problems you were testing?

No, I did not, until you asked :-)

Below are the values for the test case #57:

x_pert-x = [-2.93045271e-07 -2.07199327e-07 5.00255416e-07]
dx = [-2.92893024e-07 -2.07106723e-07 4.99999648e-07]
analytical = [-2.92893219e-07]

-> So, in this particular test case, rs[2] - x_i[2] and t are signifcantly different.

I wonder if increasing EXP_CONE_MAX_ITERS in https://github.com/cvxgrp/diffcp/blob/master/cpp/src/cones.cpp would change anything ...

Below the same test case as above, but this time with EXP_CONE_MAX_ITERS = 2000 (instead of 200 previously):

x_pert-x = [-2.93045271e-07 -2.07199327e-07 5.00255416e-07]
dx = [-2.92893024e-07 -2.07106723e-07 4.99999648e-07]
analytical = [-2.92893219e-07]

-> So, in this particular test case, EXP_CONE_MAX_ITERS does not improve things.

AxelBreuer · 2023-05-03T23:30:23Z

Regarding the two last observations:

...signifcantly different...

and

... not improve things ...

maybe this indicates that there is still a living bug in _proj_exp_cone() .

Do you have an article explaining the details of the 1D-Newton-based algorithm behind _proj_exp_cone() ? (Which is clearly not the Newton-based algorithm described in section 6.3.4 of https://web.stanford.edu/~boyd/papers/pdf/prox_algs.pdf#page65)

Never the less, would it be possible for you to validate my pull request for now ? or is this potential existence of a bug in the 1D-Newton based implementation blocking ?

AxelBreuer · 2023-05-04T13:53:47Z

A few remarks concerning the technical details of the algorithm behind _proj_exp_cone() :

I found some interesting details in https://github.com/matbesancon/MathOptSetDistances.jl/blob/master/src/projections.jl:
(line 293) function _exp_cone_proj_case_4(v::AbstractVector{T}; tol=1e-8) where {T}
which refers to 2 articles:

https://docs.mosek.com/whitepapers/expcone-proj.pdf (for a heuristic)
https://docs.mosek.com/slides/2018/ismp2018/ismp-friberg.pdf, p47-48 (when the heuristic fails)

Furthermore, I read on the recent release note of scs https://github.com/cvxgrp/scs/releases/tag/3.2.3

Exponential cone projection now about 50x faster!

I wanted to share these sources, as they could be useful for an eventual re-coding of _proj_exp_cone() (in the case of a potential bug in that function)

bodono · 2023-05-04T14:37:23Z

It looks like _proj_exp_cone() is using the old SCS exponential cone routine based on bisection on a dual variable. It's slow but pretty robust. SCS recently switched over to the Friberg method, which is much faster and from my experiments is also robust, and importantly is easier to get higher accuracy solutions from.

Original routine here.
New routine here.

AxelBreuer · 2023-05-04T15:04:51Z

@bodono thanks for your helpful feedback !

By any chance, do you remember why the returned value of rho, a.k.a t, is not the lagrange multipier ? is rho supposed to be the lagrange multiplier at all ?

(Btw, I have noticed that _proj_exp_cone() in original-scs has one more input argument w than _proj_exp_cone() in diffcp: the original-scs code was copied and modified, probably by setting w=0 : maybe that is the crux)

bodono · 2023-05-04T16:00:36Z

By any chance, do you remember why the returned value of rho, a.k.a t, is

not the lagrange multipier ? is rho supposed to be the lagrange multiplier at all ? Which function do you mean? I'm pretty sure it is finding *a* dual variable, but the question is which dual variable.

(Btw, I have noticed that _proj_exp_cone() in original-scs has one more

input argument w than _proj_exp_cone() in diffcp: the original-scs code was copied and modified, probably by setting w=0 : maybe that is the crux) Probably it was just forked from an older version without `w`. I think `w` is just a warm-start (guess) of the solution, which shouldn't affect correctness (if you're talking about `exp_newton_one_d`, I don't see a `w` in `_proj_exp_cone`). By the way, if you want to test if the old projection is matching the new one (which is totally different), you can simply update this [test]( https://github.com/cvxgrp/scs/blob/master/test/problems/test_exp_cone.h#L51) to add new test points and see if the outputs match the output of diffcp. To run the test simply `To test, run 'make test' and then '$(OUT)/run_tests_direct'`.

…

On Thu, May 4, 2023 at 4:05 PM Axel Breuer ***@***.***> wrote: @bodono <https://github.com/bodono> thanks for your helpful feedback ! By any chance, do you remember why the returned value of t is not the lagrange multipier ? is t supposed to be the lagrange multiplier at all ? I am curious, because I did some extra printing: - First, the iterations in exp_get_rho_ub() for the initial lower and upper boundary: exp_calc_grad(v, x, *lb) = 1.92585| exp_calc_grad(v, x, *ub) = 1.44511 exp_calc_grad(v, x, *lb) = 1.44511| exp_calc_grad(v, x, *ub) = 0.817791 exp_calc_grad(v, x, *lb) = 0.817791| exp_calc_grad(v, x, *ub) = 2.62944e-07 exp_calc_grad(v, x, *lb) = 2.62944e-07| exp_calc_grad(v, x, *ub) = -1.24797 As we start with lb=0 and ub = 0.125, at the end of the 4th iteration, we have lb=1 and lu=2. - Second, the iterations on i in _proj_exp_cone(): lb = 1 | ub = 2 | rho =1.5 lb = 1 | ub = 1.5 | rho =1.25 lb = 1 | ub = 1.25 | rho =1.125 lb = 1 | ub = 1.125 | rho =1.0625 lb = 1 | ub = 1.0625 | rho =1.03125 lb = 1 | ub = 1.03125 | rho =1.01562 lb = 1 | ub = 1.01562 | rho =1.00781 lb = 1 | ub = 1.00781 | rho =1.00391 lb = 1 | ub = 1.00391 | rho =1.00195 lb = 1 | ub = 1.00195 | rho =1.00098 lb = 1 | ub = 1.00098 | rho =1.00049 lb = 1 | ub = 1.00049 | rho =1.00024 lb = 1 | ub = 1.00024 | rho =1.00012 lb = 1 | ub = 1.00012 | rho =1.00006 lb = 1 | ub = 1.00006 | rho =1.00003 lb = 1 | ub = 1.00003 | rho =1.00002 lb = 1 | ub = 1.00002 | rho =1.00001 lb = 1 | ub = 1.00001 | rho =1 lb = 1 | ub = 1 | rho =1 lb = 1 | ub = 1 | rho =1 lb = 1 | ub = 1 | rho =1 lb = 1 | ub = 1 | rho =1 lb = 1 | ub = 1 | rho =1 lb = 1 | ub = 1 | rho =1 lb = 1 | ub = 1 | rho =1 lb = 1 | ub = 1 | rho =1 lb = 1 | ub = 1 | rho =1 lb = 1 | ub = 1 | rho =1 which shows that lb remains equal to 1 but ub converges to 1: hence rho=(lb+ub)/2 converges to 1. This numerical behavior seems a bit odd, as we know that the gradient for rho=1 is worth 2.62944e-07 (see the first print out when the algo initialized ul and ub) but not 0.0. I would hence have expected rho = 1 - epsilon. It is precisely for that expectation, that I was wondering if there is a bug in _proj_exp_cone() — Reply to this email directly, view it on GitHub <#59 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEHQP3J5KGFPOIRVEOCZBTXEPAR3ANCNFSM6AAAAAAXURQDNE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Brendan O'Donoghue http://bodono.github.io/

AxelBreuer · 2023-05-04T16:22:15Z

Which function do you mean? I'm pretty sure it is finding a dual
variable, but the question is which dual variable.

Excellent question !

In https://web.stanford.edu/~boyd/papers/pdf/cone_prog_refine.pdf equation (13), the projection on the exponential cone is expressed as a least squares problem with 1 equality constraint. The associated lagrangian formulation is written just above equation (25). The lagrangian multiplier I am talking about since the beginning is that mu^star.

I thought that rho returned by _proj_exp_cone() was precisely that mu^star (maybe @akshayka was also thinking the same way).

But I might be completly wrong ! (which would be an excellent news for my pr, and a bad news for my rusty math skills)

akshayka · 2023-05-04T16:51:23Z

I thought that rho returned by _proj_exp_cone() was precisely that mu^star (maybe @akshayka was also thinking the same way).

Yes, I think that's what Shane and I thought at the time ... we could have been wrong.

@AxelBreuer , I suspect your change is correct ... Thanks for looking into this so deeply. Would it be possible for you to add a test to https://github.com/cvxgrp/diffcp/blob/master/tests/test_cone_prog_diff.py that exercises your change? That is, a test that fails without your change, and passes with it.

AxelBreuer · 2023-05-04T18:02:22Z

Great ! I will add the test tomorrow: here in Paris it's the end of day.

AxelBreuer · 2023-05-04T21:24:33Z

Couldn't wait: I have added the test

AxelBreuer · 2023-05-05T06:37:25Z

@bodono Out of curiosity, by any chance, do you know which constraint exactly rho is the dual of ?

bodono · 2023-05-05T09:12:48Z

The calculation of the gradient here shows that the cone has been reformulated to the relative-entropy cone in order to do the projection:

In other words, rather than the constraint in the projection being z = y exp(x/y) it's x = y log (z/y). This is entirely equivalent and makes the projection more numerically stable (if I recall correctly), but it changes the dual variable since the constraint is different. Since the dual variable is just an internal variable it doesn't matter to SCS, so long as the projection is correct.

AxelBreuer · 2023-05-05T09:15:36Z

Enligthening answer Brendan ! Thank you very much.

AxelBreuer · 2023-05-05T17:15:01Z

Hi, I found our discussion on the exponential cone projection so interesting that I had to write a note which sums up all details of the numerical implementaion. It is a quick note I made for myself but I sought you could find it useful (who knows, maybe another user is intrigued by your projection algorithm and you could just forward my note). Best, [projection.pdf](https://github.com/cvxgrp/diffcp/files/11408670/projection.pdf) Axel

bamos · 2023-05-05T17:18:00Z

Hi, I found our discussion on the exponential cone projection so interesting that I had to write a note which sums up all details of the numerical implementaion. It is a quick note I made for myself but I sought you could find it useful (who knows, maybe another user is intrigued by your projection algorithm and you could just forward my note).

Yes, it's been great to follow along! I can't see the attachment here on github, can you try uploading it again?

AxelBreuer · 2023-05-05T17:30:08Z

just did (I re-edited my previous message), I hope it worked this time.

akshayka

LGTM! Thanks so much @AxelBreuer

akshayka · 2023-05-05T18:15:10Z

@AxelBreuer and @bodono , thanks so much for helping us fix this bug. It's been open for far too long.

The change looks good to me!

bodono · 2023-05-07T11:31:04Z

It is a quick note I made for myself but I sought you could find it useful (who knows, maybe another user is intrigued by your projection algorithm and you could just forward my note). Best, projection.pdf

Oh nice, thanks for writing this up! Yes, that's basically the algorithm I implemented. The only thing I would add is that the the dual variable is bisection on the gradient of a concave function, the solution is guaranteed to exist and bisection is guaranteed to work once we have upper and lower bounds.

I should say that I tried a lot of different approaches, many different equivalent reformulations of the cone and many, many different Newton methods (2d, 3d, 4d etc). This approach was the most robust by far, even if it was slower than the Newton methods when they worked. I only recently updated it to the Friberg algorithm, which appears to be faster and at least as robust.

fix the derivative of the projection on exponential cones

5a79c79

add test_dprojection_exp()

70a68f0

akshayka approved these changes May 5, 2023

View reviewed changes

akshayka merged commit fc69c3f into cvxgrp:master May 5, 2023
16 checks passed

agramfort mentioned this pull request May 7, 2023

Large errors on hypergradient of logistic regression cvxgrp/cvxpylayers#85

Closed

This was referenced May 9, 2023

Inaccurate gradient estimation for log penalization #57

Closed

Inaccurate gradient estimation for log penalization cvxgrp/cvxpylayers#135

Closed

currymj mentioned this pull request May 9, 2023

Apparently wrong gradients for simple softmax-as-optimization formulation cvxgrp/cvxpylayers#145

Closed

AxelBreuer mentioned this pull request May 9, 2023

Inaccurate gradient estimation for Gumbel-Sinkhorn cvxgrp/cvxpylayers#120

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix the derivative of the projection on exponential cones #59

fix the derivative of the projection on exponential cones #59

AxelBreuer commented May 3, 2023

akshayka commented May 3, 2023 •

edited

AxelBreuer commented May 3, 2023 •

edited

akshayka commented May 3, 2023

AxelBreuer commented May 3, 2023

AxelBreuer commented May 3, 2023 •

edited

AxelBreuer commented May 4, 2023 •

edited

bodono commented May 4, 2023

AxelBreuer commented May 4, 2023 •

edited

bodono commented May 4, 2023 via email

AxelBreuer commented May 4, 2023 •

edited

akshayka commented May 4, 2023

AxelBreuer commented May 4, 2023

AxelBreuer commented May 4, 2023

AxelBreuer commented May 5, 2023 •

edited

bodono commented May 5, 2023

AxelBreuer commented May 5, 2023

AxelBreuer commented May 5, 2023 via email •

edited

bamos commented May 5, 2023

AxelBreuer commented May 5, 2023 •

edited

akshayka left a comment

akshayka commented May 5, 2023

bodono commented May 7, 2023

fix the derivative of the projection on exponential cones #59

fix the derivative of the projection on exponential cones #59

Conversation

AxelBreuer commented May 3, 2023

akshayka commented May 3, 2023 • edited

AxelBreuer commented May 3, 2023 • edited

akshayka commented May 3, 2023

AxelBreuer commented May 3, 2023

AxelBreuer commented May 3, 2023 • edited

AxelBreuer commented May 4, 2023 • edited

bodono commented May 4, 2023

AxelBreuer commented May 4, 2023 • edited

bodono commented May 4, 2023 via email

AxelBreuer commented May 4, 2023 • edited

akshayka commented May 4, 2023

AxelBreuer commented May 4, 2023

AxelBreuer commented May 4, 2023

AxelBreuer commented May 5, 2023 • edited

bodono commented May 5, 2023

AxelBreuer commented May 5, 2023

AxelBreuer commented May 5, 2023 via email • edited

bamos commented May 5, 2023

AxelBreuer commented May 5, 2023 • edited

akshayka left a comment

Choose a reason for hiding this comment

akshayka commented May 5, 2023

bodono commented May 7, 2023

akshayka commented May 3, 2023 •

edited

AxelBreuer commented May 3, 2023 •

edited

AxelBreuer commented May 3, 2023 •

edited

AxelBreuer commented May 4, 2023 •

edited

AxelBreuer commented May 4, 2023 •

edited

AxelBreuer commented May 4, 2023 •

edited

AxelBreuer commented May 5, 2023 •

edited

AxelBreuer commented May 5, 2023 via email •

edited

AxelBreuer commented May 5, 2023 •

edited