Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix the derivative of the projection on exponential cones #59

Merged
merged 2 commits into from
May 5, 2023

Conversation

AxelBreuer
Copy link
Contributor

this fix corrects the wrong differentials reported in
#57
#58
cvxgrp/cvxpylayers#135

@akshayka
Copy link
Member

akshayka commented May 3, 2023

Hey @AxelBreuer, thanks for making this PR. It's great this makes the tests pass. Can you point to the math that motivates this change?

It's been a while since I've looked at the math (and this code). But I think t, i.e. the optimal dual variable associated with the projection problem, should appear in J_inv.

I'm looking at pages 23-24 of https://arxiv.org/pdf/1705.00772.pdf. With the caveat that I haven't looked at this code or math in a long time, I believe J_inv should be equal to the matrix D in equation (S.6), except for some reason we've permuted the rows in our code. Converting to their notation, rs equals [z1*, z2*, z3*], x = [z1, z2, z3], and t = nu*.

If that's correct, then I think (one) of the fixes is to just set l = t (or in other words eliminate l).

I have an additional question:

  • Is the third equation in (S.6) off by a sign error? Shouldn't the differential of (S.4) be dz3* + dv* = dz3, so shouldn't the coefficients be [0, 0, 1, 1], not [0, 0, 1, -1]?

For all of my points, I might be totally wrong. But it would be great if someone could double check ...

@sbarratt @bamos

@AxelBreuer
Copy link
Contributor Author

AxelBreuer commented May 3, 2023

Hi Akshay, thanks for your quick reply !

To be honest I haven't read https://arxiv.org/pdf/1705.00772.pdf as I worked with the formulas found in https://arxiv.org/pdf/1811.02157.pdf. In fact, thanks to your reference, I understand better now the naming convention [r,s,t] found in in the C++ code (the naming convention in my source article was [x,y,z]).

It is funny that you mention the, more elegant, solution of using l=t: In fact, prior to my pull request, this is exactly the formula I started with, but it failed: the gradient estimation worsened. It was really as a last attempt, and without much belief, that I tried l = rs[2] - x_i[2] instead of l = t - x_i[2] (which should be equivalent to l=t according to my understanding of _proj_exp_cone())... and it worked ! Basically, I implemented the third equality of equations (25) in https://arxiv.org/pdf/1811.02157.pdf to estimate the lagrange multiplier $\mu^*$. But I still cannot figure out why l=t didn't work out in the first place...

The most confusing part in the existing C++ code, is that vector rs should be named rst and t should be named mu or rho. At a later stage, it might be worth to modify these variable names for a better readability and comparison to equations in articles.

Apologies, but I do not understand your notation (S.4) and (S.6) hence I do not know what you are referring to (an equation in an article ? an equality in the C++ code ?). Could you give me some more explanations ?

@akshayka
Copy link
Member

akshayka commented May 3, 2023

Basically, I implemented the third equality of equations (25) in https://arxiv.org/pdf/1811.02157.pdf to estimate the lagrange multiplier . But I still cannot figure out why l=t didn't work out in the first place...

Oh, interesting! Yes, you're right, the two should be similar (mathematically, they should be the same).
Did you try comparing the values of t and rs[2] - x_i[2], e.g. by printing them, for the problems you were testing?

I wonder if increasing EXP_CONE_MAX_ITERS in https://github.com/cvxgrp/diffcp/blob/master/cpp/src/cones.cpp would change anything ...

Apologies, but I do not understand your notation (S.4) and (S.6) hence I do not know what you are referring to (an equation in an article ? an equality in the C++ code ?). Would you give me some more explanations ?

Sorry about that. Actually I was totally mistaken in my additional question, there is no sign error, the coefficients are fine, so never mind that comment.

@AxelBreuer
Copy link
Contributor Author

Did you try comparing the values of t and rs[2] - x_i[2], e.g. by printing them, for the problems you were testing?

No, I did not, until you asked :-)

Below are the values for the test case #57:

x_pert-x = [-2.93045271e-07 -2.07199327e-07 5.00255416e-07]
dx = [-2.92893024e-07 -2.07106723e-07 4.99999648e-07]
analytical = [-2.92893219e-07]

rs[2] - x_i[2] = 0.707107 | t = 1
rs[2] - x_i[2] = 1.70711 | t = 1
rs[2] - x_i[2] = 0.707107 | t = 1
rs[2] - x_i[2] = 1.70711 | t = 1
rs[2] - x_i[2] = 0.707108 | t = 1
rs[2] - x_i[2] = 1.70711 | t = 1
rs[2] - x_i[2] = 0.707108 | t = 1
rs[2] - x_i[2] = 1.70711 | t = 1

-> So, in this particular test case, rs[2] - x_i[2] and t are signifcantly different.

I wonder if increasing EXP_CONE_MAX_ITERS in https://github.com/cvxgrp/diffcp/blob/master/cpp/src/cones.cpp would change anything ...

Below the same test case as above, but this time with EXP_CONE_MAX_ITERS = 2000 (instead of 200 previously):

x_pert-x = [-2.93045271e-07 -2.07199327e-07 5.00255416e-07]
dx = [-2.92893024e-07 -2.07106723e-07 4.99999648e-07]
analytical = [-2.92893219e-07]

rs[2] - x_i[2] = 0.707107 | t = 1
rs[2] - x_i[2] = 1.70711 | t = 1
rs[2] - x_i[2] = 0.707107 | t = 1
rs[2] - x_i[2] = 1.70711 | t = 1
rs[2] - x_i[2] = 0.707108 | t = 1
rs[2] - x_i[2] = 1.70711 | t = 1
rs[2] - x_i[2] = 0.707108 | t = 1
rs[2] - x_i[2] = 1.70711 | t = 1

-> So, in this particular test case, EXP_CONE_MAX_ITERS does not improve things.

@AxelBreuer
Copy link
Contributor Author

AxelBreuer commented May 3, 2023

Regarding the two last observations:

...signifcantly different...

and

... not improve things ...

maybe this indicates that there is still a living bug in _proj_exp_cone() .

Do you have an article explaining the details of the 1D-Newton-based algorithm behind _proj_exp_cone() ? (Which is clearly not the Newton-based algorithm described in section 6.3.4 of https://web.stanford.edu/~boyd/papers/pdf/prox_algs.pdf#page65)

Never the less, would it be possible for you to validate my pull request for now ? or is this potential existence of a bug in the 1D-Newton based implementation blocking ?

@AxelBreuer
Copy link
Contributor Author

AxelBreuer commented May 4, 2023

A few remarks concerning the technical details of the algorithm behind _proj_exp_cone() :

I found some interesting details in https://github.com/matbesancon/MathOptSetDistances.jl/blob/master/src/projections.jl:
(line 293) function _exp_cone_proj_case_4(v::AbstractVector{T}; tol=1e-8) where {T}
which refers to 2 articles:

  1. https://docs.mosek.com/whitepapers/expcone-proj.pdf (for a heuristic)
  2. https://docs.mosek.com/slides/2018/ismp2018/ismp-friberg.pdf, p47-48 (when the heuristic fails)

Furthermore, I read on the recent release note of scs https://github.com/cvxgrp/scs/releases/tag/3.2.3

Exponential cone projection now about 50x faster!

I wanted to share these sources, as they could be useful for an eventual re-coding of _proj_exp_cone() (in the case of a potential bug in that function)

@bodono
Copy link
Member

bodono commented May 4, 2023

It looks like _proj_exp_cone() is using the old SCS exponential cone routine based on bisection on a dual variable. It's slow but pretty robust. SCS recently switched over to the Friberg method, which is much faster and from my experiments is also robust, and importantly is easier to get higher accuracy solutions from.

Original routine here.
New routine here.

@AxelBreuer
Copy link
Contributor Author

AxelBreuer commented May 4, 2023

@bodono thanks for your helpful feedback !

By any chance, do you remember why the returned value of rho, a.k.a t, is not the lagrange multipier ? is rho supposed to be the lagrange multiplier at all ?

(Btw, I have noticed that _proj_exp_cone() in original-scs has one more input argument w than _proj_exp_cone() in diffcp: the original-scs code was copied and modified, probably by setting w=0 : maybe that is the crux)

@bodono
Copy link
Member

bodono commented May 4, 2023 via email

@AxelBreuer
Copy link
Contributor Author

AxelBreuer commented May 4, 2023

Which function do you mean? I'm pretty sure it is finding a dual
variable, but the question is which dual variable.

Excellent question !

In https://web.stanford.edu/~boyd/papers/pdf/cone_prog_refine.pdf equation (13), the projection on the exponential cone is expressed as a least squares problem with 1 equality constraint. The associated lagrangian formulation is written just above equation (25). The lagrangian multiplier I am talking about since the beginning is that mu^star.

I thought that rho returned by _proj_exp_cone() was precisely that mu^star (maybe @akshayka was also thinking the same way).

But I might be completly wrong ! (which would be an excellent news for my pr, and a bad news for my rusty math skills)

@akshayka
Copy link
Member

akshayka commented May 4, 2023

I thought that rho returned by _proj_exp_cone() was precisely that mu^star (maybe @akshayka was also thinking the same way).

Yes, I think that's what Shane and I thought at the time ... we could have been wrong.

@AxelBreuer , I suspect your change is correct ... Thanks for looking into this so deeply. Would it be possible for you to add a test to https://github.com/cvxgrp/diffcp/blob/master/tests/test_cone_prog_diff.py that exercises your change? That is, a test that fails without your change, and passes with it.

@AxelBreuer
Copy link
Contributor Author

Great ! I will add the test tomorrow: here in Paris it's the end of day.

@AxelBreuer
Copy link
Contributor Author

Couldn't wait: I have added the test

@AxelBreuer
Copy link
Contributor Author

AxelBreuer commented May 5, 2023

@bodono Out of curiosity, by any chance, do you know which constraint exactly rho is the dual of ?

@bodono
Copy link
Member

bodono commented May 5, 2023

The calculation of the gradient here shows that the cone has been reformulated to the relative-entropy cone in order to do the projection:

Screenshot 2023-05-05 at 10 11 38

In other words, rather than the constraint in the projection being z = y exp(x/y) it's x = y log (z/y). This is entirely equivalent and makes the projection more numerically stable (if I recall correctly), but it changes the dual variable since the constraint is different. Since the dual variable is just an internal variable it doesn't matter to SCS, so long as the projection is correct.

@AxelBreuer
Copy link
Contributor Author

Enligthening answer Brendan ! Thank you very much.

@AxelBreuer
Copy link
Contributor Author

AxelBreuer commented May 5, 2023 via email

@bamos
Copy link
Collaborator

bamos commented May 5, 2023

Hi, I found our discussion on the exponential cone projection so interesting that I had to write a note which sums up all details of the numerical implementaion. It is a quick note I made for myself but I sought you could find it useful (who knows, maybe another user is intrigued by your projection algorithm and you could just forward my note).

Yes, it's been great to follow along! I can't see the attachment here on github, can you try uploading it again?

@AxelBreuer
Copy link
Contributor Author

AxelBreuer commented May 5, 2023

just did (I re-edited my previous message), I hope it worked this time.

Copy link
Member

@akshayka akshayka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks so much @AxelBreuer

@akshayka
Copy link
Member

akshayka commented May 5, 2023

@AxelBreuer and @bodono , thanks so much for helping us fix this bug. It's been open for far too long.

The change looks good to me!

@akshayka akshayka merged commit fc69c3f into cvxgrp:master May 5, 2023
16 checks passed
@bodono
Copy link
Member

bodono commented May 7, 2023

It is a quick note I made for myself but I sought you could find it useful (who knows, maybe another user is intrigued by your projection algorithm and you could just forward my note). Best, projection.pdf

Oh nice, thanks for writing this up! Yes, that's basically the algorithm I implemented. The only thing I would add is that the the dual variable is bisection on the gradient of a concave function, the solution is guaranteed to exist and bisection is guaranteed to work once we have upper and lower bounds.

I should say that I tried a lot of different approaches, many different equivalent reformulations of the cone and many, many different Newton methods (2d, 3d, 4d etc). This approach was the most robust by far, even if it was slower than the Newton methods when they worked. I only recently updated it to the Friberg algorithm, which appears to be faster and at least as robust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants