-
Notifications
You must be signed in to change notification settings - Fork 25
Merge LPGD into diffcp #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is great! I'm not sure what the failing builds are about. |
|
The CI issues are a symptom of the master branch CI being broken. Fixing it is on my goals for the long weekend. I'd love a chance to review this PR carefully before merging, but on a quick pass it looks great Anselm! Is there any timeline you need this merged by? |
|
Glad to hear you like it! There is no rush to merge it from my side, please take your time to carefully review it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! A few small questions. Sorry for the long delay on the review.
diffcp/cone_program.py
Outdated
|
|
||
| return dA, db, dc | ||
|
|
||
| def derivative_lpgd(dA, db, dc, tau, rho): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have quadratic objective support? I noticed you weren't banning them in construction above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is probably good to discuss. The LPGD method supports derivatives and adjoint derivatives for all parameters, including the quadratic P term here. The reason I didn't include it was to not break some compatibility by changing the arguments/outputs of the exposed derivative and adjoint_derivative methods (and their batched versions). I guess the derivative would be simple to resolve by just adding an optional dP=None argument, but for the adjoint derivative one would need to change the number of outputs. I was thinking about adding a request_dP flag? If you think this makes sense I can come up with some proposal changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the differentiation w.r.t. P now, see the latest commits. I added two examples but have not tested this extensively
…ion wrt P is possible.
* Fix: update the toy example in README.md (cvxgrp#74) * Fix PSD Constraint Ordering for Clarabel (cvxgrp#73) * Reorder PSD Constraint Rows for Clarabel Diffcp requires that the `A` matrix conforms to the SCS format, including that PSD constraints refer to the columnwise stacking of the lower triangle. Clarabel expects that the PSD constraints refer to the upper triangle, leading to an incorrect result when using Clarabel with PSD constraints. A unit test showing the issue was added to test_clarabel. In this test, X is a 3x3 PSD matrix and we wish to minimize X00 - X22 while constraining the trace to be equal to 1. Before this change, Clarabel reports success, but the resulting matrix isn't even PSD. After this change, we get the expected result. * move pytest to existing dependency group * test on larger matrix and fix bug * permute rows of b as well * fix shuffling of b matrix * Merge LPGD into diffcp (cvxgrp#67) * Update README.md * Add LPGD * Update Readme * simplify argument passing * add more lpgd examples * Catch error if perturbed problem infeasible * Give better error message on infeasibility * fix docstrings * minor changes * cosmetic changes * update readme * move perturbed solution computation to utils * cosmetics * resolve domments on pull request + minor edits * add differentiation w.r.t. P * add examples for differentiating w.r.t. P * minor cosmetic change * mention in docstring that return_dP is currently only possible in LPGD mode * Update readme for for quadratic objectives, specify when differentiation wrt P is possible. * Testing merged version, examples run but remove some ununsed imports --------- Co-authored-by: Yuwen Chen <37250191+yuwenchen95@users.noreply.github.com> Co-authored-by: Erick Fuentes <fuentes.erick@gmail.com> Co-authored-by: Anselm Paulus <40758879+a-paulus@users.noreply.github.com>
Pull request for enabling LPGD differentiation of the conic program in diffcp.
LPGD info
LPGD computes informative replacements for the true derivatives in degenerate cases as efficient finite differences.
For the forward derivatives this implementation just computes standard finite differences (with an additional optional regularization term).
For adjoint derivatives we compute finite differences between gradients of the conic program Lagrangian, evaluated at the original solution and a perturbed solution, requiring only one (two if double-sided) additional solver evaluations. See the paper for a detailed derivation of the LPGD adjoint derivatives as the gradient of an envelope function to the linearized loss.
Note that in the limit of small perturbations
tau, LPGD computes the true derivatives (if they exist). For largertauthe computed derivatives do not match the true derivatives but can provide more informative signal.Code
LPGD can be enabled with the
mode=LPGDargument ofsolve_and_derivative. It also requires passing the perturbation strengthtau(and optionally the regularization strengthrho) withderivative_kwargs=dict(tau=0.1, rho=0.1). Alternatively the derivative kwargs can be passed directly, e.g.adjoint_derivative(dx, dy, ds, tau=0.1, rho=0.1)In the code the main addition are the methods
derivative_lpgd/adjoint_derivative_lpgdincone_program.py. These methods internally callcompute_perturbed_solution/compute_adjoint_perturbed_solutionto get the solution to a perturbed optimization problem, and then return the derivatives as finite differences.For testing, the existing diffcp examples are included as modified versions using LPGD differentiation.
Note on implementation: If activated, the optional regularization requires solving a quadratic cone problem, i.e. setting
P!=0. For this reason we added an optionalP=Nonekwarg tosolve_internalwhich is passed to the solver if quadratic objectives are supported.