-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dense BFGS and compact LBFGS algorithms #221
Conversation
I am currently not able to reproduce the failure in the CI locally. Investigating what's going on. |
Codecov Report
@@ Coverage Diff @@
## master #221 +/- ##
==========================================
+ Coverage 74.03% 75.09% +1.05%
==========================================
Files 38 39 +1
Lines 3871 4079 +208
==========================================
+ Hits 2866 3063 +197
- Misses 1005 1016 +11
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
src/quasi_newton.jl
Outdated
@@ -262,24 +267,24 @@ function update!(qn::CompactLBFGS{T, VT, MT}, Bk, sk, yk) where {T, VT, MT} | |||
# [ U₂ ] [ U₂ ] | |||
|
|||
# Step 1: σₖ I | |||
sigma = dot(sk, yk) / dot(sk, sk) # σₖ | |||
Bk .= sigma # Hₖ .= σₖ I (diagonal Hessian approx.) | |||
sigma = curvature(Val(qn.init_strategy), sk, yk) # σₖ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will likely cause a dynamic dispatch because the value of qn.init_strategy
is not known at compile time. I suggest storing the Val(qn.init_strategy)
directly in the qn
type and giving it a type parameter.
I don't claim to understand everything in this PR but I am happy to test it once it's ready. |
FYI, here is a benchmark of MadNLP LBFGS with Ipopt LBFGS algorithm (using SCALAR1 init, and mem_size=6) on a subset of the CUTEst benchmark. https://web.cels.anl.gov/~fpacaud/result_lbfgs.txt Overall, Ipopt LBFGS is better than MadNLP one. I am currently working on improving MadNLP LBFGS to match Ipopt's performance. |
- quasi-Newton approximation of Lagrangian's Hessian - implemented as a dense KKT system
- AbstractKKTSystem is now parameterized by a HessianApproximation - add two options DENSE_BFGS and DENSE_DAMPED_BFGS - add support for DENSE_CONDENSED_BFGS - add jtprod! and jprod! function for HS15Model and DummyQPModel in MadNLPTests
reference: Byrd, Richard H., Jorge Nocedal, and Robert B. Schnabel. "Representations of quasi-Newton matrices and their use in limited memory methods." Mathematical Programming 63, no. 1 (1994): 129-156.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, @frapac! I have made just a few comments.
To make it possible to use BFGS with JuMP, I guess we need to implement jtprod!
for MOIModel
. I'd suggest implementing it in a separate PR, and we can do a major release.
@@ -12,6 +12,7 @@ import .CUBLAS: handle, CUBLAS_DIAG_NON_UNIT, | |||
import KernelAbstractions: @kernel, @index, wait, Event | |||
import CUDAKernels: CUDADevice | |||
|
|||
import MadNLP: NLPModels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can do
import MadNLP:
MadNLP, NLPModels, @kwdef, MadNLPLogger, @debug, @warn, @error,
AbstractOptions, AbstractLinearSolver, AbstractNLPModel, set_options!,
SymbolicException,FactorizationException,SolveException,InertiaException,
introduce, factorize!, solve!, improve!, is_inertia, inertia, tril_to_full!,
LapackOptions, input_type, is_supported, default_options, symul!
@@ -51,6 +51,26 @@ function NLPModels.jac_coord!(nlp::HS15Model, x::AbstractVector, J::AbstractVect | |||
return J | |||
end | |||
|
|||
function NLPModels.jprod!(nlp::HS15Model, x::AbstractVector, v::AbstractVector, jv::AbstractVector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we currently using this for testing? I think it would be use this instance for testing BFGS approximations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test for HS15Model
and LBFGS. The problem is that this instance is nonconvex, and MadNLP + exact Hessian is returning a different solution than MadNLP + LBFGS/BFGS. Maybe we should use HS71 instead?
src/quasi_newton.jl
Outdated
qn.Mk .= qn.SdotS # Mₖ = Sₖᵀ Sₖ | ||
mul!(qn.Mk, qn.Lk, qn.DkLk, one(T), sigma) # Mₖ = σₖ Sₖᵀ Sₖ + Lₖ Dₖ⁻¹ Lₖᵀ | ||
symmetrize!(qn.Mk) | ||
Jk = cholesky(qn.Mk).L # Mₖ = Jₖᵀ Jₖ (factorization) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is likely that we can reduce allocation by using cholesky!
(might be able to optimize further by directly calling CHOLMOD, but that might be a bit too much of work). I'd recommend storing the factorization as qn.MkF
and calling cholesky!(qn.MKF, qn.Mk)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been implemented! But I agree that medium term it would be better to use our own wrapper for CHOLMOD.
src/quasi_newton.jl
Outdated
|
||
# Step 1: σₖ I | ||
sigma = curvature(Val(qn.init_strategy), sk, yk) # σₖ | ||
Bk .= sigma # Hₖ .= σₖ I (diagonal Hessian approx.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no Bk[diagind(Bk)]
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No! We use LBFGS only in sparse mode, so Bk
is always a vector here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the long wait @frapac! Only have very minor comments, and it looks good!
src/KKT/dense.jl
Outdated
hess::MT | ||
jac::MT | ||
qn::QN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest naming qn
as qnewton
or something more indicative
src/KKT/sparse.jl
Outdated
hess_I = Vector{Int32}(undef, get_nnzh(nlp.meta)) | ||
hess_J = Vector{Int32}(undef, get_nnzh(nlp.meta)) | ||
hess_structure!(nlp,hess_I,hess_J) | ||
if QN <: ExactHessian |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do multiple dispatch here instead? And I wonder if hess_I
and hess_J
are ver used for quasi-newton
@sshin23 Thank you for the comments! I have updated the PR accordingly, let me know if you want me to modify anything else.
Indeed, we have to define the sparsity pattern of the diagonal elements for the quasi-Newton method, in order to store the elements associated to |
Right now BFGS is implemented as a
BFGSKKTSystem
. I am not sure this is the right abstraction, as this leads to a lot of duplicate code. I am currently working on another abstraction, that would allow to support directly BFGS at the condensed level.Remaining todos:
DenseCondensedKKTSystem
with BFGSDampedBFGS