Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dense BFGS and compact LBFGS algorithms #221

Merged
merged 21 commits into from
Mar 23, 2023
Merged

Add dense BFGS and compact LBFGS algorithms #221

merged 21 commits into from
Mar 23, 2023

Conversation

frapac
Copy link
Collaborator

@frapac frapac commented Sep 8, 2022

Right now BFGS is implemented as a BFGSKKTSystem. I am not sure this is the right abstraction, as this leads to a lot of duplicate code. I am currently working on another abstraction, that would allow to support directly BFGS at the condensed level.

Remaining todos:

  • Support DenseCondensedKKTSystem with BFGS
  • Add proper interface to use DampedBFGS
  • Add support for GPU

@frapac frapac requested a review from sshin23 September 8, 2022 13:57
@frapac frapac marked this pull request as ready for review September 8, 2022 21:40
@frapac
Copy link
Collaborator Author

frapac commented Sep 9, 2022

I am currently not able to reproduce the failure in the CI locally. Investigating what's going on.

@codecov-commenter
Copy link

codecov-commenter commented Sep 9, 2022

Codecov Report

Merging #221 (9d29075) into master (15c227e) will increase coverage by 1.05%.
The diff coverage is 97.79%.

@@            Coverage Diff             @@
##           master     #221      +/-   ##
==========================================
+ Coverage   74.03%   75.09%   +1.05%     
==========================================
  Files          38       39       +1     
  Lines        3871     4079     +208     
==========================================
+ Hits         2866     3063     +197     
- Misses       1005     1016      +11     
Impacted Files Coverage Δ
src/KKT/KKTsystem.jl 97.36% <ø> (ø)
src/MadNLP.jl 0.00% <ø> (ø)
src/options.jl 94.11% <80.00%> (-2.55%) ⬇️
src/quasi_newton.jl 95.49% <95.49%> (ø)
src/IPM/IPM.jl 98.95% <100.00%> (+0.08%) ⬆️
src/IPM/callbacks.jl 93.80% <100.00%> (+1.85%) ⬆️
src/IPM/factorization.jl 100.00% <100.00%> (ø)
src/KKT/dense.jl 97.20% <100.00%> (-2.80%) ⬇️
src/KKT/sparse.jl 99.36% <100.00%> (+0.02%) ⬆️
src/LinearSolvers/linearsolvers.jl 69.23% <100.00%> (+49.23%) ⬆️
... and 2 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

src/IPM/callbacks.jl Outdated Show resolved Hide resolved
@frapac frapac changed the title Add dense BFGS algorithm Add dense BFGS and compact LBFGS algorithms Oct 5, 2022
src/quasi_newton.jl Outdated Show resolved Hide resolved
@@ -262,24 +267,24 @@ function update!(qn::CompactLBFGS{T, VT, MT}, Bk, sk, yk) where {T, VT, MT}
# [ U₂ ] [ U₂ ]

# Step 1: σₖ I
sigma = dot(sk, yk) / dot(sk, sk) # σₖ
Bk .= sigma # Hₖ .= σₖ I (diagonal Hessian approx.)
sigma = curvature(Val(qn.init_strategy), sk, yk) # σₖ

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will likely cause a dynamic dispatch because the value of qn.init_strategy is not known at compile time. I suggest storing the Val(qn.init_strategy) directly in the qn type and giving it a type parameter.

@mohamed82008
Copy link

I don't claim to understand everything in this PR but I am happy to test it once it's ready.

@frapac
Copy link
Collaborator Author

frapac commented Oct 11, 2022

FYI, here is a benchmark of MadNLP LBFGS with Ipopt LBFGS algorithm (using SCALAR1 init, and mem_size=6) on a subset of the CUTEst benchmark.

https://web.cels.anl.gov/~fpacaud/result_lbfgs.txt

Overall, Ipopt LBFGS is better than MadNLP one. I am currently working on improving MadNLP LBFGS to match Ipopt's performance.

- quasi-Newton approximation of Lagrangian's Hessian
- implemented as a dense KKT system
- AbstractKKTSystem is now parameterized by a HessianApproximation
- add two options DENSE_BFGS and DENSE_DAMPED_BFGS
- add support for DENSE_CONDENSED_BFGS
- add jtprod! and jprod! function for HS15Model and DummyQPModel in
  MadNLPTests
reference: Byrd, Richard H., Jorge Nocedal, and Robert B. Schnabel.
"Representations of quasi-Newton matrices and their use in limited
memory methods." Mathematical Programming 63, no. 1 (1994): 129-156.
Copy link
Member

@sshin23 sshin23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, @frapac! I have made just a few comments.

To make it possible to use BFGS with JuMP, I guess we need to implement jtprod! for MOIModel. I'd suggest implementing it in a separate PR, and we can do a major release.

@@ -12,6 +12,7 @@ import .CUBLAS: handle, CUBLAS_DIAG_NON_UNIT,
import KernelAbstractions: @kernel, @index, wait, Event
import CUDAKernels: CUDADevice

import MadNLP: NLPModels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can do

import MadNLP:
    MadNLP, NLPModels, @kwdef, MadNLPLogger, @debug, @warn, @error,
    AbstractOptions, AbstractLinearSolver, AbstractNLPModel, set_options!,
    SymbolicException,FactorizationException,SolveException,InertiaException,
    introduce, factorize!, solve!, improve!, is_inertia, inertia, tril_to_full!,
    LapackOptions, input_type, is_supported, default_options, symul!

@@ -51,6 +51,26 @@ function NLPModels.jac_coord!(nlp::HS15Model, x::AbstractVector, J::AbstractVect
return J
end

function NLPModels.jprod!(nlp::HS15Model, x::AbstractVector, v::AbstractVector, jv::AbstractVector)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we currently using this for testing? I think it would be use this instance for testing BFGS approximations

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test for HS15Model and LBFGS. The problem is that this instance is nonconvex, and MadNLP + exact Hessian is returning a different solution than MadNLP + LBFGS/BFGS. Maybe we should use HS71 instead?

qn.Mk .= qn.SdotS # Mₖ = Sₖᵀ Sₖ
mul!(qn.Mk, qn.Lk, qn.DkLk, one(T), sigma) # Mₖ = σₖ Sₖᵀ Sₖ + Lₖ Dₖ⁻¹ Lₖᵀ
symmetrize!(qn.Mk)
Jk = cholesky(qn.Mk).L # Mₖ = Jₖᵀ Jₖ (factorization)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is likely that we can reduce allocation by using cholesky! (might be able to optimize further by directly calling CHOLMOD, but that might be a bit too much of work). I'd recommend storing the factorization as qn.MkF and calling cholesky!(qn.MKF, qn.Mk)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been implemented! But I agree that medium term it would be better to use our own wrapper for CHOLMOD.

lib/MadNLPGPU/src/callbacks.jl Show resolved Hide resolved
src/quasi_newton.jl Outdated Show resolved Hide resolved

# Step 1: σₖ I
sigma = curvature(Val(qn.init_strategy), sk, yk) # σₖ
Bk .= sigma # Hₖ .= σₖ I (diagonal Hessian approx.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no Bk[diagind(Bk)] here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No! We use LBFGS only in sparse mode, so Bk is always a vector here

src/quasi_newton.jl Outdated Show resolved Hide resolved
src/IPM/callbacks.jl Outdated Show resolved Hide resolved
src/KKT/dense.jl Outdated Show resolved Hide resolved
Copy link
Member

@sshin23 sshin23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the long wait @frapac! Only have very minor comments, and it looks good!

src/KKT/dense.jl Outdated
hess::MT
jac::MT
qn::QN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest naming qn as qnewton or something more indicative

hess_I = Vector{Int32}(undef, get_nnzh(nlp.meta))
hess_J = Vector{Int32}(undef, get_nnzh(nlp.meta))
hess_structure!(nlp,hess_I,hess_J)
if QN <: ExactHessian
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do multiple dispatch here instead? And I wonder if hess_I and hess_J are ver used for quasi-newton

@frapac
Copy link
Collaborator Author

frapac commented Mar 16, 2023

@sshin23 Thank you for the comments! I have updated the PR accordingly, let me know if you want me to modify anything else.

And I wonder if hess_I and hess_J are ver used for quasi-newton

Indeed, we have to define the sparsity pattern of the diagonal elements for the quasi-Newton method, in order to store the elements associated to \xi I in the compact formulation.

@frapac frapac merged commit f08c196 into master Mar 23, 2023
@frapac frapac deleted the fp/bfgs branch May 31, 2023 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants