Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AbstractKKTSystem structure #58

Merged
merged 10 commits into from
Aug 29, 2021
Merged

Conversation

frapac
Copy link
Collaborator

@frapac frapac commented Aug 20, 2021

First step towards support of dense NLP

  • implement SparseReducedKKTSystem and SparseAugmentedKKTSystem
  • move Options in a new file src/options.jl
  • Solver: move all KKT structures in a new field kkt
  • Solver: remove closures and implement everything in proper functions
  • Solver: callbacks are now implemented as proper Julia function (better for compiler, and allow to use multiple dispatch in the future)

@frapac frapac requested a review from sshin23 August 20, 2021 13:47
src/kktsystem.jl Outdated Show resolved Hide resolved
src/kktsystem.jl Outdated Show resolved Hide resolved
function get_coo_to_dense(coo::SparseMatrixCOO{Tv,Ti}) where {Tv,Ti<:Integer}
dense = Matrix{Float64}(undef,coo.m,coo.n)
return dense, ()->copyto!(dense,coo)
end

copyto!(dense::Matrix{Tv},coo::SparseMatrixCOO{Tv,Ti}) where {Tv,Ti<:Integer} = _copyto!(dense,coo.I,coo.J,coo.V)
function Matrix{Tv}(coo::SparseMatrixCOO{Tv,Ti}) where {Tv,Ti<:Integer}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add a proper constructor to build a dense matrix from a SparseMatrixCOO

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't the expected behavior of constructors "copying" the argument with a proper change in the type? SparseMatrixCSC and Matrix constructors seem to be just creating an object with matching size/sparsity.

src/options.jl Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Aug 20, 2021

Codecov Report

Merging #58 (bc3e94a) into master (d31692a) will decrease coverage by 0.48%.
The diff coverage is 92.25%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #58      +/-   ##
==========================================
- Coverage   87.30%   86.82%   -0.49%     
==========================================
  Files          28       30       +2     
  Lines        3018     3150     +132     
==========================================
+ Hits         2635     2735     +100     
- Misses        383      415      +32     
Impacted Files Coverage Δ
lib/MadNLPHSL/src/MadNLPHSL.jl 80.00% <ø> (ø)
lib/MadNLPHSL/src/ma27.jl 76.19% <ø> (ø)
lib/MadNLPHSL/src/ma77.jl 88.15% <ø> (ø)
src/MadNLP.jl 62.50% <ø> (-25.00%) ⬇️
src/matrixtools.jl 82.60% <77.27%> (-17.40%) ⬇️
src/Interfaces/MOI_interface.jl 84.96% <85.71%> (-0.14%) ⬇️
src/options.jl 85.71% <85.71%> (ø)
src/interiorpointsolver.jl 93.48% <90.11%> (+0.03%) ⬆️
src/Interfaces/NLPModels_interface.jl 92.85% <90.90%> (-7.15%) ⬇️
src/kktsystem.jl 100.00% <100.00%> (ø)
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d31692a...bc3e94a. Read the comment docs.

Copy link
Member

@sshin23 sshin23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @frapac, the PR looks awesome! Indeed AbstractKKTSystem can significantly enhance the modularity.

My only concern is the treatment for the "unreduced" system. I think your approach of treating it as an alternative KKT system formulation is the correct approach. But formulating it as a block 3x3 unreduced system requires some extra treatment. So, I'd suggest the following:

  • Remove reduced_system option and let this be treated entirely by setting kkt_system option.
  • We can make anything that ends with _reduced! (e.g., set_aug_reduced!) the default method.
  • Anything that ends with _unreduced! can be made as a special treatment for the unreduced system. For example, we can make
    set_aug_unreduced!(...) into set_aug_reduced!(::SparseUnreducedKKTSystem, ...)

src/options.jl Outdated Show resolved Hide resolved
src/kktsystem.jl Outdated Show resolved Hide resolved
src/kktsystem.jl Outdated Show resolved Hide resolved
src/options.jl Outdated Show resolved Hide resolved
src/kktsystem.jl Outdated Show resolved Hide resolved

"Set scaling of Jacobian"
function set_jacobian_scaling! end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

function get_coo_to_dense(coo::SparseMatrixCOO{Tv,Ti}) where {Tv,Ti<:Integer}
dense = Matrix{Float64}(undef,coo.m,coo.n)
return dense, ()->copyto!(dense,coo)
end

copyto!(dense::Matrix{Tv},coo::SparseMatrixCOO{Tv,Ti}) where {Tv,Ti<:Integer} = _copyto!(dense,coo.I,coo.J,coo.V)
function Matrix{Tv}(coo::SparseMatrixCOO{Tv,Ti}) where {Tv,Ti<:Integer}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't the expected behavior of constructors "copying" the argument with a proper change in the type? SparseMatrixCSC and Matrix constructors seem to be just creating an object with matching size/sparsity.

* implement SparseReducedKKTSystem and SparseAugmentedKKTSystem
* refactor Solver
Copy link
Member

@sshin23 sshin23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, @frapac I found a few more things. Hope we can figure out the cause of the slow down soon 💪

src/MadNLP.jl Outdated
include("nonlinearprogram.jl")
include("matrixtools.jl")
include(joinpath("LinearSolvers","linearsolvers.jl"))
include(joinpath("kktsystem.jl"))
include(joinpath("interiorpointsolver.jl"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can

include("kktsystem.jl")
include("interiorpointsolver.jl")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

src/kktsystem.jl Outdated

function set_jacobian_scaling!(kkt::AbstractSparseKKTSystem, constraint_scaling::AbstractVector)
nnzJ = length(kkt.jac)
for i in 1:nnzJ
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this for loop needs to be made as a separate function, since the function arguments are not concrete types and lots of the internal variables are referred to in the loop

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, if the types are properly specified in kkt, this should not be an issue as the compiler is able to infer everything properly.
If we look at the allocation we get in this function, we observe that we have indeed a problem:

        - function set_jacobian_scaling!(kkt::AbstractSparseKKTSystem, constraint_scaling::AbstractVector)
        0     nnzJ = length(kkt.jac)
      128     for i in 1:nnzJ
        0         index = kkt.jac_raw.I[i]
      704         kkt.jacobian_scaling[i] = constraint_scaling[index]
        -     end
        - end

If we look at it more closely, it appears that the types in SparseMatrixCOO are not concrete:

mutable struct SparseMatrixCOO{Tv,Ti<:Integer} <: AbstractSparseMatrixCOO{Tv,Ti}
    m::Int
    n::Int
    I::AbstractArray{Ti,1}
    J::AbstractArray{Ti,1}
    V::AbstractArray{Tv,1}
end

hence the compiler has some trouble to infer properly the type of kkt.jac_raw.I ...

@@ -28,6 +29,15 @@ function findIJ(S::SparseMatrixCSC{Tv,Ti}) where {Tv,Ti}
return I,J
end

get_mapping(dest::AbstractMatrix, src::SparseMatrixCOO) = nothing

function diag!(dest::AbstractVector{T}, src::AbstractMatrix{T}) where T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is diag! used somewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used in the dense implementation (which lies in a separate branch).

Copy link
Member

@sshin23 sshin23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @frapac, I think I found the issue! Please check the comments and see if the suggested changes reduce the memory allocation

# KKT system updates -------------------------------------------------------
# Set diagonal
function set_aug_diagonal!(kkt::AbstractKKTSystem, ips::Solver)
copyto!(kkt.pr_diag, ips.zl./(ips.x.-ips.xl) .+ ips.zu./(ips.xu.-ips.x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since julia evaluates ips.zl./(ips.x.-ips.xl) .+ ips.zu./(ips.xu.-ips.x) first, it allocates memory first and then copy it to kkt.pr_diag. I recommend using .= instead (which doesn't allocate memory).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! My bad: I did that previously only to support pr_diag array allocated on the GPU, I should have reverted the change in the PR.
Indeed, fixing that reduces the allocation. Testing on hs033.nl (using the script test/nlpmodel_tests.jl), I get:

  • Before
  0.001986 seconds (2.94 k allocations: 609.129 KiB)
  • Now
  0.001987 seconds (2.94 k allocations: 413.418 KiB)
  • Master (reference)
  0.001582 seconds (2.78 k allocations: 389.809 KiB)

ips._w1zl.= 0.
ips._w1zu.= 0.
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the functions in this section may allocate more memory than the original ones. The reason is that these functions are now taking non-conrete types (Solver and AbstractKKTSystem) as inputs, and julia compiler is not as efficient as the one works for concrete types. I recommend introducing another layer; e.g.,

set_aug_rhs_ifr!(ips::Solver, kkt::AbstractKKTSystem) = set_aug_rhs_reduced!(ips._w1x,ips._w1l,ips.c)
function set_aug_rhs_ifr_reduced!(_w1x,_w1l,c)
    _w1x .= 0.0
    _w1l .= .-c
end

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good catch, I understand my mistake now. Almost all the attributes in Solver have non-concrete types ... (in fact, Cthulhu is reporting a lot of inference errors even on master). As the types are non-concrete, the compiler is forced to allocate in that kind of functions, as we are directly manipulating the attributes here.

Previously, we didn't have that issue, as the non-concrete types were passed to auxiliary functions, which were specializing afterwards.

I would suggest the following:

  1. For this PR, we implement the specialization using auxiliary functions, as you suggested
  2. I think afterwards, it would be worthwhile to revisit the type system in MadNLP (-> use concrete types in all structures): that would help a lot the compiler and we could reduce significantly the time to first solve IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes... that's my bad. At the time when I implemented MadNLP I didn't know this would have this consequence 😢

I agree with our approach. For now, let's keep the auxiliary functions, and we can later do a thorough revisit of type definitions in MadNLP. I can work on it once we merge this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a common pitfall in Julia: writing type stable code is definitely non-trivial. I am not sure the two languages problem was resolved by Julia ... Fortunately, the ecosystem is maturing to address this issue. Cthulhu is already quite useful to debunk type instability. I think also JET.jl is very promising: https://github.com/aviatesk/JET.jl

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I guess we can start from the Cthulhu.jl and JET.jl reports

@frapac
Copy link
Collaborator Author

frapac commented Aug 27, 2021

I have updated the code to force specialization on the faulty functions, to avoid unneeded allocations.

On the instance hs033.jl, we have on master:

# Reduced KKT
  0.001334 seconds (2.78 k allocations: 390.355 KiB)
# Unreduced KKT
 0.004283 seconds (5.07 k allocations: 1.046 MiB)

and we have now on this branch:

# Reduced KKT
   0.001709 seconds (2.58 k allocations: 383.055 KiB)  
# Unreduced KKT
  0.003202 seconds (4.74 k allocations: 1.036 MiB) 

(we can compare only the allocations, as the comparison was made on different computers).

In the medium-term, I would be in to revisit the type safety inside MadNLP. I think we could get some easy performance gain.

@sshin23
Copy link
Member

sshin23 commented Aug 27, 2021

@frapac 🎉 yes, that would be very nice

@frapac
Copy link
Collaborator Author

frapac commented Aug 27, 2021

So, it appears that I broke the tests, currently investigating what I did wrong ^^

@sshin23
Copy link
Member

sshin23 commented Aug 28, 2021

cnt.eval_function_time += @elapsed nlp.obj_grad!(∇f,view(x,1:nlp.n))
∇f.*=ipp.obj_scale[]
cnt.eval_function_time += @elapsed nlp.obj_grad!(f,view(x,1:nlp.n))
f.*=ipp.obj_scale[]
cnt.obj_grad_cnt+=1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue was here; for some reason, sending ∇f caused some extra allocations. Maybe the reason comes from CUTEst.jl. If the gradient callback checks the dimension of f, the current change can be problematic

Copy link
Member

@sshin23 sshin23 Aug 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a similar reason, if we replace view(x,1:nlp.n) with x, allocations are significantly reduced. But if the gradient callback checks the input size, it can cause issue. (and if the callbacks are not properly implemented, it may make f and x corrupt)

Copy link
Member

@sshin23 sshin23 Aug 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the issue was here:
https://github.com/JuliaSmoothOptimizers/CUTEst.jl/blob/main/src/julia_interface.jl#L105-L110

In order to make the callbacks C function compatible, CUTEst.jl internally converts the inputs to Vector{Float64}. For "EIGMINA" in CUTEst, after allowing StrideOneVector{Float64} in the inputs, the allocation dramatically reduces:
from 1.213 MiB to 759.922 KiB.

I'll create a PR for CUTEst.jl. I think the benchmark result will look a lot better after this...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very interesting. Good catch for the issue with CUTest.jl. I think indeed there is no need to create a new array if the input x is one-strided ... I did this change because of the custom wrapper we developed for ExaPF: here, we need a gradient with the correct dimension (as we are using mul! operation afterwards to compute the adjoint). I don't know what's the default: should we pass to the callbacks all arrays with the correct dimension, or should the callbacks do all the checks internally?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the latter would be ideal, but the performance of MadNLP shouldn't degrade too much even if the callback requires the correct dimension. I added a "buffered" option for NLPModels.jl interface so that there is a buffer vector that exactly matches the input dimensions for the callbacks. NLPModels.jl interface handles copying to/from the buffer vector. But if the user knows that there won'te be a problem, the user can turn this off.

I made a PR for CUTEst.jl (JuliaSmoothOptimizers/CUTEst.jl#269) to allow view inputs. If this gets merged, we can turn off buffered option for the benchmark.

@sshin23
Copy link
Member

sshin23 commented Aug 29, 2021

Benchmark results:
time
memory

@sshin23 sshin23 merged commit 0a471d4 into MadNLP:master Aug 29, 2021
@frapac
Copy link
Collaborator Author

frapac commented Aug 29, 2021

Thanks for the help in merging this PR, @sshin23 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants