Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to instantiate DenseKKTSystem on CUDA GPU #73

Merged
merged 5 commits into from
Oct 8, 2021

Conversation

frapac
Copy link
Collaborator

@frapac frapac commented Sep 9, 2021

This is the final step: we can now solve dense problem on the GPU with MadNLP, without transferring the Hessian and the Jacobian matrices to the host.

This PR adds a few structural changes:

  • MadNLPGPU now depends on KernelAbstractions.jl. The module now implements a few kernel functions to assemble and analyse the KKT system directly on the GPU.
  • The signature of MadNLPLapackGPU.Solver has been updated to MadNLPLapackGPU.Solver{MT<:AbstractMatrix}, so we can now instantiate solver.dense on the GPU to avoid expensive transfer between the host and the device.
  • The signature of MadNLP.Solver has been updated, to take as input a kkt system (by default equal to nothing). This clashes with the previous solution (specifying the KKT system we want in the options). I think we should figure out a better API for this.
  • We check the correctness of the results on the GPU directly in MadNLPGPU/test/runtests.jl, by checking that we get exactly the same results as on the CPU.
  • build_qp_tests has been moved to MadNLPTests.

@frapac frapac requested a review from sshin23 September 9, 2021 20:31
@codecov
Copy link

codecov bot commented Sep 9, 2021

Codecov Report

Merging #73 (9177f41) into master (72d55de) will decrease coverage by 0.69%.
The diff coverage is 73.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #73      +/-   ##
==========================================
- Coverage   87.71%   87.02%   -0.70%     
==========================================
  Files          28       29       +1     
  Lines        3289     3375      +86     
==========================================
+ Hits         2885     2937      +52     
- Misses        404      438      +34     
Impacted Files Coverage Δ
lib/MadNLPGPU/src/kernels.jl 65.00% <65.00%> (ø)
lib/MadNLPGPU/src/lapackgpu.jl 81.03% <100.00%> (ø)
src/interiorpointsolver.jl 93.45% <100.00%> (+0.01%) ⬆️
src/kktsystem.jl 98.73% <100.00%> (+0.01%) ⬆️
lib/MadNLPHSL/src/ma97.jl 80.00% <0.00%> (-4.00%) ⬇️
lib/MadNLPHSL/src/ma86.jl 81.13% <0.00%> (-3.78%) ⬇️
lib/MadNLPHSL/src/ma77.jl 85.52% <0.00%> (-2.64%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 72d55de...9177f41. Read the comment docs.

Copy link
Member

@sshin23 sshin23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a major step forward for MadNLP! I think we can release v0.3 after merging this. This looks good for now, but as you said, in the future we need to decide what would be the right API for specifying the AbstractKKTSystem. For casual users, I think inferring the KKTSystem type from linear_solver would be sufficient, but we may want to allow some more flexibility for power users.

lib/MadNLPGPU/src/MadNLPGPU.jl Outdated Show resolved Hide resolved
lib/MadNLPGPU/src/kernels.jl Show resolved Hide resolved
lib/MadNLPGPU/src/kernels.jl Outdated Show resolved Hide resolved
lib/MadNLPGPU/src/lapackgpu.jl Outdated Show resolved Hide resolved
lib/MadNLPTests/src/MadNLPTests.jl Show resolved Hide resolved
@@ -314,7 +314,7 @@ function eval_lag_hess_wrapper!(ipp::Solver, kkt::AbstractKKTSystem, x::Vector{F
return hess
end

function Solver(nlp::NonlinearProgram;
function Solver(nlp::NonlinearProgram, kkt=nothing;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use case of this? I can't imagine under which circumstance kkt is given by the user.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the main problem of this PR^^
I am not able to figure out a proper API to pass DenseKKTSystem on the GPU to MadNLP. I don't want MadNLP to depend on CUDA, so implementing a new option DENSE_KKT_SYSTEM_GPU does not seem a solution...
We have the same problem with the AuglagKKTSystem. The question is to me: how to pass the KKT system to the solver in a modular way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... I see. How about making it to directly pass the KKTSystemType as a constructor? Say, kkt_system=DenseKKTSystem?

* add dedicated GPU kernels for KKT operations
* add proper tests for DenseKKTSystem on GPU
* move build_qp_dense function in MadNLPTests
@frapac
Copy link
Collaborator Author

frapac commented Oct 7, 2021

I tried to implement a workaround to pass a custom KKTSystem to MadNLP. The latest commit adds an additional constructor to MadNLP, parameterized by the type of the KKT system we want. In addition to the classical constructor:

ipp = MadNLP.InteriorPointSolver(ipp; option_dict=some_options)

we have a new constructor, allowing more coarse grained control on the KKT system:

options = MadNLP.Options(some_options) # need to instantiate options here
KKT_TYPE = MadNLP.DenseKKTSystem{Float64, CuVector{Float64}, CuMatrix{Float64}} 
ipp = MadNLP.InteriorPointSolver{KKT_TYPE}(nlp, options; linear_option_dict=some_options_for_linear_solver)

(we assume that all KKTSystem share the same signature for their constructors).

What do you think of this workaround?

Apart of that, I think we need to revisit a bit the templating in MadNLP, to have more type stability in the code. Implementing the AbstractKKTRHS is a good opportunity for this task.

@frapac
Copy link
Collaborator Author

frapac commented Oct 7, 2021

There was a problem with the CI on moonshot, currently relaunching the CI to test the GPU implementation.

@frapac
Copy link
Collaborator Author

frapac commented Oct 7, 2021

@sshin23 it looks like the runner on moonshot is not working. I think we may have run out of memory (it's a common issue we have with ExaPF). Do you know if we can restart the runner (normally, it should fix the issue straight away)?

@sshin23
Copy link
Member

sshin23 commented Oct 8, 2021

@frapac The issue probably was that I didn't start the action after moonshot's reboot. I wasn't able to find a way to automatically restart after reboot 😂

end
return InteriorPointSolver{KKTSystem}(nlp, opt; option_linear_solver=option_dict)
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for now, but let's keep track of this maybe in a separate issue. Eventually I think we want to be able to pass the KKT constructor as an option.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, indeed. I just need to figure out how to that cleanly: right now, if we pass the constructor as an option, Julia is not able to dispatch on the type and that will lead to some type instability. But I may miss something...

Copy link
Member

@sshin23 sshin23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @frapac! Other than the KKT system constructor issue, it looks awesome. I'll merge this PR once all the test passes

@frapac
Copy link
Collaborator Author

frapac commented Oct 8, 2021

Thanks for rebooting the job :-)

I wasn't able to find a way to automatically restart after reboot

Do you think we can schedule a cron job for that?

@sshin23 sshin23 merged commit 86eba3d into MadNLP:master Oct 8, 2021
@sshin23
Copy link
Member

sshin23 commented Oct 8, 2021

Merging it to master. Thanks for the great work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants