Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any example for parallelizing with GPU? #67

Open
Guowu-Mcgill opened this issue Mar 2, 2022 · 2 comments
Open

Any example for parallelizing with GPU? #67

Guowu-Mcgill opened this issue Mar 2, 2022 · 2 comments

Comments

@Guowu-Mcgill
Copy link

I have a large scale nonlinear optimization problem to solve. I am interested to see how GPU can accelerate the optimization! Are there some examples in Optizelle for me to start from?

@josyoun
Copy link
Member

josyoun commented Mar 4, 2022

Thanks for the inquiry! Yes, this is possible, but there is some nuance as to how this can be done effectively. Unfortunately, there are no examples for how to do this effectively, but I can leave that as a feature request.

In short, there are three areas that can be effectively parallelized with either a GPU or any other form of parallelism:

  1. Parallelizing the vector algebra. There's an example of how to generate a new algebra in the rosenbrock_advanced_api example. Essentially, these operations would need to be replaced with something from cublas or the like. This will help, but it is generally not a performance bottleneck.

  2. Parallelizing the function and derivative calculations. The code places no restriction on how these functions are evaluated. If your example can be computed using a GPU, then it would work fine. This is probably your largest performance increase. The issue here is that there's often good literature for how to compute something like the objective or forward problem using the GPU, but not when computing a gradient, which often involves an adjoint computation. It's almost always possible, but not particularly easy in all cases and something that's also difficult to document. It's something that the literature in inverse problems discusses.

  3. Parallelizing the preconditioner solves. This can be done, but has traditionally been difficult with GPUs because there tends to be a lot of interprocess communication. That said, I checked the other day and it appears as though NVIDIA has made good progress is adding factorizations to their code. If your problem has equality constraints, this would be a good performance increase.

Anyway, it would be great to have some examples. I can keep that in mind for future documentation, but I've nothing to share at the moment. Thanks for the request, though.

@Guowu-Mcgill
Copy link
Author

Hi Josyoun,

Many thanks for this detail and kindly explanation! I appreciate! This actually solved my concerns!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants