Skip to content

GPU support for sparse linear algebra#60

Draft
pelesh wants to merge 10 commits into
developfrom
olcf-hackathon-2026-dev
Draft

GPU support for sparse linear algebra#60
pelesh wants to merge 10 commits into
developfrom
olcf-hackathon-2026-dev

Conversation

@pelesh
Copy link
Copy Markdown
Collaborator

@pelesh pelesh commented May 22, 2026

Merge request type

  • New feature
  • Resolves bug
  • Documentation
  • Other

Relates to

  • OPFLOW
  • SOPFLOW
  • SCOPFLOW
  • TCOPFLOW
  • CMake build system
  • Spack configuration
  • Manual
  • Web docs
  • Other

This MR updates

  • Header files
  • Source code
  • CMake build system
  • Spack configuration
  • Web docs
  • Manual
  • Other

Summary

Evaluation of constraints Jacobians (equality and inequality) on GPUs. This is still work in progress and some code cleanup is needed. The stretch goal is to add sparse Hessian evaluation, as well.

Linked Issue(s)

@pelesh pelesh added this to the Release 2.1 milestone May 22, 2026
@pelesh pelesh added enhancement New feature or request testing cuda Support for NVIDIA accelerators hip Support for AMD accelerators labels May 22, 2026
@pelesh pelesh marked this pull request as draft May 22, 2026 21:47
@pelesh pelesh mentioned this pull request May 22, 2026
20 tasks
@nkoukpaizan nkoukpaizan force-pushed the olcf-hackathon-2026-dev branch from b332272 to d023534 Compare May 28, 2026 21:12
nkoukpaizan and others added 10 commits June 2, 2026 08:36
* Fix line constraint bug in HiOp MDS module.
---------

Co-authored-by: pelesh <pelesh@users.noreply.github.com>
…ns (#31)

* test_acopf with PBPOLRAJAHIOPSPARSE (verify equality constraint Jacobian) + bug fix for PBPOL.

Co-authored-by: William A Perkins <william.perkins@pnnl.gov>

* Cleanup changes to RAJAHIOPSPARSE nnz computation.

* Workaround for RAJAHIOPSPARSE inequality constraint verification.

* Handling cases where IPOPT or HIOP are not available in test_acopt + bug fixes.

* HiOp-compatible checks for constraint Jacobian allocations. Will need a
smarter way to verify the stacked inequality constraints.

* Fix for inequality constraint offsets.

---------

Co-authored-by: William A Perkins <william.perkins@pnnl.gov>
Co-authored-by: nkoukpaizan <nkoukpaizan@users.noreply.github.com>
* Add more comments to HIOPRAJASPARSE implementation.

* [skip ci] Add headers to the matpower file for Jacobian test.

---------

Co-authored-by: pelesh <pelesh@users.noreply.github.com>
* I am not going to lie, Cursor agent heavily helped me with this.

Replace PETSc-based inequality Jacobian with GPU RAJA kernels

Move the inequality constraint Jacobian computation for the HiOp sparse
GPU solver entirely to the device, eliminating the per-iteration host
back and forth (copy to host, PETSc compute, MatGetRow extraction, values
copy back to device). Elimiate PETSc use from this part of the code.

Three RAJA kernels now compute directly into device memory:
- Generator set-point constraints (AGC)
- Voltage-reactive-power bounds (FIXED_WITHIN_QBOUNDS)
- Line flow limits (Sf^2/St^2 derivatives + slack variables)

Supporting changes:
- Analytical NNZ counting replaces PETSc MatGetInfo at solver setup
- New device-side parameter fields (apf, vs, xpdevidx, xslackidx,
  bus-to-gen mapping) added to *ParamsRajaHiop structs
- Sparse position indices assigned at model setup for all three
  contribution types

Includes validation test (test_ineqjac_gpu) that solves with IPOPT,
then compares PETSc and GPU Jacobian values at the converged solution.
Optional -benchmark flag for performance comparison.

Made-with: Cursor


---------

Co-authored-by: kswirydo <kasia.swirydowicz@gmail.com>
Co-authored-by: pelesh <pelesh@users.noreply.github.com>
Co-authored-by: Nicholson Koukpaizan <koukpaizannk@ornl.gov>
* Port equality constraint Jacobian from PETSc to RAJA GPU kernels

Replace the PETSc-based equality constraint Jacobian computation in the
PBPOLRAJAHIOPSPARSE model with direct GPU kernels using RAJA, eliminating
the D2H-compute-H2D round trip. The sparsity pattern is now computed on
the host during setup and the values are computed entirely on device.

Key changes:
- Add ComputeEqJacValuesGPU_PBPOLRAJAHIOPSPARSE in new gpu.cpp/hpp files
- Add device arrays for flat-array indices (bus eqjacsp_selfidx, line
  eqjacsp_idx/eqjacsp_diag_idx/isdcline, gen xpdevidx/xpsetidx)
- Fix nnz counting bugs (missing gen/load entries, off-by-one in line
  loop) and populate flat-array indices during model setup
- Replace PETSc MatGetRow extraction in sparsity and values phases
- Handle parallel lines by sharing off-diagonal positions with atomicAdd
- Use pre-computed nnz in get_sparse_blocks_info instead of PETSc query
- Add correctness test (test_eqjac_compare) and performance benchmark
  (test_eqjac_perf)

Made-with: Cursor

* Add comments and annotations to eq Jacobian GPU port

* Sort equality constraints Jacobian entries for hiopsparse_gpu (#53)

* Reset pbpolrajahiopsparse equality constraint function pointer.

* Sort equality constraint indices on host before copying to device.

* _selfidx --> _idx.

* Store equality constraints Jacobian permutation.

* Equality constraints Jacobian index permuation on device and use in setting the values.

* Need to store the reverse permutation map.

---------

Co-authored-by: nkoukpaizan <nkoukpaizan@users.noreply.github.com>
Co-authored-by: kswirydo <kswirydo@users.noreply.github.com>
* Hessian --> hessian in unit test folders

* ConstraintHessian --> ConstraintsHessian in hiopkernels.

* UNIT_TEST_ --> UNIT_TESTS_.

* Capitalize test names for consistency.

* EQJAC --> EQUALITY_CONSTRAINT_JACOBIAN in test name.

* Include OPFLOW_ in related objective and constraints Jacobian unit test names.

* Cleanup a few CMake guards.

* Reorganize opflow unit tests.

* Remove  option from test_ineqjac_gpu, i.e, always evaluate performance.

* Minor edits to comments and print statements.

* Fix test_eqjac NETFILES and remove unnecessary command line option.

* Consolidate test_eqjac_gpu and test_eqjac_perf.

* Minor comment/print statement fix.

* Remove test_eqjac_perf from CMakeList.

---------

Co-authored-by: nkoukpaizan <nkoukpaizan@users.noreply.github.com>
@nkoukpaizan nkoukpaizan force-pushed the olcf-hackathon-2026-dev branch from d023534 to f7e07c6 Compare June 2, 2026 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda Support for NVIDIA accelerators enhancement New feature or request hip Support for AMD accelerators testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants