Skip to content

Refactor GPU support and improve solvers with CUDA enhancements#203

Merged
jameslehoux merged 30 commits intoworkingfrom
master
Mar 29, 2026
Merged

Refactor GPU support and improve solvers with CUDA enhancements#203
jameslehoux merged 30 commits intoworkingfrom
master

Conversation

@jameslehoux
Copy link
Copy Markdown

No description provided.

jameslehoux and others added 30 commits March 22, 2026 22:06
Major architecture overhaul: modular solvers, Fortran→C++ migration, GPU acceleration
- Fix GPU wheel CI: use sameli/manylinux_2_34_x86_64_cuda_12.6 (the
  manylinux_2_28 variant with CUDA 12.6 does not exist on Docker Hub)
- Update dnf repo from 'powertools' to 'crb' for RHEL 9-based image
- Add tFloodFill integration test: validates parallelFloodFill and
  collectBoundarySeeds with 4 test cases (full flood, partial flood,
  boundary seeds, multi-label)
- Add tTortuosityMLMG integration test: validates MLMG matrix-free
  solver against analytical tau=(N-1)/N on uniform block, with
  directional symmetry test (Y direction)

https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7
…ing-notebook-JUBs9

Fix GPU wheel Docker image and add FloodFill/MLMG solver tests
MLMG applies Dirichlet BCs at domain faces (external), not at cell
centers like HYPRE, so tau=1.0 for a uniform block (not (N-1)/N).

https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7
…ing-notebook-JUBs9

Fix MLMG test expected tau and apply clang-format
The GPU workflow now patches pyproject.toml to set the distribution
name to 'openimpala-cuda' before building. The import name stays
'openimpala' — only the pip package name differs, so users choose
between `pip install openimpala` (CPU) and `pip install openimpala-cuda`
(CUDA GPU). __init__.py resolves version from either package.

All tutorials and notebooks updated to install openimpala-cuda to
leverage Colab T4 GPUs.

https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7
The manylinux_2_34 container ships GCC 14+ which is unsupported by
CUDA 12.6's nvcc (max GCC 13). Install gcc-toolset-13 and set CC/CXX/FC
to GCC 13 for all dependency builds and wheel compilation.

Also fixes:
- Quote semicolons in CMAKE_CUDA_ARCHITECTURES to prevent shell splitting
  (was causing "sh: line 1: 70: command not found" errors)
- Pass CUDAFLAGS="-allow-unsupported-compiler" to HYPRE build as safety net
- Set CUDAHOSTCXX and CMAKE_CUDA_HOST_COMPILER for AMReX and OpenImpala
- Bump cache key to force rebuild with new toolchain

https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7
…ing-notebook-JUBs9

Publish GPU wheel as separate openimpala-cuda PyPI package
AMReX built with CUDA defines AMREX_GPU_HOST_DEVICE as __host__ __device__
in all its headers. Any .cpp file that includes AMReX headers must be
compiled by nvcc, not the regular C++ compiler, or the CUDA keywords
cause "does not name a type" errors.

Fix by setting LANGUAGE CUDA on all source files when GPU_BACKEND=CUDA:
- src/CMakeLists.txt: IO_SOURCES, PROPS_SOURCES, Diffusion.cpp
- python/CMakeLists.txt: BINDING_SOURCES
- tests/CMakeLists.txt: test source files via openimpala_add_test()

https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7
…ing-notebook-JUBs9

Fix CUDA build: compile all sources as CUDA when GPU backend is enabled
CUDA's atomicAdd supports int, unsigned int, unsigned long long, float,
and double — but NOT signed long long. When compiling with nvcc, the
Gpu::Atomic::Add(&long_long_ptr, 1LL) calls fail.

Switch DeviceVector counters from long long to int in:
- ConnectedComponents.cpp: component volume counting
- ThroughThicknessProfile.cpp: per-slice phase counting

int is sufficient for voxel counts (max ~2 billion cells per component).
The host-side m_volumes (vector<long long>) is populated via .assign()
which widens int→long long safely.

https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7
…ing-notebook-JUBs9

Fix CUDA atomicAdd: replace long long with int for GPU atomic counters
CUDA's extended __device__ lambdas cannot appear inside private or
protected member functions. The error:
  "The enclosing parent function for an extended __device__ lambda
   cannot have private or protected access within its class"

Move methods containing AMREX_GPU_DEVICE lambdas from private/protected
to public in:
- ConnectedComponents.H: run(), findNextUnlabeled()
- PercolationCheck.H: run()
- TortuositySolverBase.H: buildDiffusionCoeffField(),
  generateActivityMask(), globalFluxes(), computePlaneFluxes(),
  solve(), preconditionPhaseFab(), parallelFloodFill(),
  writeSolutionPlotfile()
- TortuosityHypre.H: solve(), setupMatrixEquation(),
  preconditionPhaseFab(), generateActivityMask(), global_fluxes(),
  computePlaneFluxes()

Data members remain private/protected. This is the standard pattern
for AMReX-based GPU codes.

https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7
…ing-notebook-JUBs9

Fix CUDA: make methods with __device__ lambdas publicly accessible
Same nvcc restriction: __device__ lambdas cannot be inside private
or protected member functions.

- HDF5Reader.H: move readAndThresholdFab() to public
- TortuosityDirect.H: move solve(), advance(), and other methods
  containing AMREX_GPU_DEVICE lambdas to public

https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7
…ing-notebook-JUBs9

Fix CUDA: make remaining private methods with GPU lambdas public
… and fix setVal overload

NVCC forbids extended __device__ lambdas inside constructors. Extract
ParallelFor loops from TortuosityHypre and EffectiveDiffusivityHypre
constructors into separate member functions (initializeDiffCoeff,
buildTraversableMask). Also fix setVal overload resolution by using
amrex::DestComp/NumComps wrapper types instead of raw int arguments.

https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk
…ild-yptxM

Fix CUDA compilation errors: extract device lambdas from constructors and fix setVal overload
…eduction

NVCC also requires that enclosing functions for __device__ lambdas have
public access within their class. Move initializeDiffCoeff and
buildTraversableMask to public sections. For setVal, explicitly specify
the RunOn::Host template parameter to resolve deduction failure.

https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk
…ild-yptxM

Fix CUDA: make device-lambda methods public and fix setVal template deduction
amrex::Gpu::DeviceScalar has a deleted move-assignment operator. Use
htod_memcpy to reset the device flag to zero instead of constructing
a new DeviceScalar each iteration.

https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk
…ild-yptxM

Fix CUDA: replace deleted DeviceScalar move-assignment with htod_memcpy
…n REVStudy

NVCC cannot deduce the run_on template parameter for the 6-argument
BaseFab::copy overload. Explicitly specify RunOn::Host.

https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk
…ild-yptxM

Fix CUDA: explicit RunOn::Host for BaseFab::copy template deduction in REVStudy
NVCC forbids extended __device__ lambdas in constructors. Move the
ParallelFor kernel into a public compute() method called from the
constructor.

https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk
…ild-yptxM

Fix CUDA: extract device lambda from ThroughThicknessProfile constructor
NVCC requires functions containing __device__ lambdas to have public
access within their class.

https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk
…ild-yptxM

Fix CUDA: move TortuosityMLMG::solve() from protected to public
…ild-yptxM

Fix clang-format violations in ThroughThicknessProfile and EffectiveDiffusivityHypre
@jameslehoux jameslehoux merged commit af25f38 into working Mar 29, 2026
14 of 15 checks passed
@github-actions
Copy link
Copy Markdown

Code Coverage Report

------------------------------------------------------------------------------
                           GCC Code Coverage Report
Directory: .
------------------------------------------------------------------------------
File                                       Lines     Exec  Cover   Missing
------------------------------------------------------------------------------
src/io/CathodeWrite.cpp                       95       83    87%   40-41,97-100,115-116,182-185
src/io/CathodeWrite.H                          1        1   100%
src/io/DatReader.cpp                         135      105    77%   26-27,30,35,92-93,99-100,107-109,135-137,141,144-148,152-155,162,164,208-209,242,245
src/io/DatReader.H                             1        1   100%
src/io/HDF5Reader.cpp                        344       84    24%   40-41,43-44,46-49,52,54-56,58-59,62,64-66,68-74,92-93,126-128,144-145,154-157,174-180,182-187,204,213-215,217,219-228,230-233,236-238,240-251,253-258,266,266,266,266,266,266,266,270,270,270,270,270,270,270,274,276,278,280,282,288,290,297,297,297,297,297,297,297,301,301,301,301,301,301,301,305,305,305,305,305,305,305-306,306,306,306,306,306,306,309,309,309,309,309,309,309-310,310,310,310,310,310,310-311,311,311,311,311,311,311,313,313,313,313,313,313,313-314,314,314,314,314,314,314-315,315,315,315,315,315,315,319,319,319,319,319,319,319,324,324,324,324,324,324,324-325,325,325,325,325,325,325-326,326,326,326,326,326,326-327,327,327,327,327,327,327,332,332,332,332,332,332,332,337,337,337,337,337,337,337-338,338,338,338,338,338,338,343,343,343,343,343,343,343,350,350,350,350,350,350,350,357-358,432-435,437-440
src/io/HDF5Reader.H                            3        3   100%
src/io/ImageLoader.cpp                        61       42    68%   25,38,48,60-62,64-70,72,77,89-90,92,94
src/io/RawReader.cpp                         266      116    43%   49-50,89-90,111-112,115-117,120-121,140-142,155-157,166-168,174-177,185-186,192-196,200-204,209-212,219-224,231-237,259,263-264,270-271,273-274,276,283-284,301,312,314,318,325,327,331-334,338,346-347,353-355,361-363,365-366,369,372,374,377-380,382-384,386,388-389,391,393-394,396,398-399,401,403-404,406,410-411,413,417-418,420,425,457,463-465,471-472,521-524,526,528-530,532,534-536,538,540-542,544,546-548,550,554-556,558,562-564,566,588
src/io/RawReader.H                             1        1   100%
src/io/TiffReader.cpp                        384      130    33%   59-65,67-69,71-73,75-77,79-80,82-84,86-88,90-92,94-96,98-99,101-103,106-108,111-112,114-117,119,122,124-127,143-144,148-150,152-158,160,186,210,217,226,228-231,240,242-245,248,255,288-293,306,309-317,319-320,323-327,331-335,338-342,344-348,351-357,359-363,367,369,375-377,379-393,396,398-402,404-409,413-418,420-425,428-429,432-434,555-575,577-578,581-588,590,593-609,612-614,670,673-674,677-683,685,689-700,702-703
src/io/TiffReader.H                            5        5   100%
src/props/BoundaryCondition.H                131       74    56%   63,68,70,216,224-229,233-236,238-244,247-249,252-253,255,258-261,264-265,271-272,274-279,285-287,290-296,299,303,365-366,371,373
src/props/ConnectedComponents.cpp             69       67    97%   94-95
src/props/ConnectedComponents.H                4        4   100%
src/props/DeffTensor.cpp                      62       59    95%   122,128-129
src/props/Diffusion.cpp                      510      245    48%   69,73,75-80,82-87,89-90,93-94,97-98,103-104,106-116,118,123-132,134-141,144-150,153-157,159-163,165,168-173,175-177,179,182-184,186-187,190-191,193,195-198,200,202-203,205,288-289,297-298,300,349,359-360,368-371,373-375,404-413,415,453,461,465-467,475,482-483,487-488,490-493,496-497,499-504,507-509,513-516,518-522,524-527,529-537,539,543,545,547,551,553-555,559-564,567,571,573-574,576,578-581,584-591,594,597-601,603-607,609-610,614,617-618,620-622,624-625,627-633,636,638,642-644,646,648-649,735-736,739-740,757-760,771-772,774,786-787,789,791-792,794-801,803-804,818-819,824
src/props/EffDiffFillMtx.H                   120      106    88%   58,216-217,221-225,229,231-235
src/props/EffectiveDiffusivityHypre.cpp      389      347    89%   189-191,193-197,305,367-370,479,612-615,617-619,621-624,633-636,643,672,684-687,689-691,693,705,716,718
src/props/EffectiveDiffusivityHypre.H          7        7   100%
src/props/FloodFill.cpp                       84       82    97%   94-95
src/props/HypreStructSolver.cpp              343      210    61%   87-88,121,133-134,145,299,309,311,314,346,356,358,361,367-370,372-376,378-379,381-385,388-389,391-392,394,397-398,401-402,404-407,409-413,415-416,418-422,425-426,428-429,431,434-435,438-439,441-443,445-451,453-457,460-461,463-464,466,469-470,473,475-477,479-485,487-491,494-495,497-498,500,503-504,507,509-511,513-516,518-522,525-526,528-529,531,534-535,538,541-542,555
src/props/HypreStructSolver.H                  6        6   100%
src/props/MacroGeometry.H                     17       17   100%
src/props/ParticleSizeDistribution.cpp        11       11   100%
src/props/ParticleSizeDistribution.H           6        6   100%
src/props/PercolationCheck.cpp                53       46    86%   32-33,49-51,68,73
src/props/PercolationCheck.H                   4        4   100%
src/props/PhysicsConfig.H                     90       89    98%   150
src/props/ResultsJSON.H                      225      222    98%   242,395,416
src/props/REVStudy.cpp                       151        0     0%   27,32,34-40,42-44,47,49,53-56,58,62,64-65,67-72,74-75,79-80,82-91,93-96,100-101,103,106-109,111-114,120-124,127-136,139-141,143,146-147,149,152-156,158-159,161,163,165-173,175,177-186,188-191,195-197,201,203-204,206,209-222,225-231,233-234,236-237,239-241,243,246-247,249
src/props/SolverConfig.H                      32       20    62%   30,32,37-44,75-76
src/props/SpecificSurfaceArea.cpp             56       55    98%   59
src/props/SpecificSurfaceArea.H                6        6   100%
src/props/ThroughThicknessProfile.cpp         38       38   100%
src/props/ThroughThicknessProfile.H            5        5   100%
src/props/Tortuosity.H                         2        2   100%
src/props/TortuosityDirect.cpp               219      191    87%   81-83,86,100-106,113-114,125,134,140,202-209,226,394,424,433
src/props/TortuosityDirect.H                   5        5   100%
src/props/TortuosityHypre.cpp                784      563    71%   148-149,154-155,239-242,245-247,310,334-336,339-340,342,352-354,357-359,389-392,572,596,600,621,637-638,640-642,644-653,655,658-662,666-668,671-678,680-684,688-690,692-694,696-705,707-711,713-724,726-729,731,741,747-750,752-754,763-766,768-770,786,789-790,813-818,829-832,834,871,876-879,882-884,888-891,893,895-898,900,905-907,909,958,967,972,975-980,996-999,1013-1017,1022-1027,1037-1041,1046-1051,1056-1060,1063-1066,1073-1076,1087,1096,1098,1102,1104,1126,1157-1158,1244-1246,1372-1375
src/props/TortuosityHypre.H                   15       15   100%
src/props/TortuosityHypreFill.H              127       98    77%   85,203,205-212,237-239,241-245,247-248,250,252,255-256,258-262
src/props/TortuosityKernels.H                 97       53    54%   52,56-60,62-65,69-74,76-80,84-85,90,129,143,157,243,245-248,250-253,257-260,262-265
src/props/TortuosityMLMG.cpp                  96       88    91%   153,174-176,178-179,186,199
src/props/TortuosityMLMG.H                     1        1   100%
src/props/TortuositySolverBase.cpp           301      237    78%   70-72,74-75,94-101,104,106,142-145,200,203,205,255,280,298,327,391,394-396,398,406-409,411-417,422,427-429,435-436,438-440,454,460,464-465,467,478,492,496-498,500,502,506
src/props/TortuositySolverBase.H              13       13   100%
src/props/VolumeFraction.cpp                  25       25   100%
src/props/VolumeFraction.H                     4        4   100%
------------------------------------------------------------------------------
TOTAL                                       5404     3592    66%
------------------------------------------------------------------------------


Generated by CI — coverage data from gcovr

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 29, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants