Refactor GPU support and improve solvers with CUDA enhancements by jameslehoux · Pull Request #203 · BASE-Laboratory/OpenImpala

jameslehoux · 2026-03-29T09:32:58Z

No description provided.

Major architecture overhaul: modular solvers, Fortran→C++ migration, GPU acceleration

- Fix GPU wheel CI: use sameli/manylinux_2_34_x86_64_cuda_12.6 (the manylinux_2_28 variant with CUDA 12.6 does not exist on Docker Hub) - Update dnf repo from 'powertools' to 'crb' for RHEL 9-based image - Add tFloodFill integration test: validates parallelFloodFill and collectBoundarySeeds with 4 test cases (full flood, partial flood, boundary seeds, multi-label) - Add tTortuosityMLMG integration test: validates MLMG matrix-free solver against analytical tau=(N-1)/N on uniform block, with directional symmetry test (Y direction) https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7

…ing-notebook-JUBs9 Fix GPU wheel Docker image and add FloodFill/MLMG solver tests

MLMG applies Dirichlet BCs at domain faces (external), not at cell centers like HYPRE, so tau=1.0 for a uniform block (not (N-1)/N). https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7

…ing-notebook-JUBs9 Fix MLMG test expected tau and apply clang-format

The GPU workflow now patches pyproject.toml to set the distribution name to 'openimpala-cuda' before building. The import name stays 'openimpala' — only the pip package name differs, so users choose between `pip install openimpala` (CPU) and `pip install openimpala-cuda` (CUDA GPU). __init__.py resolves version from either package. All tutorials and notebooks updated to install openimpala-cuda to leverage Colab T4 GPUs. https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7

The manylinux_2_34 container ships GCC 14+ which is unsupported by CUDA 12.6's nvcc (max GCC 13). Install gcc-toolset-13 and set CC/CXX/FC to GCC 13 for all dependency builds and wheel compilation. Also fixes: - Quote semicolons in CMAKE_CUDA_ARCHITECTURES to prevent shell splitting (was causing "sh: line 1: 70: command not found" errors) - Pass CUDAFLAGS="-allow-unsupported-compiler" to HYPRE build as safety net - Set CUDAHOSTCXX and CMAKE_CUDA_HOST_COMPILER for AMReX and OpenImpala - Bump cache key to force rebuild with new toolchain https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7

…ing-notebook-JUBs9 Publish GPU wheel as separate openimpala-cuda PyPI package

AMReX built with CUDA defines AMREX_GPU_HOST_DEVICE as __host__ __device__ in all its headers. Any .cpp file that includes AMReX headers must be compiled by nvcc, not the regular C++ compiler, or the CUDA keywords cause "does not name a type" errors. Fix by setting LANGUAGE CUDA on all source files when GPU_BACKEND=CUDA: - src/CMakeLists.txt: IO_SOURCES, PROPS_SOURCES, Diffusion.cpp - python/CMakeLists.txt: BINDING_SOURCES - tests/CMakeLists.txt: test source files via openimpala_add_test() https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7

…ing-notebook-JUBs9 Fix CUDA build: compile all sources as CUDA when GPU backend is enabled

CUDA's atomicAdd supports int, unsigned int, unsigned long long, float, and double — but NOT signed long long. When compiling with nvcc, the Gpu::Atomic::Add(&long_long_ptr, 1LL) calls fail. Switch DeviceVector counters from long long to int in: - ConnectedComponents.cpp: component volume counting - ThroughThicknessProfile.cpp: per-slice phase counting int is sufficient for voxel counts (max ~2 billion cells per component). The host-side m_volumes (vector<long long>) is populated via .assign() which widens int→long long safely. https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7

…ing-notebook-JUBs9 Fix CUDA atomicAdd: replace long long with int for GPU atomic counters

CUDA's extended __device__ lambdas cannot appear inside private or protected member functions. The error: "The enclosing parent function for an extended __device__ lambda cannot have private or protected access within its class" Move methods containing AMREX_GPU_DEVICE lambdas from private/protected to public in: - ConnectedComponents.H: run(), findNextUnlabeled() - PercolationCheck.H: run() - TortuositySolverBase.H: buildDiffusionCoeffField(), generateActivityMask(), globalFluxes(), computePlaneFluxes(), solve(), preconditionPhaseFab(), parallelFloodFill(), writeSolutionPlotfile() - TortuosityHypre.H: solve(), setupMatrixEquation(), preconditionPhaseFab(), generateActivityMask(), global_fluxes(), computePlaneFluxes() Data members remain private/protected. This is the standard pattern for AMReX-based GPU codes. https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7

…ing-notebook-JUBs9 Fix CUDA: make methods with __device__ lambdas publicly accessible

Same nvcc restriction: __device__ lambdas cannot be inside private or protected member functions. - HDF5Reader.H: move readAndThresholdFab() to public - TortuosityDirect.H: move solve(), advance(), and other methods containing AMREX_GPU_DEVICE lambdas to public https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7

…ing-notebook-JUBs9 Fix CUDA: make remaining private methods with GPU lambdas public

… and fix setVal overload NVCC forbids extended __device__ lambdas inside constructors. Extract ParallelFor loops from TortuosityHypre and EffectiveDiffusivityHypre constructors into separate member functions (initializeDiffCoeff, buildTraversableMask). Also fix setVal overload resolution by using amrex::DestComp/NumComps wrapper types instead of raw int arguments. https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk

…ild-yptxM Fix CUDA compilation errors: extract device lambdas from constructors and fix setVal overload

…eduction NVCC also requires that enclosing functions for __device__ lambdas have public access within their class. Move initializeDiffCoeff and buildTraversableMask to public sections. For setVal, explicitly specify the RunOn::Host template parameter to resolve deduction failure. https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk

…ild-yptxM Fix CUDA: make device-lambda methods public and fix setVal template deduction

amrex::Gpu::DeviceScalar has a deleted move-assignment operator. Use htod_memcpy to reset the device flag to zero instead of constructing a new DeviceScalar each iteration. https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk

…ild-yptxM Fix CUDA: replace deleted DeviceScalar move-assignment with htod_memcpy

…n REVStudy NVCC cannot deduce the run_on template parameter for the 6-argument BaseFab::copy overload. Explicitly specify RunOn::Host. https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk

…ild-yptxM Fix CUDA: explicit RunOn::Host for BaseFab::copy template deduction in REVStudy

NVCC forbids extended __device__ lambdas in constructors. Move the ParallelFor kernel into a public compute() method called from the constructor. https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk

…ild-yptxM Fix CUDA: extract device lambda from ThroughThicknessProfile constructor

NVCC requires functions containing __device__ lambdas to have public access within their class. https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk

…ild-yptxM Fix CUDA: move TortuosityMLMG::solve() from protected to public

…iffusivityHypre https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk

…ild-yptxM Fix clang-format violations in ThroughThicknessProfile and EffectiveDiffusivityHypre

github-actions · 2026-03-29T09:42:15Z

Code Coverage Report

------------------------------------------------------------------------------
                           GCC Code Coverage Report
Directory: .
------------------------------------------------------------------------------
File                                       Lines     Exec  Cover   Missing
------------------------------------------------------------------------------
src/io/CathodeWrite.cpp                       95       83    87%   40-41,97-100,115-116,182-185
src/io/CathodeWrite.H                          1        1   100%
src/io/DatReader.cpp                         135      105    77%   26-27,30,35,92-93,99-100,107-109,135-137,141,144-148,152-155,162,164,208-209,242,245
src/io/DatReader.H                             1        1   100%
src/io/HDF5Reader.cpp                        344       84    24%   40-41,43-44,46-49,52,54-56,58-59,62,64-66,68-74,92-93,126-128,144-145,154-157,174-180,182-187,204,213-215,217,219-228,230-233,236-238,240-251,253-258,266,266,266,266,266,266,266,270,270,270,270,270,270,270,274,276,278,280,282,288,290,297,297,297,297,297,297,297,301,301,301,301,301,301,301,305,305,305,305,305,305,305-306,306,306,306,306,306,306,309,309,309,309,309,309,309-310,310,310,310,310,310,310-311,311,311,311,311,311,311,313,313,313,313,313,313,313-314,314,314,314,314,314,314-315,315,315,315,315,315,315,319,319,319,319,319,319,319,324,324,324,324,324,324,324-325,325,325,325,325,325,325-326,326,326,326,326,326,326-327,327,327,327,327,327,327,332,332,332,332,332,332,332,337,337,337,337,337,337,337-338,338,338,338,338,338,338,343,343,343,343,343,343,343,350,350,350,350,350,350,350,357-358,432-435,437-440
src/io/HDF5Reader.H                            3        3   100%
src/io/ImageLoader.cpp                        61       42    68%   25,38,48,60-62,64-70,72,77,89-90,92,94
src/io/RawReader.cpp                         266      116    43%   49-50,89-90,111-112,115-117,120-121,140-142,155-157,166-168,174-177,185-186,192-196,200-204,209-212,219-224,231-237,259,263-264,270-271,273-274,276,283-284,301,312,314,318,325,327,331-334,338,346-347,353-355,361-363,365-366,369,372,374,377-380,382-384,386,388-389,391,393-394,396,398-399,401,403-404,406,410-411,413,417-418,420,425,457,463-465,471-472,521-524,526,528-530,532,534-536,538,540-542,544,546-548,550,554-556,558,562-564,566,588
src/io/RawReader.H                             1        1   100%
src/io/TiffReader.cpp                        384      130    33%   59-65,67-69,71-73,75-77,79-80,82-84,86-88,90-92,94-96,98-99,101-103,106-108,111-112,114-117,119,122,124-127,143-144,148-150,152-158,160,186,210,217,226,228-231,240,242-245,248,255,288-293,306,309-317,319-320,323-327,331-335,338-342,344-348,351-357,359-363,367,369,375-377,379-393,396,398-402,404-409,413-418,420-425,428-429,432-434,555-575,577-578,581-588,590,593-609,612-614,670,673-674,677-683,685,689-700,702-703
src/io/TiffReader.H                            5        5   100%
src/props/BoundaryCondition.H                131       74    56%   63,68,70,216,224-229,233-236,238-244,247-249,252-253,255,258-261,264-265,271-272,274-279,285-287,290-296,299,303,365-366,371,373
src/props/ConnectedComponents.cpp             69       67    97%   94-95
src/props/ConnectedComponents.H                4        4   100%
src/props/DeffTensor.cpp                      62       59    95%   122,128-129
src/props/Diffusion.cpp                      510      245    48%   69,73,75-80,82-87,89-90,93-94,97-98,103-104,106-116,118,123-132,134-141,144-150,153-157,159-163,165,168-173,175-177,179,182-184,186-187,190-191,193,195-198,200,202-203,205,288-289,297-298,300,349,359-360,368-371,373-375,404-413,415,453,461,465-467,475,482-483,487-488,490-493,496-497,499-504,507-509,513-516,518-522,524-527,529-537,539,543,545,547,551,553-555,559-564,567,571,573-574,576,578-581,584-591,594,597-601,603-607,609-610,614,617-618,620-622,624-625,627-633,636,638,642-644,646,648-649,735-736,739-740,757-760,771-772,774,786-787,789,791-792,794-801,803-804,818-819,824
src/props/EffDiffFillMtx.H                   120      106    88%   58,216-217,221-225,229,231-235
src/props/EffectiveDiffusivityHypre.cpp      389      347    89%   189-191,193-197,305,367-370,479,612-615,617-619,621-624,633-636,643,672,684-687,689-691,693,705,716,718
src/props/EffectiveDiffusivityHypre.H          7        7   100%
src/props/FloodFill.cpp                       84       82    97%   94-95
src/props/HypreStructSolver.cpp              343      210    61%   87-88,121,133-134,145,299,309,311,314,346,356,358,361,367-370,372-376,378-379,381-385,388-389,391-392,394,397-398,401-402,404-407,409-413,415-416,418-422,425-426,428-429,431,434-435,438-439,441-443,445-451,453-457,460-461,463-464,466,469-470,473,475-477,479-485,487-491,494-495,497-498,500,503-504,507,509-511,513-516,518-522,525-526,528-529,531,534-535,538,541-542,555
src/props/HypreStructSolver.H                  6        6   100%
src/props/MacroGeometry.H                     17       17   100%
src/props/ParticleSizeDistribution.cpp        11       11   100%
src/props/ParticleSizeDistribution.H           6        6   100%
src/props/PercolationCheck.cpp                53       46    86%   32-33,49-51,68,73
src/props/PercolationCheck.H                   4        4   100%
src/props/PhysicsConfig.H                     90       89    98%   150
src/props/ResultsJSON.H                      225      222    98%   242,395,416
src/props/REVStudy.cpp                       151        0     0%   27,32,34-40,42-44,47,49,53-56,58,62,64-65,67-72,74-75,79-80,82-91,93-96,100-101,103,106-109,111-114,120-124,127-136,139-141,143,146-147,149,152-156,158-159,161,163,165-173,175,177-186,188-191,195-197,201,203-204,206,209-222,225-231,233-234,236-237,239-241,243,246-247,249
src/props/SolverConfig.H                      32       20    62%   30,32,37-44,75-76
src/props/SpecificSurfaceArea.cpp             56       55    98%   59
src/props/SpecificSurfaceArea.H                6        6   100%
src/props/ThroughThicknessProfile.cpp         38       38   100%
src/props/ThroughThicknessProfile.H            5        5   100%
src/props/Tortuosity.H                         2        2   100%
src/props/TortuosityDirect.cpp               219      191    87%   81-83,86,100-106,113-114,125,134,140,202-209,226,394,424,433
src/props/TortuosityDirect.H                   5        5   100%
src/props/TortuosityHypre.cpp                784      563    71%   148-149,154-155,239-242,245-247,310,334-336,339-340,342,352-354,357-359,389-392,572,596,600,621,637-638,640-642,644-653,655,658-662,666-668,671-678,680-684,688-690,692-694,696-705,707-711,713-724,726-729,731,741,747-750,752-754,763-766,768-770,786,789-790,813-818,829-832,834,871,876-879,882-884,888-891,893,895-898,900,905-907,909,958,967,972,975-980,996-999,1013-1017,1022-1027,1037-1041,1046-1051,1056-1060,1063-1066,1073-1076,1087,1096,1098,1102,1104,1126,1157-1158,1244-1246,1372-1375
src/props/TortuosityHypre.H                   15       15   100%
src/props/TortuosityHypreFill.H              127       98    77%   85,203,205-212,237-239,241-245,247-248,250,252,255-256,258-262
src/props/TortuosityKernels.H                 97       53    54%   52,56-60,62-65,69-74,76-80,84-85,90,129,143,157,243,245-248,250-253,257-260,262-265
src/props/TortuosityMLMG.cpp                  96       88    91%   153,174-176,178-179,186,199
src/props/TortuosityMLMG.H                     1        1   100%
src/props/TortuositySolverBase.cpp           301      237    78%   70-72,74-75,94-101,104,106,142-145,200,203,205,255,280,298,327,391,394-396,398,406-409,411-417,422,427-429,435-436,438-440,454,460,464-465,467,478,492,496-498,500,502,506
src/props/TortuositySolverBase.H              13       13   100%
src/props/VolumeFraction.cpp                  25       25   100%
src/props/VolumeFraction.H                     4        4   100%
------------------------------------------------------------------------------
TOTAL                                       5404     3592    66%
------------------------------------------------------------------------------

Generated by CI — coverage data from gcovr

codecov · 2026-03-29T09:42:22Z

Codecov Report

❌ Patch coverage is 65.71429% with 24 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/props/TortuosityHypre.cpp	72.72%	0 Missing and 9 partials ⚠️
src/props/EffectiveDiffusivityHypre.cpp	71.42%	1 Missing and 5 partials ⚠️
src/props/ThroughThicknessProfile.cpp	50.00%	0 Missing and 5 partials ⚠️
src/props/ConnectedComponents.cpp	40.00%	0 Missing and 3 partials ⚠️
src/props/REVStudy.cpp	0.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

jameslehoux and others added 30 commits March 22, 2026 22:06

Merge pull request #188 from BASE-Laboratory/working

395cbbb

Major architecture overhaul: modular solvers, Fortran→C++ migration, GPU acceleration

Merge pull request #189 from BASE-Laboratory/claude/add-profiling-tun…

0a0780a

…ing-notebook-JUBs9 Fix GPU wheel Docker image and add FloodFill/MLMG solver tests

Fix MLMG test expected tau and apply clang-format

662f215

MLMG applies Dirichlet BCs at domain faces (external), not at cell centers like HYPRE, so tau=1.0 for a uniform block (not (N-1)/N). https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7

Merge pull request #190 from BASE-Laboratory/claude/add-profiling-tun…

31814df

…ing-notebook-JUBs9 Fix MLMG test expected tau and apply clang-format

Merge pull request #191 from BASE-Laboratory/claude/add-profiling-tun…

8ae8dd1

…ing-notebook-JUBs9 Publish GPU wheel as separate openimpala-cuda PyPI package

Merge pull request #192 from BASE-Laboratory/claude/add-profiling-tun…

b8de6db

…ing-notebook-JUBs9 Fix CUDA build: compile all sources as CUDA when GPU backend is enabled

Merge pull request #193 from BASE-Laboratory/claude/add-profiling-tun…

1789411

…ing-notebook-JUBs9 Fix CUDA atomicAdd: replace long long with int for GPU atomic counters

Merge pull request #194 from BASE-Laboratory/claude/add-profiling-tun…

3068f6f

…ing-notebook-JUBs9 Fix CUDA: make methods with __device__ lambdas publicly accessible

Merge pull request #195 from BASE-Laboratory/claude/add-profiling-tun…

89c555c

…ing-notebook-JUBs9 Fix CUDA: make remaining private methods with GPU lambdas public

Merge pull request #196 from BASE-Laboratory/claude/fix-gpu-wheels-bu…

977408e

…ild-yptxM Fix CUDA compilation errors: extract device lambdas from constructors and fix setVal overload

Merge pull request #197 from BASE-Laboratory/claude/fix-gpu-wheels-bu…

0409952

…ild-yptxM Fix CUDA: make device-lambda methods public and fix setVal template deduction

Merge pull request #198 from BASE-Laboratory/claude/fix-gpu-wheels-bu…

13114ac

…ild-yptxM Fix CUDA: replace deleted DeviceScalar move-assignment with htod_memcpy

Fix CUDA: explicit RunOn::Host for BaseFab::copy template deduction i…

65dfdf9

…n REVStudy NVCC cannot deduce the run_on template parameter for the 6-argument BaseFab::copy overload. Explicitly specify RunOn::Host. https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk

Merge pull request #199 from BASE-Laboratory/claude/fix-gpu-wheels-bu…

78d5afe

…ild-yptxM Fix CUDA: explicit RunOn::Host for BaseFab::copy template deduction in REVStudy

Fix CUDA: extract device lambda from ThroughThicknessProfile constructor

65913f7

NVCC forbids extended __device__ lambdas in constructors. Move the ParallelFor kernel into a public compute() method called from the constructor. https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk

Merge pull request #200 from BASE-Laboratory/claude/fix-gpu-wheels-bu…

2a07af6

…ild-yptxM Fix CUDA: extract device lambda from ThroughThicknessProfile constructor

Fix CUDA: move TortuosityMLMG::solve() from protected to public

165f284

NVCC requires functions containing __device__ lambdas to have public access within their class. https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk

Merge pull request #201 from BASE-Laboratory/claude/fix-gpu-wheels-bu…

1e1abc3

…ild-yptxM Fix CUDA: move TortuosityMLMG::solve() from protected to public

Fix clang-format violations in ThroughThicknessProfile and EffectiveD…

d184400

…iffusivityHypre https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk

Merge pull request #202 from BASE-Laboratory/claude/fix-gpu-wheels-bu…

4053ec4

…ild-yptxM Fix clang-format violations in ThroughThicknessProfile and EffectiveDiffusivityHypre

jameslehoux merged commit af25f38 into working Mar 29, 2026
14 of 15 checks passed

jameslehoux temporarily deployed to pypi March 29, 2026 09:39 — with GitHub Actions Inactive

jameslehoux had a problem deploying to pypi March 29, 2026 11:15 — with GitHub Actions Failure

jameslehoux had a problem deploying to pypi March 29, 2026 11:21 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor GPU support and improve solvers with CUDA enhancements#203

Refactor GPU support and improve solvers with CUDA enhancements#203
jameslehoux merged 30 commits intoworkingfrom
master

jameslehoux commented Mar 29, 2026

Uh oh!

Uh oh!

github-actions bot commented Mar 29, 2026

Uh oh!

codecov bot commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jameslehoux commented Mar 29, 2026

Uh oh!

Uh oh!

github-actions bot commented Mar 29, 2026

Code Coverage Report

Uh oh!

codecov bot commented Mar 29, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants