Skip to content

Fix wheel builds and publish GPU wheels to PyPI#259

Merged
jameslehoux merged 4 commits intomasterfrom
claude/upbeat-mccarthy-f1mNN
Apr 21, 2026
Merged

Fix wheel builds and publish GPU wheels to PyPI#259
jameslehoux merged 4 commits intomasterfrom
claude/upbeat-mccarthy-f1mNN

Conversation

@jameslehoux
Copy link
Copy Markdown

Now that openimpala-cuda is published to PyPI (previous commit switched the
GPU wheel workflow), the install collapses from

pip install openimpala-cuda --find-links \
  https://github.com/BASE-Laboratory/OpenImpala/releases/expanded_assets/v4.0.6 \
  nvidia-cuda-runtime-cu12 nvidia-cublas-cu12 nvidia-cusparse-cu12 \
  nvidia-curand-cu12

down to plain

pip install openimpala-cuda

The nvidia-*-cu12 packages were only needed because the --find-links index
didn't carry them; PyPI's resolver will pull whatever the wheel actually
declares. Updates every call site that showed the old incantation:

  • README.md, docs/getting-started.md, docs/user-guide/gpu.md — advanced
    install sections
  • paper.md — corrects "via GitHub Releases" wording for the JOSS draft
  • notebooks/visualization_yt.ipynb — §0 install cell
  • tutorials/02_digital_twin.ipynb — install cell
  • tutorials/04_multiphase_and_fields.ipynb — install cell
  • tutorials/07_hpc_scaling.ipynb — §6 install cell

Also fixes a malformed .sif wget URL in docs/getting-started.md (a stray
concatenation of expanded_assets/v4.0.6 with the filename) by switching to
a vX.Y.Z placeholder to match the pattern already used in tutorial 7.

James Le Houx added 4 commits April 21, 2026 11:22
CPU wheel error ("gmake: *** No rule to make target '_core'") traced to
the SKBUILD_CMAKE_ARGS env var interfering with scikit-build-core's
cmake.args merge. The OPENIMPALA_ENABLE_TINY_PROFILE option was
redundant anyway — when AMReX is built with AMReX_TINY_PROFILE=ON, it
sets AMREX_TINY_PROFILE in its installed AMReX_Config.H header, which
every file including AMReX.H picks up automatically. Removed the option
and the env var; kept the AMReX-side build flag.

GPU wheel error ("CUDA::nvToolsExt target not found") is AMReX 25.03 vs.
CUDA 12 — libnvToolsExt was removed in CUDA 12 in favour of NVTX3
(header-only). Patch AMReX 25.03's CMake to use CUDA::nvtx3 instead,
applied via sed before configure. CMake 3.25+ (we have 3.28) exposes
CUDA::nvtx3 from CUDAToolkit, so this is drop-in.

Cache keys bumped (CPU v5, GPU nvtx3-v4) to force a fresh dep rebuild.

https://claude.ai/code/session_011dJ5Bwq4Tnr8wxH597XJFf
Now that openimpala-cuda has been granted the 320 MiB per-file PyPI limit,
the GPU wheels fit and can be installed via `pip install openimpala-cuda`
like any other package. Mirror the CPU workflow's publish job: use the
pypi trusted-publisher flow (environment: pypi, id-token: write) via
pypa/gh-action-pypi-publish. Gate on github.event_name == 'release' so
workflow_dispatch runs still produce artifacts for manual inspection
without touching the index.

https://claude.ai/code/session_011dJ5Bwq4Tnr8wxH597XJFf
Now that openimpala-cuda is published to PyPI (previous commit switched the
GPU wheel workflow), the install collapses from

    pip install openimpala-cuda --find-links \
      https://github.com/BASE-Laboratory/OpenImpala/releases/expanded_assets/v4.0.6 \
      nvidia-cuda-runtime-cu12 nvidia-cublas-cu12 nvidia-cusparse-cu12 \
      nvidia-curand-cu12

down to plain

    pip install openimpala-cuda

The nvidia-*-cu12 packages were only needed because the --find-links index
didn't carry them; PyPI's resolver will pull whatever the wheel actually
declares. Updates every call site that showed the old incantation:

- README.md, docs/getting-started.md, docs/user-guide/gpu.md — advanced
  install sections
- paper.md — corrects "via GitHub Releases" wording for the JOSS draft
- notebooks/visualization_yt.ipynb — §0 install cell
- tutorials/02_digital_twin.ipynb — install cell
- tutorials/04_multiphase_and_fields.ipynb — install cell
- tutorials/07_hpc_scaling.ipynb — §6 install cell

Also fixes a malformed .sif wget URL in docs/getting-started.md (a stray
concatenation of expanded_assets/v4.0.6 with the filename) by switching to
a vX.Y.Z placeholder to match the pattern already used in tutorial 7.

https://claude.ai/code/session_011dJ5Bwq4Tnr8wxH597XJFf
auditwheel repair --exclude drops libcudart / libcublas / libcusparse /
libcurand / libnvJitLink from the openimpala-cuda wheel payload, which
means the wheel only works on machines that already have the CUDA 12
toolkit installed — driver-only Colab/Kaggle runtimes have the libraries,
but a bare Python venv on an NVIDIA workstation does not.

Declare the nvidia-*-cu12 PyPI packages as runtime deps so pip pulls them
automatically. Keep them commented out in pyproject.toml with clear
markers so the CPU wheel (which uses the same file) doesn't grow a 1-2 GB
dep tree. The GPU workflow's existing sed step already rewrites `name =
"openimpala"` to `"openimpala-cuda"`; a second sed uncomments the
`#"nvidia-..."` lines in the same pass.

Verified with python3 -m tomllib that both variants produce valid TOML
and the expected dependency lists:

  CPU: ['numpy', 'scipy>=1.7']
  GPU: ['numpy', 'scipy>=1.7', 'nvidia-cuda-runtime-cu12',
        'nvidia-cublas-cu12', 'nvidia-cusparse-cu12',
        'nvidia-curand-cu12', 'nvidia-nvjitlink-cu12']

https://claude.ai/code/session_011dJ5Bwq4Tnr8wxH597XJFf
@jameslehoux jameslehoux merged commit 8d155b7 into master Apr 21, 2026
6 checks passed
@github-actions github-actions Bot added devops documentation Improvements or additions to documentation gpu labels Apr 21, 2026
@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Size Solver Wall Time (s) Tortuosity Expected Rel. Error Iters Status
64³ pcg 0.7091 0.984375 0.984375 0.00e+00 1 PASS
64³ flexgmres 0.4182 0.984375 0.984375 0.00e+00 N/A PASS
64³ bicgstab 0.4105 0.984375 0.984375 0.00e+00 N/A PASS
64³ gmres 0.4124 0.984375 0.984375 0.00e+00 N/A PASS
128³ pcg 8.4434 0.992188 0.992188 0.00e+00 1 PASS
128³ flexgmres 5.6633 0.992188 0.992188 0.00e+00 N/A PASS
128³ bicgstab 5.4838 0.992188 0.992188 0.00e+00 N/A PASS
128³ gmres 5.5678 0.992188 0.992188 0.00e+00 N/A PASS

Fastest solver: bicgstab at 64³ (0.4105s)

Benchmark: uniform block (analytical τ = (N-1)/N)

@github-actions
Copy link
Copy Markdown

Code Coverage Report

------------------------------------------------------------------------------
                           GCC Code Coverage Report
Directory: .
------------------------------------------------------------------------------
File                                       Lines     Exec  Cover   Missing
------------------------------------------------------------------------------
src/io/CathodeWrite.cpp                       95       83    87%   40-41,97-100,115-116,182-185
src/io/CathodeWrite.H                          1        1   100%
src/io/DatReader.cpp                         135      105    77%   26-27,30,35,92-93,99-100,107-109,135-137,141,144-148,152-155,162,164,208-209,242,245
src/io/DatReader.H                             1        1   100%
src/io/HDF5Reader.cpp                        344       84    24%   40-41,43-44,46-49,52,54-56,58-59,62,64-66,68-74,92-93,126-128,144-145,154-157,174-180,182-187,204,213-215,217,219-228,230-233,236-238,240-251,253-258,266,266,266,266,266,266,266,270,270,270,270,270,270,270,274,276,278,280,282,288,290,297,297,297,297,297,297,297,301,301,301,301,301,301,301,305,305,305,305,305,305,305-306,306,306,306,306,306,306,309,309,309,309,309,309,309-310,310,310,310,310,310,310-311,311,311,311,311,311,311,313,313,313,313,313,313,313-314,314,314,314,314,314,314-315,315,315,315,315,315,315,319,319,319,319,319,319,319,324,324,324,324,324,324,324-325,325,325,325,325,325,325-326,326,326,326,326,326,326-327,327,327,327,327,327,327,332,332,332,332,332,332,332,337,337,337,337,337,337,337-338,338,338,338,338,338,338,343,343,343,343,343,343,343,350,350,350,350,350,350,350,357-358,432-435,437-440
src/io/HDF5Reader.H                            3        3   100%
src/io/ImageLoader.cpp                        61       42    68%   25,38,48,60-62,64-70,72,77,89-90,92,94
src/io/RawReader.cpp                         266      135    50%   49-50,89-90,111-112,115-117,120-121,140-142,155-157,166-168,174-177,185-186,192-196,200-204,209-212,219-224,231-237,271,273-274,276,283-284,301,312,314,318,325,327,331-334,338,346-347,353-355,361-363,365-366,369,372,374,377-380,382-384,386,388-389,391,393-394,396,398-399,401,403-404,406,410-411,413,417-418,420,425,465,471-472,521-524,538,540-542,544,546-548,558,562-564,566,588
src/io/RawReader.H                             1        1   100%
src/io/TiffReader.cpp                        384      130    33%   59-65,67-69,71-73,75-77,79-80,82-84,86-88,90-92,94-96,98-99,101-103,106-108,111-112,114-117,119,122,124-127,143-144,148-150,152-158,160,186,210,217,226,228-231,240,242-245,248,255,288-293,306,309-317,319-320,323-327,331-335,338-342,344-348,351-357,359-363,367,369,375-377,379-393,396,398-402,404-409,413-418,420-425,428-429,432-434,555-575,577-578,581-588,590,593-609,612-614,670,673-674,677-683,685,689-700,702-703
src/io/TiffReader.H                            5        5   100%
src/props/BoundaryCondition.H                131       74    56%   63,68,70,216,224-229,233-236,238-244,247-249,252-253,255,258-261,264-265,271-272,274-279,285-287,290-296,299,303,365-366,371,373
src/props/ConnectedComponents.cpp             69       67    97%   94-95
src/props/ConnectedComponents.H                4        4   100%
src/props/DeffTensor.cpp                      62       59    95%   122,128-129
src/props/Diffusion.cpp                      510      378    74%   93-94,97-98,103-104,106-116,118,123-132,134-141,144-150,153-157,159-163,165,168-173,175-177,179,182-184,186-187,190-191,193,195-198,200,202-203,288-289,297-298,300,349,359-360,368-371,373-375,404-413,415,453,461,465-467,526-527,533,535,539,547,581,610,638,646,735-736,739-740,757-760,771-772,774,824
src/props/EffDiffFillMtx.H                   120      106    88%   58,216-217,221-225,229,231-235
src/props/EffectiveDiffusivityHypre.cpp      389      347    89%   189-191,193-197,305,367-370,479,612-615,617-619,621-624,633-636,643,672,684-687,689-691,693,705,716,718
src/props/EffectiveDiffusivityHypre.H          7        7   100%
src/props/FloodFill.cpp                       84       81    96%   94-95,203
src/props/HypreStructSolver.cpp              343      210    61%   87-88,121,133-134,145,299,309,311,314,346,356,358,361,367-370,372-376,378-379,381-385,388-389,391-392,394,397-398,401-402,404-407,409-413,415-416,418-422,425-426,428-429,431,434-435,438-439,441-443,445-451,453-457,460-461,463-464,466,469-470,473,475-477,479-485,487-491,494-495,497-498,500,503-504,507,509-511,513-516,518-522,525-526,528-529,531,534-535,538,541-542,555
src/props/HypreStructSolver.H                  6        6   100%
src/props/MacroGeometry.H                     17       17   100%
src/props/ParticleSizeDistribution.cpp        11       11   100%
src/props/ParticleSizeDistribution.H           6        6   100%
src/props/PercolationCheck.cpp                53       46    86%   32-33,49-51,68,73
src/props/PercolationCheck.H                   4        4   100%
src/props/PhysicsConfig.H                     90       89    98%   150
src/props/ResultsJSON.H                      225      222    98%   242,395,416
src/props/REVStudy.cpp                       151      128    84%   72,83-91,159,170-173,175,183-186,188-190
src/props/SolverConfig.H                      32       20    62%   30,32,37-44,75-76
src/props/SpecificSurfaceArea.cpp             56       55    98%   59
src/props/SpecificSurfaceArea.H                6        6   100%
src/props/ThroughThicknessProfile.cpp         38       38   100%
src/props/ThroughThicknessProfile.H            5        5   100%
src/props/Tortuosity.H                         2        2   100%
src/props/TortuosityDirect.cpp               219      191    87%   81-83,86,100-106,113-114,125,134,140,202-209,226,394,424,433
src/props/TortuosityDirect.H                   5        5   100%
src/props/TortuosityHypre.cpp                784      563    71%   149-150,155-156,240-243,246-248,311,335-337,340-341,343,353-355,358-360,390-393,573,597,601,622,639-640,642-644,646-655,657,660-664,668-670,673-680,682-686,690-692,694-696,698-707,709-713,715-726,728-731,733,743,749-752,754-756,765-768,770-772,788,791-792,815-820,831-834,836,873,878-881,884-886,890-893,895,897-900,902,907-909,911,960,969,974,977-982,998-1001,1015-1019,1024-1029,1039-1043,1048-1053,1058-1062,1065-1068,1075-1078,1089,1098,1100,1104,1106,1128,1159-1160,1246-1248,1374-1377
src/props/TortuosityHypre.H                   15       15   100%
src/props/TortuosityHypreFill.H              127       98    77%   85,203,205-212,237-239,241-245,247-248,250,252,255-256,258-262
src/props/TortuosityKernels.H                 97       53    54%   52,56-60,62-65,69-74,76-80,84-85,90,129,143,157,243,245-248,250-253,257-260,262-265
src/props/TortuosityMLMG.cpp                  99       91    91%   160,181-183,185-186,193,206
src/props/TortuosityMLMG.H                     1        1   100%
src/props/TortuositySolverBase.cpp           301      237    78%   70-72,74-75,94-101,104,106,142-145,200,203,205,255,280,298,327,391,394-396,398,406-409,411-417,422,427-429,435-436,438-440,454,460,464-465,467,478,492,496-498,500,502,506
src/props/TortuositySolverBase.H              13       13   100%
src/props/VolumeFraction.cpp                  25       25   100%
src/props/VolumeFraction.H                     4        4   100%
------------------------------------------------------------------------------
TOTAL                                       5407     3874    71%
------------------------------------------------------------------------------


Generated by CI — coverage data from gcovr

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops documentation Improvements or additions to documentation gpu

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant