Fix wheel builds and publish GPU wheels to PyPI#259
Merged
jameslehoux merged 4 commits intomasterfrom Apr 21, 2026
Merged
Conversation
added 4 commits
April 21, 2026 11:22
CPU wheel error ("gmake: *** No rule to make target '_core'") traced to
the SKBUILD_CMAKE_ARGS env var interfering with scikit-build-core's
cmake.args merge. The OPENIMPALA_ENABLE_TINY_PROFILE option was
redundant anyway — when AMReX is built with AMReX_TINY_PROFILE=ON, it
sets AMREX_TINY_PROFILE in its installed AMReX_Config.H header, which
every file including AMReX.H picks up automatically. Removed the option
and the env var; kept the AMReX-side build flag.
GPU wheel error ("CUDA::nvToolsExt target not found") is AMReX 25.03 vs.
CUDA 12 — libnvToolsExt was removed in CUDA 12 in favour of NVTX3
(header-only). Patch AMReX 25.03's CMake to use CUDA::nvtx3 instead,
applied via sed before configure. CMake 3.25+ (we have 3.28) exposes
CUDA::nvtx3 from CUDAToolkit, so this is drop-in.
Cache keys bumped (CPU v5, GPU nvtx3-v4) to force a fresh dep rebuild.
https://claude.ai/code/session_011dJ5Bwq4Tnr8wxH597XJFf
Now that openimpala-cuda has been granted the 320 MiB per-file PyPI limit, the GPU wheels fit and can be installed via `pip install openimpala-cuda` like any other package. Mirror the CPU workflow's publish job: use the pypi trusted-publisher flow (environment: pypi, id-token: write) via pypa/gh-action-pypi-publish. Gate on github.event_name == 'release' so workflow_dispatch runs still produce artifacts for manual inspection without touching the index. https://claude.ai/code/session_011dJ5Bwq4Tnr8wxH597XJFf
Now that openimpala-cuda is published to PyPI (previous commit switched the
GPU wheel workflow), the install collapses from
pip install openimpala-cuda --find-links \
https://github.com/BASE-Laboratory/OpenImpala/releases/expanded_assets/v4.0.6 \
nvidia-cuda-runtime-cu12 nvidia-cublas-cu12 nvidia-cusparse-cu12 \
nvidia-curand-cu12
down to plain
pip install openimpala-cuda
The nvidia-*-cu12 packages were only needed because the --find-links index
didn't carry them; PyPI's resolver will pull whatever the wheel actually
declares. Updates every call site that showed the old incantation:
- README.md, docs/getting-started.md, docs/user-guide/gpu.md — advanced
install sections
- paper.md — corrects "via GitHub Releases" wording for the JOSS draft
- notebooks/visualization_yt.ipynb — §0 install cell
- tutorials/02_digital_twin.ipynb — install cell
- tutorials/04_multiphase_and_fields.ipynb — install cell
- tutorials/07_hpc_scaling.ipynb — §6 install cell
Also fixes a malformed .sif wget URL in docs/getting-started.md (a stray
concatenation of expanded_assets/v4.0.6 with the filename) by switching to
a vX.Y.Z placeholder to match the pattern already used in tutorial 7.
https://claude.ai/code/session_011dJ5Bwq4Tnr8wxH597XJFf
auditwheel repair --exclude drops libcudart / libcublas / libcusparse /
libcurand / libnvJitLink from the openimpala-cuda wheel payload, which
means the wheel only works on machines that already have the CUDA 12
toolkit installed — driver-only Colab/Kaggle runtimes have the libraries,
but a bare Python venv on an NVIDIA workstation does not.
Declare the nvidia-*-cu12 PyPI packages as runtime deps so pip pulls them
automatically. Keep them commented out in pyproject.toml with clear
markers so the CPU wheel (which uses the same file) doesn't grow a 1-2 GB
dep tree. The GPU workflow's existing sed step already rewrites `name =
"openimpala"` to `"openimpala-cuda"`; a second sed uncomments the
`#"nvidia-..."` lines in the same pass.
Verified with python3 -m tomllib that both variants produce valid TOML
and the expected dependency lists:
CPU: ['numpy', 'scipy>=1.7']
GPU: ['numpy', 'scipy>=1.7', 'nvidia-cuda-runtime-cu12',
'nvidia-cublas-cu12', 'nvidia-cusparse-cu12',
'nvidia-curand-cu12', 'nvidia-nvjitlink-cu12']
https://claude.ai/code/session_011dJ5Bwq4Tnr8wxH597XJFf
Performance Benchmark Results
Fastest solver: bicgstab at 64³ (0.4105s) Benchmark: uniform block (analytical τ = (N-1)/N) |
Code Coverage ReportGenerated by CI — coverage data from gcovr |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Now that openimpala-cuda is published to PyPI (previous commit switched the
GPU wheel workflow), the install collapses from
down to plain
The nvidia-*-cu12 packages were only needed because the --find-links index
didn't carry them; PyPI's resolver will pull whatever the wheel actually
declares. Updates every call site that showed the old incantation:
install sections
Also fixes a malformed .sif wget URL in docs/getting-started.md (a stray
concatenation of expanded_assets/v4.0.6 with the filename) by switching to
a vX.Y.Z placeholder to match the pattern already used in tutorial 7.