[no-ci] toolshed: modernize conda_create_for_pathfinder_testing scripts#2013
Conversation
Bring the bash and PowerShell helpers used to spin up CUDA-pathfinder
test environments into line with how cuda_pathfinder/pyproject.toml is
now organized, and harden them so common shell mistakes surface
immediately.
Highlights:
- Add ``set -euo pipefail`` (bash) and ``Set-StrictMode -Version Latest``
(PowerShell) so unset variables and non-zero exit codes fail loudly.
- Pick the Python version from the CUDA major: ``12.x`` -> 3.12,
``13.x`` -> 3.14. Reject other CUDA majors with a clear error
instead of silently installing 3.13.
- Bash: wrap ``conda activate`` and ``conda install`` in ``set +u`` /
``set -u`` because conda's shims dereference unset variables, which
otherwise abort the script under ``set -u``.
- Realign the conda package lists with the platform-scoped groups in
``cuda_pathfinder/pyproject.toml``:
- Add ``cutlass`` (Linux + Windows) and ``libcusolvermp-dev`` (Linux).
- Drop ``libnvpl-fft-dev`` from the unconditional Linux list and from
Windows entirely.
- On Linux, install ``libnvpl-fft-dev`` only on ``aarch64`` and add
``libcudla-dev`` for the ``cu13``/``aarch64`` combination.
- Drop ``libcublasmp-dev``, ``libcufftmp-dev``, ``libnvshmem3``, and
``libnvshmem-dev`` from the Windows list to match what ``pyproject``
installs there.
- Refresh SPDX copyright to ``2025-2026``.
Co-authored-by: Cursor <cursoragent@cursor.com>
| ) | ||
|
|
||
| $ErrorActionPreference = "Stop" | ||
| Set-StrictMode -Version Latest |
There was a problem hiding this comment.
It would be good to spell out more what "modernization's" you are doing. I'm not sure what you are modernizing based on the code.
There was a problem hiding this comment.
Done: See the massively expanded PR description.
| exit 1 | ||
| fi | ||
|
|
||
| cuda_version="$1" |
There was a problem hiding this comment.
Consider: Refactoring these two scripts into a single python script to reduce maintenance headaches.
There was a problem hiding this comment.
See last paragraph in the new expanded PR description.
I more-or-less went into the opposite direction: this feedback prompted me to simplify the scripts, by removing the "derive Python version from CUDA version" logic and simply requiring the Python version to be passed in.
There was a problem hiding this comment.
Consider: Using batch install rather then running the resolve N times.
"This runs the conda solver N times instead of once, which is both much slower and can produce a different (worse) environment than solving all packages together — each install can independently downgrade/upgrade shared deps. A single conda install -y -c conda-forge "${cpkgs[@]}" is faster and more consistent."
Replace the per-package ``conda install`` loop in both the bash and
PowerShell helpers with a single batched call that hands the full
package list to one solver invocation.
A single ``conda install -y -c conda-forge "${cpkgs[@]}"`` (bash) /
``conda install -y -c conda-forge @cpkgs`` (PowerShell) is both faster
and more consistent than calling ``conda install`` N times: the loop
form runs the solver once per package, and each independent solve can
downgrade or upgrade shared dependencies that the prior step just
installed.
The ``set +u`` / ``set -u`` shim around the bash call is preserved
because conda's shell wrapper still dereferences unset variables
regardless of how many packages are passed.
Not yet exercised against a real conda environment; the next
end-to-end run will use this batched form.
Addresses review feedback from @rparolin on PR NVIDIA#2013.
…aming
Make the Python version a required argument to both helpers instead of
deriving it from the CUDA major. The previous case/switch over the
CUDA major was a brittle source of truth: every time a new Python or
CUDA line was supported, the helper would silently install the wrong
Python until it was updated. Pushing the choice to the caller keeps
the helpers focused on the conda environment they manage and lets the
caller pick whatever Python they actually want to test against.
New usage on both platforms is parallel:
conda_create_for_pathfinder_testing.sh 3.12 12.9.2
conda_create_for_pathfinder_testing.ps1 3.12 12.9.2
Variable / parameter names now match the usage line and read
identically across the two scripts:
- bash: ``python_major_minor`` / ``cuda_major_minor_patch``
- PowerShell: ``$PythonMajorMinor`` / ``$CudaMajorMinorPatch``
The PowerShell ``param()`` block uses ``Mandatory + Position 0/1`` so
the script accepts both positional invocation (matching the bash style
above) and the named ``-PythonMajorMinor`` / ``-CudaMajorMinorPatch``
form.
Switch the ``libcudla-dev`` gate from ``cuda_major == "13"`` to
``(( cuda_major >= 13 ))`` so future CUDA majors fall through to the
correct branch automatically. ``(( ))`` is the idiomatic bash form for
integer comparison; the script is already firmly bash-only
(``#!/bin/bash``, ``set -euo pipefail``), so the syntax is in keeping
with the rest of the file.
|
Re-tested locally after the consolidating changes on this PR. Linux: Windows (PowerShell): All four runs completed cleanly: the explicit The asymmetric Python pick — 3.13 on Linux vs. 3.14 on Windows for the CUDA 13 run — is deliberate: 3.14 paired with |
This comment has been minimized.
This comment has been minimized.
2 similar comments
This comment has been minimized.
This comment has been minimized.
|
Modernizations as used for manually testing changes under PR #1977 (xref issue #1038) with Conda Linux and Windows:
Test results: #1977 (comment)
This is a separate PR to remove a distraction from the bigger set of changes.
What this PR modernizes
The two scripts had drifted from their de-facto source of truth — the Linux- and Windows-installable subsets of
cuda_pathfinder/pyproject.toml. This PR brings them back in sync and tightens a few rough edges:pyproject.tomldeclares.conda installerrors for packages that don't have win-64 builds. The new list contains only the win-64-installable subset, so a non-zero exit now means something is actually wrong.python_major_minoras an explicit argument instead of hard-coding it. The trigger was thatpython=3.14withcuda-toolkit=12.9.xhad stopped resolving on Windows, so the fixed mapping was no longer adequate.set -euo pipefail(bash) andSet-StrictMode -Version Latestplus$ErrorActionPreference = "Stop"(PowerShell) so unset variables and non-zero exits surface immediately instead of silently producing partial environments.conda install(per separate review feedback on this PR). Both scripts now hand the full package list to oneconda installinvocation, replacing the prior per-package loop. Faster, and more importantly: each independent solve in the old loop could downgrade or upgrade shared deps placed by a previous solve, leaving the final environment in an order-dependent state.On
toolshed/in generaltoolshed/is intentionally a shared scratchpad for scripts that are useful intermittently — environment bootstrap helpers, ad-hoc reproducers, glue around manual workflows — but don't warrant CI coverage, unit tests, or the rest of the production-code apparatus. It is accepted that entries bitrot between uses; having them in a shared, versioned location just means the next person who needs the same workflow has a better starting point than starting from scratch.That framing is also why these helpers stay as two thin shell scripts rather than getting consolidated into a Python module: for the current cadence of use, parallel shell idioms in bash and PowerShell are a smaller maintenance surface than a third script (Python) that would in turn need its own subprocess handling, error model, and entry-point story. Worth revisiting if/when the scripts grow further.