Skip to content

[no-ci] toolshed: modernize conda_create_for_pathfinder_testing scripts#2013

Merged
rwgk merged 4 commits into
NVIDIA:mainfrom
rwgk:toolshed_conda_create_for_pathfinder_modernizations
May 17, 2026
Merged

[no-ci] toolshed: modernize conda_create_for_pathfinder_testing scripts#2013
rwgk merged 4 commits into
NVIDIA:mainfrom
rwgk:toolshed_conda_create_for_pathfinder_modernizations

Conversation

@rwgk
Copy link
Copy Markdown
Contributor

@rwgk rwgk commented May 4, 2026

Modernizations as used for manually testing changes under PR #1977 (xref issue #1038) with Conda Linux and Windows:

Test results: #1977 (comment)

This is a separate PR to remove a distraction from the bigger set of changes.

What this PR modernizes

The two scripts had drifted from their de-facto source of truth — the Linux- and Windows-installable subsets of cuda_pathfinder/pyproject.toml. This PR brings them back in sync and tightens a few rough edges:

  • Track the current pyproject.toml package lists. The previous scripts predate several additions, removals, and renamings in the pathfinder dependency groups; the package arrays in both shells now mirror the platform-scoped subsets that pyproject.toml declares.
  • The Windows list is properly platform-scoped instead of a copied-from-Linux superset. The Windows script previously ran with the Linux list and silently swallowed conda install errors for packages that don't have win-64 builds. The new list contains only the win-64-installable subset, so a non-zero exit now means something is actually wrong.
  • Caller-chosen Python version. Both scripts now take python_major_minor as an explicit argument instead of hard-coding it. The trigger was that python=3.14 with cuda-toolkit=12.9.x had stopped resolving on Windows, so the fixed mapping was no longer adequate.
  • Harden the script invariants. set -euo pipefail (bash) and Set-StrictMode -Version Latest plus $ErrorActionPreference = "Stop" (PowerShell) so unset variables and non-zero exits surface immediately instead of silently producing partial environments.
  • Single-solve conda install (per separate review feedback on this PR). Both scripts now hand the full package list to one conda install invocation, replacing the prior per-package loop. Faster, and more importantly: each independent solve in the old loop could downgrade or upgrade shared deps placed by a previous solve, leaving the final environment in an order-dependent state.

On toolshed/ in general

toolshed/ is intentionally a shared scratchpad for scripts that are useful intermittently — environment bootstrap helpers, ad-hoc reproducers, glue around manual workflows — but don't warrant CI coverage, unit tests, or the rest of the production-code apparatus. It is accepted that entries bitrot between uses; having them in a shared, versioned location just means the next person who needs the same workflow has a better starting point than starting from scratch.

That framing is also why these helpers stay as two thin shell scripts rather than getting consolidated into a Python module: for the current cadence of use, parallel shell idioms in bash and PowerShell are a smaller maintenance surface than a third script (Python) that would in turn need its own subprocess handling, error model, and entry-point story. Worth revisiting if/when the scripts grow further.

Bring the bash and PowerShell helpers used to spin up CUDA-pathfinder
test environments into line with how cuda_pathfinder/pyproject.toml is
now organized, and harden them so common shell mistakes surface
immediately.

Highlights:

- Add ``set -euo pipefail`` (bash) and ``Set-StrictMode -Version Latest``
  (PowerShell) so unset variables and non-zero exit codes fail loudly.
- Pick the Python version from the CUDA major: ``12.x`` -> 3.12,
  ``13.x`` -> 3.14. Reject other CUDA majors with a clear error
  instead of silently installing 3.13.
- Bash: wrap ``conda activate`` and ``conda install`` in ``set +u`` /
  ``set -u`` because conda's shims dereference unset variables, which
  otherwise abort the script under ``set -u``.
- Realign the conda package lists with the platform-scoped groups in
  ``cuda_pathfinder/pyproject.toml``:
  - Add ``cutlass`` (Linux + Windows) and ``libcusolvermp-dev`` (Linux).
  - Drop ``libnvpl-fft-dev`` from the unconditional Linux list and from
    Windows entirely.
  - On Linux, install ``libnvpl-fft-dev`` only on ``aarch64`` and add
    ``libcudla-dev`` for the ``cu13``/``aarch64`` combination.
  - Drop ``libcublasmp-dev``, ``libcufftmp-dev``, ``libnvshmem3``, and
    ``libnvshmem-dev`` from the Windows list to match what ``pyproject``
    installs there.
- Refresh SPDX copyright to ``2025-2026``.

Co-authored-by: Cursor <cursoragent@cursor.com>
@rwgk rwgk added this to the cuda.pathfinder next milestone May 4, 2026
@rwgk rwgk self-assigned this May 4, 2026
@rwgk rwgk added P1 Medium priority - Should do test Improvements or additions to tests cuda.pathfinder Everything related to the cuda.pathfinder module labels May 4, 2026
@rwgk rwgk requested a review from rparolin May 4, 2026 17:30
)

$ErrorActionPreference = "Stop"
Set-StrictMode -Version Latest
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to spell out more what "modernization's" you are doing. I'm not sure what you are modernizing based on the code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: See the massively expanded PR description.

exit 1
fi

cuda_version="$1"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider: Refactoring these two scripts into a single python script to reduce maintenance headaches.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See last paragraph in the new expanded PR description.

I more-or-less went into the opposite direction: this feedback prompted me to simplify the scripts, by removing the "derive Python version from CUDA version" logic and simply requiring the Python version to be passed in.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider: Using batch install rather then running the resolve N times.

"This runs the conda solver N times instead of once, which is both much slower and can produce a different (worse) environment than solving all packages together — each install can independently downgrade/upgrade shared deps. A single conda install -y -c conda-forge "${cpkgs[@]}" is faster and more consistent."

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit 76d2524

rwgk added 3 commits May 17, 2026 14:16
Replace the per-package ``conda install`` loop in both the bash and
PowerShell helpers with a single batched call that hands the full
package list to one solver invocation.

A single ``conda install -y -c conda-forge "${cpkgs[@]}"`` (bash) /
``conda install -y -c conda-forge @cpkgs`` (PowerShell) is both faster
and more consistent than calling ``conda install`` N times: the loop
form runs the solver once per package, and each independent solve can
downgrade or upgrade shared dependencies that the prior step just
installed.

The ``set +u`` / ``set -u`` shim around the bash call is preserved
because conda's shell wrapper still dereferences unset variables
regardless of how many packages are passed.

Not yet exercised against a real conda environment; the next
end-to-end run will use this batched form.

Addresses review feedback from @rparolin on PR NVIDIA#2013.
…aming

Make the Python version a required argument to both helpers instead of
deriving it from the CUDA major. The previous case/switch over the
CUDA major was a brittle source of truth: every time a new Python or
CUDA line was supported, the helper would silently install the wrong
Python until it was updated. Pushing the choice to the caller keeps
the helpers focused on the conda environment they manage and lets the
caller pick whatever Python they actually want to test against.

New usage on both platforms is parallel:

    conda_create_for_pathfinder_testing.sh  3.12 12.9.2
    conda_create_for_pathfinder_testing.ps1 3.12 12.9.2

Variable / parameter names now match the usage line and read
identically across the two scripts:

- bash:       ``python_major_minor`` / ``cuda_major_minor_patch``
- PowerShell: ``$PythonMajorMinor``  / ``$CudaMajorMinorPatch``

The PowerShell ``param()`` block uses ``Mandatory + Position 0/1`` so
the script accepts both positional invocation (matching the bash style
above) and the named ``-PythonMajorMinor`` / ``-CudaMajorMinorPatch``
form.

Switch the ``libcudla-dev`` gate from ``cuda_major == "13"`` to
``(( cuda_major >= 13 ))`` so future CUDA majors fall through to the
correct branch automatically. ``(( ))`` is the idiomatic bash form for
integer comparison; the script is already firmly bash-only
(``#!/bin/bash``, ``set -euo pipefail``), so the syntax is in keeping
with the rest of the file.
@rwgk
Copy link
Copy Markdown
Contributor Author

rwgk commented May 17, 2026

Re-tested locally after the consolidating changes on this PR.

Linux:

toolshed/conda_create_for_pathfinder_testing.sh 3.12 12.9.2
toolshed/conda_create_for_pathfinder_testing.sh 3.13 13.2.1

Windows (PowerShell):

toolshed\conda_create_for_pathfinder_testing.ps1 3.12 12.9.2
toolshed\conda_create_for_pathfinder_testing.ps1 3.14 13.2.1

All four runs completed cleanly: the explicit python_major_minor argument was honored, the platform-scoped package list installed in a single conda install solve, and CUDA-specific resolutions picked the expected cu12/cu13 builds.

The asymmetric Python pick — 3.13 on Linux vs. 3.14 on Windows for the CUDA 13 run — is deliberate: 3.14 paired with cuda-toolkit=12.9.x is the combination that originally stopped resolving on Windows and motivated taking the Python version as an explicit argument, so this exercises that the newer Python is now reachable when paired with a matching CUDA toolkit.

@rwgk rwgk merged commit f401388 into NVIDIA:main May 17, 2026
20 checks passed
@rwgk rwgk deleted the toolshed_conda_create_for_pathfinder_modernizations branch May 17, 2026 22:29
@github-actions

This comment has been minimized.

2 similar comments
@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown

Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.pathfinder Everything related to the cuda.pathfinder module P1 Medium priority - Should do test Improvements or additions to tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants