Add cuTENSOR support & bug fixes discovered while working on conda testing #1194

rwgk · 2025-10-28T06:01:03Z

Closes #1144, #1116

Bump cuda-pathfinder version to 1.3.2

TODOs:

Paste all outputs: (site-packages, conda, standard-CTK) x (cu12, cu13) x (linux-64, linux-aarch64, win-64)

Main changes:

find_nvidia_headers.py generalization for non-CTK headers, introducing a new family of SUPPORTED_*NON_CTK* variables in supported_nvidia_headers.py.
test_load_nvidia_dynamic_lib.py now loops consistently over all SUPPORTED_LINUX_SONAMES or SUPPORTED_WINDOWS_DLLS, depending on the platform.

Piggy-backed changes:

Fix [BUG]: cublasmp dependencies are not reflected in supported_nvidia_libs.py #1116
Add SITE_PACKAGES_LIBDIRS_WINDOWS_OTHER cudss paths in supported_nvidia_libs.py.
Handle cccl IS_WINDOWS conda anomaly (see example below) in find_nvidia_headers.py, e.g. (this was overlooked before due to lack of automatic testing).
commit 0d71a85 — Add nvidia-cufftmp-cu13 data in supported_nvidia_libs.py

Example for cccl IS_WINDOWS conda anomaly (note targets\x64 after include\):

INFO test_find_ctk_headers[cccl]: hdr_dir='C:\\Users\\rgrossekunst\\AppData\\Local\\miniforge3\\envs\\pathfinder_testing_cu12.9.1\\Library\\include\\targets\\x64'

… testing is INCOMPLETE.

…inder_testing.*

copy-pr-bot · 2025-10-28T06:01:06Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

rwgk · 2025-10-28T06:02:45Z

/ok to test

github-actions · 2025-10-28T06:12:29Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-1194/
https://nvidia.github.io/cuda-python/pr-preview/pr-1194/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-1194/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-1194/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

…test_load_nvidia_dynamic_lib.py

…or conda anomaly, to restore proper functioning for standard CTK installations

rwgk · 2025-10-28T16:46:49Z

/ok to test

Manually tested with: pip install nvidia-cufftmp-cu13==12.1.3.1 That wheel was yanked, therefore not adding to pyproject.toml

copy-pr-bot · 2025-10-28T19:23:40Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

rwgk · 2025-10-28T19:23:44Z

/ok to test

greptile-apps

Greptile Overview

{"summary": "

Sequence Diagram

sequenceDiagram
    participant User
    participant find_nvidia_header_directory
    participant find_nvidia_headers.py
    participant supported_nvidia_headers.py
    participant load_nvidia_dynamic_lib
    participant supported_nvidia_libs.py
    participant Test Suite

    User->>find_nvidia_header_directory: "Request header for libname (e.g., 'cutensor')"
    find_nvidia_header_directory->>supported_nvidia_headers.py: "Check if libname in SUPPORTED_HEADERS_CTK"
    alt CTK Library
        supported_nvidia_headers.py-->>find_nvidia_header_directory: "Found in CTK"
        find_nvidia_header_directory->>find_nvidia_headers.py: "Call _find_ctk_header_directory()"
    else Non-CTK Library (e.g., cutensor)
        supported_nvidia_headers.py-->>find_nvidia_header_directory: "Check SUPPORTED_HEADERS_NON_CTK"
        find_nvidia_header_directory->>supported_nvidia_headers.py: "Get canonical header basename"
        supported_nvidia_headers.py-->>find_nvidia_header_directory: "Return 'cutensor.h'"
        find_nvidia_header_directory->>find_nvidia_headers.py: "Search site-packages"
        find_nvidia_headers.py->>find_nvidia_headers.py: "Check SUPPORTED_SITE_PACKAGE_HEADER_DIRS_NON_CTK"
        find_nvidia_headers.py->>find_nvidia_headers.py: "_find_based_on_conda_layout() for non-CTK"
        alt Windows Conda Anomaly (cccl)
            find_nvidia_headers.py->>find_nvidia_headers.py: "Handle targets/x64 path anomaly"
        end
        find_nvidia_headers.py->>find_nvidia_headers.py: "Check SUPPORTED_INSTALL_DIRS_NON_CTK"
    end
    find_nvidia_header_directory-->>User: "Return header directory path or None"

    User->>load_nvidia_dynamic_lib: "Load dynamic library (e.g., 'cutensor')"
    load_nvidia_dynamic_lib->>supported_nvidia_libs.py: "Query SUPPORTED_LINUX_SONAMES or SUPPORTED_WINDOWS_DLLS"
    alt CTK Library
        supported_nvidia_libs.py-->>load_nvidia_dynamic_lib: "Return from SUPPORTED_*_SONAMES_CTK"
    else Non-CTK Library (cutensor)
        supported_nvidia_libs.py-->>load_nvidia_dynamic_lib: "Return from SUPPORTED_*_SONAMES_OTHER"
        Note over supported_nvidia_libs.py: "Added: cutensor -> libcutensor.so.2 (Linux)<br/>cutensor -> cutensor.dll (Windows)"
    end
    load_nvidia_dynamic_lib->>supported_nvidia_libs.py: "Check SITE_PACKAGES_LIBDIRS_*"
    supported_nvidia_libs.py-->>load_nvidia_dynamic_lib: "Return site-packages paths"
    load_nvidia_dynamic_lib-->>User: "Return LoadedDL object or DynamicLibNotFoundError"

    User->>Test Suite: "Run test_find_nvidia_headers"
    Test Suite->>find_nvidia_header_directory: "Test all SUPPORTED_HEADERS_NON_CTK.keys()"
    Test Suite->>Test Suite: "Loop over cutensor, nvshmem, etc."
    Test Suite-->>User: "Report test results"

    User->>Test Suite: "Run test_load_nvidia_dynamic_lib"
    Test Suite->>load_nvidia_dynamic_lib: "Test all SUPPORTED_LINUX_SONAMES or SUPPORTED_WINDOWS_DLLS"
    Test Suite->>Test Suite: "Run in spawned child process for isolation"
    Test Suite-->>User: "Report test results with abs_path or 'Not found'"

_{14 files reviewed, 12 comments}

_{Edit Code Review Agent Settings | Greptile}

cuda_pathfinder/tests/local_helpers.py

cuda_pathfinder/docs/source/release/1.3.2-notes.rst

toolshed/conda_create_for_pathfinder_testing.ps1

toolshed/conda_create_for_pathfinder_testing.sh

cuda_pathfinder/tests/test_find_nvidia_headers.py

cuda_pathfinder/cuda/pathfinder/_headers/find_nvidia_headers.py

rwgk · 2025-10-28T19:28:13Z

@ZzEeKkAa I believe this is ready for review. It'd be great if you could take a look.

The only thing left to do is go through the manual testing systematically.

rwgk · 2025-10-28T20:08:25Z

CI ran successfully at commit b079c84:

https://github.com/NVIDIA/cuda-python/actions/runs/18886735456/job/53903916554?pr=1194

greptile-apps

Greptile Overview

Greptile Summary

This incremental review covers only the changes made since the last review, not the entire PR. The developer has addressed previous feedback by applying the _abs_norm() wrapper to all remaining return paths in find_nvidia_header_directory() (lines 150, 153, 159) and fixed the PowerShell syntax error in the conda setup script. The _abs_norm() helper (lines 15-18) normalizes path separators and converts relative paths to absolute paths, ensuring consistent path format across all return points—critical for the new cuTENSOR/non-CTK library support that can be installed in diverse locations (site-packages, conda, standard directories). The PowerShell script now includes 'cutensor' in the package list for testing and corrects the missing comma after "libnvshmem-dev". These changes ensure that header discovery returns predictable, normalized paths regardless of installation method or platform, addressing Windows conda path anomalies mentioned in the PR description.

Important Files Changed

Filename	Score	Overview
cuda_pathfinder/cuda/pathfinder/_headers/find_nvidia_headers.py	5/5	Applied `_abs_norm()` wrapper to all non-CTK header return paths for consistent path normalization
toolshed/conda_create_for_pathfinder_testing.ps1	5/5	Added 'cutensor' package and fixed trailing comma syntax error in package list

Confidence score: 5/5

This PR is safe to merge with minimal risk as the changes are targeted fixes addressing specific previous review feedback
Score reflects that all previously identified issues have been resolved: path normalization is now consistent across all return paths, and the PowerShell syntax error has been corrected
No files require special attention; both changes are straightforward defensive improvements that enhance cross-platform reliability

_{2 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

leofang · 2025-10-28T21:53:32Z

cuda_pathfinder/docs/source/api.rst

   SUPPORTED_HEADERS_CTK
+   SUPPORTED_HEADERS_NON_CTK


I think we'll have to document the OS-dependent flavors. The other day I was looking at this doc
https://nvidia.github.io/cuda-python/cuda-pathfinder/latest/generated/cuda.pathfinder.SUPPORTED_NVIDIA_LIBNAMES.html#cuda.pathfinder.SUPPORTED_NVIDIA_LIBNAMES
and noticed that nccl is not on the list, but we clearly support nccl.

I created issue #1197 to track follow-on work.

This (SUPPORTED_HEADERS_NON_CTK) unfortunately needs to be decided now, no matter how hard we want to push out a release. Once this is documented, it becomes a public API, and there is no turning back without breaking the major version. Let's make sure we reach a conclusion before cutting a new (patch) release, if not in this PR.

cuda_pathfinder/cuda/pathfinder/__init__.py

leofang · 2025-10-28T22:02:43Z

cuda_pathfinder/cuda/pathfinder/_headers/find_nvidia_headers.py

+            # conda has this anomaly
+            cdir_ctk12 = os.path.join(idir, "targets", "x64")
+            cdir_ctk13 = os.path.join(cdir_ctk12, "cccl")
+            if _joined_isfile(cdir_ctk13, h_basename):
+                return cdir_ctk13
+            if _joined_isfile(cdir_ctk12, h_basename):
+                return cdir_ctk12


I don't understand this comment. The difference between CUDA 12 & 13 is universal (differ by a cccl subdir) and not limited to conda?

CUDA 12 Windows path is %CONDA_PREFIX%\Library\include\targets\x64\ (link)

CUDA 13 Windows path is %CONDA_PREFIX%\Library\include\targets\x64\cccl (link)

Going through systematically:

cu13 = 13.0.2
cu12 = 12.9.1

local-ctk cu13: IINFO test_find_ctk_headers[cccl]: hdr_dir='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.0\\include\\cccl'

local-ctk cu12: INFO test_find_ctk_headers[cccl]: hdr_dir='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.9\\include'

conda-ctk cu13: INFO test_find_ctk_headers[cccl]: hdr_dir='C:\\Users\\rgrossekunst\\AppData\\Local\\miniforge3\\envs\\pathfinder_testing_cu13.0.2\\Library\\include\\targets\\x64\\cccl'

conda-ctk cu12: INFO test_find_ctk_headers[cccl]: hdr_dir='C:\\Users\\rgrossekunst\\AppData\\Local\\miniforge3\\envs\\pathfinder_testing_cu12.9.1\\Library\\include\\targets\\x64'

I moved the comment to the end of the line, to make it more clear that "anomaly" applies to the targets\x64 part (commit 4d5a41a).

So by "anomaly" you meant x64 shows up only in CCCL's path?

I had the whole thing in mind: targets\x64 (appears only specifically if conda && windows && cccl)

Thanks for spotting it. I now recall we documented this
https://github.com/conda-forge/cccl-feedstock/blob/1fc3562e90c7d81203e62826779c8d45fe66191b/recipe/recipe.yaml#L1-L4
However, I don't think it is Windows only. This is the path on Linux (as documented):

https://github.com/conda-forge/cuda-cccl-feedstock/blob/19281e6b2bbe5b2a2332795e7bac85e02e1421ca/recipe/build.sh#L20

https://github.com/conda-forge/cuda-cccl-feedstock/blob/19281e6b2bbe5b2a2332795e7bac85e02e1421ca/recipe/meta.yaml#L91

Could you check again?

leofang · 2025-10-28T22:14:50Z

cuda_pathfinder/cuda/pathfinder/_headers/supported_nvidia_headers.py

+}
+SUPPORTED_HEADERS_NON_CTK_LINUX = SUPPORTED_HEADERS_NON_CTK_COMMON | SUPPORTED_HEADERS_NON_CTK_LINUX_ONLY
+SUPPORTED_HEADERS_NON_CTK_WINDOWS = SUPPORTED_HEADERS_NON_CTK_COMMON
+SUPPORTED_HEADERS_NON_CTK_ALL = SUPPORTED_HEADERS_NON_CTK_COMMON | SUPPORTED_HEADERS_NON_CTK_LINUX_ONLY


Q: What's the difference between SUPPORTED_HEADERS_NON_CTK_ALL and SUPPORTED_HEADERS_NON_CTK_LINUX? They look the same to me.

In supported_nvidia_dynamic_libs.py there is also SUPPORTED_..._WINDOWS_ONLY. It doesn't exist here, but I wanted to follow the more general form.

We can maybe look at this some more under issue #1197, although I think it's a useful pattern.

I am not sure I follow. Shouldn't it be

Suggested change

SUPPORTED_HEADERS_NON_CTK_ALL = SUPPORTED_HEADERS_NON_CTK_COMMON | SUPPORTED_HEADERS_NON_CTK_LINUX_ONLY

SUPPORTED_HEADERS_NON_CTK_ALL = SUPPORTED_HEADERS_NON_CTK_COMMON | SUPPORTED_HEADERS_NON_CTK_LINUX_ONLY | SUPPORTED_HEADERS_NON_CTK_WINDOWS_ONLY

?

Maybe keep an empty dict here so that we express the intent while leaving room for future extension?

rwgk · 2025-10-29T00:48:08Z

/ok to test

cuda_pathfinder/tests/test_find_nvidia_headers.py

cuda_pathfinder/tests/test_load_nvidia_dynamic_lib.py

rwgk added 10 commits October 26, 2025 22:49

Add support for cutensor. Still works for existing libnames, cutensor…

2d1d8fe

… testing is INCOMPLETE.

Generalize tests/test_find_nvidia_headers.py to also cover cutensor

986a83f

test_find_nvidia_headers.py conda testing and fix

0583393

test_load_nvidia_dynamic_lib.py fix conda testing

1d1f534

Add conda_create_for_pathfinder_testing.ps1

e0a4ca6

Bug fix: SITE_PACKAGES_LIBDIRS_WINDOWS_OTHER cutensor, cutensorMg paths

25dd364

Add cudss paths to SITE_PACKAGES_LIBDIRS_WINDOWS_OTHER

cdab969

Add SUPPORTED_HEADERS_NON_CTK_ALL to fix Windows site-packages tests

6d490f5

Bug fix (existing code): conda cccl header directory

28e7206

test_find_nvidia_headers.py: refer to toolshed/conda_create_for_pathf…

d36a62d

…inder_testing.*

rwgk added 3 commits October 28, 2025 09:09

nvidia-libmathdx-... only exists for cu12: tolerate abs_path=None in …

0734ac7

…test_load_nvidia_dynamic_lib.py

find_nvidia_headers.py cccl IS_WINDOWS: fall-through after checking f…

3b156b7

…or conda anomaly, to restore proper functioning for standard CTK installations

Add cublasmp DIRECT_DEPENDENCIES (closes NVIDIA#1116)

7aa6679

rwgk added 4 commits October 28, 2025 10:57

Add SUPPORTED_HEADERS_NON_CTK to cuda_pathfinder/docs/source/api.rst

9bfef5f

Add 1.3.2-notes.rst

3a44b6a

Add nvidia-cufftmp-cu13 data in supported_nvidia_libs.py

0d71a85

Manually tested with: pip install nvidia-cufftmp-cu13==12.1.3.1 That wheel was yanked, therefore not adding to pyproject.toml

Merge branch 'main' into cutensor_support

b079c84

rwgk marked this pull request as ready for review October 28, 2025 19:23

greptile-apps bot reviewed Oct 28, 2025

View reviewed changes

rwgk added 2 commits October 28, 2025 12:56

Add missing comma in toolshed/conda_create_for_pathfinder_testing.ps1

90ed27a

Systematically add _abs_norm() in find_nvidia_header_directory()

9671b24

greptile-apps bot reviewed Oct 28, 2025

View reviewed changes

leofang assigned leofang and rwgk and unassigned leofang Oct 28, 2025

leofang added enhancement Any code-related improvements P0 High priority - Must do! cuda.pathfinder Everything related to the cuda.pathfinder module labels Oct 28, 2025

leofang reviewed Oct 28, 2025

View reviewed changes

cuda_pathfinder/cuda/pathfinder/__init__.py Show resolved Hide resolved

leofang reviewed Oct 28, 2025

View reviewed changes

rwgk force-pushed the cutensor_support branch from ab4c8f3 to 9671b24 Compare October 28, 2025 22:13

leofang reviewed Oct 28, 2025

View reviewed changes

rwgk added 2 commits October 28, 2025 17:29

Move "conda has this anomaly" comment to the end of the line

4d5a41a

Merge branch 'main' into cutensor_support

75d8874

rwgk mentioned this pull request Oct 29, 2025

[ENH]: Clean up SUPPORTED_... variables between load libs and find header dirs #1197

Open

leofang reviewed Oct 29, 2025

View reviewed changes

cuda_pathfinder/tests/test_find_nvidia_headers.py Show resolved Hide resolved

leofang reviewed Oct 29, 2025

View reviewed changes

cuda_pathfinder/tests/test_load_nvidia_dynamic_lib.py Show resolved Hide resolved

	SUPPORTED_HEADERS_NON_CTK_ALL = SUPPORTED_HEADERS_NON_CTK_COMMON \| SUPPORTED_HEADERS_NON_CTK_LINUX_ONLY
	SUPPORTED_HEADERS_NON_CTK_ALL = SUPPORTED_HEADERS_NON_CTK_COMMON \| SUPPORTED_HEADERS_NON_CTK_LINUX_ONLY \| SUPPORTED_HEADERS_NON_CTK_WINDOWS_ONLY

Uh oh!

Add cuTENSOR support & bug fixes discovered while working on conda testing #1194

Are you sure you want to change the base?

Add cuTENSOR support & bug fixes discovered while working on conda testing #1194

Conversation

rwgk commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Oct 28, 2025

Uh oh!

rwgk commented Oct 28, 2025

Uh oh!

github-actions bot commented Oct 28, 2025

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

rwgk commented Oct 28, 2025

Uh oh!

copy-pr-bot bot commented Oct 28, 2025

Uh oh!

rwgk commented Oct 28, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

{"summary": "

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rwgk commented Oct 28, 2025

Uh oh!

rwgk commented Oct 28, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Important Files Changed

Confidence score: 5/5

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leofang Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

leofang Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rwgk Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leofang Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

rwgk commented Oct 28, 2025 •

edited

Loading

leofang Oct 29, 2025 •

edited

Loading

leofang Oct 28, 2025 •

edited

Loading

rwgk Oct 29, 2025 •

edited

Loading

leofang Oct 29, 2025 •

edited

Loading