Skip to content

Conversation

rwgk
Copy link
Collaborator

@rwgk rwgk commented Aug 19, 2025

Closes #776, #823, #856

Bump cuda-pathfinder version to 1.1.1a3

Adds support for non-CTK Nvidia libraries, as needed for nvmath:

CUDA-related:

Non-CUDA:

  • nvpl_fftw

A general approach for finding .so/.dll files under site-packages was implemented in these files:

  • cuda/pathfinder/_utils/find_site_packages_dll.py
  • cuda/pathfinder/_utils/find_site_packages_so.py

However, this turned out to be relatively slow (~1.9 seconds in a certain environment with a network drive), even though only RECORD files are inspected, via the stdlib importlib.metadata API. (An initial version that simply walked all site-packages directories was prohibitively slow, ~100 seconds in the same environment.)

To maximize performance, and to make supporting more libraries more intutive at the same time, the approach to searching site-packages was changed to rely on new dictionaries added in supported_nvidia_libs.py:

  • SITE_PACKAGES_LIBDIRS_LINUX
  • SITE_PACKAGES_LIBDIRS_WINDOWS

These dictionaries are based on the output from two pairs of helper scripts added under the toolshed/ directory:

  • Linux: collect_site_packages_so_files.sh, make_site_packages_libdirs_linux.py
  • Windows: collect_site_packages_dll_files.ps1, make_site_packages_libdirs_windows.py

With this, the same test that took 1.9 seconds to complete with the slower importlib.metadata-based approach completes in ~0.2 seconds (similar to status quo). — The slower approach is now used only from test_load_nvidia_dynamic_lib.py, as an easy way to determine dynamically which wheels are available on each platform.

A simple LIBNAMES_REQUIRING_RTLD_DEEPBIND feature was added to support cufftMp.

Piggy-backed: tests/child_load_nvidia_dynamic_lib_helper.py was factored out of tests/test_load_nvidia_dynamic_lib.py to improve test performance, especially on Windows (factor 2x).

Copy link
Contributor

copy-pr-bot bot commented Aug 19, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rwgk rwgk self-assigned this Aug 19, 2025
@rwgk rwgk added the cuda.pathfinder Everything related to the cuda.pathfinder module label Aug 19, 2025
@rwgk rwgk added this to the pathfinder-nvmath-support milestone Aug 19, 2025
@rwgk
Copy link
Collaborator Author

rwgk commented Aug 19, 2025

/ok to test

This comment has been minimized.

@rwgk
Copy link
Collaborator Author

rwgk commented Aug 20, 2025

/ok to test

@rwgk
Copy link
Collaborator Author

rwgk commented Aug 20, 2025

/ok to test

@rwgk
Copy link
Collaborator Author

rwgk commented Aug 20, 2025

/ok to test

@rwgk rwgk requested a review from kkraus14 August 22, 2025 02:27
@rwgk rwgk changed the title Find all dynamic libs under all site packages Support non-CTK Nvidia libraries, add general fallback for unsupported libs under site-packages Aug 22, 2025
Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a quick pass. Will resume tomorrow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the two make_site_packages_libdirs_*.py files can be merged into one. You already handled the directory separators (/ vs \ or \\) properly, and always use the former in supported_nvidia_libs.py.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Could you please merge the newly added toolshed/make_site_packages_libdirs_*.py scripts into one?

(These scripts were both auto-generated by ChatGPT (with a very minimal effort), I decided it's not worth a manual effort consolidating them.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copilot doesn't listen here. I tried via the Spaces feature, but that's only GPT 4.1, and it prompted me to attach files manually. So I went back to ChatGPT 5 Pro, the thinking version, that I mostly use.

I'm attaching what it generated. I'll have time to try it out only later. The line counts:

      76 make_site_packages_libdirs_linux.py
     110 make_site_packages_libdirs_windows.py
     155 make_site_packages_libdirs.py

Do you think it's worth more effort? The original scripts were generated in ~5 minutes each, including writing the prompts.

Throwing them away seems wrong.

Working on the more, I'm not sure.

make_site_packages_libdirs.py.txt

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've manually consolidated the two script now: commit 69d7c98

Even ChatGPT 5 Pro didn't get it right the first time through, two iterations later it was a pretty big mess, then I just did it manually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to make copilot as an assignee before we can "at" it, but we probably can't do so in an existing PR, only before a PR is created.

@github-project-automation github-project-automation bot moved this from Todo to In Progress in CCCL Aug 25, 2025
@rwgk
Copy link
Collaborator Author

rwgk commented Aug 27, 2025

As discussed in our meeting today: I removed the fallback code paths in find_nvidia_dynamic_lib.py

commit 5711602

@rwgk
Copy link
Collaborator Author

rwgk commented Aug 27, 2025

/ok to test

@rwgk rwgk force-pushed the find_all_dynamic_libs_under_all_site_packages branch from 901dc47 to 69d7c98 Compare August 27, 2025 21:36
@rwgk
Copy link
Collaborator Author

rwgk commented Aug 27, 2025

@leofang I'm holding off rerunning the tests, pending your full review. (I think the delta between the last time the full tests ran is small.)

Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks a lot for the hard and thorough work Ralf!


LIBNAMES_REQUIRING_OS_ADD_DLL_DIRECTORY = (
"cufft",
"nvrtc",
)

LIBNAMES_REQUIRING_RTLD_DEEPBIND = ("cufftMp",)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: is deepbind needed so as to avoid symbol collision with libcufft?

Copy link
Member

@leofang leofang Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker, but would be nice to revisit after the dust is settled.

I believe this is another file that can be merged with find_site_packages_so.py. If the names dll vs so are bothering you, just refer to them as, say, dso (dynamic shared libraries).

Even on Windows, we also have the same version "suffix" concept despite they are not literally the suffices. Taking cuBLASLt as example: cublasLt64_12.dll is the Windows-equivalent of libcublasLt.so.12 on Linux, and 12 is the "suffix" . So we could have a split_dso_version_suffix that handles both platforms, and then find_all_dso_files_via_metadata is unified.

The same reasoning that find_all_so_files_via_metadata returns a dict of dicts, strictly speaking, also applies to Windows, so a later unification of these two files can also make our treatment more robust.

"nvfatbin": ("nvidia/cu13/lib", "nvidia/nvfatbin/lib"),
"nvjpeg": ("nvidia/cu13/lib", "nvidia/nvjpeg/lib"),
"nvrtc": ("nvidia/cu13/lib", "nvidia/cuda_nvrtc/lib"),
"nvvm": ("nvidia/cu13/lib", "nvidia/cuda_nvcc/nvvm/lib64"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh man, so in CUDA 13 the nvvm subdir still exists, but only libdevice is still kept there while everything else is moved to bin/include/lib... 😞

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is annoying... so the wheel layout for NVVM is changed, but not the system CTK which still has $CUDA_PATH/nvvm...

$ tree /usr/local/cuda-13.0/nvvm/
/usr/local/cuda-13.0/nvvm/
├── bin
│   └── cicc
├── include
│   └── nvvm.h
├── lib64
│   ├── libnvvm.so -> libnvvm.so.4
│   ├── libnvvm.so.4 -> libnvvm.so.4.0.0
│   └── libnvvm.so.4.0.0
└── libdevice
    └── libdevice.10.bc

4 directories, 6 files

@github-project-automation github-project-automation bot moved this from In Progress to In Review in CCCL Aug 28, 2025
@leofang
Copy link
Member

leofang commented Aug 28, 2025

/ok to test 69d7c98

@rwgk
Copy link
Collaborator Author

rwgk commented Aug 28, 2025

Thanks Leo!

@rwgk rwgk merged commit 1137e15 into NVIDIA:main Aug 28, 2025
50 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Aug 28, 2025
@rwgk rwgk deleted the find_all_dynamic_libs_under_all_site_packages branch August 28, 2025 01:53
@leofang leofang linked an issue Aug 28, 2025 that may be closed by this pull request
1 task
Copy link

Doc Preview CI
Preview removed because the pull request was closed or merged.

@rwgk rwgk changed the title Support non-CTK Nvidia libraries, add general fallback for unsupported libs under site-packages Support non-CTK Nvidia libraries Aug 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda.pathfinder Everything related to the cuda.pathfinder module
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[FEA]: pathfinder support for cufftMp wheel [FEA]: Support for finding libmathdx through pathfinder
4 participants