Skip to content

Conversation

brandon-b-miller
Copy link
Contributor

WIP, just sketching things out currently

Closes #759

Copy link
Contributor

copy-pr-bot bot commented Sep 29, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

return LoadedDL(abs_path, False, handle._handle, Distribution("system", None))
return None

def find_with_system_search_linux(libname: str) -> Optional[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things stand out here:

  • This function is an almost exact copy of load_with_system_search() and could be implemented as return load_with_system_search(libname).abs_path

  • But I wouldn't want to give this function a name that starts with "find" because it is actually loading the library, which is a side-effect with very strong consequences. — See also Add find_nvidia_dynamic_library to find the DSO location without loading #757

To maintain that the code reflects actual (inconvenient) realities, it'd be great to avoid adding the find_with_system_search_* functions and try_with_system_search() in find_nvidia_dynamic_lib.py‎, even if that means extra wrinkles elsewhere.

Copy link
Contributor Author

@brandon-b-miller brandon-b-miller Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we switch to a version of find_with_system_search that doesn't load, but does some kind of existence check on the file? That would allow us to have a really clean separation between "finding" and "loading"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not, beyond the subprocess idea that's tracked already under #757.

To double-check, I just asked ChatGPT 5 Pro (which didn't exist last time I was working on it):

This was my prompt:


I believe I asked before, but could you please give this another very thorough look?

Platform: Linux only (please ignore any other platforms)

Python versions: 3.13 (please ignore any older Python versions)

Consider this example code:

import ctypes
import os


soname = "libcublas.so.13"
cdll_mode = os.RTLD_NOW | os.RTLD_GLOBAL
try:
    ctypes.CDLL(soname, cdll_mode)
except OSError:
    print(f"Failure loading {soname}")
else:
    print(f"Success loading {soname}")

On my workstation it prints Success loading libcublas.so.13

My question: Is there any way to implement

  • find the dynamic library (libcublas.so.13 in the example) in exactly the same way,

  • but do not actually load it,

  • WITHOUT reverse engineering the search for the file to load?


It reasoned for 15 minutes!

For the full answer, please see https://chatgpt.com/share/68dc09d0-c928-8008-9a55-a5aede0dbbf8

Bottom line

  • In‑process, no‑load, exact resolution: there isn’t a public API.

  • Practical solution that’s exact but safe: spawn a one‑shot helper ...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To track some pointers here:

When asking ChatGPT about a brute-force idea, surprisingly suggested a new approach:

The interesting part from there:


A practical, non‑subprocess alternative (doesn’t load the target)

If you’re willing to slightly relax “no reverse‑engineering,” there is a clean way that doesn’t load the library and doesn’t guess the search order:

Ask the dynamic linker for the exact directory search list it would use for a dlopen() from your process, then walk that list in order to find the first directory containing libcublas.so.13.

On glibc, this is what dlinfo(..., RTLD_DI_SERINFO, ...) is for: it returns the actual run‑time search path the linker will follow (including LD_LIBRARY_PATH, RUNPATH/RPATH, $ORIGIN expansions, and system directories configured via ld.so.conf). You then check those directories in order for the file name. That way, you’re not re‑implementing the rules—you’re reading the linker’s own computed search list—and you never map the target library.


I then asked it to generate the code, which it did in a couple minutes:

But running that code, it turns out:

RTLD_DI_SERINFO returns the directory walk order the dynamic linker would use when it actually needs to scan directories (e.g., from LD_LIBRARY_PATH or a DT_RPATH/DT_RUNPATH on the handle you pass). It does not dump everything from /etc/ld.so.conf.d/*. For most system/extra dirs (like your CUDA entry), glibc doesn’t “search” them at runtime; it consults the binary cache /etc/ld.so.cache to map the SONAME → full path directly. Because there’s no directory walk involved, those directories don’t appear in the SERINFO list.

The only non-reverse-engineering way to get at the binary cache is to run ldconfig -p. So we need a subprocess and string parsing — not great IMO.

I then asked it about the equivalent on Windows, for which it also quickly generated code, but there are also caveats.

I decided to stop here:

  • Implementing find_nvidia_dynamic_lib() that's mostly equivalent to load_nvidia_dynamic_lib() seems feasible now, but it's far from straightforward.

  • I'm really not sure if the value of having that independent find function justifies dealing with the limitations and extra complications.

return dl

def load_with_site_packages(self) -> Optional[LoadedDL]:
return self._load_simple(self.finder.try_site_packages, "site-packages")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this changes the behavior in a subtle but deep way: loading of dependencies should happen in all cases. The interactions are tricky, but that's just a consequence of the realities. The existing order

  1. find specific lib
  2. load_dependencies unconditionally (no matter what was found!)
  3. load specific lib

is very intentional. See the existing comments in existing lines 39-39.

I guess we could change the order to

  1. load_dependencies unconditionally
  2. find specific lib
  3. load specific lib

but I didn't do that because there could be exceptions in the "find specific lib" step, and we'd have the side-effects from loading the dependencies already. It's not clear-cut if that's better or worse; I decided side-effects later is probably better.

I'm thinking it'd be better to be more conservative with the code organization when adding in the Distribution code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. lets unpack the four ways of loading things back out into their own methods then. Updates to follow.

return self._load_simple(self.finder.try_with_conda_prefix, "conda")

def load_with_system_search(self) -> Optional[LoadedDL]:
return self._load_with_dependencies(self.finder.try_with_system_search, "system")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, if we say "system" here, it's not actually so much about "Distribution" but our chosen approach to searching.

Two ideas to explain what I have in mind, for discussion:

  • Instead of Distribution we could call the new member FoundVia. Technically that's more to the point.

  • Or we could analyze what we found and say aha, that's in a CTK package installed from a runfile, or apt, or ... distribution.

The aha part is way more tricky. Is there any value to it, for the purposes of numba-cuda?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's enough for numba-cuda to report the source and the path. Right now we also report the name of the package the library was found from (nvcc in this case) but if the path is reported I think it's less critical to have that as well.

Finding nvvm from NVIDIA NVCC Wheel                                                                                                                                                                                                                                                                                        
        Located at /raid/mambaforge/envs/numba-cuda/lib/python3.12/site-packages/nvidia/cuda_nvcc/nvvm/lib64/libnvvm.so  

@leofang leofang added P1 Medium priority - Should do feature New feature or request cuda.pathfinder Everything related to the cuda.pathfinder module labels Sep 30, 2025
@leofang leofang added this to the cuda-pathfinder parking lot milestone Sep 30, 2025
self.error_messages: list[str] = []
self.attachments: list[str] = []
self.abs_path: Optional[str] = None
self.distribution: Optional[str] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not used anymore?

@dataclass
class FoundVia:
name: str
version: Optional[str] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently not used: I'd lean towards not adding that in this PR, but only when we have an actual use case, which might give us ideas for a different name (or names).

abs_path: Optional[str]
was_already_loaded_from_elsewhere: bool
_handle_uint: int # Platform-agnostic unsigned pointer value
foundvia: Optional[FoundVia] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found_via highly preferred (reads better, and seems to be most common when mapping from camel case to snake case).

finder.raise_not_found_error()

return load_with_abs_path(libname, abs_path)
def _load_lib_no_cache(libname: str) -> LoadedDL:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to move this below the class _LoadNvidiaDynamicLib code (it's more in line with how I organized the rest of the pathfinder code).

Also, this could be a one-liner:

    return _LoadNvidiaDynamicLib(libname).load_lib()

Comment on lines 154 to 157
loaded = load_with_system_search(self.libname)
if loaded is not None:
return loaded
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        return load_with_system_search(self.libname)

self.finder = _FindNvidiaDynamicLib(libname)
self.libname = self.finder.libname

def _maybe_return_loaded(self, check_if_loaded_from_elsewhere: bool = True) -> Optional[LoadedDL]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should run only once. Currently it's run up to four times.

check_if_loaded_from_elsewherehave_abs_path

have_abs_path should depend only on the outcome of _FindNvidiaDynamicLib(libname)

I realize this is all a bit twisty, but it's a consequence of all the exceedingly twisty logic that is provided by the operating systems.

I spent another hour or so today looking into find_nvidia_dynamic_lib() as the primitive, which would make most of the logic here obsolete, but — again unfortunately — it turned out to be gnarly. Let's discuss tomorrow.

Comment on lines 134 to 135
if loaded := self._maybe_return_loaded(True):
return loaded
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove in this group of four methods and call once at the top of load_lib().

Comment on lines 127 to 131
load_dependencies(self.libname, load_nvidia_dynamic_lib)

if loaded is not None:
return loaded
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        return load_dependencies(self.libname, load_nvidia_dynamic_lib)

seems equivalent.

return dl
if dl := self.load_with_cuda_home():
dl.foundvia = FoundVia("CUDA_HOME")
return dl
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the unifying group of four methods, you could use a loop like this:

        if loaded := self._maybe_return_loaded(self.finder.abs_path is not None):
            return loaded
        for load_meth, found_via in (
            (self.load_with_site_packages, "site-packages"),
            (self.load_with_conda_prefix, "conda"),
            (self.load_with_system_search, "system-search"),
            (self.load_with_cuda_home, "CUDA_HOME"),
        ):
            if loaded := load_meth():
                loaded.foundvia = FoundVia(found_via)
                return loaded


from typing import Optional

def _load_lib_no_cache(libname: str) -> LoadedDL:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect it'll be simpler (less code) to wrap this function in a helper that adds found_via into LoadedDL.

def _load_lib_no_cache_impl(libname: str) -> LoadedDL, str
    existing code, but we also keep track of `found_via` and return (loaded, found_via)
def _load_lib_no_cache(libname: str) -> LoadedDL:
    loaded, found_via = _load_lib_no_cache_impl(libname)
    loaded.found_via = FoundVia(found_via)
    return loaded

Alternatively, we could pass found_via down along with abs_path, so that the LoadedDL instances can be constructed with it, rather than retrofitting it in later. That's a bit more surgical, not sure how it'll shake out, but it would seem cleanest.

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great direction and just needs some discussion of details and minimal testing (meeting briefly might be most efficient).

Thoughts:

  • LoadedDL.found_via could just be a string, I'd try if we can achieve "not optional"?
  • One additional found_via value would be "was_already_loaded_from_elsewhere", which means the existing bool is basically redundant, but we need to keep it to avoid an API breakage.
  • Minor but nice I think: found_via = "system-search" instead of "system"
  • The version we can handle separately later. Coincidentally I did a little bit of related work under PR #1085 (_get_cuda_version_from_cuda_h). However, in general this will be quite tricky, e.g. conda and site-packages could have libs and/or headers from more than one CTK version. Maybe the best way to handle this will actually be in the context of a "scoped search" as outlined under #1038.

@brandon-b-miller brandon-b-miller marked this pull request as ready for review October 8, 2025 18:53
Copy link
Contributor

copy-pr-bot bot commented Oct 8, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda.pathfinder Everything related to the cuda.pathfinder module feature New feature or request P1 Medium priority - Should do
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA]: Support LoadedDL.from_distribution
3 participants