-
Notifications
You must be signed in to change notification settings - Fork 259
Description
Is this a duplicate?
- I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct
Type of Bug
Runtime Error
Component
cuda.pathfinder
Describe the bug
Forwwarding from cupy/cupy#9803. If everything else fails for e.g. header discovery, cuda-pathfinder falls back to using:
_resolve_system_loaded_abs_path_in_subprocess- and:
run_in_spawned_child_process
However, if you use this inside a script (i.e. missing if __name__ == "__main__"), the process spawn is problematic, as it leads to the script itself being executed during bootstrapping.
I.e. there are (apparently) two possible outcomes here:
- Python raises an error pointing out that
if __name == "__main__"seems missing (I got this if I just call_resolve_system_loaded_abs_path_in_subprocess()directly. - In the OP the script spawning process just triggered a second run of the script with the first run hitting the 10 second time-out if the script runs more than 10 seconds.
(I am not sure why the nested call succeeds spawning, but I doubt it matters.)
Not sure if this is a priority, because I am not sure how likely it is for cuda-pathfinder to reach this path. The work-around of setting CUDA_PATH= is also straight-forward. But it limits the reliability of this fallback.
How to Reproduce
I am not sure what likely needs to be weird about the environment for cuda-pathfinder to go to such lengths to find the right paths.
However, calling:
cuda.pathfinder._headers.find_nvidia_headers._find_ctk_header_directory_via_canary("nvrtc", "nvrtc.h")
in a Python script reproduces the first error for me (an older cuda-pathfinder version, main requires a different signature there, I think.).
Expected behavior
Maybe rather than spawning, this needs to use a subprocess to avoid issues with scripts?
I.e. execute something like:
subprocess.check_output([sys.executable, "-c", f"from cuda.pathfinder import ...; probe_canary_abs_path_and_print_json('{libname}')"])
or maybe nicer and safer, via:
[sys.executable, "-m", cuda.pathfinder._something, libname]
(I am not sure if there are subtleties around subprocess, but I guess spawn must be similar already?).
Operating System
No response
nvidia-smi output
No response