Skip to content

[BUG]: run_in_spawned_child_process fails if used as script #1767

@seberg

Description

@seberg

Is this a duplicate?

Type of Bug

Runtime Error

Component

cuda.pathfinder

Describe the bug

Forwwarding from cupy/cupy#9803. If everything else fails for e.g. header discovery, cuda-pathfinder falls back to using:

  • _resolve_system_loaded_abs_path_in_subprocess
  • and: run_in_spawned_child_process

However, if you use this inside a script (i.e. missing if __name__ == "__main__"), the process spawn is problematic, as it leads to the script itself being executed during bootstrapping.
I.e. there are (apparently) two possible outcomes here:

  • Python raises an error pointing out that if __name == "__main__" seems missing (I got this if I just call _resolve_system_loaded_abs_path_in_subprocess() directly.
  • In the OP the script spawning process just triggered a second run of the script with the first run hitting the 10 second time-out if the script runs more than 10 seconds.
    (I am not sure why the nested call succeeds spawning, but I doubt it matters.)

Not sure if this is a priority, because I am not sure how likely it is for cuda-pathfinder to reach this path. The work-around of setting CUDA_PATH= is also straight-forward. But it limits the reliability of this fallback.

How to Reproduce

I am not sure what likely needs to be weird about the environment for cuda-pathfinder to go to such lengths to find the right paths.
However, calling:

cuda.pathfinder._headers.find_nvidia_headers._find_ctk_header_directory_via_canary("nvrtc", "nvrtc.h")

in a Python script reproduces the first error for me (an older cuda-pathfinder version, main requires a different signature there, I think.).

Expected behavior

Maybe rather than spawning, this needs to use a subprocess to avoid issues with scripts?
I.e. execute something like:

subprocess.check_output([sys.executable, "-c", f"from cuda.pathfinder import ...; probe_canary_abs_path_and_print_json('{libname}')"])

or maybe nicer and safer, via:

[sys.executable, "-m", cuda.pathfinder._something, libname]

(I am not sure if there are subtleties around subprocess, but I guess spawn must be similar already?).

Operating System

No response

nvidia-smi output

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions