Skip to content

fix(pathfinder): keep canary probes script-safe#1768

Merged
cpcloud merged 4 commits intoNVIDIA:mainfrom
cpcloud:issue-1767-script-safe-canary-probe
Mar 16, 2026
Merged

fix(pathfinder): keep canary probes script-safe#1768
cpcloud merged 4 commits intoNVIDIA:mainfrom
cpcloud:issue-1767-script-safe-canary-probe

Conversation

@cpcloud
Copy link
Contributor

@cpcloud cpcloud commented Mar 16, 2026

Closes #1767.

Summary

  • Run the CTK canary probe via python -m cuda.pathfinder._dynamic_libs.canary_probe_subprocess instead of multiprocessing spawn, and add a regression test showing a plain caller script is executed only once.
  • Preserve the original isolation requirement behind PR feat(pathfinder): add CTK root canary probe for non-standard-path libs #1595: the canary still runs in a fresh Python interpreter with independent loader state, but it no longer bootstraps through the caller's __main__, so script entrypoints without an if __name__ == \"__main__\" guard do not recurse or fail during multiprocessing startup.
  • Anchor the canary subprocess to the parent package's import root so the child resolves the same cuda.pathfinder distribution as the parent. This avoids wheel-based CI importing the checkout tree copy, which can be missing generated files such as cuda.pathfinder._version, and keeps source-vs-wheel behavior consistent.
  • Update the header-discovery/release-note wording so the canary path is described as a dedicated subprocess while leaving the shared spawned_process_runner in place for the callable-based isolation paths and real-loading tests that still need it.

Reviewer context

  • PR feat(pathfinder): add CTK root canary probe for non-standard-path libs #1595 originally introduced the canary probe as a dedicated python -m ... subprocess.
  • During review it was switched to multiprocessing.get_context(\"spawn\") to avoid a forked child inheriting preloaded CUDA libraries from the parent and to keep the probe isolated/predictable.
  • This change keeps that same guarantee because the canary still runs in a dedicated module entrypoint in a fresh interpreter; the difference is only that it no longer re-imports the user's script as __main__, which is the failure mode reported in [BUG]: run_in_spawned_child_process fails if used as script #1767.
  • While validating the change in wheel CI, the child python -m ... process was found to be resolving cuda.pathfinder from the checkout tree because run-tests pathfinder runs from ./cuda_pathfinder. In that environment the checkout copy can lack generated files such as cuda.pathfinder._version, so the child fails before probing.
  • Setting the child cwd to the already-imported parent package's import root makes the subprocess import the same distribution root as the parent, whether the parent came from source or from an installed wheel.

Made with Cursor

cpcloud added 2 commits March 16, 2026 13:07
Run the CTK canary probe via a dedicated module subprocess so it still gets a fresh interpreter and independent loader state without re-entering the caller's script as __main__.

Made-with: Cursor
Mark the intentional subprocess launches as trusted inputs for Ruff's bandit checks and keep the affected files in hook-compliant import/format order.

Made-with: Cursor
@cpcloud cpcloud requested a review from rwgk March 16, 2026 17:22
@github-actions

This comment has been minimized.

Run the canary subprocess from the parent package's import root so wheel-based CI tests do not accidentally resolve `cuda.pathfinder` from the checkout tree and fail on missing generated files.

Made-with: Cursor
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to merge with the two tiny suggested changes.

As a follow-on: could it be better to eliminate spawned_process_runner.py, and instead use the exact same approach for creating a fully isolated subprocess also for the tests?

Use the requested fully-isolated subprocess wording, restore the historical 1.4.0 release note text, and add new 1.4.3 release notes for the script-safety and import-root fixes instead.

Made-with: Cursor
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks so much!

@cpcloud cpcloud enabled auto-merge (squash) March 16, 2026 19:10
@cpcloud cpcloud merged commit 15cfd84 into NVIDIA:main Mar 16, 2026
86 checks passed
@github-actions

This comment has been minimized.

1 similar comment
@github-actions
Copy link

Doc Preview CI
Preview removed because the pull request was closed or merged.

@seberg
Copy link

seberg commented Mar 17, 2026

That was quick, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: run_in_spawned_child_process fails if used as script

3 participants