Skip to content

Only enabled PDL for PTX/SASS supporting it#9163

Merged
bernhardmgruber merged 10 commits into
NVIDIA:mainfrom
bernhardmgruber:pdl_fix
May 29, 2026
Merged

Only enabled PDL for PTX/SASS supporting it#9163
bernhardmgruber merged 10 commits into
NVIDIA:mainfrom
bernhardmgruber:pdl_fix

Conversation

@bernhardmgruber
Copy link
Copy Markdown
Contributor

@bernhardmgruber bernhardmgruber commented May 28, 2026

The changes in this PR assert that PDL is only enabled with PTX/SASS supporting it. The assertions are not triggered by any CI run, but I can hit them locally by e.g. compiling for SM80 and running on SM120 (e.g. cub.test.device.transform.lid_0).

Then, a fix is proposed by guarding any PDL use by the compute capability of the PTX used to launch.

Fixes: #9134

@bernhardmgruber bernhardmgruber requested a review from a team as a code owner May 28, 2026 10:08
@github-project-automation github-project-automation Bot moved this to Todo in CCCL May 28, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL May 28, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

important: Two launcher factories add __assert_pdl_allowed checks that assert dependent_launch is permitted only on SM90+; multiple dispatch paths now pass dependent_launch conditionally (ptx_version >= 900 or cc >= 9.0) instead of unconditionally.

PDL Support Validation and Dispatch Threading

Layer / File(s) Summary
PDL assertion helpers in launcher factories
cub/cub/detail/launcher/cuda_driver.cuh, cub/cub/detail/launcher/cuda_runtime.cuh
Added __assert_pdl_allowed in both factories and invoked from operator() to assert SM90+ when dependent_launch is requested.
MergeSort conditional PDL threading
cub/cub/device/dispatch/dispatch_merge_sort.cuh
BlockSort, Partition, and Merge kernel launcher calls now pass dependent_launch as ptx_version >= 900 (PTX path) or cc >= compute_capability{9,0} (compute-capability path) instead of true.
Scan conditional PDL threading
cub/cub/device/dispatch/dispatch_scan.cuh
DispatchScan init/scan launches (Invoke, warpspeed, lookback) now pass dependent_launch as ptx_version >= 900 instead of always use_pdl = true.
Transform threading of compute capability
cub/cub/device/dispatch/dispatch_transform.cuh
Threaded ::cuda::compute_capability cc into async/prefetch/vectorized helpers; those callers use cc >= {9,0} to choose dependent_launch and updated function signatures to accept cc.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
cub/cub/detail/launcher/cuda_runtime.cuh (1)

33-33: ⚡ Quick win

suggestion: Provide a descriptive assertion message for the PtxComputeCap query failure.

The empty string "" doesn't communicate why the assertion failed. A message like "Failed to query PTX compute capability" would aid debugging.

-_CCCL_ASSERT(PtxComputeCap(cc) == cudaSuccess, "");
+_CCCL_ASSERT(PtxComputeCap(cc) == cudaSuccess, "Failed to query PTX compute capability");

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ba4ba4ae-9074-481b-8696-03db2f9f78a4

📥 Commits

Reviewing files that changed from the base of the PR and between 5e3f881 and c3d10ba.

📒 Files selected for processing (2)
  • cub/cub/detail/launcher/cuda_driver.cuh
  • cub/cub/detail/launcher/cuda_runtime.cuh

Comment thread cub/cub/detail/launcher/cuda_driver.cuh
Comment thread cub/cub/detail/launcher/cuda_runtime.cuh
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

2 similar comments
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6bb42824-b1a0-4633-9292-2594ddebd58a

📥 Commits

Reviewing files that changed from the base of the PR and between 11a7f45 and 9d8dc7e.

📒 Files selected for processing (3)
  • cub/cub/device/dispatch/dispatch_merge_sort.cuh
  • cub/cub/device/dispatch/dispatch_scan.cuh
  • cub/cub/device/dispatch/dispatch_transform.cuh

Comment thread cub/cub/device/dispatch/dispatch_scan.cuh
@bernhardmgruber bernhardmgruber changed the title Assert PDL is only enabled with PTX/SASS supporting it Only enabled PDL for PTX/SASS supporting it May 28, 2026
@bernhardmgruber
Copy link
Copy Markdown
Contributor Author

Let's see how far back we can backport this.

@github-actions

This comment has been minimized.

Comment thread cub/cub/detail/launcher/cuda_driver.cuh Outdated
Comment thread cub/cub/detail/launcher/cuda_runtime.cuh Outdated
active_policy.threads_per_block,
0,
stream,
/* dependent_launch */ cc >= ::cuda::compute_capability{9, 0})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rather introduce:

enum class enable_dependent_launch : bool {};

and do:

Suggested change
/* dependent_launch */ cc >= ::cuda::compute_capability{9, 0})
enable_dependent_launch{cc >= ::cuda::compute_capability{9, 0}})

I hate that we need the comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried that and found it a bit cumbersome. Where should we put the definition of this enum? It's needed in Thrust and CUB.

@github-actions
Copy link
Copy Markdown
Contributor

🥳 CI Workflow Results

🟩 Finished in 2h 27m: Pass: 100%/285 | Total: 11d 18h | Max: 2h 27m | Hits: 13%/1266121

See results here.

@bernhardmgruber bernhardmgruber merged commit 52931dd into NVIDIA:main May 29, 2026
305 of 306 checks passed
@bernhardmgruber bernhardmgruber deleted the pdl_fix branch May 29, 2026 12:30
@github-actions
Copy link
Copy Markdown
Contributor

Backport failed for branch/3.3.x, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin branch/3.3.x
git worktree add -d .worktree/backport-9163-to-branch/3.3.x origin/branch/3.3.x
cd .worktree/backport-9163-to-branch/3.3.x
git switch --create backport-9163-to-branch/3.3.x
git cherry-pick -x 52931dd9a79f2159c8c30b7852898636f8d17f0f

@github-actions
Copy link
Copy Markdown
Contributor

Successfully created backport PR for branch/3.4.x:

@bernhardmgruber
Copy link
Copy Markdown
Contributor Author

Manual backport to 3.3: #9188

davebayer added a commit to davebayer/cccl that referenced this pull request May 29, 2026
Fixes: NVIDIA#9134

Co-authored-by: David Bayer <48736217+davebayer@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[BUG] Unconditionally enabling PDL for any kernel is a bug

2 participants