Skip to content

Apply cuda::barrier and elect_one feedback#6344

Merged
bernhardmgruber merged 2 commits intoNVIDIA:mainfrom
bernhardmgruber:barrier_fixes
Oct 30, 2025
Merged

Apply cuda::barrier and elect_one feedback#6344
bernhardmgruber merged 2 commits intoNVIDIA:mainfrom
bernhardmgruber:barrier_fixes

Conversation

@bernhardmgruber
Copy link
Contributor

This is a follow-up to #6329 after feedback from @ahendriksen

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Oct 27, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Oct 27, 2025
@bernhardmgruber bernhardmgruber changed the title Small cuda::barrier and elect_one fixes Apply cuda::barrier and elect_one feedback Oct 27, 2025
@bernhardmgruber bernhardmgruber marked this pull request as ready for review October 27, 2025 12:55
@bernhardmgruber bernhardmgruber requested review from a team as code owners October 27, 2025 12:55
@cccl-authenticator-app cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Oct 27, 2025
const auto uniform_warp_id = __shfl_sync(~0, warp_id, 0); // broadcast from lane 0
return uniform_warp_id == 0 && cuda::ptx::elect_sync(~0); // elect a leader thread among warp 0
),
(::cuda::device::__cuda_elect_sync_is_not_supported_before_SM_90__(); _CCCL_UNREACHABLE();));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the

  return threadIdx.x == 0;

go in the else here?

AFAIK PTX ISA is CTK 12.0. CCCL support 12.0 and up, so there is no need for the __cccl_ptx_isa ifdef.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I would need to add it twice, as else branch of the NV_IF_TARGET and as else branch of the _CCCL_CUDA_COMPILATION() && __cccl_ptx_isa >= 800

Regarding PTX ISA. I don't know whether the clang CUDA versions we test already support __cccl_ptx_isa >= 800

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let's just try

@github-actions

This comment has been minimized.

This is a follow-up to NVIDIA#6329 after feedback from ahendriksen
@github-actions

This comment has been minimized.

@bernhardmgruber bernhardmgruber enabled auto-merge (squash) October 30, 2025 18:06
@github-actions
Copy link
Contributor

🥳 CI Workflow Results

🟩 Finished in 7h 16m: Pass: 100%/134 | Total: 6d 15h | Max: 5h 10m | Hits: 51%/265638

See results here.

@bernhardmgruber bernhardmgruber merged commit c8cd7bc into NVIDIA:main Oct 30, 2025
288 of 291 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Oct 30, 2025
@bernhardmgruber bernhardmgruber deleted the barrier_fixes branch October 30, 2025 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants