Skip to content

Use block-scoped atomics in BlockHistogramAtomic on SM 60+#8162

Open
NitishNaineni wants to merge 3 commits intoNVIDIA:mainfrom
NitishNaineni:histogram-atomic-scope
Open

Use block-scoped atomics in BlockHistogramAtomic on SM 60+#8162
NitishNaineni wants to merge 3 commits intoNVIDIA:mainfrom
NitishNaineni:histogram-atomic-scope

Conversation

@NitishNaineni
Copy link

@NitishNaineni NitishNaineni commented Mar 25, 2026

Closes #3357

Summary

  • Replaces atomicAdd with atomicAdd_block in BlockHistogramAtomic::Composite on SM 60+ using NV_IF_TARGET, with fallback to atomicAdd on older architectures.
  • This matches the pattern already used in agent_histogram.cuh (lines 312, 325, 350).

Codegen verification

SASS confirms the scope change on SM 86 (RTX 3080):

  • Before: RED.E.ADD.STRONG.GPU
  • After: RED.E.ADD.STRONG.SM

Performance was identical on Ampere in my benchmarks (1024 threads × 16 items, per-block global histograms, various bin counts and grid sizes).

Test plan

  • cub.test.block.histogram.bins_32 — all 1512 assertions passed
  • cub.test.block.histogram.bins_256 — all 1512 assertions passed
  • cub.test.block.histogram.bins_1024 — all 1512 assertions passed

@NitishNaineni NitishNaineni requested a review from a team as a code owner March 25, 2026 01:02
@github-project-automation github-project-automation bot moved this to Todo in CCCL Mar 25, 2026
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Mar 25, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-project-automation github-project-automation bot moved this from Todo to In Progress in CCCL Mar 25, 2026
@cccl-authenticator-app cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Mar 25, 2026
@NitishNaineni NitishNaineni force-pushed the histogram-atomic-scope branch from 784ceca to 652f279 Compare March 25, 2026 14:49
simplify to just atomicAdd_block since CCCL only supports SM 75+

Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
@NitishNaineni NitishNaineni requested a review from miscco March 25, 2026 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

[FEA]: Reduce scope of histogram atomics

3 participants