Skip to content

Rename block_threads -> threads_per_block#8836

Merged
bernhardmgruber merged 3 commits into
NVIDIA:mainfrom
bernhardmgruber:rename_block_threads2
May 6, 2026
Merged

Rename block_threads -> threads_per_block#8836
bernhardmgruber merged 3 commits into
NVIDIA:mainfrom
bernhardmgruber:rename_block_threads2

Conversation

@bernhardmgruber
Copy link
Copy Markdown
Contributor

@bernhardmgruber bernhardmgruber commented May 5, 2026

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 5, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@bernhardmgruber bernhardmgruber force-pushed the rename_block_threads2 branch from 536e889 to 9993ba5 Compare May 6, 2026 07:44
@bernhardmgruber bernhardmgruber marked this pull request as ready for review May 6, 2026 07:44
@bernhardmgruber bernhardmgruber requested review from a team as code owners May 6, 2026 07:44
@cccl-authenticator-app cccl-authenticator-app Bot moved this from In Progress to In Review in CCCL May 6, 2026
return {rp,
cub::warp_reduce_policy{rp.block_threads, TUNE_S_THREADS_PER_WARP, s_items, rp.vec_size, rp.load_modifier},
cub::warp_reduce_policy{rp.block_threads, TUNE_M_THREADS_PER_WARP, m_items, rp.vec_size, rp.load_modifier}};
return {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder: This will generate merge issues with #8840

@bernhardmgruber
Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

📝 Walkthrough

Summary by CodeRabbit

  • Refactor
    • Standardized naming convention across kernel launch and tuning policies, renaming thread configuration parameter from block_threads to threads_per_block for improved consistency and clarity in library APIs.

Walkthrough

This PR systematically renames the block_threads field to threads_per_block across all CUB policy structures, agent definitions, dispatch kernels, and tuning configurations. The rename standardizes per-block thread-count naming throughout the library without changing underlying algorithms or launch behavior.

Changes

Policy Field Rename: block_threads → threads_per_block

Layer / File(s) Summary
Policy & Utility Type Definitions
cub/cub/util_arch.cuh, cub/cub/util_device.cuh
Core scaling result and kernel config types updated to expose threads_per_block instead of block_threads. Helper functions like scale_reg_bound, RegBoundScaling, MemBoundScaling, and MaxSmOccupancy signature and member variables renamed.
Agent & Block Primitives
cub/cub/agent/agent_*.cuh, cub/cub/block/block_*.cuh, cub/cub/warp/warp_*.cuh
Agent classes (agent_batched_topk, agent_for, agent_histogram, agent_merge, agent_merge_sort, agent_topk) and block/warp primitives updated to use threads_per_block constant and type aliases. Internal block-operation typedefs (BlockLoad, BlockStore, BlockScan) parameterized by the renamed field. Documentation examples in warp headers updated.
Tuning Policy Structures
cub/cub/device/dispatch/tuning/tuning_*.cuh
All policy structs (adjacent_difference, batch_memcpy, batched_topk, find, for, histogram, merge, merge_sort, radix_sort, reduce, reduce_by_key, reduce_deterministic, rle_non_trivial_runs, scan, scan_by_key, segmented_reduce, segmented_scan, segmented_sort, select_if, three_way_partition, topk, transform, unique_by_key) renamed member from block_threads to threads_per_block; equality and streaming operators updated accordingly.
Device Dispatch Paths
cub/cub/device/dispatch/dispatch_*.cuh
Dispatch logic for all algorithms (adjacent_difference, batch_memcpy, batched_topk, find, for, histogram, merge, merge_sort, radix_sort, reduce, reduce_by_key, reduce_deterministic, reduce_nondeterministic, rle, scan, scan_by_key, segmented_radix_sort, segmented_reduce, segmented_scan, segmented_sort, select_if, streaming_reduce_by_key, three_way_partition, topk, transform, unique_by_key) updated to read and propagate threads_per_block from active policies; tile sizing, occupancy calculations, and kernel launch bounds use the renamed field.
Kernel Declarations & Launch Configuration
cub/cub/device/dispatch/kernels/kernel_*.cuh
Kernel __launch_bounds__ attributes and internal policy instantiations (AgentPolicy, AgentReduce, etc.) switched from block_threads to threads_per_block; launch-specific typedefs updated to parameterize on the renamed field; internal TILE_ITEMS and related constants recomputed using threads_per_block.
Benchmark Tuning
cub/benchmarks/bench/*/policy_selector.h
Policy selector constructors in radix_sort, segmented_reduce, and transform benchmarks updated to use scaled.threads_per_block instead of scaled.block_threads.
CUDA Execution & CUDAX
cudax/include/cuda/experimental/__execution/*/
Launch configuration in bulk execution, stream adaptor, and scheduler refactored to derive and use __threads_per_block local variable; kernel launch parameters updated accordingly.
Thrust Integration
thrust/thrust/system/cuda/detail/core/
AgentPlan struct and AgentLauncher updated to use threads_per_block member; max_blocks_per_sm_impl parameter renamed; launch logging and invocation updated.
Tests & Documentation
cub/test/catch2_test_*.cu, libcudacxx/test/*/tune.pass.cpp, python/cuda_cccl/tests/coop/test_*.py, docs/cub/*
Test assertions, environment tuning verification, and documentation examples updated to read and verify threads_per_block field; example code snippets in RST and Python tests use the new naming.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ede3f459-7abf-473a-be5e-7acb9a5926f7

📥 Commits

Reviewing files that changed from the base of the PR and between e688643 and 9993ba5.

📒 Files selected for processing (100)
  • cub/benchmarks/bench/radix_sort/policy_selector.h
  • cub/benchmarks/bench/segmented_reduce/base.cuh
  • cub/benchmarks/bench/transform/common.h
  • cub/cub/agent/agent_batched_topk.cuh
  • cub/cub/agent/agent_for.cuh
  • cub/cub/agent/agent_histogram.cuh
  • cub/cub/agent/agent_merge.cuh
  • cub/cub/agent/agent_merge_sort.cuh
  • cub/cub/agent/agent_topk.cuh
  • cub/cub/block/block_load_to_shared.cuh
  • cub/cub/block/block_radix_rank.cuh
  • cub/cub/block/specializations/block_reduce_warp_reductions.cuh
  • cub/cub/block/specializations/block_topk_air.cuh
  • cub/cub/device/dispatch/dispatch_adjacent_difference.cuh
  • cub/cub/device/dispatch/dispatch_batch_memcpy.cuh
  • cub/cub/device/dispatch/dispatch_batched_topk.cuh
  • cub/cub/device/dispatch/dispatch_find.cuh
  • cub/cub/device/dispatch/dispatch_for.cuh
  • cub/cub/device/dispatch/dispatch_histogram.cuh
  • cub/cub/device/dispatch/dispatch_merge.cuh
  • cub/cub/device/dispatch/dispatch_merge_sort.cuh
  • cub/cub/device/dispatch/dispatch_radix_sort.cuh
  • cub/cub/device/dispatch/dispatch_reduce.cuh
  • cub/cub/device/dispatch/dispatch_reduce_by_key.cuh
  • cub/cub/device/dispatch/dispatch_reduce_deterministic.cuh
  • cub/cub/device/dispatch/dispatch_reduce_nondeterministic.cuh
  • cub/cub/device/dispatch/dispatch_rle.cuh
  • cub/cub/device/dispatch/dispatch_scan.cuh
  • cub/cub/device/dispatch/dispatch_scan_by_key.cuh
  • cub/cub/device/dispatch/dispatch_segmented_radix_sort.cuh
  • cub/cub/device/dispatch/dispatch_segmented_reduce.cuh
  • cub/cub/device/dispatch/dispatch_segmented_scan.cuh
  • cub/cub/device/dispatch/dispatch_segmented_sort.cuh
  • cub/cub/device/dispatch/dispatch_select_if.cuh
  • cub/cub/device/dispatch/dispatch_streaming_reduce_by_key.cuh
  • cub/cub/device/dispatch/dispatch_three_way_partition.cuh
  • cub/cub/device/dispatch/dispatch_topk.cuh
  • cub/cub/device/dispatch/dispatch_transform.cuh
  • cub/cub/device/dispatch/dispatch_unique_by_key.cuh
  • cub/cub/device/dispatch/kernels/kernel_batched_topk.cuh
  • cub/cub/device/dispatch/kernels/kernel_for_each.cuh
  • cub/cub/device/dispatch/kernels/kernel_histogram.cuh
  • cub/cub/device/dispatch/kernels/kernel_merge_sort.cuh
  • cub/cub/device/dispatch/kernels/kernel_radix_sort.cuh
  • cub/cub/device/dispatch/kernels/kernel_reduce.cuh
  • cub/cub/device/dispatch/kernels/kernel_reduce_deterministic.cuh
  • cub/cub/device/dispatch/kernels/kernel_scan.cuh
  • cub/cub/device/dispatch/kernels/kernel_segmented_radix_sort.cuh
  • cub/cub/device/dispatch/kernels/kernel_segmented_reduce.cuh
  • cub/cub/device/dispatch/kernels/kernel_segmented_scan.cuh
  • cub/cub/device/dispatch/kernels/kernel_segmented_sort.cuh
  • cub/cub/device/dispatch/kernels/kernel_three_way_partition.cuh
  • cub/cub/device/dispatch/kernels/kernel_transform.cuh
  • cub/cub/device/dispatch/kernels/kernel_unique_by_key.cuh
  • cub/cub/device/dispatch/tuning/tuning_adjacent_difference.cuh
  • cub/cub/device/dispatch/tuning/tuning_batch_memcpy.cuh
  • cub/cub/device/dispatch/tuning/tuning_batched_topk.cuh
  • cub/cub/device/dispatch/tuning/tuning_find.cuh
  • cub/cub/device/dispatch/tuning/tuning_for.cuh
  • cub/cub/device/dispatch/tuning/tuning_histogram.cuh
  • cub/cub/device/dispatch/tuning/tuning_merge.cuh
  • cub/cub/device/dispatch/tuning/tuning_merge_sort.cuh
  • cub/cub/device/dispatch/tuning/tuning_radix_sort.cuh
  • cub/cub/device/dispatch/tuning/tuning_reduce.cuh
  • cub/cub/device/dispatch/tuning/tuning_reduce_by_key.cuh
  • cub/cub/device/dispatch/tuning/tuning_reduce_deterministic.cuh
  • cub/cub/device/dispatch/tuning/tuning_rle_non_trivial_runs.cuh
  • cub/cub/device/dispatch/tuning/tuning_scan.cuh
  • cub/cub/device/dispatch/tuning/tuning_scan_by_key.cuh
  • cub/cub/device/dispatch/tuning/tuning_segmented_reduce.cuh
  • cub/cub/device/dispatch/tuning/tuning_segmented_scan.cuh
  • cub/cub/device/dispatch/tuning/tuning_segmented_sort.cuh
  • cub/cub/device/dispatch/tuning/tuning_select_if.cuh
  • cub/cub/device/dispatch/tuning/tuning_three_way_partition.cuh
  • cub/cub/device/dispatch/tuning/tuning_topk.cuh
  • cub/cub/device/dispatch/tuning/tuning_transform.cuh
  • cub/cub/device/dispatch/tuning/tuning_unique_by_key.cuh
  • cub/cub/util_arch.cuh
  • cub/cub/util_device.cuh
  • cub/cub/warp/warp_exchange.cuh
  • cub/cub/warp/warp_load.cuh
  • cub/cub/warp/warp_merge_sort.cuh
  • cub/cub/warp/warp_store.cuh
  • cub/test/catch2_test_device_reduce_env.cu
  • cub/test/catch2_test_device_scan_by_key_env.cu
  • cub/test/catch2_test_device_scan_env.cu
  • cub/test/catch2_test_device_segmented_reduce_max_seg_size.cu
  • cub/test/catch2_test_device_select_env.cu
  • cub/test/catch2_test_device_transform.cu
  • cub/test/catch2_test_vsmem.cu
  • cudax/include/cuda/experimental/__execution/bulk.cuh
  • cudax/include/cuda/experimental/__execution/stream/adaptor.cuh
  • cudax/include/cuda/experimental/__execution/stream/scheduler.cuh
  • docs/cub/developer/device_scope.rst
  • docs/cub/policy_selectors.rst
  • docs/cub/tuning.rst
  • libcudacxx/test/libcudacxx/cuda/execution/tune.pass.cpp
  • python/cuda_cccl/tests/coop/test_warp_merge_sort.py
  • thrust/thrust/system/cuda/detail/core/agent_launcher.h
  • thrust/thrust/system/cuda/detail/core/util.h

Comment thread cub/benchmarks/bench/transform/common.h
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@davebayer davebayer enabled auto-merge (squash) May 6, 2026 10:33
@bernhardmgruber bernhardmgruber disabled auto-merge May 6, 2026 11:02
@bernhardmgruber bernhardmgruber enabled auto-merge (squash) May 6, 2026 11:02
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

🥳 CI Workflow Results

🟩 Finished in 2h 11m: Pass: 100%/467 | Total: 7d 03h | Max: 2h 10m | Hits: 96%/602782

See results here.

@bernhardmgruber bernhardmgruber merged commit 16f96cf into NVIDIA:main May 6, 2026
491 checks passed
@github-project-automation github-project-automation Bot moved this from In Review to Done in CCCL May 6, 2026
@bernhardmgruber bernhardmgruber deleted the rename_block_threads2 branch May 6, 2026 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

5 participants