Skip to content

Use the new tuning API internally for detail::select|three_way_partition::dispatch and DevicePartition#8925

Open
bernhardmgruber wants to merge 2 commits into
NVIDIA:mainfrom
bernhardmgruber:use_tuning_api_partition
Open

Use the new tuning API internally for detail::select|three_way_partition::dispatch and DevicePartition#8925
bernhardmgruber wants to merge 2 commits into
NVIDIA:mainfrom
bernhardmgruber:use_tuning_api_partition

Conversation

@bernhardmgruber
Copy link
Copy Markdown
Contributor

@bernhardmgruber bernhardmgruber commented May 12, 2026

  • No SASS changes for cub.bench.partition.three_way.base on SM75;80;86;90;100
  • Use signed offset type for DevicePartition #8971 (required to avoid SASS changes)
  • No SASS changes for cub.bench.partition.if.base on SM75;80;86;90;100
  • No SASS changes for cub.bench.partition.flagged.base on SM75;80;86;90;100

Fixes: #8879
Fixes: #8380

@bernhardmgruber bernhardmgruber requested review from a team as code owners May 12, 2026 14:43
@bernhardmgruber bernhardmgruber requested a review from shwina May 12, 2026 14:43
@bernhardmgruber bernhardmgruber requested a review from pauleonix May 12, 2026 14:43
@github-project-automation github-project-automation Bot moved this to Todo in CCCL May 12, 2026
@bernhardmgruber bernhardmgruber requested a review from elstehle May 12, 2026 14:43
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL May 12, 2026
@github-actions

This comment has been minimized.

Comment thread cub/test/catch2_test_device_partition_env.cu Outdated
@bernhardmgruber bernhardmgruber force-pushed the use_tuning_api_partition branch from 2202fb7 to f978ca6 Compare May 13, 2026 16:26
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 43f16d4e-e995-4c48-98e6-6d373f470bcd

📥 Commits

Reviewing files that changed from the base of the PR and between 2e36a8b and 88442b0.

📒 Files selected for processing (1)
  • thrust/thrust/system/cuda/detail/partition.h
🚧 Files skipped from review as they are similar to previous changes (1)
  • thrust/thrust/system/cuda/detail/partition.h

📝 Walkthrough

Summary by CodeRabbit

  • Tests

    • Added tests that verify partition operations (If, Flagged, three-way If) work correctly when run with execution tuning.
  • Refactor

    • Simplified partition benchmark and runtime wiring to invoke partition operations via a tunable execution environment, removing multi-step temp-storage plumbing and stream-level dispatch. Public APIs remain unchanged.

suggestion:

Walkthrough

This PR rewires DevicePartition/select dispatch to the new CUDA execution tuning API, replaces manual temp-size/allocation with env-based dispatch and policy_selector functors in benchmarks, updates Thrust partition dispatch to call CUB dispatch helper, and adds tests that validate tuning behavior.

Changes

DevicePartition + Benchmarks + Thrust + Tests (single cohort)

Layer / File(s) Summary
DevicePartition env-based overload refactor
cub/cub/device/device_partition.cuh
Removed partition_impl; environment overloads for Flagged, If, and three-way If compute signed offset and overflow checks inline, build a default policy_selector via policy_selector_from_types (or three-way variant), and call detail::select::dispatch / detail::three_way_partition::dispatch directly using dispatch_with_env_and_tuning.
Benchmark adoption: policy selector and env-based execution
cub/benchmarks/bench/partition/flagged.cu, cub/benchmarks/bench/partition/if.cu, cub/benchmarks/bench/partition/three_way.cu
Introduce policy_selector functors (used when not TUNE_BASE), simplify local type aliases, use raw device pointers for inputs/outputs/selected-count, remove two-step temp-size/alloc/stream dispatch, and call cub::DevicePartition::Flagged/If directly with a caching_allocator_t + cub_bench_env (optionally tuned via cuda::execution::tune(policy_selector<T>{})).
Thrust partition helper: unified dispatch helper
thrust/thrust/system/cuda/detail/partition.h
Replaced DispatchPartitionIf with dispatch_partition that runs cub::detail::select::dispatch<SelectImpl::Partition> in two phases (query then execute), unified index-type dispatch to THRUST_INDEX_TYPE_DISPATCH calling dispatch_partition for both temp-size query and execution, and read back num_selected after synchronization.
Tuning tests for DevicePartition
cub/test/catch2_test_device_partition_env.cu
Added headers for raw pointer tuning and cuda::execution::tune, defined capability-dependent policy-selector templates and small predicates, and added C2H_TEST cases that construct tuned envs, run DevicePartition::If / Flagged / three-way If, and assert both selected counts and that the tuned block size matches the test target.

Assessment against linked issues

Objective Addressed Explanation
Use the new tuning API for detail::select::dispatch + DevicePartition [#8879]
Use the new tuning API for detail::three_way_partition::dispatch [#8380]

Possibly related PRs

  • NVIDIA/cccl#8971: Overlaps on DevicePartition environment-overload signed-offset handling and selector construction.

Suggested reviewers

  • elstehle

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2e262785-9664-4f8f-b738-a61ffdd14e4e

📥 Commits

Reviewing files that changed from the base of the PR and between 3f440b9 and f978ca6.

📒 Files selected for processing (6)
  • cub/benchmarks/bench/partition/flagged.cu
  • cub/benchmarks/bench/partition/if.cu
  • cub/benchmarks/bench/partition/three_way.cu
  • cub/cub/device/device_partition.cuh
  • cub/test/catch2_test_device_partition_env.cu
  • thrust/thrust/system/cuda/detail/partition.h

Comment thread cub/benchmarks/bench/partition/three_way.cu Outdated
Comment thread cub/cub/device/device_partition.cuh Outdated
@github-actions

This comment has been minimized.

@bernhardmgruber bernhardmgruber force-pushed the use_tuning_api_partition branch from 5d6ec8e to 2e36a8b Compare May 14, 2026 19:30
Comment thread thrust/thrust/system/cuda/detail/partition.h Outdated
@github-actions

This comment has been minimized.

Co-authored-by: Jacob Faibussowitsch <jacob.fai@gmail.com>
@bernhardmgruber
Copy link
Copy Markdown
Contributor Author

/ok to test 88442b0

@github-actions
Copy link
Copy Markdown
Contributor

🥳 CI Workflow Results

🟩 Finished in 1h 39m: Pass: 100%/340 | Total: 2d 18h | Max: 51m 12s | Hits: 94%/339576

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

3 participants