Skip to content

Use the new tuning API internally for detail::find::dispatch#9240

Merged
gonidelis merged 2 commits into
NVIDIA:mainfrom
gonidelis:find_if_tuning_dispatch
Jun 4, 2026
Merged

Use the new tuning API internally for detail::find::dispatch#9240
gonidelis merged 2 commits into
NVIDIA:mainfrom
gonidelis:find_if_tuning_dispatch

Conversation

@gonidelis

@gonidelis gonidelis commented Jun 3, 2026

Copy link
Copy Markdown
Member

fixes #9237

  • check that there are no/minimal sass diffs

@gonidelis gonidelis requested review from a team as code owners June 3, 2026 17:56
@gonidelis gonidelis requested review from NaderAlAwar and shwina June 3, 2026 17:56
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Jun 3, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 3, 2026
@gonidelis gonidelis requested a review from bernhardmgruber June 3, 2026 17:57
@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 21ddb9de-0e35-4ad0-b3aa-713bab13c80d

📥 Commits

Reviewing files that changed from the base of the PR and between ea94bad and 7e663df.

📒 Files selected for processing (1)
  • cub/test/catch2_test_device_find_env.cu
🚧 Files skipped from review as they are similar to previous changes (1)
  • cub/test/catch2_test_device_find_env.cu

Note: CodeRabbit is enabled on this repository as a convenience for maintainers
and contributors. Use your best judgment when considering its review comments and
suggestions — a suggested change may be inadequate, unnecessary, or safe to ignore.
Contributors are not expected to address every comment. Human reviews are what
ultimately matter for merging.

Overview

This PR migrates internal dispatch for DeviceFind::FindIf to use the new tuning API and environment-based dispatching so that tuned policies can be selected and applied from execution environments. No external/public API signatures were changed.

What I verified (files inspected)

  • cub/benchmarks/bench/find_if/base.cu
  • cub/cub/device/device_find.cuh
  • cub/test/catch2_test_device_find_env.cu

Changes

  • DeviceFind::FindIf (cub/cub/device/device_find.cuh)

    • The public FindIf entry now routes to detail::find::dispatch as before but the env-based launch path selects a default policy selector (based on iterator value type) and uses dispatch_with_env_and_tuning so that a policy_selector is computed and forwarded into detail::find::dispatch. No public API signature changes.
  • Benchmark (cub/benchmarks/bench/find_if/base.cu)

    • Adds optional compile-time tuning ranges and a bench_policy_selector that constructs a cub::detail::find::find_policy from TUNE_* parameters (threads-per-block pow2, items-per-thread, and load modifier).
    • Reworks nvbench execution to create a cub_bench_env (with cuda::execution::tune when !TUNE_BASE) and calls DeviceFind::FindIf with that env via _CCCL_TRY_CUDA_API, consolidating global memory accounting and replacing the manual two-phase sizing/allocation flow.
  • Tests (cub/test/catch2_test_device_find_env.cu)

    • Adds a find_tuning policy selector and a block-size-extracting predicate used by a new parameterized test ("Device FindIf can be tuned") that:
      • Constructs a tuned cuda::execution::tune(...) env, launches device_find_if with the tuned env, and verifies (1) no-match behavior (d_out == num_items) and (2) that the block size recorded on-device equals the requested ThreadsPerBlock.
    • Existing environment-usage tests for FindIf/LowerBound/UpperBound are left in place and still exercise env-based launches.

Impact / Notes

  • Behavior: Enables runtime/launch-time tuning for FindIf via execution environments. This increases flexibility for policy selection without changing public APIs.
  • Tests: A new test verifies the tuning path actually selects the requested block size.
  • Benchmark: nvbench benchmark supports compile-time tuning knobs and uses the new env-based invocation.
  • Review request: bernhardmgruber asked the author to run the "find if" benchmark SASS check to ensure there are no unintended SASS diffs—this PR adds tuning paths and a benchmark variant, so validating SASS output (or confirming only intended kernels are emitted) is recommended before merging.

suggestion:

Walkthrough

Updates DeviceFind::FindIf to compute and pass a tuning policy selector at dispatch; adds a test that exercises cuda::execution::tune(...) to enforce block size and verify it was applied; updates benchmark to optionally construct a compile-time policy selector and run under the tuning environment.

Changes

Tuning API Integration for DeviceFind::FindIf

Layer / File(s) Summary
Policy dispatch with tuning
cub/cub/device/device_find.cuh
DeviceFind::FindIf env path computes default_policy_selector from input value type and routes through dispatch_with_env_and_tuning, passing policy selector to detail::find::dispatch.
Tuning validation and test
cub/test/catch2_test_device_find_env.cu
Adds #include <cuda/functional>, defines find_tuning<ThreadsPerBlock> and a block-size-extracting never-matching predicate, and a gated test that runs device_find_if under cuda::execution::tune(...) to validate output and observed block size.
Benchmark tuning integration
cub/benchmarks/bench/find_if/base.cu
Adds bench_policy_selector<T> when !TUNE_BASE and refactors execution to use cub_bench_env, _CCCL_TRY_CUDA_API, and optional cuda::execution::tune(...), with updated nvbench tags and memory accounting.

Assessment against linked issues

Objective Addressed Explanation
Use new tuning API internally for detail::find::dispatch [#9237]

Suggested reviewers

  • shwina
  • pauleonix
  • davebayer
  • bernhardmgruber

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread cub/test/catch2_test_device_find_env.cu Outdated
Comment thread cub/test/catch2_test_device_find_env.cu Outdated
@bernhardmgruber

Copy link
Copy Markdown
Contributor

@gonidelis Please still do a SASS check of the find if benchmark!

@github-actions

This comment has been minimized.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

🥳 CI Workflow Results

🟩 Finished in 1h 44m: Pass: 100%/284 | Total: 1d 22h | Max: 53m 27s | Hits: 98%/196763

See results here.

@gonidelis

Copy link
Copy Markdown
Member Author

No SASS diffs observed for sm120

@gonidelis gonidelis merged commit 2b21bec into NVIDIA:main Jun 4, 2026
305 checks passed
@github-project-automation github-project-automation Bot moved this from In Review to Done in CCCL Jun 4, 2026
@coderabbitai coderabbitai Bot mentioned this pull request Jun 8, 2026
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Use the new tuning API internally for detail::find::dispatch

2 participants