Skip to content

Allow public tuning of cub::DeviceAdjacentDifference#9218

Merged
bernhardmgruber merged 2 commits into
NVIDIA:mainfrom
bernhardmgruber:adj_diff_public
Jun 3, 2026
Merged

Allow public tuning of cub::DeviceAdjacentDifference#9218
bernhardmgruber merged 2 commits into
NVIDIA:mainfrom
bernhardmgruber:adj_diff_public

Conversation

@bernhardmgruber
Copy link
Copy Markdown
Contributor

@bernhardmgruber bernhardmgruber commented Jun 2, 2026

Fixes: #8567

New public entities

Entity Data members
cub::AdjacentDifferencePolicy int threads_per_block
int items_per_thread
BlockLoadAlgorithm load_algorithm
CacheLoadModifier load_modifier
BlockStoreAlgorithm store_algorithm

@bernhardmgruber bernhardmgruber requested review from a team as code owners June 2, 2026 17:28
@bernhardmgruber bernhardmgruber requested a review from shwina June 2, 2026 17:28
@bernhardmgruber bernhardmgruber requested a review from pauleonix June 2, 2026 17:28
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Jun 2, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 2, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c7c6a18c-9cc8-4654-b832-771eb3cddffa

📥 Commits

Reviewing files that changed from the base of the PR and between 5904191 and fe593e2.

📒 Files selected for processing (2)
  • cub/cub/device/dispatch/dispatch_adjacent_difference.cuh
  • cub/test/catch2_test_device_adjacent_difference_custom_policy_hub.cu
✅ Files skipped from review due to trivial changes (1)
  • cub/test/catch2_test_device_adjacent_difference_custom_policy_hub.cu
🚧 Files skipped from review as they are similar to previous changes (1)
  • cub/cub/device/dispatch/dispatch_adjacent_difference.cuh

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Introduced a public AdjacentDifferencePolicy type for tuning adjacent-difference execution.
  • Documentation

    • Expanded Adjacent Difference API docs with environment-based tuning guidance and practical examples.
  • Tests

    • Added compile-time and runtime tests validating policy behavior, initialization forms, and environment-based tuning results.
  • Chores

    • Marked legacy DispatchAdjacentDifference as deprecated; tests suppress related warnings.

important: Walkthrough

This PR moves adjacent-difference tuning from internal detail namespace to public API by introducing cub::AdjacentDifferencePolicy, wiring it through policy selectors and dispatch, updating docs and benchmarks, and adding compile-time and runtime tests.

important: Changes

Productize AdjacentDifferencePolicy tuning API

Layer / File(s) Summary
AdjacentDifferencePolicy public type definition
cub/cub/device/dispatch/tuning/tuning_adjacent_difference.cuh
Introduces public AdjacentDifferencePolicy struct with threads_per_block, items_per_thread, load_algorithm, load_modifier, and store_algorithm fields; implements [[nodiscard]] operator==/operator!= friends and hosted operator<< formatter; updates related concepts.
Policy selector machinery
cub/cub/device/dispatch/tuning/tuning_adjacent_difference.cuh
policy_selector and policy_selector_from_types call operators now return and construct AdjacentDifferencePolicy instead of the prior internal type.
Dispatch layer integration
cub/cub/device/dispatch/dispatch_adjacent_difference.cuh
Kernel constant policy, functor policy_selector_from_hub, and dispatch local variable active_policy updated to AdjacentDifferencePolicy; DispatchAdjacentDifference marked deprecated.
Public API documentation
cub/cub/device/device_adjacent_difference.cuh
Added @par Tuning section with literal includes from tests demonstrating custom policy selector usage for SubtractLeft.
Compile-time policy type validation
cub/test/catch2_test_device_adjacent_difference_env.cu
Tuning functor return type changed to cub::AdjacentDifferencePolicy; GCC-gated test validates semiregularity, aggregate form, aggregate/designated initialization variants, and equality comparisons.
Environment-based tuning API test
cub/test/catch2_test_device_adjacent_difference_env_api.cu
Includes cuda/__execution/tune.h; adds C++20-gated test validating SubtractLeftCopy with cuda::execution::tune(...) policy selector using compute-capability-dependent parameter thresholds.
Benchmark update
cub/benchmarks/bench/adjacent_difference/subtract_left.cu
Benchmark policy_selector_t::operator() return type changed to cub::AdjacentDifferencePolicy.
Test deprecation suppression
cub/test/catch2_test_device_adjacent_difference_custom_policy_hub.cu
Adds CCCL_IGNORE_DEPRECATED_API and relocates TODO to suppress deprecation warnings for DispatchAdjacentDifference.

important: Assessment against linked issues

Objective Addressed Explanation
Productize the cub::DeviceAdjacentDifference tuning API [#8567]

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cub/cub/device/dispatch/dispatch_adjacent_difference.cuh (1)

347-360: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

important: Validate externally supplied policy dimensions before using them.

Now that AdjacentDifferencePolicy is public, policy_selector(cc) can legally hand back arbitrary values here. threads_per_block <= 0 or items_per_thread <= 0 will flow straight into tile_size, ::cuda::ceil_div, and the triple-chevron launch config, which turns a bad public policy into divide-by-zero or invalid launch parameters. Guard these fields and fail with a clear cudaErrorInvalidValue before computing num_tiles or launching the kernel.

Also applies to: 430-444

🧹 Nitpick comments (1)
cub/test/catch2_test_device_adjacent_difference_env_api.cu (1)

10-10: ⚡ Quick win

suggestion: In cub/test/catch2_test_device_adjacent_difference_env_api.cu (line 10), prefer #include <cuda/execution.tune.h> over <cuda/__execution/tune.h> since the public header already includes/forwards the private implementation that defines cuda::execution::tune.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: bdcbdb88-fbf2-4d4e-a2b1-894e542c3138

📥 Commits

Reviewing files that changed from the base of the PR and between 2a82ae1 and 5904191.

📒 Files selected for processing (6)
  • cub/benchmarks/bench/adjacent_difference/subtract_left.cu
  • cub/cub/device/device_adjacent_difference.cuh
  • cub/cub/device/dispatch/dispatch_adjacent_difference.cuh
  • cub/cub/device/dispatch/tuning/tuning_adjacent_difference.cuh
  • cub/test/catch2_test_device_adjacent_difference_env.cu
  • cub/test/catch2_test_device_adjacent_difference_env_api.cu

Comment on lines +95 to +105
//! .. literalinclude:: ../../../cub/test/catch2_test_device_adjacent_difference_env_api.cu
//! :language: c++
//! :dedent:
//! :start-after: example-begin subtract-left-copy-policy-selector
//! :end-before: example-end subtract-left-copy-policy-selector
//!
//! .. literalinclude:: ../../../cub/test/catch2_test_device_adjacent_difference_env_api.cu
//! :language: c++
//! :dedent:
//! :start-after: example-begin subtract-left-copy-tuning
//! :end-before: example-end subtract-left-copy-tuning
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Does C++17 support designated initializers like T{ .field = value }, or is that a C++20-only feature?

💡 Result:

Designated initializers, using the syntax T{.field = value }, are a C++20-only feature [1][2][3][4]. They are not part of the C++17 standard [1]. While some compilers (such as GCC and Clang) provided designated initializers as a non-standard language extension long before C++20 [1][5][6], they were not officially supported in C++17 [1]. If you use this syntax in C++17 mode, you are relying on compiler-specific extensions rather than standard C++ [1]. Compilers often allow this code to compile by default, but it may trigger errors or warnings if you compile with strict standard compliance flags (such as -Wpedantic or -pedantic) [1][2]. In contrast, C++20 formally introduced designated initializers for aggregate types [2][3][4]. There are specific differences between the C++20 implementation and C99 designated initializers: for instance, C++20 requires that members be initialized in their declaration order, whereas C99 allows arbitrary order [3][4].

Citations:


important: The @par Tuning docs in cub/cub/device/device_adjacent_difference.cuh include the subtract-left-copy-policy-selector example from cub/test/catch2_test_device_adjacent_difference_env_api.cu, which uses C++20 designated initializers (return {.threads_per_block = ...}). Copy-pasting this into a C++17 project will fail under standard-conforming builds; update the snippet to use C++17-compatible aggregate initialization (e.g., positional initialization) or mirror the existing _CCCL_STD_VER >= 2020 fallback.

Comment on lines 61 to 63
template <typename T>
concept adjacent_difference_policy_selector = policy_selector<T, adjacent_difference_policy>;
concept adjacent_difference_policy_selector = policy_selector<T, AdjacentDifferencePolicy>;
#endif // _CCCL_HAS_CONCEPTS()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

important: Constrain the public selector concept to stateless selectors.

adjacent_difference_policy_selector accepts selectors that return AdjacentDifferencePolicy, but the device path still requires ::cuda::std::is_empty_v<PolicySelector> later in DeviceAdjacentDifferenceDifferenceKernel. With the API now public, that means a stateful selector passes the front-door constraint and then fails deep in dispatch. Fold the emptiness requirement into this concept so unsupported selectors are rejected at the API boundary instead. Based on learnings, policy selectors used at the CUB device API layer are stateless.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

🥳 CI Workflow Results

🟩 Finished in 1h 16m: Pass: 100%/284 | Total: 3d 17h | Max: 1h 04m | Hits: 64%/328568

See results here.

@bernhardmgruber bernhardmgruber changed the title Allow public tuning of DeviceAdjacentDifference Allow public tuning of cub::DeviceAdjacentDifference Jun 3, 2026
@bernhardmgruber bernhardmgruber merged commit fc452a9 into NVIDIA:main Jun 3, 2026
311 checks passed
@bernhardmgruber bernhardmgruber deleted the adj_diff_public branch June 3, 2026 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Productize the cub::DeviceAdjacentDifference tuning API

3 participants