Skip to content

[STF] Use out parameter for partition mappers#9117

Merged
andralex merged 11 commits into
NVIDIA:mainfrom
caugonnet:stf-get-executor-out-param
May 27, 2026
Merged

[STF] Use out parameter for partition mappers#9117
andralex merged 11 commits into
NVIDIA:mainfrom
caugonnet:stf-get-executor-out-param

Conversation

@caugonnet
Copy link
Copy Markdown
Contributor

Summary

  • Change partition_fn_t / stf_get_executor_fn to write mapper results through an out parameter instead of returning pos4 by value.
  • Update built-in places partitioners, localized array mapper calls, C STF tests, and places documentation to match the new contract.
  • Keep this extraction independent of Python STF bindings and other stf_c_api branch work.

Test plan

  • pre-commit run --files c/experimental/stf/include/cccl/c/experimental/stf/stf.h c/experimental/stf/src/stf.cu c/experimental/stf/test/test_places.cpp cudax/include/cuda/experimental/__places/data_place_interface.cuh cudax/include/cuda/experimental/__places/localized_array.cuh cudax/include/cuda/experimental/__places/partitions/blocked_partition.cuh cudax/include/cuda/experimental/__places/partitions/cyclic_shape.cuh cudax/include/cuda/experimental/__places/partitions/tiled_partition.cuh docs/cudax/places.rst
  • git diff --check

@caugonnet caugonnet requested review from a team as code owners May 22, 2026 23:42
@caugonnet caugonnet requested a review from fbusato May 22, 2026 23:42
@github-project-automation github-project-automation Bot moved this to Todo in CCCL May 22, 2026
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 22, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@caugonnet caugonnet requested a review from gevtushenko May 22, 2026 23:42
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL May 22, 2026
@caugonnet caugonnet changed the title [CUDAX] Use out parameter for partition mappers [STF] Use out parameter for partition mappers May 22, 2026
@caugonnet caugonnet self-assigned this May 22, 2026
@caugonnet caugonnet added the stf Sequential Task Flow programming model label May 22, 2026
@caugonnet caugonnet marked this pull request as draft May 22, 2026 23:44
@cccl-authenticator-app cccl-authenticator-app Bot moved this from In Review to In Progress in CCCL May 22, 2026
@caugonnet
Copy link
Copy Markdown
Contributor Author

/ok to test 5f33c27

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: bfafbe56-2e29-42ea-840e-71da364119e3

📥 Commits

Reviewing files that changed from the base of the PR and between 71d8b77 and 23091ba.

📒 Files selected for processing (1)
  • c/experimental/stf/src/stf.cu
✅ Files skipped from review due to trivial changes (1)
  • c/experimental/stf/src/stf.cu

📝 Walkthrough

Summary by CodeRabbit

  • Breaking Changes

    • Mapper and partitioner callbacks now return results via output pointer parameters instead of returning structs, affecting the public API surface across experimental STF and CUDA Places.
  • Tests

    • Unit tests updated to call the new pointer-based callback signatures.
  • Documentation

    • API docs updated to describe the new output-parameter convention and adjusted parameter descriptions.

Walkthrough

important: Partitioner and executor function signatures are refactored from return-by-value to out-parameter convention. Type definitions are updated in CudaX and C wrapper headers, partition implementations are converted, call sites are adapted, test mappers are adjusted, and documentation is synchronized across the refactoring.

Changes

Partitioner ABI migration

Layer / File(s) Summary
Interface contracts
cudax/include/cuda/experimental/__places/data_place_interface.cuh, c/experimental/stf/include/cccl/c/experimental/stf/stf.h
partition_fn_t and stf_get_executor_fn are redefined from returning pos4/stf_pos4 by value to writing into pos4*/stf_pos4* output parameters, establishing the new out-parameter ABI.
Partition implementations
cudax/include/cuda/experimental/__places/partitions/blocked_partition.cuh, cudax/include/cuda/experimental/__places/partitions/cyclic_shape.cuh, cudax/include/cuda/experimental/__places/partitions/tiled_partition.cuh
get_executor methods in blocked, cyclic, and tiled partitions are updated to accept a result pointer and return void, with corresponding test calls refactored to pass temporaries by address instead of capturing return values.
Function call sites
cudax/include/cuda/experimental/__places/localized_array.cuh, c/experimental/stf/src/stf.cu
Mapper function invocations are switched to out-parameter style: mapper(&eplace_coords, ...) instead of assignment from return values, and the C wrapper's cpp_mapper cast is updated.
Test mapper
c/experimental/stf/test/test_places.cpp
blocked_mapper_1d test fixture signature changes to void with stf_pos4* result parameter, and field assignments use pointer dereference (result->x/y/z/t) instead of struct return.
User documentation
docs/cudax/places.rst
Documented get_executor method signature and description updated to reflect the out-parameter convention and clarify that coordinates are written to *result.

suggestion: Run full compile with nvcc and host toolchain and re-run unit tests that exercise partition get_executor paths to catch any ABI-callsite mismatches.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cudax/include/cuda/experimental/__places/partitions/blocked_partition.cuh (1)

105-123: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

critical: blocked_partition_custom::get_executor can divide by zero on Line 123 when the selected dimension extent is zero (part_size becomes 0). Add an early guard for grid_dims.x == 0/part_size == 0 before computing c / part_size.

🧹 Nitpick comments (2)
cudax/include/cuda/experimental/__places/localized_array.cuh (1)

362-364: ⚡ Quick win

suggestion: initialize eplace_coords before passing it to mapper (e.g., pos4 eplace_coords(0);) so partially-written mapper outputs cannot leak uninitialized coordinates into placement decisions.

cudax/include/cuda/experimental/__places/data_place_interface.cuh (1)

55-58: ⚡ Quick win

suggestion: Add explicit unstable API warning to the documentation comment. While the cuda::experimental namespace indicates experimental status, the coding guidelines require that "All CUDA Experimental APIs must be documented as unstable and subject to change without notice." Consider adding a Doxygen @warning or note tag.

 //! Function type for computing executor placement from data coordinates.
 //! Uses an out-pointer convention so the signature is trivially representable
 //! in FFI frameworks (ctypes, cffi, Rust) that cannot return C structs.
+//! `@warning` This API is experimental and subject to change without notice.
 using partition_fn_t = void (*)(pos4* result, pos4 data_coords, dim4 data_dims, dim4 grid_dims);

As per coding guidelines: "All CUDA Experimental APIs must be documented as unstable and subject to change without notice".


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 59d940a9-5232-4609-8a45-9d12e8f769af

📥 Commits

Reviewing files that changed from the base of the PR and between c47f140 and 5f33c27.

📒 Files selected for processing (9)
  • c/experimental/stf/include/cccl/c/experimental/stf/stf.h
  • c/experimental/stf/src/stf.cu
  • c/experimental/stf/test/test_places.cpp
  • cudax/include/cuda/experimental/__places/data_place_interface.cuh
  • cudax/include/cuda/experimental/__places/localized_array.cuh
  • cudax/include/cuda/experimental/__places/partitions/blocked_partition.cuh
  • cudax/include/cuda/experimental/__places/partitions/cyclic_shape.cuh
  • cudax/include/cuda/experimental/__places/partitions/tiled_partition.cuh
  • docs/cudax/places.rst

@github-actions

This comment has been minimized.

@caugonnet caugonnet marked this pull request as ready for review May 23, 2026 19:49
@cccl-authenticator-app cccl-authenticator-app Bot moved this from In Progress to In Review in CCCL May 23, 2026
@caugonnet
Copy link
Copy Markdown
Contributor Author

/ok to test 4b1424e

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
cudax/include/cuda/experimental/__places/localized_array.cuh (1)

362-362: ⚡ Quick win

suggestion: Use uniform initialization for eplace_coords.

Line 362 should use brace initialization to satisfy the header style rule: pos4 eplace_coords{0};.

As per coding guidelines: "Use uniform initialization for class constructors and compile-time conversions, e.g., constexpr auto x = int{sizeof(float)};".


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 853a96b6-2a57-4d38-bee9-bb10c465a4f4

📥 Commits

Reviewing files that changed from the base of the PR and between 5f33c27 and aa099d0.

📒 Files selected for processing (1)
  • cudax/include/cuda/experimental/__places/localized_array.cuh

Comment thread c/experimental/stf/src/stf.cu Outdated
Comment thread cudax/include/cuda/experimental/__places/partitions/cyclic_shape.cuh Outdated
//! Function type for computing executor placement from data coordinates.
//! Uses an out-pointer convention so the signature is trivially representable
//! in FFI frameworks (ctypes, cffi, Rust) that cannot return C structs.
using partition_fn_t = void (*)(pos4* result, pos4 data_coords, dim4 data_dims, dim4 grid_dims);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use here the typedef in c/experimental/stf/include/cccl/c/experimental/stf/stf.h‎ so we don't need to duplicate it?

Copy link
Copy Markdown
Contributor

@andralex andralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm modulo a few nits

@andralex andralex enabled auto-merge (squash) May 26, 2026 18:56
@andralex
Copy link
Copy Markdown
Contributor

/ok to test ba9d290

@github-actions

This comment has been minimized.

@caugonnet
Copy link
Copy Markdown
Contributor Author

/ok to test 71d8b77

Comment thread docs/cudax/places.rst
static const S_out apply(const S_in& in, pos4 position, dim4 grid_dims);

pos4 get_executor(pos4 data_coords, dim4 data_dims, dim4 grid_dims);
void get_executor(pos4* result, pos4 data_coords, dim4 data_dims, dim4 grid_dims);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: should this be static void get_executor(...)? partition_fn_t is a plain function pointer type not a member function. Also the text below mentions a virtual method, which this is not

@github-actions

This comment has been minimized.

@caugonnet
Copy link
Copy Markdown
Contributor Author

/ok to test 23091ba

@github-actions
Copy link
Copy Markdown
Contributor

🥳 CI Workflow Results

🟩 Finished in 36m 57s: Pass: 100%/59 | Total: 5h 12m | Max: 31m 09s | Hits: 100%/33226

See results here.

@andralex andralex merged commit ba6b4ea into NVIDIA:main May 27, 2026
78 of 79 checks passed
@github-project-automation github-project-automation Bot moved this from In Review to Done in CCCL May 27, 2026
trxcllnt pushed a commit to trxcllnt/cccl that referenced this pull request May 27, 2026
* [CUDAX] Use out parameter for partition mappers

* [CUDAX] Initialize mapper output coordinates

* Apply suggestion from @andralex

* Update cudax/include/cuda/experimental/__places/partitions/cyclic_shape.cuh

---------

Co-authored-by: Andrei Alexandrescu <andrei@erdani.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stf Sequential Task Flow programming model

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants