Skip to content

[STF] Add C bindings for the places layer#9232

Merged
andralex merged 15 commits into
NVIDIA:mainfrom
caugonnet:stf_c_api_places
Jun 8, 2026
Merged

[STF] Add C bindings for the places layer#9232
andralex merged 15 commits into
NVIDIA:mainfrom
caugonnet:stf_c_api_places

Conversation

@caugonnet

Copy link
Copy Markdown
Contributor

Description

Extracted from #5315 (STF Python bindings) to keep that PR reviewable. This PR contains only the places-layer additions to the experimental STF C API; it is independent of the companion stackable-contexts PR and can be reviewed/merged on its own.

Adds C bindings that mirror the C++ places layer:

  • green_context_helper (create/destroy/count/device id) and green-context exec_place / data_place factories (CUDA 12.4+).
  • exec_place scope enter/exit (RAII context activation), affine data_place accessor, and grid sub-place accessor (get_place).
  • data_place stream-ordered allocate/deallocate and an allocation_is_stream_ordered query, plus machine_init.
  • task grid accessors: get_grid_dims and get_custream_at_index.

Checklist

  • New or existing tests cover these changes (test_places.cpp).
  • The documentation is up to date with these changes (Doxygen comments in the header).

Extends the experimental STF C API to mirror the C++ places layer:

- green_context_helper (create/destroy/count/device id) and green-context
  exec_place / data_place factories (CUDA 12.4+).
- exec_place scope enter/exit (RAII context activation), affine data_place
  accessor, and grid sub-place accessor (get_place).
- data_place stream-ordered allocate/deallocate and an
  allocation_is_stream_ordered query, plus machine_init.
- task grid accessors: get_grid_dims and get_custream_at_index.

Adds coverage in test_places.cpp. Extracted from the python-bindings PR
to keep that change reviewable.
@caugonnet caugonnet requested a review from a team as a code owner June 3, 2026 09:19
@caugonnet caugonnet requested a review from NaderAlAwar June 3, 2026 09:19
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Jun 3, 2026
@copy-pr-bot

copy-pr-bot Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 3, 2026
@caugonnet caugonnet self-assigned this Jun 3, 2026
@caugonnet caugonnet added the stf Sequential Task Flow programming model label Jun 3, 2026
@caugonnet

Copy link
Copy Markdown
Contributor Author

/ok to test 7109fdb

@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

suggestion:

Walkthrough

Adds STF C API declarations and implementations for green-context helper lifecycle, exec-place scope enter/exit and affine data access, green-context-backed exec/data-place factories, data-place allocate/deallocate and stream-order query, task grid-dim and per-index CUstream accessors, and corresponding tests.

Changes

STF C API Extensions

Layer / File(s) Summary
Opaque handle types and conversion infrastructure
c/experimental/stf/include/cccl/c/experimental/stf/stf.h, c/experimental/stf/src/stf.cu
New stf_green_context_helper_handle and stf_exec_place_scope_handle typedefs and corresponding to_opaque/from_opaque_const conversions.
Green context helper lifecycle
c/experimental/stf/include/cccl/c/experimental/stf/stf.h, c/experimental/stf/src/stf.cu
APIs to create/destroy a green-context helper and to query helper count and device id (CTK-gated).
Exec-place scope activation and affine data access
c/experimental/stf/include/cccl/c/experimental/stf/stf.h, c/experimental/stf/src/stf.cu
stf_exec_place_scope_enter/exit for sub-place activation with CUDA context save/restore, and stf_exec_place_get_affine_data_place.
Exec and data place factories (green ctx)
c/experimental/stf/include/cccl/c/experimental/stf/stf.h, c/experimental/stf/src/stf.cu
stf_exec_place_get_place, stf_exec_place_green_ctx, and stf_data_place_green_ctx to obtain places by index or from a green-context helper; return nullptr when CTK unavailable or index OOB.
Data-place memory management and utilities
c/experimental/stf/include/cccl/c/experimental/stf/stf.h, c/experimental/stf/src/stf.cu
stf_data_place_allocate / stf_data_place_deallocate with size and cudaStream parameters, stf_data_place_allocation_is_stream_ordered, and idempotent stf_machine_init(); allocation catches exceptions and logs to stderr.
Task grid introspection and per-grid stream access
cudax/include/cuda/experimental/__stf/internal/context.cuh, c/experimental/stf/include/cccl/c/experimental/stf/stf.h, c/experimental/stf/src/stf.cu
stf_task_get_grid_dims to read exec-place grid shape and stf_task_get_custream_at_index to retrieve a CUstream for a grid index; internal unified_task helpers added to fetch per-place CUDA streams and grid dims.
Tests for grid introspection, scope, and allocations
c/experimental/stf/test/test_places.cpp
Added and refactored Catch2 tests validating grid-dimension reporting, per-index CUstream retrieval, exec-place scope behavior, exec-place accessors, idempotent machine init, device/host/managed allocation roundtrips, stream-order property checks, and invalid-allocation cases.

important: Suggested reviewers

  • NaderAlAwar
  • andralex
  • oleksandr-pavlyk

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
c/experimental/stf/include/cccl/c/experimental/stf/stf.h (1)

159-170: ⚡ Quick win

suggestion: these new public Doxygen blocks stop at the brief text, so the generated C API docs never record the parameter and return contracts for the added entry points. Please add per-parameter annotations and return tags here before merge.

As per coding guidelines, "When a function is documented with Doxygen, it must include: //! @brief, `//! `@param`[in/out/in,out]` for every parameter, and `//! `@return for non-void functions."

Also applies to: 203-246, 270-273


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 300e4c0d-dc3f-41c3-9ea1-d62a420a36a6

📥 Commits

Reviewing files that changed from the base of the PR and between cfe7e26 and 8a89b78.

📒 Files selected for processing (3)
  • c/experimental/stf/include/cccl/c/experimental/stf/stf.h
  • c/experimental/stf/src/stf.cu
  • c/experimental/stf/test/test_places.cpp

Comment thread c/experimental/stf/src/stf.cu
Comment thread c/experimental/stf/src/stf.cu
@github-actions

This comment has been minimized.

caugonnet and others added 2 commits June 3, 2026 14:07
Address CodeRabbit review feedback:
- stf_exec_place_scope_enter now rejects out-of-range indices with NULL,
  matching the contract of the neighboring index-based accessors.
- stf_data_place_deallocate catches and maps C++ exceptions instead of
  letting them escape the extern "C" entry point.
@caugonnet

Copy link
Copy Markdown
Contributor Author

/ok to test fb32b48

Fix modernize-loop-convert clang-tidy errors by iterating the places
array with range-based for loops instead of index-based loops.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
c/experimental/stf/test/test_places.cpp (1)

234-288: ⚡ Quick win

suggestion: Test coverage gap for stf_task_get_custream_at_index error path. Add a test case verifying that calling stf_task_get_custream_at_index(t, 2, &s) returns non-zero when the grid has only 2 elements (indices 0 and 1). Context snippet from stf.cu:912-927 shows the API returns -1 for out-of-bounds. Pattern at lines 463-475 demonstrates similar bounds testing for stf_exec_place_get_place.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 241742fe-0ed2-4265-92c0-09ddd1364d8a

📥 Commits

Reviewing files that changed from the base of the PR and between 9e2a4b4 and 3d7dd90.

📒 Files selected for processing (1)
  • c/experimental/stf/test/test_places.cpp

@github-actions

This comment has been minimized.

Comment thread c/experimental/stf/src/stf.cu
Comment thread c/experimental/stf/src/stf.cu
Comment thread c/experimental/stf/test/test_places.cpp
The places C bindings (stf_task_get_grid_dims / stf_task_get_custream_at_index)
call get_grid_dims(dim4*) and get_stream(size_t) on context::unified_task<>,
but those overloads were never declared on unified_task in this branch, so
stf.cu failed to compile. Add both methods, dispatching the per-place stream
to stream_task<Deps...> and returning nullptr/false for graph tasks or
non-grid exec places.
@caugonnet caugonnet requested a review from a team as a code owner June 5, 2026 05:29
@caugonnet caugonnet requested a review from andralex June 5, 2026 05:29
caugonnet and others added 2 commits June 5, 2026 09:13
stream_task::get_stream(size_t) indexes the stream grid without any bounds
check, so stf_task_get_custream_at_index could read past the grid for an
out-of-range index (UB) and returned success for non-grid exec places,
contradicting the documented contract (non-zero on "not a grid" / index out
of range). Guard the linear index in the unified_task<> wrapper: return
nullptr for graph tasks, non-grid exec places, and out-of-range indices.

Add a regression check to the grid test for the out-of-range index case.
@caugonnet

Copy link
Copy Markdown
Contributor Author

/ok to test 18afc44

Add direct C API coverage for green-context helper and green-context exec/data place factories so the extracted places bindings are self-contained.
@caugonnet

Copy link
Copy Markdown
Contributor Author

/ok to test ca80817

@github-actions

This comment has been minimized.

@caugonnet caugonnet enabled auto-merge (squash) June 8, 2026 15:43

@andralex andralex left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice slice — the place layer is exactly the part the C API was missing, and the test coverage here is genuinely strong (grid dims + OOB index + non-grid error path, scope enter/exit incl. nested, get_place scalar/grid/OOB, full green-ctx lifecycle with a clean SKIP when unavailable, and allocate device/host/managed + stream-ordered query + invalid→null). A few things worth a look before merge.

Exception safety at the C boundary

  • stf_machine_init (stf.cu) is the one new entry point that does real work but isn't guarded. Everything else either goes through stf_try_allocate (creators) or is a trivial getter, and allocate/deallocate correctly grow their own try/catch. But machine::instance() on first call does P2P/mempool/topology setup; if anything in there throws (std::bad_alloc, etc.) it unwinds across extern \"C\"std::terminate/UB. Either wrap it in the same try { ... } catch(...) pattern, or confirm machine init is abort-on-error internally and add a one-line comment saying so.
  • stf_green_context_helper_get_count / get_device_id and the two stf_task_get_* accessors call C++ methods/visitors unguarded. Low risk (they're effectively getters), and it's consistent with the existing getter convention in this file — just calling it out so the choice is deliberate.

Null-handling is now inconsistent across the file

The new place/green-ctx functions use _CCCL_ASSERT(h != nullptr, ...) (a no-op in release → null deref on misuse), while the new task accessors do real runtime checks (if (t == nullptr || out == nullptr) return -1;). Both styles are defensible, but mixing them in the same C surface is surprising for non-C++ callers. Suggest standardizing — and the [out]-param accessors' hard checks are the safer default for a C API. (stf_exec_place_scope_exit(nullptr) being a no-op via delete from_opaque(nullptr) is fine and matches the doc. 👍)

stf_data_place_allocate / deallocate size types

allocate takes ptrdiff_t size (signed) but deallocate takes size_t size (unsigned) for the same logical quantity. It mirrors the underlying C++ signatures, but on the C surface the asymmetry is a small footgun — consider making both size_t (or documenting why allocate is signed).

Minor

  • stf_task_get_custream_at_index: a NULL out-stream is treated as the error sentinel. CUstream 0 is the legacy default stream, so a legitimate stream-0 would be misreported as failure. STF streams are non-default in practice, so this is fine — but a one-line note would help.
  • get_grid_dims / get_stream(idx) treat size() == 1 as "not a grid" (<= 1), so a deliberately 1-element grid returns false/nullptr. Looks intentional (and the two are consistent), worth a doc note.
  • Doxygen: the two new stf_task_get_* blocks open with a lone //! and have a blank line between the doc block and the declaration — that can detach the comment from the symbol in doxygen.

Questions

  • data_place_allocate_invalid_returns_null relies on affine().allocate() throwing (so the catch returns null). Is that guaranteed in release builds, i.e. it doesn't _CCCL_ASSERT/abort on an unsupported place?
  • task on exec_place_grid destroys composite_dplace right after stf_exec_place_set_affine_data_place(grid, composite_dplace) — confirming set_affine_data_place copies rather than borrows?

Overall this is close; the only thing I'd call blocking is the stf_machine_init boundary question.

@andralex andralex left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm modulo comments

@caugonnet caugonnet disabled auto-merge June 8, 2026 16:09
@caugonnet

Copy link
Copy Markdown
Contributor Author

Fix machine init to have proper exception support, and documented the difference in signness for alloc API.

For questions :

  • 1 : data_place::affine() is backed by data_place_affine, whose allocate is an unconditional throw, not a _CCCL_ASSERT or abort
  • 2 : Yes, it copies; destroying composite_dplace right after the call is safe. The setter takes its argument by value and moves it into the member

andralex and others added 3 commits June 8, 2026 12:18
machine::instance() does real work on first call (P2P/mempool/topology
setup) and can throw. Wrap it in try/catch so a C++ exception never
unwinds across the extern "C" boundary into a C caller (UB / terminate),
matching the error-reporting convention used by stf_try_allocate.
stf_data_place_allocate takes a signed ptrdiff_t while stf_data_place_deallocate
takes an unsigned size_t. This mirrors the C++ allocator interface, where the
requested size is passed by reference and negated to signal allocation failure;
deallocation has no such error to signal. Document the asymmetry on both entry
points so the C surface explains why the types differ.
@andralex andralex enabled auto-merge (squash) June 8, 2026 16:19
The stf_task_get_grid_dims / stf_task_get_custream_at_index doc blocks
opened with a lone //! line and had a blank line between the comment and
the declaration, which can detach the comment from the symbol in doxygen.
Drop the leading empty //! and the trailing blank line so each block binds
to its function.
@andralex

andralex commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

/ok to test 55e0e84

Note that stf_task_get_grid_dims treats a single-element exec place as
"not a grid" (returns non-zero), and that stf_task_get_custream_at_index
leaves out_stream untouched on failure and never yields the legacy
default stream (CUstream 0) on success, since STF grids use non-default
streams.
@caugonnet

Copy link
Copy Markdown
Contributor Author

/ok to test ab35eae

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

🥳 CI Workflow Results

🟩 Finished in 36m 01s: Pass: 100%/59 | Total: 4h 23m | Max: 36m 00s | Hits: 99%/34772

See results here.

@andralex andralex merged commit b29b61a into NVIDIA:main Jun 8, 2026
79 checks passed
@github-project-automation github-project-automation Bot moved this from In Review to Done in CCCL Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stf Sequential Task Flow programming model

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants