[STF] Add C bindings for the places layer#9232
Conversation
Extends the experimental STF C API to mirror the C++ places layer: - green_context_helper (create/destroy/count/device id) and green-context exec_place / data_place factories (CUDA 12.4+). - exec_place scope enter/exit (RAII context activation), affine data_place accessor, and grid sub-place accessor (get_place). - data_place stream-ordered allocate/deallocate and an allocation_is_stream_ordered query, plus machine_init. - task grid accessors: get_grid_dims and get_custream_at_index. Adds coverage in test_places.cpp. Extracted from the python-bindings PR to keep that change reviewable.
|
/ok to test 7109fdb |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
suggestion: WalkthroughAdds STF C API declarations and implementations for green-context helper lifecycle, exec-place scope enter/exit and affine data access, green-context-backed exec/data-place factories, data-place allocate/deallocate and stream-order query, task grid-dim and per-index CUstream accessors, and corresponding tests. ChangesSTF C API Extensions
important: Suggested reviewers
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
c/experimental/stf/include/cccl/c/experimental/stf/stf.h (1)
159-170: ⚡ Quick winsuggestion: these new public Doxygen blocks stop at the brief text, so the generated C API docs never record the parameter and return contracts for the added entry points. Please add per-parameter annotations and return tags here before merge.
As per coding guidelines, "When a function is documented with Doxygen, it must include:
//!@brief, `//! `@param`[in/out/in,out]` for every parameter, and `//! `@returnfor non-void functions."Also applies to: 203-246, 270-273
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 300e4c0d-dc3f-41c3-9ea1-d62a420a36a6
📒 Files selected for processing (3)
c/experimental/stf/include/cccl/c/experimental/stf/stf.hc/experimental/stf/src/stf.cuc/experimental/stf/test/test_places.cpp
This comment has been minimized.
This comment has been minimized.
Address CodeRabbit review feedback: - stf_exec_place_scope_enter now rejects out-of-range indices with NULL, matching the contract of the neighboring index-based accessors. - stf_data_place_deallocate catches and maps C++ exceptions instead of letting them escape the extern "C" entry point.
|
/ok to test fb32b48 |
Fix modernize-loop-convert clang-tidy errors by iterating the places array with range-based for loops instead of index-based loops.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
c/experimental/stf/test/test_places.cpp (1)
234-288: ⚡ Quick winsuggestion: Test coverage gap for
stf_task_get_custream_at_indexerror path. Add a test case verifying that callingstf_task_get_custream_at_index(t, 2, &s)returns non-zero when the grid has only 2 elements (indices 0 and 1). Context snippet from stf.cu:912-927 shows the API returns -1 for out-of-bounds. Pattern at lines 463-475 demonstrates similar bounds testing forstf_exec_place_get_place.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 241742fe-0ed2-4265-92c0-09ddd1364d8a
📒 Files selected for processing (1)
c/experimental/stf/test/test_places.cpp
This comment has been minimized.
This comment has been minimized.
The places C bindings (stf_task_get_grid_dims / stf_task_get_custream_at_index) call get_grid_dims(dim4*) and get_stream(size_t) on context::unified_task<>, but those overloads were never declared on unified_task in this branch, so stf.cu failed to compile. Add both methods, dispatching the per-place stream to stream_task<Deps...> and returning nullptr/false for graph tasks or non-grid exec places.
stream_task::get_stream(size_t) indexes the stream grid without any bounds check, so stf_task_get_custream_at_index could read past the grid for an out-of-range index (UB) and returned success for non-grid exec places, contradicting the documented contract (non-zero on "not a grid" / index out of range). Guard the linear index in the unified_task<> wrapper: return nullptr for graph tasks, non-grid exec places, and out-of-range indices. Add a regression check to the grid test for the out-of-range index case.
|
/ok to test 18afc44 |
Add direct C API coverage for green-context helper and green-context exec/data place factories so the extracted places bindings are self-contained.
|
/ok to test ca80817 |
This comment has been minimized.
This comment has been minimized.
andralex
left a comment
There was a problem hiding this comment.
Nice slice — the place layer is exactly the part the C API was missing, and the test coverage here is genuinely strong (grid dims + OOB index + non-grid error path, scope enter/exit incl. nested, get_place scalar/grid/OOB, full green-ctx lifecycle with a clean SKIP when unavailable, and allocate device/host/managed + stream-ordered query + invalid→null). A few things worth a look before merge.
Exception safety at the C boundary
stf_machine_init(stf.cu) is the one new entry point that does real work but isn't guarded. Everything else either goes throughstf_try_allocate(creators) or is a trivial getter, andallocate/deallocatecorrectly grow their owntry/catch. Butmachine::instance()on first call does P2P/mempool/topology setup; if anything in there throws (std::bad_alloc, etc.) it unwinds acrossextern \"C\"→std::terminate/UB. Either wrap it in the sametry { ... } catch(...)pattern, or confirm machine init is abort-on-error internally and add a one-line comment saying so.stf_green_context_helper_get_count/get_device_idand the twostf_task_get_*accessors call C++ methods/visitors unguarded. Low risk (they're effectively getters), and it's consistent with the existing getter convention in this file — just calling it out so the choice is deliberate.
Null-handling is now inconsistent across the file
The new place/green-ctx functions use _CCCL_ASSERT(h != nullptr, ...) (a no-op in release → null deref on misuse), while the new task accessors do real runtime checks (if (t == nullptr || out == nullptr) return -1;). Both styles are defensible, but mixing them in the same C surface is surprising for non-C++ callers. Suggest standardizing — and the [out]-param accessors' hard checks are the safer default for a C API. (stf_exec_place_scope_exit(nullptr) being a no-op via delete from_opaque(nullptr) is fine and matches the doc. 👍)
stf_data_place_allocate / deallocate size types
allocate takes ptrdiff_t size (signed) but deallocate takes size_t size (unsigned) for the same logical quantity. It mirrors the underlying C++ signatures, but on the C surface the asymmetry is a small footgun — consider making both size_t (or documenting why allocate is signed).
Minor
stf_task_get_custream_at_index: aNULLout-stream is treated as the error sentinel.CUstream0is the legacy default stream, so a legitimate stream-0 would be misreported as failure. STF streams are non-default in practice, so this is fine — but a one-line note would help.get_grid_dims/get_stream(idx)treatsize() == 1as "not a grid" (<= 1), so a deliberately 1-element grid returns false/nullptr. Looks intentional (and the two are consistent), worth a doc note.- Doxygen: the two new
stf_task_get_*blocks open with a lone//!and have a blank line between the doc block and the declaration — that can detach the comment from the symbol in doxygen.
Questions
data_place_allocate_invalid_returns_nullrelies onaffine().allocate()throwing (so thecatchreturns null). Is that guaranteed in release builds, i.e. it doesn't_CCCL_ASSERT/abort on an unsupported place?task on exec_place_griddestroyscomposite_dplaceright afterstf_exec_place_set_affine_data_place(grid, composite_dplace)— confirmingset_affine_data_placecopies rather than borrows?
Overall this is close; the only thing I'd call blocking is the stf_machine_init boundary question.
|
Fix machine init to have proper exception support, and documented the difference in signness for alloc API. For questions :
|
machine::instance() does real work on first call (P2P/mempool/topology setup) and can throw. Wrap it in try/catch so a C++ exception never unwinds across the extern "C" boundary into a C caller (UB / terminate), matching the error-reporting convention used by stf_try_allocate.
stf_data_place_allocate takes a signed ptrdiff_t while stf_data_place_deallocate takes an unsigned size_t. This mirrors the C++ allocator interface, where the requested size is passed by reference and negated to signal allocation failure; deallocation has no such error to signal. Document the asymmetry on both entry points so the C surface explains why the types differ.
The stf_task_get_grid_dims / stf_task_get_custream_at_index doc blocks opened with a lone //! line and had a blank line between the comment and the declaration, which can detach the comment from the symbol in doxygen. Drop the leading empty //! and the trailing blank line so each block binds to its function.
|
/ok to test 55e0e84 |
Note that stf_task_get_grid_dims treats a single-element exec place as "not a grid" (returns non-zero), and that stf_task_get_custream_at_index leaves out_stream untouched on failure and never yields the legacy default stream (CUstream 0) on success, since STF grids use non-default streams.
|
/ok to test ab35eae |
🥳 CI Workflow Results🟩 Finished in 36m 01s: Pass: 100%/59 | Total: 4h 23m | Max: 36m 00s | Hits: 99%/34772See results here. |
Description
Extracted from #5315 (STF Python bindings) to keep that PR reviewable. This PR contains only the places-layer additions to the experimental STF C API; it is independent of the companion stackable-contexts PR and can be reviewed/merged on its own.
Adds C bindings that mirror the C++
placeslayer:green_context_helper(create/destroy/count/device id) and green-contextexec_place/data_placefactories (CUDA 12.4+).exec_placescope enter/exit (RAII context activation), affinedata_placeaccessor, and grid sub-place accessor (get_place).data_placestream-ordered allocate/deallocate and anallocation_is_stream_orderedquery, plusmachine_init.get_grid_dimsandget_custream_at_index.Checklist
test_places.cpp).