cudax/stf: migrate stackable/ from cuda_safe_call to cuda_try#9165
cudax/stf: migrate stackable/ from cuda_safe_call to cuda_try#9165andralex wants to merge 1 commit into
Conversation
In stackable_ctx_impl.cuh, replace cuda_safe_call with cuda_try in the
graph_ctx_node constructor and finalize() so CUDA errors are reported as
exceptions rather than aborting the process.
The constructor builds a CUDA graph in stages, so add transactional
cleanup:
- In the nested non-conditional branch, the freshly created
dummy_graph is destroyed intentionally mid-block. Guard it with a
SCOPE(fail) that frees it only while dummy_graph_owned is true, and
disarm the flag right after the intentional destroy.
- The outer `graph` is owned by us only in the non-nested case (in the
nested cases it is either parent_graph or a child of parent_graph,
both owned upstream). A SCOPE(fail) destroys it on early throw and
is disarmed the instant graph_ctx adopts it via
`auto gctx = graph_ctx(sub_graph, ...);`, matching graph_ctx's
documented ownership contract ("User code is not supposed to destroy
the graph later").
- The conditional handle (cudaGraphConditionalHandleCreate) and any
nodes added to `graph` (cudaGraphAddNode, cudaGraphAddKernelNode)
are implicitly cleaned up by the outer SCOPE(fail) destroying
`graph`.
Two residual hazards are intentionally documented inline rather than
fixed in this commit:
- cudaGraphAddChildGraphNode leaves an orphaned child node inside
parent_graph on later throw; cleanly removing it would need
cudaGraphDestroyNode and dependency rewiring.
- cudaGraphConditionalHandleCreate writes a handle into a caller-owned
pointer; CUDA has no destroy API for conditional handles, so on
throw the handle is left invalid (its backing graph is destroyed).
Both are no worse than the prior behavior (which aborted).
The four cuda_safe_call sites in finalize() (cudaGraphAddDependencies
on both CTK branches, cudaGraphDebugDotPrint, cudaGraphLaunch) become
plain cuda_try; no resource rollback applies.
The two cuda_safe_call sites inside the new SCOPE(fail) bodies are
intentional: SCOPE bodies are noexcept, so cuda_safe_call is the
correct tool there.
In stackable_ctx.cuh, the two cuda_safe_call sites inside
UNITTEST host-task lambdas are kept and annotated. Those lambdas
are dispatched by the STF host-task path, whose exception-safety has
not been audited, so an abort remains safer than an unannotated throw.
📝 WalkthroughSummary by CodeRabbit
WalkthroughCUDA Graph API error handling in stackable graph context construction, finalization, and node wiring is refactored to use ChangesStackable graph context error handling
Possibly related PRs
Suggested labels
Suggested reviewers
Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
cudax/include/cuda/experimental/__stf/stackable/stackable_ctx_impl.cuh (1)
40-40: 💤 Low valuesuggestion: Include uses quote syntax instead of angle brackets per coding guideline ("All header inclusions must use angle bracket syntax"). However, this matches the existing pattern in lines 33-39, so fixing would be a file-wide refactor.
-#include "cuda/experimental/__stf/utility/scope_guard.cuh" +#include <cuda/experimental/__stf/utility/scope_guard.cuh>As per coding guidelines: "All header inclusions must use angle bracket syntax, e.g.,
"
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f5eedaee-c34f-4310-b104-9770f494cea2
📒 Files selected for processing (2)
cudax/include/cuda/experimental/__stf/stackable/stackable_ctx.cuhcudax/include/cuda/experimental/__stf/stackable/stackable_ctx_impl.cuh
|
/ok to test 1f791e5 |
🥳 CI Workflow Results🟩 Finished in 1h 31m: Pass: 100%/55 | Total: 1d 14h | Max: 1h 31m | Hits: 12%/296793See results here. |
Summary
Migration PR3 of the
cuda_safe_call→cuda_tryrollout forcudax/__stf/. Targetscudax/include/cuda/experimental/__stf/stackable/(15 sites). 13 sites converted, 2 KEEPs documented.Companion to PR #9146 (allocators) and PR #9150 (utility).
Changes
stackable_ctx_impl.cuh—graph_ctx_nodeconstructor +finalize()13
cuda_safe_callsites →cuda_try. The constructor builds a CUDA graph in stages, so transactional cleanup is added:Nested non-conditional branch (453, 457, 460, 465). The freshly created
dummy_graphis destroyed intentionally mid-block. ASCOPE(fail)guarded bydummy_graph_ownedfrees it on early throw; the flag is disarmed right after the intentional destroy.Outer
graph(472). Owned by us only in the non-nested case; in nested casesgraphis eitherparent_graphor a child ofparent_graph(both owned upstream). ASCOPE(fail)destroys it on early throw, gated bybool graph_owned_by_us = !nested_graph;. The flag is disarmed the instantgraph_ctxadopts it viaauto gctx = graph_ctx(sub_graph, ...);— matchinggraph_ctx's documented ownership contract:Conditional branch (483, 496, 498, 518).
cudaGraphConditionalHandleCreate, the two CTK variants ofcudaGraphAddNode, andcudaGraphAddKernelNode. The handle and any added nodes live insidegraph, so they are implicitly cleaned up by the outerSCOPE(fail).finalize()(587, 590, 606, 616).cudaGraphAddDependencies(both CTK branches),cudaGraphDebugDotPrint,cudaGraphLaunch. Straightcuda_tryconversion; no rollback applies.Two
cuda_safe_calls intentionally remain inSCOPE(fail)bodiesLines 463 and 506.
SCOPEbodies arenoexcept, socuda_safe_call(abort-on-failure) is the correct tool there.stackable_ctx.cuh— 2 KEEPs in test fixturesThe two
cuda_safe_call(cudaStreamSynchronize(stream))calls insideUNITTESTlambdas passed totask(exec_place::host(), ...)->*lambdaare kept and annotated. The host-task dispatch path's exception safety has not been audited, so an abort there remains safer than an unannotated throw escaping into the runtime.Residual hazards intentionally documented inline
Both are no worse than the prior behavior (which aborted the entire process). Both are deferred:
Orphaned child node in
parent_graphif acuda_trythrows aftercudaGraphAddChildGraphNode. Clean removal would needcudaGraphDestroyNodeplus dependency rewiring. Harmless untilparent_graphis destroyed.Stale conditional handle in
*config.conditional_handleif acuda_trythrows aftercudaGraphConditionalHandleCreate. CUDA has no destroy API for conditional handles (they are tied to their graph, which theSCOPE(fail)destroys). Caller must treat the handle as invalid in the catch-block.Test plan