[STF] Properly destroy CUDA streams and do not try to initialize CUDA while capturing#8919
Conversation
Destroy pool-owned streams with the stream pool and initialize the CUDA runtime only once so consecutive stream_ctx instances on a caller stream serialize without explicit synchronization.
Describe the runtime initialization invariant without relying on implementation history.
Skip CUDA runtime initialization when constructing a stream_ctx from an already-capturing user stream; the stream itself implies CUDA is initialized, and normal contexts still initialize before issuing work.
|
/ok to test 7059d3b |
This comment has been minimized.
This comment has been minimized.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 Walkthroughimportant: WalkthroughRuntime initialization is now conditional when a user stream is in capture; stream_ctx detects capture and forwards an initialize flag to backend impl. stream_pool records external ownership and avoids destroying user streams. A new STf test exercises back-to-back stream_ctx ordering on a caller stream. ChangesStream Capture and Ownership Management
important: Suggested labels
important: Suggested reviewers
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
cudax/test/stf/local_stf/stream_ctx_lifetime_btb.cu (1)
66-66: ⚡ Quick winsuggestion: add a compile-time guard for the partitioning assumption (
N % CHAIN_COUNT == 0, and optionallyN >= CHAIN_COUNT) so this test cannot silently skip tail elements if constants change later.cudax/include/cuda/experimental/__places/stream_pool.cuh (1)
171-186: 💤 Low valuesuggestion: The destructor silently discards
cudaStreamDestroyerrors. While destructors must not throw, failures during stream cleanup (e.g.,cudaErrorInvalidValue,cudaErrorCudartUnloading) may indicate resource leaks or runtime shutdown races. Consider logging the error or documenting why it's safe to ignore.Additionally, mark the destructor
noexceptto make the no-throw contract explicit.Proposed addition
- ~impl() + ~impl() noexcept { if (externally_owned) { return; } for (auto& ds : payload) { if (ds.stream != nullptr) { - [[maybe_unused]] cudaError_t err = cudaStreamDestroy(ds.stream); + cudaError_t err = cudaStreamDestroy(ds.stream); + // Errors during shutdown (e.g., cudaErrorCudartUnloading) are benign. + // Log or assert in debug builds if needed. + (void)err; ds.stream = nullptr; } } }cudax/include/cuda/experimental/__stf/stream/stream_ctx.cuh (1)
180-186: 💤 Low valuesuggestion:
is_capturingdoes not explicitly handleuser_stream == nullptr. WhilecudaStreamIsCapturingaccepts a NULL stream and returnscudaSuccesswithcudaStreamCaptureStatusNone, the intent should be documented. If a null stream is never expected here, consider adding an assertion; otherwise, document that null streams are treated as not-capturing.Proposed clarification
[[nodiscard]] static bool is_capturing(cudaStream_t user_stream) { + // Null stream (legacy default stream) is never capturing. cudaStreamCaptureStatus capture_status = cudaStreamCaptureStatusNone; cuda_safe_call(cudaStreamIsCapturing(user_stream, &capture_status)); return capture_status != cudaStreamCaptureStatusNone; }
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 0e8be004-fe22-4a85-b2d5-61d8fc3b76fb
📒 Files selected for processing (5)
cudax/include/cuda/experimental/__places/stream_pool.cuhcudax/include/cuda/experimental/__stf/internal/backend_ctx.cuhcudax/include/cuda/experimental/__stf/stream/stream_ctx.cuhcudax/test/stf/CMakeLists.txtcudax/test/stf/local_stf/stream_ctx_lifetime_btb.cu
|
/ok to test 543742a |
This comment has been minimized.
This comment has been minimized.
Co-authored-by: Andrei Alexandrescu <andrei@erdani.com>
Co-authored-by: Andrei Alexandrescu <andrei@erdani.com>
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 809dabdd-1172-4444-833d-ec721f461569
📒 Files selected for processing (4)
cudax/include/cuda/experimental/__stf/internal/backend_ctx.cuhcudax/include/cuda/experimental/__stf/stream/stream_ctx.cuhcudax/test/stf/CMakeLists.txtcudax/test/stf/local_stf/stream_ctx_lifetime_btb.cu
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cudax/include/cuda/experimental/__places/stream_pool.cuh (1)
167-182:⚠️ Potential issue | 🟠 Major | ⚡ Quick winimportant: Destructor must be marked
noexcept.The destructor ignores CUDA errors and doesn't throw, so per coding guidelines it must use
noexcept.Proposed fix
- ~impl() + ~impl() noexcept {As per coding guidelines: All functions that don't throw exceptions must use
noexcept.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 01f4dc4e-26ea-4e5f-903f-ace5e4f0d3fd
📒 Files selected for processing (1)
cudax/include/cuda/experimental/__places/stream_pool.cuh
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cudax/include/cuda/experimental/__places/stream_pool.cuh (1)
167-178:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winimportant: Align destructor declaration and CUDA free-function call with cudax header rules. At Line 167,
~impl() noexceptis missing a_CCCL_*API annotation, and at Line 178,cudaStreamDestroyshould be called as::cudaStreamDestroy(...)to satisfy the global qualification rule.As per coding guidelines: "All functions must be marked with
_CCCL_HOST_API,_CCCL_DEVICE_API, or_CCCL_API" and "All calls to free functions must be fully qualified from the global namespace."
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8afb961b-1b67-42f1-af61-908fd897af82
📒 Files selected for processing (1)
cudax/include/cuda/experimental/__places/stream_pool.cuh
|
/ok to test ae25afe |
This comment has been minimized.
This comment has been minimized.
|
/ok to test 9de92be |
|
Need to be investigated |
🥳 CI Workflow Results🟩 Finished in 33m 32s: Pass: 100%/55 | Total: 4h 50m | Max: 33m 32s | Hits: 100%/32746See results here. |
Destroy pool-owned streams with the stream pool and initialize the CUDA runtime only once so consecutive stream_ctx instances on a caller stream serialize without explicit synchronization.
Description
closes
Checklist