Skip to content

[STF] Migrate __stf/allocators/ from cuda_safe_call to cuda_try#9146

Closed
andralex wants to merge 1 commit into
NVIDIA:mainfrom
andralex:andralex/stf-cuda-try-allocators
Closed

[STF] Migrate __stf/allocators/ from cuda_safe_call to cuda_try#9146
andralex wants to merge 1 commit into
NVIDIA:mainfrom
andralex:andralex/stf-cuda-try-allocators

Conversation

@andralex
Copy link
Copy Markdown
Contributor

Summary

First PR in a series migrating production STF headers off the abort-on-failure cuda_safe_call and onto the throw-on-failure cuda_try, so callers (including Python wrappers and any wider exception-aware control flow) have the option to recover from CUDA errors instead of having the process aborted underneath them.

Scope here is intentionally tiny -- the entire cudax/include/cuda/experimental/__stf/allocators/ subtree, two call sites -- so the migration pattern can be reviewed before it scales up.

Changes

pooled_allocator.cuh

Plain cudaGetDeviceProperties query inside the constructor. No CUDA state in flight, no rollback needed -- straight cuda_safe_call -> cuda_try substitution.

adapters.cuh

stream_adapter::clear() synchronizes the stream before any blocking deallocations. With cuda_try, that path can now throw. The existing ~stream_adapter() sanity assertion (cleared_or_moved must be true at destruction) would then fire spuriously and report "clear() was not called" -- which is misleading: clear() was called, it just failed mid-way.

Guarded the bool flip with SCOPE(exit) at the top of clear() so the destructor's contract holds whether or not the synchronization throws. Explicit scope_guard.cuh include added rather than relying on transitive inclusion.

void clear()
{
  _CCCL_ASSERT(adapter_state, "Invalid state");
  _CCCL_ASSERT(!cleared_or_moved, "clear() was already called, or the object was moved.");

  SCOPE(exit) { cleared_or_moved = true; };

  // ... cuda_try(cudaStreamSynchronize(stream)); ...
}

The SCOPE(exit) body only writes a bool, which is noexcept -- no risk of std::terminate during unwinding.

Migration pattern notes (for the follow-ups)

  • SAFE: pure queries with no in-flight CUDA state -> direct cuda_try substitution.
  • GUARDED: operations that need an undo step on failure -> cuda_try + SCOPE(fail) for the rollback. Inside the SCOPE(fail) body, use cuda_safe_call, not cuda_try -- guard destructors are noexcept, so a thrown exception during unwinding would std::terminate.
  • KEEP: destructors and CUDA host callbacks remain cuda_safe_call. Same rationale -- those contexts are noexcept.

Test plan

  • CI green (/ok to test once branch is pushed -- see comment below)
  • No behavioral change for the success path
  • Confirmed no @code doxygen blocks in either file reference cuda_safe_call, so docs do not drift

First PR in a series migrating production STF headers away from the
abort-on-failure ``cuda_safe_call`` toward the throw-on-failure
``cuda_try``, so callers (including Python wrappers and any wider
exception-aware control flow) have the option to recover from CUDA
errors instead of having the process aborted underneath them.

Two sites in ``__stf/allocators/``:

- ``pooled_allocator.cuh``: a plain ``cudaGetDeviceProperties`` query
  inside the constructor. No CUDA state in flight, no rollback needed --
  straight substitution.

- ``adapters.cuh``: ``stream_adapter::clear()`` synchronizes the stream
  before performing blocking deallocations. Substituting ``cuda_try``
  means that path can now throw; the existing ``~stream_adapter()``
  sanity assertion (``cleared_or_moved`` must be true) would then fire
  spuriously and report "clear() was not called" -- which is misleading,
  because clear() *was* called, it just failed mid-way. Guard the bool
  flip with ``SCOPE(exit)`` at the top of ``clear()`` so the destructor's
  contract holds whether or not the synchronization throws, and pull in
  ``scope_guard.cuh`` explicitly rather than relying on transitive
  include.

Pilot PR for the broader migration -- intentionally small so the
conversion pattern (including SCOPE-body usage of ``cuda_safe_call`` for
the rollback step in future PRs) can be reviewed before scaling up.
@andralex andralex requested a review from a team as a code owner May 27, 2026 22:03
@andralex andralex requested a review from caugonnet May 27, 2026 22:03
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 27, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-project-automation github-project-automation Bot moved this to Todo in CCCL May 27, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL May 27, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes
    • Improved exception safety in memory allocation operations to prevent resource leaks during CUDA operations.
    • Enhanced error handling approach for device configuration queries to increase overall system robustness.

Walkthrough

This PR improves exception-safety and error-handling consistency in STF allocators. stream_adapter::clear() now uses scope guards to guarantee the destructor precondition is met even when CUDA operations throw, and switches stream synchronization from cuda_safe_call to cuda_try. Similarly, block_data_pool switches device-properties queries to cuda_try.

Changes

STF Allocator Robustness

Layer / File(s) Summary
stream_adapter::clear() exception-safety and error-handling
cudax/include/cuda/experimental/__stf/allocators/adapters.cuh
Includes scope_guard header and adds SCOPE(exit) guard in clear() to set cleared_or_moved = true on all exit paths; replaces cuda_safe_call with cuda_try for stream synchronization before blocking deallocations.
block_data_pool error-handling consistency
cudax/include/cuda/experimental/__stf/allocators/pooled_allocator.cuh
Switches cudaGetDeviceProperties from cuda_safe_call to cuda_try wrapper in pool constructor.

Possibly related PRs

  • NVIDIA/cccl#8891: Implements enhanced cuda_try output-parameter inference that this PR depends on for cudaGetDeviceProperties calls.

Suggested labels

stf

Suggested reviewers

  • caugonnet
  • alliepiper
  • bernhardmgruber

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 963e43f7-56ce-479c-8aca-f1922db75a36

📥 Commits

Reviewing files that changed from the base of the PR and between d2d87ed and 0f22512.

📒 Files selected for processing (2)
  • cudax/include/cuda/experimental/__stf/allocators/adapters.cuh
  • cudax/include/cuda/experimental/__stf/allocators/pooled_allocator.cuh

Comment on lines +174 to +182
// Flip the bool up-front so the destructor's "clear() was not called"
// sanity assertion still holds even if a CUDA call below throws and the
// caller catches the exception. From the user's contract perspective,
// ``clear()`` *was* called; whether every individual deallocation
// succeeded is communicated via the thrown ``cuda_exception``.
SCOPE(exit)
{
cleared_or_moved = true;
};
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

critical: Do not mark clear() as completed when it exits via exception.

If cuda_try throws on Line 195, adapter_state->to_free is still populated, but the SCOPE(exit) on Lines 179-182 sets cleared_or_moved = true anyway. That makes a retry hit the precondition on Line 172 and lets the destructor stop asserting even though buffers were never released, so this exception path becomes a silent leak. Only flip the state after all frees succeed, or split this into separate "clear attempted" and "clear completed" flags.

Also applies to: 195-195

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andralex Would not that go through SCOPE(fail) instead of SCOPE(exit) ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

@andralex
Copy link
Copy Markdown
Contributor Author

/ok to test 0f22512

@andralex
Copy link
Copy Markdown
Contributor Author

Closing to let the author review the diff locally before pushing. Will reopen (or a successor PR) once approved.

@andralex andralex closed this May 27, 2026
@github-project-automation github-project-automation Bot moved this from In Review to Done in CCCL May 27, 2026
@andralex andralex deleted the andralex/stf-cuda-try-allocators branch May 27, 2026 22:17
@github-actions
Copy link
Copy Markdown
Contributor

🥳 CI Workflow Results

🟩 Finished in 59m 28s: Pass: 100%/55 | Total: 23h 04m | Max: 59m 27s | Hits: 15%/150797

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants