Skip to content

fix kernel launch failure when sender expressions can throw#6277

Merged
ericniebler merged 3 commits intoNVIDIA:mainfrom
ericniebler:polyfill-exception_ptr-for-device-code
Oct 20, 2025
Merged

fix kernel launch failure when sender expressions can throw#6277
ericniebler merged 3 commits intoNVIDIA:mainfrom
ericniebler:polyfill-exception_ptr-for-device-code

Conversation

@ericniebler
Copy link
Contributor

Description

the following code will fail with an "invalid device function" cuda error:

namespace ex = cuda::experimental::execution;
auto ctx     = ex::stream_context{cuda::experimental::device_ref(0)};
auto gpu     = ctx.get_scheduler();

auto sndr    = ex::on(gpu, ex::just()) | ex::then([]{ return 42; });
auto [value] = ex::sync_wait(std::move(sndr));

however, the following nearly identical code will work:

namespace ex = cuda::experimental::execution;
auto ctx     = ex::stream_context{cuda::experimental::device_ref(0)};
auto gpu     = ctx.get_scheduler();

auto sndr    = ex::on(gpu, ex::just()) | ex::then([]() noexcept { return 42; });
auto [value] = ex::sync_wait(std::move(sndr));

the difference is that the lambda passed to ex::then is noexcept. when it is not noexcept, the sender has a ex::set_error_t(std::exception_ptr) completion signature -- but only in host code.

this results in host code and device code computing different intermediate types, and instantiating kernels with those different types. since the mangled names of the kernels differ, the kernel launch fails.

Fix

the problem stems from the fact that cudax::execution uses some nothrow type traits that are unconditionally true in device code (since device code never throws). the fix is to stop defining the nothrow traits differently for device code.

but that is an incomplete fix, because with that change, device code now needs to be aware of std::exception_ptr. the standard exception_ptr APIs are host-only and so cannot be called from device code.

the fix is to polyfill the exception_ptr APIs for device code, and use the std:: APIs on host.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Oct 17, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Oct 17, 2025
@ericniebler
Copy link
Contributor Author

/ok to test d8905b0

@github-actions

This comment has been minimized.

@ericniebler
Copy link
Contributor Author

/ok to test b69591a

@ericniebler
Copy link
Contributor Author

/ok to test 9f65d45

@github-actions
Copy link
Contributor

🥳 CI Workflow Results

🟩 Finished in 31m 19s: Pass: 100%/42 | Total: 5h 15m | Max: 25m 48s | Hits: 95%/21329

See results here.

@ericniebler ericniebler marked this pull request as ready for review October 20, 2025 18:07
@ericniebler ericniebler requested a review from a team as a code owner October 20, 2025 18:07
@cccl-authenticator-app cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Oct 20, 2025
@ericniebler ericniebler requested a review from pciolkosz October 20, 2025 19:43
@ericniebler ericniebler merged commit 63f0638 into NVIDIA:main Oct 20, 2025
55 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Oct 20, 2025
@ericniebler ericniebler deleted the polyfill-exception_ptr-for-device-code branch October 20, 2025 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants