Skip to content

[libcu++] Replace cudaStreamPerThread with cudaStream{} in PSTL#9214

Merged
davebayer merged 2 commits into
NVIDIA:mainfrom
davebayer:use_null_stream_in_pstl
Jun 2, 2026
Merged

[libcu++] Replace cudaStreamPerThread with cudaStream{} in PSTL#9214
davebayer merged 2 commits into
NVIDIA:mainfrom
davebayer:use_null_stream_in_pstl

Conversation

@davebayer
Copy link
Copy Markdown
Contributor

Fixes #9213.

@davebayer davebayer requested a review from a team as a code owner June 2, 2026 10:32
@davebayer davebayer requested a review from griwes June 2, 2026 10:32
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Jun 2, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 2, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 736796d3-84cf-4ae7-82c6-2333b8829c73

📥 Commits

Reviewing files that changed from the base of the PR and between 9a8f7f6 and 99ac5ff.

📒 Files selected for processing (7)
  • libcudacxx/include/cuda/std/__pstl/cuda/shift_left.h
  • libcudacxx/include/cuda/std/__pstl/cuda/shift_right.h
  • libcudacxx/include/cuda/std/__pstl/cuda/sort.h
  • libcudacxx/include/cuda/std/__pstl/cuda/stable_partition.h
  • libcudacxx/include/cuda/std/__pstl/cuda/temporary_storage.h
  • libcudacxx/test/libcudacxx/cuda/execution/execution_policy/get_memory_resource.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/execution/execution_policy/get_stream.pass.cpp

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes
    • Fixed CUDA stream handling in parallel algorithms (sort, shift, and partition operations) to properly acquire default execution streams instead of using deprecated stream settings.
    • Updated corresponding test expectations to reflect corrected stream behavior.

Walkthrough

This PR updates CUDA PSTL algorithm backends to use a default-initialized null stream (::cudaStream_t{}) as the fallback stream reference instead of the per-thread stream (cudaStreamPerThread). The change applies consistently across shift_left, shift_right, sort, stable_partition implementations and temporary storage, with corresponding test updates to validate the new behavior.

Changes

Stream fallback unification

Layer / File(s) Summary
Algorithm stream fallback updates
libcudacxx/include/cuda/std/__pstl/cuda/shift_left.h, shift_right.h, sort.h, stable_partition.h, temporary_storage.h
shift_left, shift_right, sort (radix and merge paths), stable_partition, and temporary_storage all replace ::cuda::stream_ref{cudaStreamPerThread} with ::cuda::stream_ref{::cudaStream_t{}} as the fallback stream reference passed to __call_or(::cuda::get_stream, ...).
Test updates for stream fallback
libcudacxx/test/libcudacxx/cuda/execution/execution_policy/get_stream.pass.cpp, get_memory_resource.pass.cpp
Test baseline expectations and assertions for ::cuda::get_stream are updated to use the new cuda::stream_ref{::cudaStream_t{}} reference, and get_memory_resource.pass.cpp adds a stream check after resource overwrite.

Assessment against linked issues

Objective Addressed Explanation
Unify PSTL algorithm stream fallback behavior [#9213]

Suggested reviewers

  • pciolkosz
  • ericniebler

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Infer (1.2.0)
libcudacxx/test/libcudacxx/cuda/execution/execution_policy/get_memory_resource.pass.cpp

libcudacxx/test/libcudacxx/cuda/execution/execution_policy/get_memory_resource.pass.cpp:17:10: fatal error: 'cuda/functional' file not found
17 | #include <cuda/functional>
| ^~~~~~~~~~~~~~~~~
1 error generated.
Error: the following clang command did not run successfully:
/opt/infer-linux-x86_64-v1.2.0/lib/infer/facebook-clang-plugins/clang/install/bin/clang-18
@/tmp/coderabbit-infer/99ac5ffcbfbf1b032774007601fba90593699d39-592ecba06aa173d3/tmp/clang_command_.tmp.115228.txt
++Contents of '/tmp/coderabbit-infer/99ac5ffcbfbf1b032774007601fba90593699d39-592ecba06aa173d3/tmp/clang_command_.tmp.115228.txt':
"-cc1" "-load"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../../facebook-clang-plugins/libtooling/build/FacebookClangPlugin.dylib"
"-add-plugin" "BiniouASTExporter" "-plugin-arg-BiniouASTExporter" "-"
"-plugin-arg-BiniouASTExporter" "PREPEND_CURRENT_DIR=1"
"-plugin-arg-BiniouASTExporter" "MAX_STRING_SIZE=65535" "-cc1" "-triple"

... [truncated 1214 characters] ...

l/include" "-internal-isystem"
"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include"
"-internal-externc-isystem" "/usr/include/x86_64-linux-gnu"
"-internal-externc-isystem" "/include" "-internal-externc-isystem"
"/usr/include" "-Wno-ignored-optimization-argument" "-Wno-everything"
"-fdeprecated-macro" "-ferror-limit" "19" "-fgnuc-version=4.2.1"
"-fskip-odr-check-in-gmf" "-fcxx-exceptions" "-fexceptions"
"-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o"
"/tmp/coderabbit-infer/592ecba06aa173d3/file.o" "-x" "c++"
"libcudacxx/test/libcudacxx/cuda/execution/execution_policy/get_memory_resource.pass.cpp"
"-O0" "-fno-builtin" "-include"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../lib/clang_wrappers/global_defines.h"
"-Wno-everything"

libcudacxx/test/libcudacxx/cuda/execution/execution_policy/get_stream.pass.cpp

libcudacxx/test/libcudacxx/cuda/execution/execution_policy/get_stream.pass.cpp:17:10: fatal error: 'cuda/functional' file not found
17 | #include <cuda/functional>
| ^~~~~~~~~~~~~~~~~
1 error generated.
Error: the following clang command did not run successfully:
/opt/infer-linux-x86_64-v1.2.0/lib/infer/facebook-clang-plugins/clang/install/bin/clang-18
@/tmp/coderabbit-infer/99ac5ffcbfbf1b032774007601fba90593699d39-dcd2ecffd9fe1481/tmp/clang_command_.tmp.c583a6.txt
++Contents of '/tmp/coderabbit-infer/99ac5ffcbfbf1b032774007601fba90593699d39-dcd2ecffd9fe1481/tmp/clang_command_.tmp.c583a6.txt':
"-cc1" "-load"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../../facebook-clang-plugins/libtooling/build/FacebookClangPlugin.dylib"
"-add-plugin" "BiniouASTExporter" "-plugin-arg-BiniouASTExporter" "-"
"-plugin-arg-BiniouASTExporter" "PREPEND_CURRENT_DIR=1"
"-plugin-arg-BiniouASTExporter" "MAX_STRING_SIZE=65535" "-cc1" "-triple"
"x86_64-

... [truncated 1187 characters] ...

/usr/local/include" "-internal-isystem"
"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include"
"-internal-externc-isystem" "/usr/include/x86_64-linux-gnu"
"-internal-externc-isystem" "/include" "-internal-externc-isystem"
"/usr/include" "-Wno-ignored-optimization-argument" "-Wno-everything"
"-fdeprecated-macro" "-ferror-limit" "19" "-fgnuc-version=4.2.1"
"-fskip-odr-check-in-gmf" "-fcxx-exceptions" "-fexceptions"
"-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o"
"/tmp/coderabbit-infer/dcd2ecffd9fe1481/file.o" "-x" "c++"
"libcudacxx/test/libcudacxx/cuda/execution/execution_policy/get_stream.pass.cpp"
"-O0" "-fno-builtin" "-include"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../lib/clang_wrappers/global_defines.h"
"-Wno-everything"


Comment @coderabbitai help to get the list of available commands and usage tips.

@davebayer davebayer enabled auto-merge (squash) June 2, 2026 11:17
@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

🥳 CI Workflow Results

🟩 Finished in 2h 55m: Pass: 100%/115 | Total: 3d 02h | Max: 2h 55m | Hits: 57%/947662

See results here.

@davebayer davebayer merged commit 4f5bc7c into NVIDIA:main Jun 2, 2026
135 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

Successfully created backport PR for branch/3.4.x:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[BUG]: Some PSTL algorithms fallback to cudaStreamPerThread while other to nullptr

3 participants