Skip to content

Add atomic fetch_max/fetch_min (P0493R5) for integral and pointer types#9205

Open
edenfunf wants to merge 3 commits into
NVIDIA:mainfrom
edenfunf:feat/cuda-std-atomic-min-max
Open

Add atomic fetch_max/fetch_min (P0493R5) for integral and pointer types#9205
edenfunf wants to merge 3 commits into
NVIDIA:mainfrom
edenfunf:feat/cuda-std-atomic-min-max

Conversation

@edenfunf
Copy link
Copy Markdown
Contributor

@edenfunf edenfunf commented Jun 1, 2026

Description

closes #9173

Implements P0493R5 atomic minimum/maximum for cuda::std::atomic and cuda::std::atomic_ref.

P0493R5 covers integer and pointer types only, so this PR adds:

  • Members fetch_max/fetch_min on the integral and pointer specializations of cuda::std::atomic and cuda::std::atomic_ref, via a new _LIBCUDACXX_ATOMIC_MINMAX_IMPL macro.
  • Free functions atomic_fetch_max/atomic_fetch_min and their _explicit forms (integral and pointer overloads).
  • The feature-test macro __cccl_lib_atomic_min_max (202403L).
  • Tests mirroring the existing atomic_fetch_add ones, covering integral and pointer types on host and device.

The implementation only wires up the std-conforming API surface: it reuses the existing __atomic_fetch_max_dispatch/__atomic_fetch_min_dispatch machinery (host CAS loop + device atom.max/atom.min with a CAS fallback) that the cuda::atomic extension already exposes. No code generation is touched.

Floating-point min/max (P3008R6) is intentionally out of scope and tracked separately by #9176.

A couple of notes for reviewers:

  • The new members are added to the integral (__atomic_bitwise) and pointer specializations only, not to the floating-point one. Consistent with the existing fetch_add free functions, the free functions exclude bool via SFINAE.
  • The feature-test macro is defined unconditionally (not gated on _CCCL_STD_VER), since the functionality is provided as an extension regardless of the language standard, following the __cccl_lib_saturation_arithmetic precedent. Happy to gate it behind C++26 instead if preferred.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Surface fetch_max/fetch_min on cuda::std::atomic and atomic_ref for the
integral and pointer specializations, add the atomic_fetch_max/min and
atomic_fetch_max/min_explicit free functions, and define the
__cccl_lib_atomic_min_max feature-test macro.

This reuses the existing __atomic_fetch_max_dispatch/__atomic_fetch_min_dispatch
machinery (host CAS loop + device atom.max / CAS fallback) that the cuda::atomic
extension already exposes. Floating-point min/max (P3008R6) is tracked separately
by NVIDIA#9176 and is intentionally out of scope here.

Adds std-layer tests covering integral and pointer types, host + device.

Fixes NVIDIA#9173
@edenfunf edenfunf requested a review from a team as a code owner June 1, 2026 17:36
@edenfunf edenfunf requested a review from pciolkosz June 1, 2026 17:36
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Jun 1, 2026
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented Jun 1, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 1, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 05c18c63-03ff-4f58-b5be-84e0b207da88

📥 Commits

Reviewing files that changed from the base of the PR and between d4e2cd2 and ce16801.

📒 Files selected for processing (2)
  • libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_max.pass.cpp
  • libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_min.pass.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_min.pass.cpp

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added atomic fetch_min/fetch_max member operations and corresponding free-function variants (including explicit memory-ordering) for integral and pointer types, plus atomic_ref support.
    • Introduced a feature-test macro indicating availability of atomic min/max.
  • Tests

    • Added comprehensive conformance tests exercising min/max behavior across memory scopes, volatile/non-volatile variants, and host/device targets.

suggestion:

Walkthrough

Adds atomic min/max member implementations via a new macro, wires them into owned/reference atomics, exposes public free-function APIs (implicit and explicit memory-order variants) for integral and pointer atomics, and adds conformance tests covering host/device and multiple memory scopes.

Changes

Atomic Min/Max Operations

Layer / File(s) Summary
Minmax member implementation macro
libcudacxx/include/cuda/std/__atomic/api/common.h
_LIBCUDACXX_ATOMIC_MINMAX_IMPL macro defines fetch_max(_Tp, memory_order) and fetch_min(_Tp, memory_order) forwarding to __atomic_fetch_max_dispatch / __atomic_fetch_min_dispatch with thread scope and default memory_order_seq_cst.
Member integration and feature test
libcudacxx/include/cuda/std/__atomic/api/owned.h, libcudacxx/include/cuda/std/__atomic/api/reference.h, libcudacxx/include/cuda/std/version
Macro is invoked in __atomic_bitwise and __atomic_pointer (owned) and __atomic_ref_bitwise and __atomic_ref_pointer (reference). Adds __cccl_lib_atomic_min_max = 202403L.
Free function API
libcudacxx/include/cuda/std/atomic
Adds atomic_fetch_max, atomic_fetch_max_explicit, atomic_fetch_min, atomic_fetch_min_explicit templates for integral (non-bool) and pointer atomic/volatile atomic types, forwarding to the corresponding member functions (explicit variants forward memory_order).
Comprehensive test coverage
libcudacxx/test/libcudacxx/std/atomics/.../atomic_fetch_max.pass.cpp, atomic_fetch_max_explicit.pass.cpp, atomic_fetch_min.pass.cpp, atomic_fetch_min_explicit.pass.cpp, atomic_ref_fetch_max.pass.cpp, atomic_ref_fetch_min.pass.cpp
Adds conformance tests validating return-old-value and storage-update behaviors for replacement and no-op cases across integral and pointer instantiations, volatile variants, and host/device with local/shared/global memory selectors.

Assessment against linked issues

Objective Addressed Explanation
Implement P0493R5 atomic minimum/maximum member functions [#9173]
Implement P0493R5 atomic minimum/maximum free functions [#9173]
Support integral and pointer atomic types [#9173]
Provide explicit memory-order variants [#9173]

Suggested reviewers

  • ericniebler
  • fbusato
  • gevtushenko

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Infer (1.2.0)
libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_max.pass.cpp

libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_max.pass.cpp:27:10: fatal error: 'cuda/std/atomic' file not found
27 | #include <cuda/std/atomic>
| ^~~~~~~~~~~~~~~~~
1 error generated.
Error: the following clang command did not run successfully:
/opt/infer-linux-x86_64-v1.2.0/lib/infer/facebook-clang-plugins/clang/install/bin/clang-18
@/tmp/coderabbit-infer/ce16801c74859bb6924383a6099e16b321ba0732-32df2cb92a9be7f4/tmp/clang_command_.tmp.fe8075.txt
++Contents of '/tmp/coderabbit-infer/ce16801c74859bb6924383a6099e16b321ba0732-32df2cb92a9be7f4/tmp/clang_command_.tmp.fe8075.txt':
"-cc1" "-load"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../../facebook-clang-plugins/libtooling/build/FacebookClangPlugin.dylib"
"-add-plugin" "BiniouASTExporter" "-plugin-arg-BiniouASTExporter" "-"
"-plugin-arg-BiniouASTExporter" "PREPEND_CURRENT_DIR=1"
"-plugin-arg-BiniouASTExporter" "MAX_STRI

... [truncated 1285 characters] ...

"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include"
"-internal-externc-isystem" "/usr/include/x86_64-linux-gnu"
"-internal-externc-isystem" "/include" "-internal-externc-isystem"
"/usr/include" "-Wno-ignored-optimization-argument" "-Wno-everything"
"-fdeprecated-macro" "-ferror-limit" "19" "-fgnuc-version=4.2.1"
"-fskip-odr-check-in-gmf" "-fcxx-exceptions" "-fexceptions"
"-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o"
"/tmp/coderabbit-infer/32df2cb92a9be7f4/file.o" "-x" "c++"
"libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_max.pass.cpp"
"-O0" "-fno-builtin" "-include"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../lib/clang_wrappers/global_defines.h"
"-Wno-everything"

libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_min.pass.cpp

libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_min.pass.cpp:27:10: fatal error: 'cuda/std/atomic' file not found
27 | #include <cuda/std/atomic>
| ^~~~~~~~~~~~~~~~~
1 error generated.
Error: the following clang command did not run successfully:
/opt/infer-linux-x86_64-v1.2.0/lib/infer/facebook-clang-plugins/clang/install/bin/clang-18
@/tmp/coderabbit-infer/ce16801c74859bb6924383a6099e16b321ba0732-a49ab3f64d1ac0dc/tmp/clang_command_.tmp.e14c83.txt
++Contents of '/tmp/coderabbit-infer/ce16801c74859bb6924383a6099e16b321ba0732-a49ab3f64d1ac0dc/tmp/clang_command_.tmp.e14c83.txt':
"-cc1" "-load"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../../facebook-clang-plugins/libtooling/build/FacebookClangPlugin.dylib"
"-add-plugin" "BiniouASTExporter" "-plugin-arg-BiniouASTExporter" "-"
"-plugin-arg-BiniouASTExporter" "PREPEND_CURRENT_DIR=1"
"-plugin-arg-BiniouASTExporter" "MAX_STRI

... [truncated 1285 characters] ...

"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include"
"-internal-externc-isystem" "/usr/include/x86_64-linux-gnu"
"-internal-externc-isystem" "/include" "-internal-externc-isystem"
"/usr/include" "-Wno-ignored-optimization-argument" "-Wno-everything"
"-fdeprecated-macro" "-ferror-limit" "19" "-fgnuc-version=4.2.1"
"-fskip-odr-check-in-gmf" "-fcxx-exceptions" "-fexceptions"
"-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o"
"/tmp/coderabbit-infer/a49ab3f64d1ac0dc/file.o" "-x" "c++"
"libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_min.pass.cpp"
"-O0" "-fno-builtin" "-include"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../lib/clang_wrappers/global_defines.h"
"-Wno-everything"


Comment @coderabbitai help to get the list of available commands and usage tips.

@gevtushenko gevtushenko requested a review from wmaxey June 1, 2026 17:54
Copy link
Copy Markdown
Contributor

@fbusato fbusato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the contribution, @edenfunf!

The PR is really good. The only gap that I see is that atomic_ref functions are not tested directly. Outside that, there are stylistic issues, but unfortunately this not mimic the surrounding code, e.g. SFINAE->require, namespace qualification, inline with template, etc.

Another comment is about the accepted types. P0493R5 leaves open the door for supporting pointers, as the PR on the host side. On the other hand, the c++-draft only specifies integral types.
I leave the final decision to @wmaxey

Copy link
Copy Markdown
Contributor

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on this. It is already looking great 🎉

I believe we only need to modify the testing of pointers, because I would prefer to have a well defined ordering we can test.

For now its only integral types, so that is fine and we can easily extend that to floating point types later

The pointer overloads of the atomic_fetch_max/min tests fabricated pointers
via integer-to-pointer casts and compared them, which is undefined behavior.
Use a real array and compare pointers into it, which have a well-defined
ordering.

Also add std-layer tests that exercise cuda::std::atomic_ref::fetch_max and
fetch_min directly for integral types.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8c6556c2-aac5-41ab-951a-5935168a3aef

📥 Commits

Reviewing files that changed from the base of the PR and between 2aec42c and d4e2cd2.

📒 Files selected for processing (6)
  • libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_max.pass.cpp
  • libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_max_explicit.pass.cpp
  • libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_min.pass.cpp
  • libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_min_explicit.pass.cpp
  • libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_max.pass.cpp
  • libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_min.pass.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
  • libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_min_explicit.pass.cpp
  • libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_min.pass.cpp

Sweep atomic_ref<int*> and atomic_ref<const int*> in addition to the
integral types, since __atomic_ref_pointer also provides fetch_max/fetch_min.
Pointers into a real array are used so the ordering is well-defined.
@wmaxey
Copy link
Copy Markdown
Member

wmaxey commented Jun 3, 2026

A final step to this PR is going to be removing the overloads for fetch_max/min on the interface of cuda::atomic[_ref]. Adding these functions to the interface of __atomic_impl will obviate the need for them being tacked onto it.

See here

_CCCL_HOST_DEVICE_API inline _Tp fetch_max(const _Tp& __op, memory_order __m = memory_order_seq_cst) noexcept
{
return ::cuda::std::__atomic_fetch_max_dispatch(&this->__a, __op, __m, ::cuda::std::__scope_to_tag<_Sco>{});
}
_CCCL_HOST_DEVICE_API inline _Tp fetch_max(const _Tp& __op, memory_order __m = memory_order_seq_cst) volatile noexcept
{
return ::cuda::std::__atomic_fetch_max_dispatch(&this->__a, __op, __m, ::cuda::std::__scope_to_tag<_Sco>{});
}
_CCCL_HOST_DEVICE_API inline _Tp fetch_min(const _Tp& __op, memory_order __m = memory_order_seq_cst) noexcept
{
return ::cuda::std::__atomic_fetch_min_dispatch(&this->__a, __op, __m, ::cuda::std::__scope_to_tag<_Sco>{});
}
_CCCL_HOST_DEVICE_API inline _Tp fetch_min(const _Tp& __op, memory_order __m = memory_order_seq_cst) volatile noexcept
{
return ::cuda::std::__atomic_fetch_min_dispatch(&this->__a, __op, __m, ::cuda::std::__scope_to_tag<_Sco>{});
}
and here
_CCCL_HOST_DEVICE_API inline _Tp fetch_max(const _Tp& __op, memory_order __m = memory_order_seq_cst) const noexcept
{
return ::cuda::std::__atomic_fetch_max_dispatch(&this->__a, __op, __m, ::cuda::std::__scope_to_tag<_Sco>{});
}
_CCCL_HOST_DEVICE_API inline _Tp fetch_min(const _Tp& __op, memory_order __m = memory_order_seq_cst) const noexcept
{
return ::cuda::std::__atomic_fetch_min_dispatch(&this->__a, __op, __m, ::cuda::std::__scope_to_tag<_Sco>{});
}

We'll also need to update testing to include both interfaces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

[FEA]: Implement P0493R5 Atomic maximum and minimum

4 participants