Add atomic `fetch_max`/`fetch_min` (P0493R5) for integral and pointer types by edenfunf · Pull Request #9205 · NVIDIA/cccl

edenfunf · 2026-06-01T17:36:01Z

Description

Implements P0493R5 atomic minimum/maximum for cuda::std::atomic and cuda::std::atomic_ref.

P0493R5 covers integer and pointer types only, so this PR adds:

Members fetch_max/fetch_min on the integral and pointer specializations of cuda::std::atomic and cuda::std::atomic_ref, via a new _LIBCUDACXX_ATOMIC_MINMAX_IMPL macro.
Free functions atomic_fetch_max/atomic_fetch_min and their _explicit forms (integral and pointer overloads).
The feature-test macro __cccl_lib_atomic_min_max (202403L).
Tests mirroring the existing atomic_fetch_add ones, covering integral and pointer types on host and device.

The implementation only wires up the std-conforming API surface: it reuses the existing __atomic_fetch_max_dispatch/__atomic_fetch_min_dispatch machinery (host CAS loop + device atom.max/atom.min with a CAS fallback) that the cuda::atomic extension already exposes. No code generation is touched.

Floating-point min/max (P3008R6) is intentionally out of scope and tracked separately by #9176.

A couple of notes for reviewers:

The new members are added to the integral (__atomic_bitwise) and pointer specializations only, not to the floating-point one. Consistent with the existing fetch_add free functions, the free functions exclude bool via SFINAE.
The feature-test macro is defined unconditionally (not gated on _CCCL_STD_VER), since the functionality is provided as an extension regardless of the language standard, following the __cccl_lib_saturation_arithmetic precedent. Happy to gate it behind C++26 instead if preferred.

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

Surface fetch_max/fetch_min on cuda::std::atomic and atomic_ref for the integral and pointer specializations, add the atomic_fetch_max/min and atomic_fetch_max/min_explicit free functions, and define the __cccl_lib_atomic_min_max feature-test macro. This reuses the existing __atomic_fetch_max_dispatch/__atomic_fetch_min_dispatch machinery (host CAS loop + device atom.max / CAS fallback) that the cuda::atomic extension already exposes. Floating-point min/max (P3008R6) is tracked separately by NVIDIA#9176 and is intentionally out of scope here. Adds std-layer tests covering integral and pointer types, host + device. Fixes NVIDIA#9173

copy-pr-bot · 2026-06-01T17:36:05Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-06-01T17:45:55Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 05c18c63-03ff-4f58-b5be-84e0b207da88

📥 Commits

Reviewing files that changed from the base of the PR and between d4e2cd2 and ce16801.

📒 Files selected for processing (2)

libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_max.pass.cpp
libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_min.pass.cpp

🚧 Files skipped from review as they are similar to previous changes (1)

libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_min.pass.cpp

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added atomic fetch_min/fetch_max member operations and corresponding free-function variants (including explicit memory-ordering) for integral and pointer types, plus atomic_ref support.
- Introduced a feature-test macro indicating availability of atomic min/max.
Tests
- Added comprehensive conformance tests exercising min/max behavior across memory scopes, volatile/non-volatile variants, and host/device targets.

suggestion:

Walkthrough

Adds atomic min/max member implementations via a new macro, wires them into owned/reference atomics, exposes public free-function APIs (implicit and explicit memory-order variants) for integral and pointer atomics, and adds conformance tests covering host/device and multiple memory scopes.

Changes

Atomic Min/Max Operations

Layer / File(s)	Summary
Minmax member implementation macro `libcudacxx/include/cuda/std/__atomic/api/common.h`	`_LIBCUDACXX_ATOMIC_MINMAX_IMPL` macro defines `fetch_max(_Tp, memory_order)` and `fetch_min(_Tp, memory_order)` forwarding to `__atomic_fetch_max_dispatch` / `__atomic_fetch_min_dispatch` with thread scope and default `memory_order_seq_cst`.
Member integration and feature test `libcudacxx/include/cuda/std/__atomic/api/owned.h`, `libcudacxx/include/cuda/std/__atomic/api/reference.h`, `libcudacxx/include/cuda/std/version`	Macro is invoked in `__atomic_bitwise` and `__atomic_pointer` (owned) and `__atomic_ref_bitwise` and `__atomic_ref_pointer` (reference). Adds `__cccl_lib_atomic_min_max = 202403L`.
Free function API `libcudacxx/include/cuda/std/atomic`	Adds `atomic_fetch_max`, `atomic_fetch_max_explicit`, `atomic_fetch_min`, `atomic_fetch_min_explicit` templates for integral (non-bool) and pointer `atomic`/`volatile atomic` types, forwarding to the corresponding member functions (explicit variants forward `memory_order`).
Comprehensive test coverage `libcudacxx/test/libcudacxx/std/atomics/.../atomic_fetch_max.pass.cpp`, `atomic_fetch_max_explicit.pass.cpp`, `atomic_fetch_min.pass.cpp`, `atomic_fetch_min_explicit.pass.cpp`, `atomic_ref_fetch_max.pass.cpp`, `atomic_ref_fetch_min.pass.cpp`	Adds conformance tests validating return-old-value and storage-update behaviors for replacement and no-op cases across integral and pointer instantiations, volatile variants, and host/device with local/shared/global memory selectors.

Assessment against linked issues

Objective	Addressed	Explanation
Implement P0493R5 atomic minimum/maximum member functions [`#9173`]	✅
Implement P0493R5 atomic minimum/maximum free functions [`#9173`]	✅
Support integral and pointer atomic types [`#9173`]	✅
Provide explicit memory-order variants [`#9173`]	✅

Suggested reviewers

ericniebler
fbusato
gevtushenko

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Infer (1.2.0)

libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_max.pass.cpp

libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_max.pass.cpp:27:10: fatal error: 'cuda/std/atomic' file not found
27 | #include <cuda/std/atomic>
| ^~~~~~~~~~~~~~~~~
1 error generated.
Error: the following clang command did not run successfully:
/opt/infer-linux-x86_64-v1.2.0/lib/infer/facebook-clang-plugins/clang/install/bin/clang-18
@/tmp/coderabbit-infer/ce16801c74859bb6924383a6099e16b321ba0732-32df2cb92a9be7f4/tmp/clang_command_.tmp.fe8075.txt
++Contents of '/tmp/coderabbit-infer/ce16801c74859bb6924383a6099e16b321ba0732-32df2cb92a9be7f4/tmp/clang_command_.tmp.fe8075.txt':
"-cc1" "-load"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../../facebook-clang-plugins/libtooling/build/FacebookClangPlugin.dylib"
"-add-plugin" "BiniouASTExporter" "-plugin-arg-BiniouASTExporter" "-"
"-plugin-arg-BiniouASTExporter" "PREPEND_CURRENT_DIR=1"
"-plugin-arg-BiniouASTExporter" "MAX_STRI

... [truncated 1285 characters] ...

"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include"
"-internal-externc-isystem" "/usr/include/x86_64-linux-gnu"
"-internal-externc-isystem" "/include" "-internal-externc-isystem"
"/usr/include" "-Wno-ignored-optimization-argument" "-Wno-everything"
"-fdeprecated-macro" "-ferror-limit" "19" "-fgnuc-version=4.2.1"
"-fskip-odr-check-in-gmf" "-fcxx-exceptions" "-fexceptions"
"-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o"
"/tmp/coderabbit-infer/32df2cb92a9be7f4/file.o" "-x" "c++"
"libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_max.pass.cpp"
"-O0" "-fno-builtin" "-include"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../lib/clang_wrappers/global_defines.h"
"-Wno-everything"

libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_min.pass.cpp

libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_min.pass.cpp:27:10: fatal error: 'cuda/std/atomic' file not found
27 | #include <cuda/std/atomic>
| ^~~~~~~~~~~~~~~~~
1 error generated.
Error: the following clang command did not run successfully:
/opt/infer-linux-x86_64-v1.2.0/lib/infer/facebook-clang-plugins/clang/install/bin/clang-18
@/tmp/coderabbit-infer/ce16801c74859bb6924383a6099e16b321ba0732-a49ab3f64d1ac0dc/tmp/clang_command_.tmp.e14c83.txt
++Contents of '/tmp/coderabbit-infer/ce16801c74859bb6924383a6099e16b321ba0732-a49ab3f64d1ac0dc/tmp/clang_command_.tmp.e14c83.txt':
"-cc1" "-load"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../../facebook-clang-plugins/libtooling/build/FacebookClangPlugin.dylib"
"-add-plugin" "BiniouASTExporter" "-plugin-arg-BiniouASTExporter" "-"
"-plugin-arg-BiniouASTExporter" "PREPEND_CURRENT_DIR=1"
"-plugin-arg-BiniouASTExporter" "MAX_STRI

... [truncated 1285 characters] ...

"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include"
"-internal-externc-isystem" "/usr/include/x86_64-linux-gnu"
"-internal-externc-isystem" "/include" "-internal-externc-isystem"
"/usr/include" "-Wno-ignored-optimization-argument" "-Wno-everything"
"-fdeprecated-macro" "-ferror-limit" "19" "-fgnuc-version=4.2.1"
"-fskip-odr-check-in-gmf" "-fcxx-exceptions" "-fexceptions"
"-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o"
"/tmp/coderabbit-infer/a49ab3f64d1ac0dc/file.o" "-x" "c++"
"libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_min.pass.cpp"
"-O0" "-fno-builtin" "-include"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../lib/clang_wrappers/global_defines.h"
"-Wno-everything"

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

fbusato

thanks for the contribution, @edenfunf!

The PR is really good. The only gap that I see is that atomic_ref functions are not tested directly. Outside that, there are stylistic issues, but unfortunately this not mimic the surrounding code, e.g. SFINAE->require, namespace qualification, inline with template, etc.

Another comment is about the accepted types. P0493R5 leaves open the door for supporting pointers, as the PR on the host side. On the other hand, the c++-draft only specifies integral types.
I leave the final decision to @wmaxey

miscco

Thanks a lot for working on this. It is already looking great 🎉

I believe we only need to modify the testing of pointers, because I would prefer to have a well defined ordering we can test.

For now its only integral types, so that is fine and we can easily extend that to floating point types later

The pointer overloads of the atomic_fetch_max/min tests fabricated pointers via integer-to-pointer casts and compared them, which is undefined behavior. Use a real array and compare pointers into it, which have a well-defined ordering. Also add std-layer tests that exercise cuda::std::atomic_ref::fetch_max and fetch_min directly for integral types.

coderabbitai

Actionable comments posted: 1

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8c6556c2-aac5-41ab-951a-5935168a3aef

📥 Commits

Reviewing files that changed from the base of the PR and between 2aec42c and d4e2cd2.

📒 Files selected for processing (6)

libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_max.pass.cpp
libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_max_explicit.pass.cpp
libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_min.pass.cpp
libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_min_explicit.pass.cpp
libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_max.pass.cpp
libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_min.pass.cpp

🚧 Files skipped from review as they are similar to previous changes (2)

libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_min_explicit.pass.cpp
libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_min.pass.cpp

Sweep atomic_ref<int*> and atomic_ref<const int*> in addition to the integral types, since __atomic_ref_pointer also provides fetch_max/fetch_min. Pointers into a real array are used so the ordering is well-defined.

wmaxey · 2026-06-03T22:52:58Z

A final step to this PR is going to be removing the overloads for fetch_max/min on the interface of cuda::atomic[_ref]. Adding these functions to the interface of __atomic_impl will obviate the need for them being tacked onto it.

See here

cccl/libcudacxx/include/cuda/__atomic/atomic.h

Lines 58 to 74 in c47f140

    
           _CCCL_HOST_DEVICE_API inline _Tp fetch_max(const _Tp& __op, memory_order __m = memory_order_seq_cst) noexcept 
        
           { 
        
             return ::cuda::std::__atomic_fetch_max_dispatch(&this->__a, __op, __m, ::cuda::std::__scope_to_tag<_Sco>{}); 
        
           } 
        
           _CCCL_HOST_DEVICE_API inline _Tp fetch_max(const _Tp& __op, memory_order __m = memory_order_seq_cst) volatile noexcept 
        
           { 
        
             return ::cuda::std::__atomic_fetch_max_dispatch(&this->__a, __op, __m, ::cuda::std::__scope_to_tag<_Sco>{}); 
        
           } 
        
           _CCCL_HOST_DEVICE_API inline _Tp fetch_min(const _Tp& __op, memory_order __m = memory_order_seq_cst) noexcept 
        
           { 
        
             return ::cuda::std::__atomic_fetch_min_dispatch(&this->__a, __op, __m, ::cuda::std::__scope_to_tag<_Sco>{}); 
        
           } 
        
           _CCCL_HOST_DEVICE_API inline _Tp fetch_min(const _Tp& __op, memory_order __m = memory_order_seq_cst) volatile noexcept 
        
           { 
        
             return ::cuda::std::__atomic_fetch_min_dispatch(&this->__a, __op, __m, ::cuda::std::__scope_to_tag<_Sco>{}); 
        
           }

and here

cccl/libcudacxx/include/cuda/__atomic/atomic.h

Lines 102 to 110 in c47f140

    
           _CCCL_HOST_DEVICE_API inline _Tp fetch_max(const _Tp& __op, memory_order __m = memory_order_seq_cst) const noexcept 
        
           { 
        
             return ::cuda::std::__atomic_fetch_max_dispatch(&this->__a, __op, __m, ::cuda::std::__scope_to_tag<_Sco>{}); 
        
           } 
        
           _CCCL_HOST_DEVICE_API inline _Tp fetch_min(const _Tp& __op, memory_order __m = memory_order_seq_cst) const noexcept 
        
           { 
        
             return ::cuda::std::__atomic_fetch_min_dispatch(&this->__a, __op, __m, ::cuda::std::__scope_to_tag<_Sco>{}); 
        
           }

We'll also need to update testing to include both interfaces.

edenfunf requested a review from a team as a code owner June 1, 2026 17:36

edenfunf requested a review from pciolkosz June 1, 2026 17:36

github-project-automation Bot added this to CCCL Jun 1, 2026

github-project-automation Bot moved this to Todo in CCCL Jun 1, 2026

cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 1, 2026

gevtushenko requested a review from wmaxey June 1, 2026 17:54

fbusato reviewed Jun 1, 2026

View reviewed changes

miscco reviewed Jun 2, 2026

View reviewed changes

Comment thread .../std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_max.pass.cpp Outdated

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread .../atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_max.pass.cpp

Add pointer coverage to atomic_ref min/max tests

ce16801

Sweep atomic_ref<int*> and atomic_ref<const int*> in addition to the integral types, since __atomic_ref_pointer also provides fetch_max/fetch_min. Pointers into a real array are used so the ordering is well-defined.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add atomic `fetch_max`/`fetch_min` (P0493R5) for integral and pointer types#9205

Add atomic `fetch_max`/`fetch_min` (P0493R5) for integral and pointer types#9205
edenfunf wants to merge 3 commits into
NVIDIA:mainfrom
edenfunf:feat/cuda-std-atomic-min-max

edenfunf commented Jun 1, 2026

Uh oh!

copy-pr-bot Bot commented Jun 1, 2026

Uh oh!

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Assessment against linked issues

Suggested reviewers

Uh oh!

fbusato left a comment

Uh oh!

miscco left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

wmaxey commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

edenfunf commented Jun 1, 2026

Description

Checklist

Uh oh!

copy-pr-bot Bot commented Jun 1, 2026

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Assessment against linked issues

Suggested reviewers

Uh oh!

fbusato left a comment

Choose a reason for hiding this comment

Uh oh!

miscco left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wmaxey commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading