`cuda::simd` Add `abs_diff` by fbusato · Pull Request #8994 · NVIDIA/cccl

fbusato · 2026-05-14T23:39:57Z

Description

Introduce SIMD abs_diff to compute the vectorize absolute difference and use VABSDIFF.
The PR includes the implementation, unit test, documentation, and codegen checks.

Requires:

cuda::simd Add saturation_add #8991

coderabbitai · 2026-05-14T23:49:35Z

Warning

Rate limit exceeded

@fbusato has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 5 minutes and 46 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9717d3a4-f448-4ed8-90b5-efff84021257

📥 Commits

Reviewing files that changed from the base of the PR and between ed4cb30 and f690b2e.

📒 Files selected for processing (18)

docs/libcudacxx/extended_api.rst
docs/libcudacxx/extended_api/simd.rst
docs/libcudacxx/extended_api/simd/abs_diff.rst
docs/libcudacxx/extended_api/simd/saturating_add.rst
libcudacxx/include/cuda/__simd/saturating_add.h
libcudacxx/include/cuda/__simd/simd_intrinsics.h
libcudacxx/include/cuda/__simd/simd_intrinsics_array.h
libcudacxx/include/cuda/__simd/vabsdiff.h
libcudacxx/include/cuda/simd
libcudacxx/include/cuda/std/__fwd/simd.h
libcudacxx/include/cuda/std/__internal/features.h
libcudacxx/include/cuda/std/__internal/namespaces.h
libcudacxx/include/cuda/std/__simd/basic_vec.h
libcudacxx/include/cuda/std/__simd/specializations/fixed_size_integral_vec.h
libcudacxx/include/cuda/std/__simd/specializations/fixed_size_storage.h
libcudacxx/include/cuda/std/__simd/specializations/simd_intrinsics.h
libcudacxx/include/cuda/std/__simd/specializations/simd_intrinsics_array.h
libcudacxx/include/cuda/std/__simd/type_traits.h

📝 Walkthrough

Walkthrough

Adds cuda::simd::saturating_add and cuda::simd::abs_diff with device intrinsics, std::simd fixed-size integral specializations, feature macros and namespace helpers, storage/alignment updates, public CUDA headers, docs, CMake test infra changes, and many new codegen and unit tests.

Changes

SIMD saturating add and absolute difference

Layer / File(s)	Summary
Combined SIMD feature, intrinsics, std specializations, headers, docs, and tests `multiple files across include/, docs/, test/`	Implements feature macros and namespace helpers; adds device intrinsics and array helpers; provides `saturating_add` and `abs_diff` public headers and `<cuda/simd>` aggregator; adds std::simd fixed-size integral specialization, storage alignment and pointer-alignment adjustments; updates CMake simd_codegen logic; and adds unit/codegen tests and docs pages.

suggestion:

Walkthrough

Adds cuda::simd::saturating_add and cuda::simd::abs_diff with device intrinsics, std::simd fixed-size integral specializations, feature macros and namespace helpers, storage/alignment updates, public CUDA headers, docs, CMake test infra changes, and many new codegen and unit tests.

Changes

SIMD saturating add and absolute difference

Layer / File(s)	Summary
Combined SIMD feature, intrinsics, std specializations, headers, docs, and tests `multiple files across include/, docs/, test/`	Implements feature macros and namespace helpers; adds device intrinsics and array helpers; provides `saturating_add` and `abs_diff` public headers and `<cuda/simd>` aggregator; adds std::simd fixed-size integral specialization, storage alignment and pointer-alignment adjustments; updates CMake simd_codegen logic; and adds unit/codegen tests and docs pages.

Suggested reviewers

alliepiper
pciolkosz
ericniebler

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

libcudacxx/include/cuda/std/__simd/basic_vec.h (1)

69-102: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

important: Restore a private boundary before the storage internals. __s_, __storage_tag, the storage-tag constructor, __make_mask, and __set are implementation details; making them public expands the basic_vec API surface just to let the new free functions reach inside the type. Please keep these private and add a narrow hidden-friend/internal accessor for the SIMD extension helpers instead. As per coding guidelines, libcudacxx reviews should focus on ABI/API stability.

🧹 Nitpick comments (1)

libcudacxx/include/cuda/std/__simd/specializations/fixed_size_storage.h (1)
44-45: ⚡ Quick win

suggestion: use ::cuda::std::size_t instead of unqualified size_t in __simd_storage_alignment_v so this header does not rely on namespace leakage and stays consistent with libcudacxx qualification rules.
-template <typename _Tp, __simd_size_type _Np, size_t _TotalAlignment = alignof(_Tp) * _Np>
-inline constexpr size_t __simd_storage_alignment_v = ::cuda::std::max(alignof(_Tp), size_t{8});
+template <typename _Tp, __simd_size_type _Np, ::cuda::std::size_t _TotalAlignment = alignof(_Tp) * _Np>
+inline constexpr ::cuda::std::size_t __simd_storage_alignment_v =
+  ::cuda::std::max(alignof(_Tp), ::cuda::std::size_t{8});
As per coding guidelines, standard integer type aliases must be fully qualified, e.g. ::cuda::std::size_t.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 216e654a-96f6-4f33-a243-f9f33c85cd63

📥 Commits

Reviewing files that changed from the base of the PR and between 16d667a and 554b016.

📒 Files selected for processing (46)

docs/libcudacxx/extended_api.rst
docs/libcudacxx/extended_api/simd.rst
docs/libcudacxx/extended_api/simd/abs_diff.rst
docs/libcudacxx/extended_api/simd/saturating_add.rst
libcudacxx/include/cuda/__simd/saturating_add.h
libcudacxx/include/cuda/__simd/simd_intrinsics.h
libcudacxx/include/cuda/__simd/simd_intrinsics_array.h
libcudacxx/include/cuda/__simd/vabsdiff.h
libcudacxx/include/cuda/simd
libcudacxx/include/cuda/std/__fwd/simd.h
libcudacxx/include/cuda/std/__internal/features.h
libcudacxx/include/cuda/std/__internal/namespaces.h
libcudacxx/include/cuda/std/__simd/basic_vec.h
libcudacxx/include/cuda/std/__simd/specializations/fixed_size_integral_vec.h
libcudacxx/include/cuda/std/__simd/specializations/fixed_size_storage.h
libcudacxx/include/cuda/std/__simd/specializations/simd_intrinsics.h
libcudacxx/include/cuda/std/__simd/specializations/simd_intrinsics_array.h
libcudacxx/include/cuda/std/__simd/type_traits.h
libcudacxx/test/libcudacxx/std/numerics/simd/simd.non_std/saturation_add.pass.cpp
libcudacxx/test/libcudacxx/std/numerics/simd/simd.non_std/vabsdiff.pass.cpp
libcudacxx/test/libcudacxx/std/numerics/simd/simd.traits/alignment.pass.cpp
libcudacxx/test/simd_codegen/CMakeLists.txt
libcudacxx/test/simd_codegen/floating_point/decrement_f32x2.cu
libcudacxx/test/simd_codegen/floating_point/fma_bf16.cu
libcudacxx/test/simd_codegen/floating_point/fma_f16.cu
libcudacxx/test/simd_codegen/floating_point/increment_f32x2.cu
libcudacxx/test/simd_codegen/floating_point/less_bf16.cu
libcudacxx/test/simd_codegen/floating_point/less_f16.cu
libcudacxx/test/simd_codegen/floating_point/minus_f32x2.cu
libcudacxx/test/simd_codegen/floating_point/multiplies_bf16.cu
libcudacxx/test/simd_codegen/floating_point/multiplies_f16.cu
libcudacxx/test/simd_codegen/floating_point/plus_bf16.cu
libcudacxx/test/simd_codegen/floating_point/plus_f16.cu
libcudacxx/test/simd_codegen/floating_point/plus_f32x2.cu
libcudacxx/test/simd_codegen/floating_point/unary_minus_f32x2.cu
libcudacxx/test/simd_codegen/fma_bf16.cu
libcudacxx/test/simd_codegen/fma_f16.cu
libcudacxx/test/simd_codegen/integer/arithmetic_u16x2.cu
libcudacxx/test/simd_codegen/integer/arithmetic_u8x4.cu
libcudacxx/test/simd_codegen/integer/bitwise_u16x2_u8x4.cu
libcudacxx/test/simd_codegen/minus_f32x2.cu
libcudacxx/test/simd_codegen/multiplies_bf16.cu
libcudacxx/test/simd_codegen/plus_bf16.cu
libcudacxx/test/simd_codegen/plus_f32x2.cu
libcudacxx/test/simd_codegen/saturation_add/saturating_add.cu
libcudacxx/test/simd_codegen/vabsdiff/vabsdiff.cu

💤 Files with no reviewable changes (6)

libcudacxx/test/simd_codegen/minus_f32x2.cu
libcudacxx/test/simd_codegen/fma_bf16.cu
libcudacxx/test/simd_codegen/plus_f32x2.cu
libcudacxx/test/simd_codegen/plus_bf16.cu
libcudacxx/test/simd_codegen/multiplies_bf16.cu
libcudacxx/test/simd_codegen/fma_f16.cu

coderabbitai · 2026-05-14T23:49:39Z

+For each element ``i`` in the input vectors, the result is equivalent to:
+
+.. code:: cuda
+
+   abs(lhs[i] - rhs[i])
+


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In C++, can abs(a - b)on signed integers be undefined behavior whena - b overflows (for example with INT_MIN and INT_MAX)?

💡 Result:

Yes, abs(a - b) on signed integers can invoke undefined behavior when a - b overflows. Signed integer subtraction that produces a result outside the representable range of the type (e.g., INT_MIN - INT_MAX, which mathematically is -(INT_MAX + |INT_MIN|), exceeding INT_MIN in magnitude) causes undefined behavior due to signed integer overflow [1][2][3][4]. Even if the subtraction does not overflow, passing the result to std::abs where the absolute value cannot be represented (e.g., abs(INT_MIN)) also causes undefined behavior [5]. In the example with INT_MIN and INT_MAX, a=INT_MIN, b=INT_MAX leads to overflow in a - b, making the entire expression undefined before abs is even applied [6].

Citations:

1: https://en.cppreference.com/cpp/language/operator_arithmetic

2: https://stackoverflow.com/questions/16188263/is-signed-integer-overflow-still-undefined-behavior-in-c

3: https://timsong-cpp.github.io/cppwp/std20/basic.fundamental

4: https://en.cppreference.com/w/cpp/language/operator_arithmetic.html

5: https://cppreference.dev/w/cpp/numeric/math/abs

6: https://stackoverflow.com/questions/36342403/why-int-max-minus-int-min-equals-1

important: Line 26 documents the operation as abs(lhs[i] - rhs[i]); for signed integer lanes this expression triggers undefined behavior when the subtraction overflows at extreme values (e.g., INT_MIN − INT_MAX). Reword this as a mathematical absolute difference or show the unsigned-domain formulation to avoid misleading users into UB-prone scalar code.

miscco · 2026-05-19T06:27:35Z

Question: The title talks about abs_diff but this also adds saturating_add. Why is that in this PR and only saturating_add

miscco · 2026-05-19T06:28:57Z

+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _CUDA___SIMD_SIMD_INTRINSICS_ARRAY_H


This file has a very confusing name

github-actions · 2026-05-20T04:01:33Z

😬 CI Workflow Results

🟥 Finished in 5h 05m: Pass: 8%/116 | Total: 17h 59m | Max: 30m 18s | Hits: 97%/3504

See results here.

fbusato self-assigned this May 14, 2026

fbusato requested review from a team as code owners May 14, 2026 23:39

fbusato added this to CCCL May 14, 2026

fbusato requested a review from a team as a code owner May 14, 2026 23:39

fbusato requested a review from alliepiper May 14, 2026 23:39

fbusato added the libcu++ For all items related to libcu++ label May 14, 2026

fbusato requested a review from pciolkosz May 14, 2026 23:39

github-project-automation Bot moved this to Todo in CCCL May 14, 2026

cccl-authenticator-app Bot moved this from Todo to In Review in CCCL May 14, 2026

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

Jacobfaib reviewed May 18, 2026

View reviewed changes

fbusato mentioned this pull request May 18, 2026

cuda::simd Integer dot product #9064

Open

This comment has been minimized.

Sign in to view

miscco reviewed May 19, 2026

View reviewed changes

fbusato added 3 commits May 19, 2026 15:40

cuda::std::simd Optimize small integer operations

82dd526

cuda::simd Add saturation_add

cbb03d9

cuda::simd Add abs_diff

f690b2e

fbusato force-pushed the simd-cuda-vabsdiff branch from ed4cb30 to f690b2e Compare May 19, 2026 22:45

Conversation

fbusato commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Walkthrough

Changes

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

miscco May 19, 2026

Choose a reason for hiding this comment

Uh oh!

miscco May 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 20, 2026

😬 CI Workflow Results

🟥 Finished in 5h 05m: Pass: 8%/116 | Total: 17h 59m | Max: 30m 18s | Hits: 97%/3504

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fbusato commented May 14, 2026 •

edited

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading