Skip to content

cuda::simd Add abs_diff#8994

Open
fbusato wants to merge 3 commits into
NVIDIA:mainfrom
fbusato:simd-cuda-vabsdiff
Open

cuda::simd Add abs_diff#8994
fbusato wants to merge 3 commits into
NVIDIA:mainfrom
fbusato:simd-cuda-vabsdiff

Conversation

@fbusato
Copy link
Copy Markdown
Contributor

@fbusato fbusato commented May 14, 2026

Description

Introduce SIMD abs_diff to compute the vectorize absolute difference and use VABSDIFF.
The PR includes the implementation, unit test, documentation, and codegen checks.

Requires:

@fbusato fbusato self-assigned this May 14, 2026
@fbusato fbusato requested review from a team as code owners May 14, 2026 23:39
@fbusato fbusato added this to CCCL May 14, 2026
@fbusato fbusato requested a review from a team as a code owner May 14, 2026 23:39
@fbusato fbusato requested a review from alliepiper May 14, 2026 23:39
@fbusato fbusato added the libcu++ For all items related to libcu++ label May 14, 2026
@fbusato fbusato requested a review from pciolkosz May 14, 2026 23:39
@github-project-automation github-project-automation Bot moved this to Todo in CCCL May 14, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL May 14, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

Review Change Stack

Warning

Rate limit exceeded

@fbusato has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 5 minutes and 46 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9717d3a4-f448-4ed8-90b5-efff84021257

📥 Commits

Reviewing files that changed from the base of the PR and between ed4cb30 and f690b2e.

📒 Files selected for processing (18)
  • docs/libcudacxx/extended_api.rst
  • docs/libcudacxx/extended_api/simd.rst
  • docs/libcudacxx/extended_api/simd/abs_diff.rst
  • docs/libcudacxx/extended_api/simd/saturating_add.rst
  • libcudacxx/include/cuda/__simd/saturating_add.h
  • libcudacxx/include/cuda/__simd/simd_intrinsics.h
  • libcudacxx/include/cuda/__simd/simd_intrinsics_array.h
  • libcudacxx/include/cuda/__simd/vabsdiff.h
  • libcudacxx/include/cuda/simd
  • libcudacxx/include/cuda/std/__fwd/simd.h
  • libcudacxx/include/cuda/std/__internal/features.h
  • libcudacxx/include/cuda/std/__internal/namespaces.h
  • libcudacxx/include/cuda/std/__simd/basic_vec.h
  • libcudacxx/include/cuda/std/__simd/specializations/fixed_size_integral_vec.h
  • libcudacxx/include/cuda/std/__simd/specializations/fixed_size_storage.h
  • libcudacxx/include/cuda/std/__simd/specializations/simd_intrinsics.h
  • libcudacxx/include/cuda/std/__simd/specializations/simd_intrinsics_array.h
  • libcudacxx/include/cuda/std/__simd/type_traits.h
📝 Walkthrough

Walkthrough

Adds cuda::simd::saturating_add and cuda::simd::abs_diff with device intrinsics, std::simd fixed-size integral specializations, feature macros and namespace helpers, storage/alignment updates, public CUDA headers, docs, CMake test infra changes, and many new codegen and unit tests.

Changes

SIMD saturating add and absolute difference

Layer / File(s) Summary
Combined SIMD feature, intrinsics, std specializations, headers, docs, and tests
multiple files across include/, docs/, test/
Implements feature macros and namespace helpers; adds device intrinsics and array helpers; provides saturating_add and abs_diff public headers and <cuda/simd> aggregator; adds std::simd fixed-size integral specialization, storage alignment and pointer-alignment adjustments; updates CMake simd_codegen logic; and adds unit/codegen tests and docs pages.

suggestion:

Walkthrough

Adds cuda::simd::saturating_add and cuda::simd::abs_diff with device intrinsics, std::simd fixed-size integral specializations, feature macros and namespace helpers, storage/alignment updates, public CUDA headers, docs, CMake test infra changes, and many new codegen and unit tests.

Changes

SIMD saturating add and absolute difference

Layer / File(s) Summary
Combined SIMD feature, intrinsics, std specializations, headers, docs, and tests
multiple files across include/, docs/, test/
Implements feature macros and namespace helpers; adds device intrinsics and array helpers; provides saturating_add and abs_diff public headers and <cuda/simd> aggregator; adds std::simd fixed-size integral specialization, storage alignment and pointer-alignment adjustments; updates CMake simd_codegen logic; and adds unit/codegen tests and docs pages.

Suggested reviewers

  • alliepiper
  • pciolkosz
  • ericniebler

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
libcudacxx/include/cuda/std/__simd/basic_vec.h (1)

69-102: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

important: Restore a private boundary before the storage internals. __s_, __storage_tag, the storage-tag constructor, __make_mask, and __set are implementation details; making them public expands the basic_vec API surface just to let the new free functions reach inside the type. Please keep these private and add a narrow hidden-friend/internal accessor for the SIMD extension helpers instead. As per coding guidelines, libcudacxx reviews should focus on ABI/API stability.

🧹 Nitpick comments (1)
libcudacxx/include/cuda/std/__simd/specializations/fixed_size_storage.h (1)

44-45: ⚡ Quick win

suggestion: use ::cuda::std::size_t instead of unqualified size_t in __simd_storage_alignment_v so this header does not rely on namespace leakage and stays consistent with libcudacxx qualification rules.

-template <typename _Tp, __simd_size_type _Np, size_t _TotalAlignment = alignof(_Tp) * _Np>
-inline constexpr size_t __simd_storage_alignment_v = ::cuda::std::max(alignof(_Tp), size_t{8});
+template <typename _Tp, __simd_size_type _Np, ::cuda::std::size_t _TotalAlignment = alignof(_Tp) * _Np>
+inline constexpr ::cuda::std::size_t __simd_storage_alignment_v =
+  ::cuda::std::max(alignof(_Tp), ::cuda::std::size_t{8});

As per coding guidelines, standard integer type aliases must be fully qualified, e.g. ::cuda::std::size_t.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 216e654a-96f6-4f33-a243-f9f33c85cd63

📥 Commits

Reviewing files that changed from the base of the PR and between 16d667a and 554b016.

📒 Files selected for processing (46)
  • docs/libcudacxx/extended_api.rst
  • docs/libcudacxx/extended_api/simd.rst
  • docs/libcudacxx/extended_api/simd/abs_diff.rst
  • docs/libcudacxx/extended_api/simd/saturating_add.rst
  • libcudacxx/include/cuda/__simd/saturating_add.h
  • libcudacxx/include/cuda/__simd/simd_intrinsics.h
  • libcudacxx/include/cuda/__simd/simd_intrinsics_array.h
  • libcudacxx/include/cuda/__simd/vabsdiff.h
  • libcudacxx/include/cuda/simd
  • libcudacxx/include/cuda/std/__fwd/simd.h
  • libcudacxx/include/cuda/std/__internal/features.h
  • libcudacxx/include/cuda/std/__internal/namespaces.h
  • libcudacxx/include/cuda/std/__simd/basic_vec.h
  • libcudacxx/include/cuda/std/__simd/specializations/fixed_size_integral_vec.h
  • libcudacxx/include/cuda/std/__simd/specializations/fixed_size_storage.h
  • libcudacxx/include/cuda/std/__simd/specializations/simd_intrinsics.h
  • libcudacxx/include/cuda/std/__simd/specializations/simd_intrinsics_array.h
  • libcudacxx/include/cuda/std/__simd/type_traits.h
  • libcudacxx/test/libcudacxx/std/numerics/simd/simd.non_std/saturation_add.pass.cpp
  • libcudacxx/test/libcudacxx/std/numerics/simd/simd.non_std/vabsdiff.pass.cpp
  • libcudacxx/test/libcudacxx/std/numerics/simd/simd.traits/alignment.pass.cpp
  • libcudacxx/test/simd_codegen/CMakeLists.txt
  • libcudacxx/test/simd_codegen/floating_point/decrement_f32x2.cu
  • libcudacxx/test/simd_codegen/floating_point/fma_bf16.cu
  • libcudacxx/test/simd_codegen/floating_point/fma_f16.cu
  • libcudacxx/test/simd_codegen/floating_point/increment_f32x2.cu
  • libcudacxx/test/simd_codegen/floating_point/less_bf16.cu
  • libcudacxx/test/simd_codegen/floating_point/less_f16.cu
  • libcudacxx/test/simd_codegen/floating_point/minus_f32x2.cu
  • libcudacxx/test/simd_codegen/floating_point/multiplies_bf16.cu
  • libcudacxx/test/simd_codegen/floating_point/multiplies_f16.cu
  • libcudacxx/test/simd_codegen/floating_point/plus_bf16.cu
  • libcudacxx/test/simd_codegen/floating_point/plus_f16.cu
  • libcudacxx/test/simd_codegen/floating_point/plus_f32x2.cu
  • libcudacxx/test/simd_codegen/floating_point/unary_minus_f32x2.cu
  • libcudacxx/test/simd_codegen/fma_bf16.cu
  • libcudacxx/test/simd_codegen/fma_f16.cu
  • libcudacxx/test/simd_codegen/integer/arithmetic_u16x2.cu
  • libcudacxx/test/simd_codegen/integer/arithmetic_u8x4.cu
  • libcudacxx/test/simd_codegen/integer/bitwise_u16x2_u8x4.cu
  • libcudacxx/test/simd_codegen/minus_f32x2.cu
  • libcudacxx/test/simd_codegen/multiplies_bf16.cu
  • libcudacxx/test/simd_codegen/plus_bf16.cu
  • libcudacxx/test/simd_codegen/plus_f32x2.cu
  • libcudacxx/test/simd_codegen/saturation_add/saturating_add.cu
  • libcudacxx/test/simd_codegen/vabsdiff/vabsdiff.cu
💤 Files with no reviewable changes (6)
  • libcudacxx/test/simd_codegen/minus_f32x2.cu
  • libcudacxx/test/simd_codegen/fma_bf16.cu
  • libcudacxx/test/simd_codegen/plus_f32x2.cu
  • libcudacxx/test/simd_codegen/plus_bf16.cu
  • libcudacxx/test/simd_codegen/multiplies_bf16.cu
  • libcudacxx/test/simd_codegen/fma_f16.cu

Comment on lines +22 to +27
For each element ``i`` in the input vectors, the result is equivalent to:

.. code:: cuda

abs(lhs[i] - rhs[i])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In C++, can abs(a - b)on signed integers be undefined behavior whena - b overflows (for example with INT_MIN and INT_MAX)?

💡 Result:

Yes, abs(a - b) on signed integers can invoke undefined behavior when a - b overflows. Signed integer subtraction that produces a result outside the representable range of the type (e.g., INT_MIN - INT_MAX, which mathematically is -(INT_MAX + |INT_MIN|), exceeding INT_MIN in magnitude) causes undefined behavior due to signed integer overflow [1][2][3][4]. Even if the subtraction does not overflow, passing the result to std::abs where the absolute value cannot be represented (e.g., abs(INT_MIN)) also causes undefined behavior [5]. In the example with INT_MIN and INT_MAX, a=INT_MIN, b=INT_MAX leads to overflow in a - b, making the entire expression undefined before abs is even applied [6].

Citations:


important: Line 26 documents the operation as abs(lhs[i] - rhs[i]); for signed integer lanes this expression triggers undefined behavior when the subtraction overflows at extreme values (e.g., INT_MIN − INT_MAX). Reword this as a mathematical absolute difference or show the unsigned-domain formulation to avoid misleading users into UB-prone scalar code.

Comment thread libcudacxx/include/cuda/__simd/vabsdiff.h
Comment thread libcudacxx/include/cuda/std/__simd/type_traits.h Outdated
@github-actions

This comment has been minimized.

Comment thread libcudacxx/include/cuda/std/__internal/features.h Outdated
Comment thread libcudacxx/include/cuda/std/__simd/specializations/simd_intrinsics.h Outdated
Comment thread libcudacxx/include/cuda/std/__simd/type_traits.h Outdated
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: The title talks about abs_diff but this also adds saturating_add. Why is that in this PR and only saturating_add

//
//===----------------------------------------------------------------------===//

#ifndef _CUDA___SIMD_SIMD_INTRINSICS_ARRAY_H
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has a very confusing name

@fbusato fbusato force-pushed the simd-cuda-vabsdiff branch from ed4cb30 to f690b2e Compare May 19, 2026 22:45
@github-actions
Copy link
Copy Markdown
Contributor

😬 CI Workflow Results

🟥 Finished in 5h 05m: Pass: 8%/116 | Total: 17h 59m | Max: 30m 18s | Hits: 97%/3504

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

libcu++ For all items related to libcu++

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

3 participants