cuda::simd Add abs_diff#8994
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (18)
📝 WalkthroughWalkthroughAdds cuda::simd::saturating_add and cuda::simd::abs_diff with device intrinsics, std::simd fixed-size integral specializations, feature macros and namespace helpers, storage/alignment updates, public CUDA headers, docs, CMake test infra changes, and many new codegen and unit tests. ChangesSIMD saturating add and absolute difference
suggestion: WalkthroughAdds cuda::simd::saturating_add and cuda::simd::abs_diff with device intrinsics, std::simd fixed-size integral specializations, feature macros and namespace helpers, storage/alignment updates, public CUDA headers, docs, CMake test infra changes, and many new codegen and unit tests. ChangesSIMD saturating add and absolute difference
Suggested reviewers
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
libcudacxx/include/cuda/std/__simd/basic_vec.h (1)
69-102:⚠️ Potential issue | 🟠 Major | ⚡ Quick winimportant: Restore a private boundary before the storage internals.
__s_,__storage_tag, the storage-tag constructor,__make_mask, and__setare implementation details; making them public expands thebasic_vecAPI surface just to let the new free functions reach inside the type. Please keep these private and add a narrow hidden-friend/internal accessor for the SIMD extension helpers instead. As per coding guidelines, libcudacxx reviews should focus on ABI/API stability.
🧹 Nitpick comments (1)
libcudacxx/include/cuda/std/__simd/specializations/fixed_size_storage.h (1)
44-45: ⚡ Quick winsuggestion: use
::cuda::std::size_tinstead of unqualifiedsize_tin__simd_storage_alignment_vso this header does not rely on namespace leakage and stays consistent with libcudacxx qualification rules.-template <typename _Tp, __simd_size_type _Np, size_t _TotalAlignment = alignof(_Tp) * _Np> -inline constexpr size_t __simd_storage_alignment_v = ::cuda::std::max(alignof(_Tp), size_t{8}); +template <typename _Tp, __simd_size_type _Np, ::cuda::std::size_t _TotalAlignment = alignof(_Tp) * _Np> +inline constexpr ::cuda::std::size_t __simd_storage_alignment_v = + ::cuda::std::max(alignof(_Tp), ::cuda::std::size_t{8});As per coding guidelines, standard integer type aliases must be fully qualified, e.g.
::cuda::std::size_t.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 216e654a-96f6-4f33-a243-f9f33c85cd63
📒 Files selected for processing (46)
docs/libcudacxx/extended_api.rstdocs/libcudacxx/extended_api/simd.rstdocs/libcudacxx/extended_api/simd/abs_diff.rstdocs/libcudacxx/extended_api/simd/saturating_add.rstlibcudacxx/include/cuda/__simd/saturating_add.hlibcudacxx/include/cuda/__simd/simd_intrinsics.hlibcudacxx/include/cuda/__simd/simd_intrinsics_array.hlibcudacxx/include/cuda/__simd/vabsdiff.hlibcudacxx/include/cuda/simdlibcudacxx/include/cuda/std/__fwd/simd.hlibcudacxx/include/cuda/std/__internal/features.hlibcudacxx/include/cuda/std/__internal/namespaces.hlibcudacxx/include/cuda/std/__simd/basic_vec.hlibcudacxx/include/cuda/std/__simd/specializations/fixed_size_integral_vec.hlibcudacxx/include/cuda/std/__simd/specializations/fixed_size_storage.hlibcudacxx/include/cuda/std/__simd/specializations/simd_intrinsics.hlibcudacxx/include/cuda/std/__simd/specializations/simd_intrinsics_array.hlibcudacxx/include/cuda/std/__simd/type_traits.hlibcudacxx/test/libcudacxx/std/numerics/simd/simd.non_std/saturation_add.pass.cpplibcudacxx/test/libcudacxx/std/numerics/simd/simd.non_std/vabsdiff.pass.cpplibcudacxx/test/libcudacxx/std/numerics/simd/simd.traits/alignment.pass.cpplibcudacxx/test/simd_codegen/CMakeLists.txtlibcudacxx/test/simd_codegen/floating_point/decrement_f32x2.culibcudacxx/test/simd_codegen/floating_point/fma_bf16.culibcudacxx/test/simd_codegen/floating_point/fma_f16.culibcudacxx/test/simd_codegen/floating_point/increment_f32x2.culibcudacxx/test/simd_codegen/floating_point/less_bf16.culibcudacxx/test/simd_codegen/floating_point/less_f16.culibcudacxx/test/simd_codegen/floating_point/minus_f32x2.culibcudacxx/test/simd_codegen/floating_point/multiplies_bf16.culibcudacxx/test/simd_codegen/floating_point/multiplies_f16.culibcudacxx/test/simd_codegen/floating_point/plus_bf16.culibcudacxx/test/simd_codegen/floating_point/plus_f16.culibcudacxx/test/simd_codegen/floating_point/plus_f32x2.culibcudacxx/test/simd_codegen/floating_point/unary_minus_f32x2.culibcudacxx/test/simd_codegen/fma_bf16.culibcudacxx/test/simd_codegen/fma_f16.culibcudacxx/test/simd_codegen/integer/arithmetic_u16x2.culibcudacxx/test/simd_codegen/integer/arithmetic_u8x4.culibcudacxx/test/simd_codegen/integer/bitwise_u16x2_u8x4.culibcudacxx/test/simd_codegen/minus_f32x2.culibcudacxx/test/simd_codegen/multiplies_bf16.culibcudacxx/test/simd_codegen/plus_bf16.culibcudacxx/test/simd_codegen/plus_f32x2.culibcudacxx/test/simd_codegen/saturation_add/saturating_add.culibcudacxx/test/simd_codegen/vabsdiff/vabsdiff.cu
💤 Files with no reviewable changes (6)
- libcudacxx/test/simd_codegen/minus_f32x2.cu
- libcudacxx/test/simd_codegen/fma_bf16.cu
- libcudacxx/test/simd_codegen/plus_f32x2.cu
- libcudacxx/test/simd_codegen/plus_bf16.cu
- libcudacxx/test/simd_codegen/multiplies_bf16.cu
- libcudacxx/test/simd_codegen/fma_f16.cu
| For each element ``i`` in the input vectors, the result is equivalent to: | ||
|
|
||
| .. code:: cuda | ||
|
|
||
| abs(lhs[i] - rhs[i]) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In C++, can abs(a - b)on signed integers be undefined behavior whena - b overflows (for example with INT_MIN and INT_MAX)?
💡 Result:
Yes, abs(a - b) on signed integers can invoke undefined behavior when a - b overflows. Signed integer subtraction that produces a result outside the representable range of the type (e.g., INT_MIN - INT_MAX, which mathematically is -(INT_MAX + |INT_MIN|), exceeding INT_MIN in magnitude) causes undefined behavior due to signed integer overflow [1][2][3][4]. Even if the subtraction does not overflow, passing the result to std::abs where the absolute value cannot be represented (e.g., abs(INT_MIN)) also causes undefined behavior [5]. In the example with INT_MIN and INT_MAX, a=INT_MIN, b=INT_MAX leads to overflow in a - b, making the entire expression undefined before abs is even applied [6].
Citations:
- 1: https://en.cppreference.com/cpp/language/operator_arithmetic
- 2: https://stackoverflow.com/questions/16188263/is-signed-integer-overflow-still-undefined-behavior-in-c
- 3: https://timsong-cpp.github.io/cppwp/std20/basic.fundamental
- 4: https://en.cppreference.com/w/cpp/language/operator_arithmetic.html
- 5: https://cppreference.dev/w/cpp/numeric/math/abs
- 6: https://stackoverflow.com/questions/36342403/why-int-max-minus-int-min-equals-1
important: Line 26 documents the operation as abs(lhs[i] - rhs[i]); for signed integer lanes this expression triggers undefined behavior when the subtraction overflows at extreme values (e.g., INT_MIN − INT_MAX). Reword this as a mathematical absolute difference or show the unsigned-domain formulation to avoid misleading users into UB-prone scalar code.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Question: The title talks about abs_diff but this also adds saturating_add. Why is that in this PR and only saturating_add
| // | ||
| //===----------------------------------------------------------------------===// | ||
|
|
||
| #ifndef _CUDA___SIMD_SIMD_INTRINSICS_ARRAY_H |
There was a problem hiding this comment.
This file has a very confusing name
ed4cb30 to
f690b2e
Compare
😬 CI Workflow Results🟥 Finished in 5h 05m: Pass: 8%/116 | Total: 17h 59m | Max: 30m 18s | Hits: 97%/3504See results here. |
Description
Introduce SIMD
abs_diffto compute the vectorize absolute difference and useVABSDIFF.The PR includes the implementation, unit test, documentation, and codegen checks.
Requires:
cuda::simdAddsaturation_add#8991