-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Clang] Add __builtin_selectvector and use it for AVX512 intrinsics #91306
base: main
Are you sure you want to change the base?
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
25ab7ac
to
5053a4e
Compare
5053a4e
to
de35512
Compare
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-codegen Author: Nikolas Klauser (philnik777) ChangesThis is also very useful for generic code. For example this would allow libc++ to vectorize Patch is 962.61 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/91306.diff 32 Files Affected:
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index 96691b45d63a3..6513676438ffb 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -3019,6 +3019,26 @@ C-style cast applied to each element of the first argument.
Query for this feature with ``__has_builtin(__builtin_convertvector)``.
+``__builtin_selectvector``
+--------------------------
+
+``__builtin_selectvector`` is used to express generic vector element selection.
+
+**Signature**:
+
+.. code-block:: c++
+
+ template <class T, size_t N>
+ simd_vec<T, N> __builtin_selectvector(simd_vec<T, N> lhs, simd_vec<T, N> rhs,
+ simd_vec<bool, N> cond)
+
+**Description**:
+
+The returned vector is equivalent to
+``simd_vec<T, N>{cond[0] ? rhs[0] : lhs[0], ..., cond[N - 1] ? rhs[N - 1] : lhs[N - 1]}``.
+
+Query for this feature with ``__has_builtin(__builtin_selectvector)``.
+
``__builtin_bitreverse``
------------------------
diff --git a/clang/include/clang/Basic/Builtins.td b/clang/include/clang/Basic/Builtins.td
index d6ceb450bd106..279330d9b5251 100644
--- a/clang/include/clang/Basic/Builtins.td
+++ b/clang/include/clang/Basic/Builtins.td
@@ -1176,6 +1176,12 @@ def ConvertVector : Builtin {
let Prototype = "void(...)";
}
+def SelectVector : Builtin {
+ let Spellings = ["__builtin_selectvector"];
+ let Attributes = [NoThrow, Const, CustomTypeChecking];
+ let Prototype = "void(...)";
+}
+
def AllocaUninitialized : Builtin {
let Spellings = ["__builtin_alloca_uninitialized"];
let Attributes = [FunctionWithBuiltinPrefix, NoThrow];
diff --git a/clang/include/clang/Basic/BuiltinsX86.def b/clang/include/clang/Basic/BuiltinsX86.def
index eafcc219c1096..2e099b3ab4f05 100644
--- a/clang/include/clang/Basic/BuiltinsX86.def
+++ b/clang/include/clang/Basic/BuiltinsX86.def
@@ -1973,30 +1973,6 @@ TARGET_BUILTIN(__builtin_ia32_vfcmulcph256_mask, "V8fV8fV8fV8fUc", "ncV:256:",
TARGET_BUILTIN(__builtin_ia32_vfcmulcph512_mask, "V16fV16fV16fV16fUsIi", "ncV:512:", "avx512fp16,evex512")
// generic select intrinsics
-TARGET_BUILTIN(__builtin_ia32_selectb_128, "V16cUsV16cV16c", "ncV:128:", "avx512bw,avx512vl")
-TARGET_BUILTIN(__builtin_ia32_selectb_256, "V32cUiV32cV32c", "ncV:256:", "avx512bw,avx512vl")
-TARGET_BUILTIN(__builtin_ia32_selectb_512, "V64cUOiV64cV64c", "ncV:512:", "avx512bw,evex512")
-TARGET_BUILTIN(__builtin_ia32_selectw_128, "V8sUcV8sV8s", "ncV:128:", "avx512bw,avx512vl")
-TARGET_BUILTIN(__builtin_ia32_selectw_256, "V16sUsV16sV16s", "ncV:256:", "avx512bw,avx512vl")
-TARGET_BUILTIN(__builtin_ia32_selectw_512, "V32sUiV32sV32s", "ncV:512:", "avx512bw,evex512")
-TARGET_BUILTIN(__builtin_ia32_selectd_128, "V4iUcV4iV4i", "ncV:128:", "avx512vl")
-TARGET_BUILTIN(__builtin_ia32_selectd_256, "V8iUcV8iV8i", "ncV:256:", "avx512vl")
-TARGET_BUILTIN(__builtin_ia32_selectd_512, "V16iUsV16iV16i", "ncV:512:", "avx512f,evex512")
-TARGET_BUILTIN(__builtin_ia32_selectph_128, "V8xUcV8xV8x", "ncV:128:", "avx512fp16,avx512vl")
-TARGET_BUILTIN(__builtin_ia32_selectph_256, "V16xUsV16xV16x", "ncV:256:", "avx512fp16,avx512vl")
-TARGET_BUILTIN(__builtin_ia32_selectph_512, "V32xUiV32xV32x", "ncV:512:", "avx512fp16,evex512")
-TARGET_BUILTIN(__builtin_ia32_selectpbf_128, "V8yUcV8yV8y", "ncV:128:", "avx512bf16,avx512vl")
-TARGET_BUILTIN(__builtin_ia32_selectpbf_256, "V16yUsV16yV16y", "ncV:256:", "avx512bf16,avx512vl")
-TARGET_BUILTIN(__builtin_ia32_selectpbf_512, "V32yUiV32yV32y", "ncV:512:", "avx512bf16,evex512")
-TARGET_BUILTIN(__builtin_ia32_selectq_128, "V2OiUcV2OiV2Oi", "ncV:128:", "avx512vl")
-TARGET_BUILTIN(__builtin_ia32_selectq_256, "V4OiUcV4OiV4Oi", "ncV:256:", "avx512vl")
-TARGET_BUILTIN(__builtin_ia32_selectq_512, "V8OiUcV8OiV8Oi", "ncV:512:", "avx512f,evex512")
-TARGET_BUILTIN(__builtin_ia32_selectps_128, "V4fUcV4fV4f", "ncV:128:", "avx512vl")
-TARGET_BUILTIN(__builtin_ia32_selectps_256, "V8fUcV8fV8f", "ncV:256:", "avx512vl")
-TARGET_BUILTIN(__builtin_ia32_selectps_512, "V16fUsV16fV16f", "ncV:512:", "avx512f,evex512")
-TARGET_BUILTIN(__builtin_ia32_selectpd_128, "V2dUcV2dV2d", "ncV:128:", "avx512vl")
-TARGET_BUILTIN(__builtin_ia32_selectpd_256, "V4dUcV4dV4d", "ncV:256:", "avx512vl")
-TARGET_BUILTIN(__builtin_ia32_selectpd_512, "V8dUcV8dV8d", "ncV:512:", "avx512f,evex512")
TARGET_BUILTIN(__builtin_ia32_selectsh_128, "V8xUcV8xV8x", "ncV:128:", "avx512fp16")
TARGET_BUILTIN(__builtin_ia32_selectsbf_128, "V8yUcV8yV8y", "ncV:128:", "avx512bf16")
TARGET_BUILTIN(__builtin_ia32_selectss_128, "V4fUcV4fV4f", "ncV:128:", "avx512f")
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 774d2b53a3825..7c2222fe51203 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -12035,7 +12035,9 @@ def err_builtin_invalid_arg_type: Error <
"a floating point type|"
"a vector of integers|"
"an unsigned integer|"
- "an 'int'}1 (was %2)">;
+ "an 'int'|"
+ "a vector of bools"
+ "}1 (was %2)">;
def err_builtin_matrix_disabled: Error<
"matrix types extension is disabled. Pass -fenable-matrix to enable it">;
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index c7b219dcfcec5..487f9a2099eb9 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -3744,6 +3744,12 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
return RValue::get(Result);
}
+ case Builtin::BI__builtin_selectvector: {
+ return RValue::get(Builder.CreateSelect(EmitScalarExpr(E->getArg(2)),
+ EmitScalarExpr(E->getArg(0)),
+ EmitScalarExpr(E->getArg(1))));
+ }
+
case Builtin::BI__builtin_elementwise_abs: {
Value *Result;
QualType QT = E->getArg(0)->getType();
@@ -15513,31 +15519,6 @@ Value *CodeGenFunction::EmitX86BuiltinExpr(unsigned BuiltinID,
case X86::BI__builtin_ia32_prorvq256:
case X86::BI__builtin_ia32_prorvq512:
return EmitX86FunnelShift(*this, Ops[0], Ops[0], Ops[1], true);
- case X86::BI__builtin_ia32_selectb_128:
- case X86::BI__builtin_ia32_selectb_256:
- case X86::BI__builtin_ia32_selectb_512:
- case X86::BI__builtin_ia32_selectw_128:
- case X86::BI__builtin_ia32_selectw_256:
- case X86::BI__builtin_ia32_selectw_512:
- case X86::BI__builtin_ia32_selectd_128:
- case X86::BI__builtin_ia32_selectd_256:
- case X86::BI__builtin_ia32_selectd_512:
- case X86::BI__builtin_ia32_selectq_128:
- case X86::BI__builtin_ia32_selectq_256:
- case X86::BI__builtin_ia32_selectq_512:
- case X86::BI__builtin_ia32_selectph_128:
- case X86::BI__builtin_ia32_selectph_256:
- case X86::BI__builtin_ia32_selectph_512:
- case X86::BI__builtin_ia32_selectpbf_128:
- case X86::BI__builtin_ia32_selectpbf_256:
- case X86::BI__builtin_ia32_selectpbf_512:
- case X86::BI__builtin_ia32_selectps_128:
- case X86::BI__builtin_ia32_selectps_256:
- case X86::BI__builtin_ia32_selectps_512:
- case X86::BI__builtin_ia32_selectpd_128:
- case X86::BI__builtin_ia32_selectpd_256:
- case X86::BI__builtin_ia32_selectpd_512:
- return EmitX86Select(*this, Ops[0], Ops[1], Ops[2]);
case X86::BI__builtin_ia32_selectsh_128:
case X86::BI__builtin_ia32_selectsbf_128:
case X86::BI__builtin_ia32_selectss_128:
diff --git a/clang/lib/Headers/avx512bf16intrin.h b/clang/lib/Headers/avx512bf16intrin.h
index b28d2e243f2cb..1c32831a8cc57 100644
--- a/clang/lib/Headers/avx512bf16intrin.h
+++ b/clang/lib/Headers/avx512bf16intrin.h
@@ -77,9 +77,9 @@ _mm512_cvtne2ps_pbh(__m512 __A, __m512 __B) {
/// conversion of __B, and higher 256 bits come from conversion of __A.
static __inline__ __m512bh __DEFAULT_FN_ATTRS512
_mm512_mask_cvtne2ps_pbh(__m512bh __W, __mmask32 __U, __m512 __A, __m512 __B) {
- return (__m512bh)__builtin_ia32_selectpbf_512((__mmask32)__U,
- (__v32bf)_mm512_cvtne2ps_pbh(__A, __B),
- (__v32bf)__W);
+ return (__m512bh)__builtin_selectvector(
+ (__v32bf)_mm512_cvtne2ps_pbh(__A, __B), (__v32bf)__W,
+ __builtin_bit_cast(__vecmask32, __U));
}
/// Convert Two Packed Single Data to One Packed BF16 Data.
@@ -99,9 +99,9 @@ _mm512_mask_cvtne2ps_pbh(__m512bh __W, __mmask32 __U, __m512 __A, __m512 __B) {
/// conversion of __B, and higher 256 bits come from conversion of __A.
static __inline__ __m512bh __DEFAULT_FN_ATTRS512
_mm512_maskz_cvtne2ps_pbh(__mmask32 __U, __m512 __A, __m512 __B) {
- return (__m512bh)__builtin_ia32_selectpbf_512((__mmask32)__U,
- (__v32bf)_mm512_cvtne2ps_pbh(__A, __B),
- (__v32bf)_mm512_setzero_si512());
+ return (__m512bh)__builtin_selectvector(
+ (__v32bf)_mm512_cvtne2ps_pbh(__A, __B), (__v32bf)_mm512_setzero_si512(),
+ __builtin_bit_cast(__vecmask32, __U));
}
/// Convert Packed Single Data to Packed BF16 Data.
@@ -200,9 +200,9 @@ _mm512_dpbf16_ps(__m512 __D, __m512bh __A, __m512bh __B) {
/// __A, __B and __D
static __inline__ __m512 __DEFAULT_FN_ATTRS512
_mm512_mask_dpbf16_ps(__m512 __D, __mmask16 __U, __m512bh __A, __m512bh __B) {
- return (__m512)__builtin_ia32_selectps_512((__mmask16)__U,
- (__v16sf)_mm512_dpbf16_ps(__D, __A, __B),
- (__v16sf)__D);
+ return (__m512)__builtin_selectvector(
+ (__v16sf)_mm512_dpbf16_ps(__D, __A, __B), (__v16sf)__D,
+ __builtin_bit_cast(__vecmask16, __U));
}
/// Dot Product of BF16 Pairs Accumulated into Packed Single Precision.
@@ -224,9 +224,9 @@ _mm512_mask_dpbf16_ps(__m512 __D, __mmask16 __U, __m512bh __A, __m512bh __B) {
/// __A, __B and __D
static __inline__ __m512 __DEFAULT_FN_ATTRS512
_mm512_maskz_dpbf16_ps(__mmask16 __U, __m512 __D, __m512bh __A, __m512bh __B) {
- return (__m512)__builtin_ia32_selectps_512((__mmask16)__U,
- (__v16sf)_mm512_dpbf16_ps(__D, __A, __B),
- (__v16sf)_mm512_setzero_si512());
+ return (__m512)__builtin_selectvector(
+ (__v16sf)_mm512_dpbf16_ps(__D, __A, __B), (__v16sf)_mm512_setzero_si512(),
+ __builtin_bit_cast(__vecmask16, __U));
}
/// Convert Packed BF16 Data to Packed float Data.
diff --git a/clang/lib/Headers/avx512bitalgintrin.h b/clang/lib/Headers/avx512bitalgintrin.h
index bad265ceb7db2..f4e31c287af18 100644
--- a/clang/lib/Headers/avx512bitalgintrin.h
+++ b/clang/lib/Headers/avx512bitalgintrin.h
@@ -29,9 +29,9 @@ _mm512_popcnt_epi16(__m512i __A)
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_popcnt_epi16(__m512i __A, __mmask32 __U, __m512i __B)
{
- return (__m512i) __builtin_ia32_selectw_512((__mmask32) __U,
- (__v32hi) _mm512_popcnt_epi16(__B),
- (__v32hi) __A);
+ return (__m512i)__builtin_selectvector((__v32hi)_mm512_popcnt_epi16(__B),
+ (__v32hi)__A,
+ __builtin_bit_cast(__vecmask32, __U));
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
@@ -51,9 +51,9 @@ _mm512_popcnt_epi8(__m512i __A)
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_popcnt_epi8(__m512i __A, __mmask64 __U, __m512i __B)
{
- return (__m512i) __builtin_ia32_selectb_512((__mmask64) __U,
- (__v64qi) _mm512_popcnt_epi8(__B),
- (__v64qi) __A);
+ return (__m512i)__builtin_selectvector((__v64qi)_mm512_popcnt_epi8(__B),
+ (__v64qi)__A,
+ __builtin_bit_cast(__vecmask64, __U));
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
diff --git a/clang/lib/Headers/avx512bwintrin.h b/clang/lib/Headers/avx512bwintrin.h
index c854720de6a65..ba77f979da1f8 100644
--- a/clang/lib/Headers/avx512bwintrin.h
+++ b/clang/lib/Headers/avx512bwintrin.h
@@ -369,16 +369,16 @@ _mm512_add_epi8 (__m512i __A, __m512i __B) {
static __inline__ __m512i __DEFAULT_FN_ATTRS512
_mm512_mask_add_epi8(__m512i __W, __mmask64 __U, __m512i __A, __m512i __B) {
- return (__m512i)__builtin_ia32_selectb_512((__mmask64)__U,
- (__v64qi)_mm512_add_epi8(__A, __B),
- (__v64qi)__W);
+ return (__m512i)__builtin_selectvector((__v64qi)_mm512_add_epi8(__A, __B),
+ (__v64qi)__W,
+ __builtin_bit_cast(__vecmask64, __U));
}
static __inline__ __m512i __DEFAULT_FN_ATTRS512
_mm512_maskz_add_epi8(__mmask64 __U, __m512i __A, __m512i __B) {
- return (__m512i)__builtin_ia32_selectb_512((__mmask64)__U,
- (__v64qi)_mm512_add_epi8(__A, __B),
- (__v64qi)_mm512_setzero_si512());
+ return (__m512i)__builtin_selectvector((__v64qi)_mm512_add_epi8(__A, __B),
+ (__v64qi)_mm512_setzero_si512(),
+ __builtin_bit_cast(__vecmask64, __U));
}
static __inline__ __m512i __DEFAULT_FN_ATTRS512
@@ -388,16 +388,16 @@ _mm512_sub_epi8 (__m512i __A, __m512i __B) {
static __inline__ __m512i __DEFAULT_FN_ATTRS512
_mm512_mask_sub_epi8(__m512i __W, __mmask64 __U, __m512i __A, __m512i __B) {
- return (__m512i)__builtin_ia32_selectb_512((__mmask64)__U,
- (__v64qi)_mm512_sub_epi8(__A, __B),
- (__v64qi)__W);
+ return (__m512i)__builtin_selectvector((__v64qi)_mm512_sub_epi8(__A, __B),
+ (__v64qi)__W,
+ __builtin_bit_cast(__vecmask64, __U));
}
static __inline__ __m512i __DEFAULT_FN_ATTRS512
_mm512_maskz_sub_epi8(__mmask64 __U, __m512i __A, __m512i __B) {
- return (__m512i)__builtin_ia32_selectb_512((__mmask64)__U,
- (__v64qi)_mm512_sub_epi8(__A, __B),
- (__v64qi)_mm512_setzero_si512());
+ return (__m512i)__builtin_selectvector((__v64qi)_mm512_sub_epi8(__A, __B),
+ (__v64qi)_mm512_setzero_si512(),
+ __builtin_bit_cast(__vecmask64, __U));
}
static __inline__ __m512i __DEFAULT_FN_ATTRS512
@@ -407,16 +407,16 @@ _mm512_add_epi16 (__m512i __A, __m512i __B) {
static __inline__ __m512i __DEFAULT_FN_ATTRS512
_mm512_mask_add_epi16(__m512i __W, __mmask32 __U, __m512i __A, __m512i __B) {
- return (__m512i)__builtin_ia32_selectw_512((__mmask32)__U,
- (__v32hi)_mm512_add_epi16(__A, __B),
- (__v32hi)__W);
+ return (__m512i)__builtin_selectvector((__v32hi)_mm512_add_epi16(__A, __B),
+ (__v32hi)__W,
+ __builtin_bit_cast(__vecmask32, __U));
}
static __inline__ __m512i __DEFAULT_FN_ATTRS512
_mm512_maskz_add_epi16(__mmask32 __U, __m512i __A, __m512i __B) {
- return (__m512i)__builtin_ia32_selectw_512((__mmask32)__U,
- (__v32hi)_mm512_add_epi16(__A, __B),
- (__v32hi)_mm512_setzero_si512());
+ return (__m512i)__builtin_selectvector((__v32hi)_mm512_add_epi16(__A, __B),
+ (__v32hi)_mm512_setzero_si512(),
+ __builtin_bit_cast(__vecmask32, __U));
}
static __inline__ __m512i __DEFAULT_FN_ATTRS512
@@ -426,16 +426,16 @@ _mm512_sub_epi16 (__m512i __A, __m512i __B) {
static __inline__ __m512i __DEFAULT_FN_ATTRS512
_mm512_mask_sub_epi16(__m512i __W, __mmask32 __U, __m512i __A, __m512i __B) {
- return (__m512i)__builtin_ia32_selectw_512((__mmask32)__U,
- (__v32hi)_mm512_sub_epi16(__A, __B),
- (__v32hi)__W);
+ return (__m512i)__builtin_selectvector((__v32hi)_mm512_sub_epi16(__A, __B),
+ (__v32hi)__W,
+ __builtin_bit_cast(__vecmask32, __U));
}
static __inline__ __m512i __DEFAULT_FN_ATTRS512
_mm512_maskz_sub_epi16(__mmask32 __U, __m512i __A, __m512i __B) {
- return (__m512i)__builtin_ia32_selectw_512((__mmask32)__U,
- (__v32hi)_mm512_sub_epi16(__A, __B),
- (__v32hi)_mm512_setzero_si512());
+ return (__m512i)__builtin_selectvector((__v32hi)_mm512_sub_epi16(__A, __B),
+ (__v32hi)_mm512_setzero_si512(),
+ __builtin_bit_cast(__vecmask32, __U));
}
static __inline__ __m512i __DEFAULT_FN_ATTRS512
@@ -445,32 +445,30 @@ _mm512_mullo_epi16 (__m512i __A, __m512i __B) {
static __inline__ __m512i __DEFAULT_FN_ATTRS512
_mm512_mask_mullo_epi16(__m512i __W, __mmask32 __U, __m512i __A, __m512i __B) {
- return (__m512i)__builtin_ia32_selectw_512((__mmask32)__U,
- (__v32hi)_mm512_mullo_epi16(__A, __B),
- (__v32hi)__W);
+ return (__m512i)__builtin_selectvector((__v32hi)_mm512_mullo_epi16(__A, __B),
+ (__v32hi)__W,
+ __builtin_bit_cast(__vecmask32, __U));
}
static __inline__ __m512i __DEFAULT_FN_ATTRS512
_mm512_maskz_mullo_epi16(__mmask32 __U, __m512i __A, __m512i __B) {
- return (__m512i)__builtin_ia32_selectw_512((__mmask32)__U,
- (__v32hi)_mm512_mullo_epi16(__A, __B),
- (__v32hi)_mm512_setzero_si512());
+ return (__m512i)__builtin_selectvector((__v32hi)_mm512_mullo_epi16(__A, __B),
+ (__v32hi)_mm512_setzero_si512(),
+ __builtin_bit_cast(__vecmask32, __U));
}
static __inline__ __m512i __DEFAULT_FN_ATTRS512
_mm512_mask_blend_epi8 (__mmask64 __U, __m512i __A, __m512i __W)
{
- return (__m512i) __builtin_ia32_selectb_512 ((__mmask64) __U,
- (__v64qi) __W,
- (__v64qi) __A);
+ return (__m512i)__builtin_selectvector((__v64qi)__W, (__v64qi)__A,
+ __builtin_bit_cast(__vecmask64, __U));
}
static __inline__ __m512i __DEFAULT_FN_ATTRS512
_mm512_mask_blend_epi16 (__mmask32 __U, __m512i __A, __m512i __W)
{
- return (__m512i) __builtin_ia32_selectw_512 ((__mmask32) __U,
- (__v32hi) __W,
- (__v32hi) __A);
+ return (__m512i)__builtin_selectvector((__v32hi)__W, (__v32hi)__A,
+ __builtin_bit_cast(__vecmask32, __U));
}
static __inline__ __m512i __DEFAULT_FN_ATTRS512
@@ -482,17 +480,17 @@ _mm512_abs_epi8 (__m512i __A)
static __inline__ __m512i __DEFAULT_FN_ATTRS512
_mm512_mask_abs_epi8 (__m512i __W, __mmask64 __U, __m512i __A)
{
- return (__m512i)__builtin_ia32_selectb_512((__mmask64)__U,
- (__v64qi)_mm512_abs_epi8(__A),
- (__v64qi)__W);
+ return (__m512i)__builtin_selectvector((__v64qi)_mm512_abs_epi8(__A),
+ (__v64qi)__W,
+ __builtin_bit_cast(__vecmask64, __U));
}
static __inline__ __m512i __DEFAULT_FN_ATTRS512
_mm512_maskz_abs_epi8 (__mmask64 __U, __m512i __A)
{
- return (__m512i)__builtin_ia32_selectb_512((__mmask64)__U,
- (__v64qi)_mm512_abs_epi8(__A),
- (__v64qi)_mm512_setzero_si512());
+ return (__m512i)__builtin_selectvector((__v64qi)_mm512_abs_epi8(__A),
+ (__v64qi)_mm512_setzero_si512(),
+ __builtin_bit_cast(__vecmask64, __U));
}
static __inline__ __m512i __DEFAULT_FN_ATTRS512
@@ -504,17 +502,17 @@ _mm512_abs_epi16 (__m512i __A)
static __inline__ __m512i __DEFAULT_FN_ATTRS512
_mm512_mask_abs_epi16 (__m512i __W, __mmask32 __U, __m512i __A)
{
- return (__m512i)__builtin_ia32_selectw_512((__mmask32)__U,...
[truncated]
|
I don't understand why the order of emitted instructions changes based on how exactly Clang is compiled, but other than that this should be ready. Hopefully someone spots what the problem could be. |
``__builtin_selectvector`` | ||
-------------------------- | ||
|
||
``__builtin_selectvector`` is used to express generic vector element selection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extend this description to explicitly describe the input/output types and mechanism - don't just rely on the code snippet (although that's a nice accompaniment): The input must all be vectors of the same same number of elements, the 2 first operands must be the same type etc. etc. (basically everything in SemaChecking).
(__v2di)__W); | ||
return (__m128i)__builtin_selectvector((__v2di)_mm_add_epi64(__A, __B), | ||
(__v2di)__W, | ||
__builtin_bit_cast(__vecmask2, __U)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__mmask8 is an unsigned char - how do we safely bitcast to __vecmask2 which is bool ext vector type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constexpr handling?
@@ -3744,6 +3744,12 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID, | |||
return RValue::get(Result); | |||
} | |||
|
|||
case Builtin::BI__builtin_selectvector: { | |||
return RValue::get(Builder.CreateSelect(EmitScalarExpr(E->getArg(2)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check all one (or all zero) like we did in EmitX86Select
?
|
||
template <class T, size_t N> | ||
simd_vec<T, N> __builtin_selectvector(simd_vec<T, N> lhs, simd_vec<T, N> rhs, | ||
simd_vec<bool, N> cond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put cond
the first operand to match with select
and the old X86 builtins?
typedef bool __vecmask2 __attribute__((__ext_vector_type__(2))); | ||
typedef bool __vecmask4 __attribute__((__ext_vector_type__(4))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used __mmask8
for v2i1
and v4i1
too. I don't we can bitcast them directly.
(__v32bf)__W); | ||
return (__m512bh)__builtin_selectvector( | ||
(__v32bf)_mm512_cvtne2ps_pbh(__A, __B), (__v32bf)__W, | ||
__builtin_bit_cast(__vecmask32, __U)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use (__vecmask32) dirctly like __v32bf
etc?
typedef bool __vecmask32 __attribute__((__ext_vector_type__(32))); | ||
typedef bool __vecmask64 __attribute__((__ext_vector_type__(64))); | ||
#else | ||
typedef _Bool __vecmask2 __attribute__((__ext_vector_type__(2))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerning about the ABI when people abuse them in function call.
clang already supports |
@@ -1176,6 +1176,12 @@ def ConvertVector : Builtin { | |||
let Prototype = "void(...)"; | |||
} | |||
|
|||
def SelectVector : Builtin { | |||
let Spellings = ["__builtin_selectvector"]; | |||
let Attributes = [NoThrow, Const, CustomTypeChecking]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also be constexpr?
ExprResult RHS = TheCall->getArg(1); | ||
|
||
QualType Result = UsualArithmeticConversions( | ||
LHS, RHS, TheCall->getExprLoc(), ACK_Comparison); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is comparison correct? The documentation makes it seem like this should actually be ACK_Conditional
.
|
||
if (!LHSVecT) { | ||
Diag(LHS.get()->getBeginLoc(), diag::err_builtin_invalid_arg_type) | ||
<< 1 << 4 << LHST; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add inline comments to the magic numbers (here and below), like /*argument*/1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please be sure to add a release note about the new functionality.
This works with bool vectors. I didn't realize you could use the ternary operator, since I only tested with them. I guess we could extend the ternary operator to accept bool vectors as well. Any thoughts? |
You mean, if all three operands are boolean vectors? I'm surprised that doesn't already work. |
No, I mean I have a vector of bools and want to select a value based on that. e.g. |
The relevant bit of code is:
Maybe makes sense to relax it. |
This is also very useful for generic code. For example this would allow libc++ to vectorize
{min,max,minmax}_element
without having to use platform-specific intrinsics. I've done some testing and even at-O0
Clang compiles the code to the expected instructions for architectures where the mask vector has the same bit count as the element vector (i.e. every SIMD ISA except AVX512 that I'm aware of) as long as the comparison operation is visible.