Skip to content

Commit

Permalink
ARROW-8166: [C++] fix AVX512 intrinsics fail with clang-8
Browse files Browse the repository at this point in the history
__m512i_u undeclared in clang while _mm512_storeu_epi32 undefined in gcc,
using memcpy instead for the unaligned stroe.

BM_PlainDecodingBoolean with gcc get same level result with previous.

Signed-off-by: Frank Du <frank.du@intel.com>

Closes #6673 from jianxind/avx512-build-with-clang

Authored-by: Frank Du <frank.du@intel.com>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
  • Loading branch information
frankdjx authored and wesm committed Mar 20, 2020
1 parent d29066c commit 4f9db53
Showing 1 changed file with 5 additions and 3 deletions.
8 changes: 5 additions & 3 deletions cpp/src/arrow/util/bpacking.h
Original file line number Diff line number Diff line change
Expand Up @@ -39,20 +39,22 @@ namespace internal {
#if defined(__AVX512F__)
inline const uint32_t* unpack1_32(const uint32_t* in, uint32_t* out) {
uint32_t inl = util::SafeLoad(in);
__m512i shifts, inls, masks;
__m512i shifts, inls, masks, result;

inls = _mm512_set1_epi32(inl);
masks = _mm512_set1_epi32(1);

// shift the first 16 outs
shifts = _mm512_set_epi32(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
*(__m512i_u*)out = _mm512_and_epi32(_mm512_srlv_epi32(inls, shifts), masks);
result = _mm512_and_epi32(_mm512_srlv_epi32(inls, shifts), masks);
memcpy(out, &result, 16 * sizeof(*out));
out += 16;

// shift the last 16 outs
shifts =
_mm512_set_epi32(31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16);
*(__m512i_u*)out = _mm512_and_epi32(_mm512_srlv_epi32(inls, shifts), masks);
result = _mm512_and_epi32(_mm512_srlv_epi32(inls, shifts), masks);
memcpy(out, &result, 16 * sizeof(*out));
out += 16;

++in;
Expand Down

0 comments on commit 4f9db53

Please sign in to comment.