[C++][Parquet] Rewrite BYTE_STREAM_SPLIT optimizations using xsimd #38560

pitrou · 2023-11-02T14:51:05Z

Describe the enhancement requested

Currently, there are BYTE_STREAM_SPLIT optimizations using hand-written x86 intrinsics (for SSE4.2, AVX2 and AVX512), selected at compile-time.

We should rewrite those using the xsimd library so as to provide support for non-x86 ISA extensions such as Arm Neon (most importantly) and SVE.

More precisely:

rewrite the SSE4.2 acceleration for generic 128-bit SIMD
rewrite the AVX2 acceleration for generic 256-bit SIMD
either rewrite the AVX512 acceleration, leave it alone, or remove it (the benefits are probably minor)

Component(s)

C++, Parquet

pitrou · 2023-11-02T14:51:14Z

cc @cyb70289 @mapleFU

mapleFU · 2023-11-02T15:34:51Z

IMO, Parquet itself has so many hand-written AVX2(mostly in Levels handling, some are in decode etc). So, for parquet, mixing AVX512 and AVX2 may causing performance loss. But if user just want to use this encoder, AVX512 might be useful(Also, AVX10 is coming now...)

pitrou · 2023-11-02T15:44:12Z

mixing AVX512 and AVX2 may causing performance loss

This sounds like an urban legend at this point.

But if user just want to use this encoder, AVX512 might be useful

The user doesn't want "an encoding", they want space spavings. BYTE_STREAM_SPLIT is only useful in conjunction with a (de)compressor, so the main objective is to be fast compared to (de)compression.

mapleFU · 2023-11-02T16:07:03Z

This sounds like an urban legend at this point.

Before Icelake optimization [1] [2], AVX512 might cause de-freq when using it [3]

[1] https://www.hc32.hotchips.org/assets/program/conference/day1/HotChips2020_Server_Processors_Intel_Irma_ICX-CPU-final3.pdf
[2] https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html
[3] https://lemire.me/blog/2018/08/15/the-dangers-of-avx-512-throttling-a-3-impact/

pitrou · 2023-11-02T16:09:12Z

Ok, but 1) those are relatively old CPUs 2) the performance loss is not caused by mixing AVX2 and AVX512, but simply by using AVX512 ;-)

pitrou · 2024-02-27T14:03:05Z

FTR, AVX512 variants were removed in #40127

pitrou · 2024-02-27T14:24:01Z

After grepping through the xsimd include files, it seems that:

the 128 bit (currently SSE4.2) variants can probably be migrated to arch-agnostic xsimd code
the 256 bit (currently AVX2) variants use _mm256_unpack{lo,hi}_epi8 and permutations in a non-trivial way that may be difficult to reproduce using xsimd

This means to we could at least migrate the 128 bit paths to xsimd, which may get us NEON acceleration.

mapleFU · 2024-02-27T17:35:52Z

Nice analysis, I can have a try on migrating this, but I'm a SIMD newbie, some help is need

pitrou · 2024-02-27T18:00:25Z

Or perhaps @cyb70289 wants to take it up :-)

cyb70289 · 2024-02-28T00:33:21Z

I may not have bandwidth recently. I believe @mapleFU can do it well. Ping me if you need help.

…using xsimd (#40335) ### Rationale for this change This is part of #38560 (comment) . It tried to Rewrite SSE4_2 using xsimd. ### What changes are included in this PR? Rewrite SSE4_2 using xsimd. ### Are these changes tested? Yes ### Are there any user-facing changes? no * GitHub Issue: #38560 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>

pitrou · 2024-03-18T11:16:55Z

Issue resolved by pull request 40335
#40335

pitrou added the Type: enhancement label Nov 2, 2023

github-actions bot added Component: Parquet Component: C++ labels Nov 2, 2023

mapleFU mentioned this issue Mar 4, 2024

GH-38560: [C++][Parquet] Rewrite BYTE_STREAM_SPLIT SSE optimizations using xsimd #40335

Merged

github-actions bot assigned mapleFU Mar 4, 2024

pitrou added this to the 16.0.0 milestone Mar 18, 2024

pitrou closed this as completed Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++][Parquet] Rewrite BYTE_STREAM_SPLIT optimizations using xsimd #38560

[C++][Parquet] Rewrite BYTE_STREAM_SPLIT optimizations using xsimd #38560

pitrou commented Nov 2, 2023

pitrou commented Nov 2, 2023

mapleFU commented Nov 2, 2023

pitrou commented Nov 2, 2023 •

edited

Loading

mapleFU commented Nov 2, 2023

pitrou commented Nov 2, 2023

pitrou commented Feb 27, 2024

pitrou commented Feb 27, 2024

mapleFU commented Feb 27, 2024

pitrou commented Feb 27, 2024

cyb70289 commented Feb 28, 2024

pitrou commented Mar 18, 2024

[C++][Parquet] Rewrite BYTE_STREAM_SPLIT optimizations using xsimd #38560

[C++][Parquet] Rewrite BYTE_STREAM_SPLIT optimizations using xsimd #38560

Comments

pitrou commented Nov 2, 2023

Describe the enhancement requested

Component(s)

pitrou commented Nov 2, 2023

mapleFU commented Nov 2, 2023

pitrou commented Nov 2, 2023 • edited Loading

mapleFU commented Nov 2, 2023

pitrou commented Nov 2, 2023

pitrou commented Feb 27, 2024

pitrou commented Feb 27, 2024

mapleFU commented Feb 27, 2024

pitrou commented Feb 27, 2024

cyb70289 commented Feb 28, 2024

pitrou commented Mar 18, 2024

pitrou commented Nov 2, 2023 •

edited

Loading