-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++][Parquet] Rewrite BYTE_STREAM_SPLIT optimizations using xsimd #38560
Comments
IMO, Parquet itself has so many hand-written AVX2(mostly in Levels handling, some are in decode etc). So, for parquet, mixing AVX512 and AVX2 may causing performance loss. But if user just want to use this encoder, AVX512 might be useful(Also, AVX10 is coming now...) |
This sounds like an urban legend at this point.
The user doesn't want "an encoding", they want space spavings. BYTE_STREAM_SPLIT is only useful in conjunction with a (de)compressor, so the main objective is to be fast compared to (de)compression. |
Before Icelake optimization [1] [2], AVX512 might cause de-freq when using it [3] [1] https://www.hc32.hotchips.org/assets/program/conference/day1/HotChips2020_Server_Processors_Intel_Irma_ICX-CPU-final3.pdf |
Ok, but 1) those are relatively old CPUs 2) the performance loss is not caused by mixing AVX2 and AVX512, but simply by using AVX512 ;-) |
FTR, AVX512 variants were removed in #40127 |
After grepping through the xsimd include files, it seems that:
This means to we could at least migrate the 128 bit paths to xsimd, which may get us NEON acceleration. |
Nice analysis, I can have a try on migrating this, but I'm a SIMD newbie, some help is need |
Or perhaps @cyb70289 wants to take it up :-) |
I may not have bandwidth recently. I believe @mapleFU can do it well. Ping me if you need help. |
…using xsimd (#40335) ### Rationale for this change This is part of #38560 (comment) . It tried to Rewrite SSE4_2 using xsimd. ### What changes are included in this PR? Rewrite SSE4_2 using xsimd. ### Are these changes tested? Yes ### Are there any user-facing changes? no * GitHub Issue: #38560 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Issue resolved by pull request 40335 |
Describe the enhancement requested
Currently, there are BYTE_STREAM_SPLIT optimizations using hand-written x86 intrinsics (for SSE4.2, AVX2 and AVX512), selected at compile-time.
We should rewrite those using the xsimd library so as to provide support for non-x86 ISA extensions such as Arm Neon (most importantly) and SVE.
More precisely:
Component(s)
C++, Parquet
The text was updated successfully, but these errors were encountered: