Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet] Rewrite BYTE_STREAM_SPLIT optimizations using xsimd #38560

Closed
pitrou opened this issue Nov 2, 2023 · 11 comments
Closed

[C++][Parquet] Rewrite BYTE_STREAM_SPLIT optimizations using xsimd #38560

pitrou opened this issue Nov 2, 2023 · 11 comments

Comments

@pitrou
Copy link
Member

pitrou commented Nov 2, 2023

Describe the enhancement requested

Currently, there are BYTE_STREAM_SPLIT optimizations using hand-written x86 intrinsics (for SSE4.2, AVX2 and AVX512), selected at compile-time.

We should rewrite those using the xsimd library so as to provide support for non-x86 ISA extensions such as Arm Neon (most importantly) and SVE.

More precisely:

  • rewrite the SSE4.2 acceleration for generic 128-bit SIMD
  • rewrite the AVX2 acceleration for generic 256-bit SIMD
  • either rewrite the AVX512 acceleration, leave it alone, or remove it (the benefits are probably minor)

Component(s)

C++, Parquet

@pitrou
Copy link
Member Author

pitrou commented Nov 2, 2023

cc @cyb70289 @mapleFU

@mapleFU
Copy link
Member

mapleFU commented Nov 2, 2023

IMO, Parquet itself has so many hand-written AVX2(mostly in Levels handling, some are in decode etc). So, for parquet, mixing AVX512 and AVX2 may causing performance loss. But if user just want to use this encoder, AVX512 might be useful(Also, AVX10 is coming now...)

@pitrou
Copy link
Member Author

pitrou commented Nov 2, 2023

mixing AVX512 and AVX2 may causing performance loss

This sounds like an urban legend at this point.

But if user just want to use this encoder, AVX512 might be useful

The user doesn't want "an encoding", they want space spavings. BYTE_STREAM_SPLIT is only useful in conjunction with a (de)compressor, so the main objective is to be fast compared to (de)compression.

@mapleFU
Copy link
Member

mapleFU commented Nov 2, 2023

@pitrou
Copy link
Member Author

pitrou commented Nov 2, 2023

Ok, but 1) those are relatively old CPUs 2) the performance loss is not caused by mixing AVX2 and AVX512, but simply by using AVX512 ;-)

@pitrou
Copy link
Member Author

pitrou commented Feb 27, 2024

FTR, AVX512 variants were removed in #40127

@pitrou
Copy link
Member Author

pitrou commented Feb 27, 2024

After grepping through the xsimd include files, it seems that:

  • the 128 bit (currently SSE4.2) variants can probably be migrated to arch-agnostic xsimd code
  • the 256 bit (currently AVX2) variants use _mm256_unpack{lo,hi}_epi8 and permutations in a non-trivial way that may be difficult to reproduce using xsimd

This means to we could at least migrate the 128 bit paths to xsimd, which may get us NEON acceleration.

@mapleFU
Copy link
Member

mapleFU commented Feb 27, 2024

Nice analysis, I can have a try on migrating this, but I'm a SIMD newbie, some help is need

@pitrou
Copy link
Member Author

pitrou commented Feb 27, 2024

Or perhaps @cyb70289 wants to take it up :-)

@cyb70289
Copy link
Contributor

I may not have bandwidth recently. I believe @mapleFU can do it well. Ping me if you need help.

pitrou pushed a commit that referenced this issue Mar 18, 2024
…using xsimd (#40335)

### Rationale for this change

This is part of #38560 (comment) . It tried to Rewrite SSE4_2 using xsimd.

### What changes are included in this PR?

Rewrite SSE4_2 using xsimd.

### Are these changes tested?

Yes

### Are there any user-facing changes?

no

* GitHub Issue: #38560

Lead-authored-by: mwish <maplewish117@gmail.com>
Co-authored-by: mwish <anmmscs_maple@qq.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@pitrou pitrou added this to the 16.0.0 milestone Mar 18, 2024
@pitrou
Copy link
Member Author

pitrou commented Mar 18, 2024

Issue resolved by pull request 40335
#40335

@pitrou pitrou closed this as completed Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants