Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disable XXH64 autovectorization #924

Merged
merged 2 commits into from Mar 8, 2024
Merged

disable XXH64 autovectorization #924

merged 2 commits into from Mar 8, 2024

Conversation

Cyan4973
Copy link
Owner

@Cyan4973 Cyan4973 commented Mar 6, 2024

using a compiler fence (like XXH32).

XXH64 has been reported to be auto-vectorized on AVX512 architecture (with recent clang compilers).
Like XXH32, it's generally a bad idea to vectorize XXH64. So disable it, using the same technique.

However, in order to test, or if the results differ for newer cpus, it's also possible to disable the fence by using XXH_ENABLE_AUTOVECTORIZE build variable.

fix #897

using a compiler fence (like XXH32).

XXH64 has been reported to be auto-vectorized on AVX512 architecture.
Like XXH32, it's generally a bad idea to vectorize XXH64.

However, in order to test, or if the result differ for newer cpus,
it's possible to disable the fence by using XXH_ENABLE_AUTOVECTORIZE build variable.
@Cyan4973
Copy link
Owner Author

Cyan4973 commented Mar 6, 2024

@TocarIP : if you have the time,
it would be nice to test if this patch works as intended,
aka :

  • before the patch, XXH64 is auto-vectorized with clang-trunk when AVX512 is enabled.
  • after the patch, it no longer is vectorized, thus restoring performance to previous level
  • when compiled with XXH_ENABLE_AUTOVECTORIZE, XXH64 is once again vectorized

It's difficult for me to access cpus with avx512 enabled, though I will try nonetheless.

@TocarIP
Copy link

TocarIP commented Mar 6, 2024

I've patched this into zstd, and can confirm that everything is WAI. I haven't run benchmarks, just looked at the asm, but that should be enough, thanks for the fix!

@Cyan4973
Copy link
Owner Author

Cyan4973 commented Mar 7, 2024

I could get access to some systems with AVX512 enabled : a Tiger Lake mini laptop, and a Zen4 NUC.
Both were running Windows 11. I employed msys2/mingw64 for compilation, and also WSL2+Ubuntu on the Zen4 NUC.

  1. Checked that AVX512 was working properly, notably thanks to xxhsum welcome diagnosis message, and by checking performance of XXH3.
  2. XXH64 could not be auto-vectorized on any configuration. gcc-13, clang-14 and clang-17 were used for these tests, and followed the recommended CFLAGS settings : -O3 -mavx512f -mavx512dq -mavx512vl. To no avail. Unless performance is strictly exactly the same across all hardwares and compiler versions, it simply never is auto-vectorized.
  3. As a counterpoint, XXH32 could be auto-vectorized with clang, as expected. So auto-vectorization itself was active.
  4. Auto-vectorization of XXH32 leads to lower performance on Tiger Lake, as expected, but improved performance on Zen4, which was more surprising. That's the first time I see vectorization bringing benefits to XXH32. It underlines that the topic is a bit more complex than "just disable it", and reinforce the need to offer a knob to let the user decide.
  5. Checked on Godbolt that the GUARD was working as intended and preventing the compiler to generate vectorized code.
  6. Checked that performance of XXH64 was unaffected by the GUARD on any configuration.

So the only disappointing part was the inability to get XXH64 auto-vectorized with AVX512, which I can't explain at this point. As a consequence, it may make this patch look kind of useless, since it prevents a scenario that apparently doesn't happen. But as @TocarIP reported that it works as intended, and since it doesn't introduce any negative performance side effect, I believe it's good to merge nonetheless.

@Cyan4973 Cyan4973 self-assigned this Mar 8, 2024
Cyan4973 added a commit to facebook/zstd that referenced this pull request Mar 8, 2024
@Cyan4973 Cyan4973 merged commit da79cc1 into dev Mar 8, 2024
126 checks passed
hswong3i pushed a commit to alvistack/facebook-zstd that referenced this pull request Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disable auto vectorization of xxhash64, when AVX512 is present.
2 participants