-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++][Parquet] Remove AVX512 variants of BYTE_STREAM_SPLIT encoding #40095
Comments
@cyb70289 @wgtmac @mapleFU @felipecrv Opinions on this? |
Isn't that due to the effects of down-clocking when the CPU is executing too many AVX-512 instructions? It might be better in the future or to users that have a way to avoid that down-clocking. |
I don't think all CPUs have down-clocking, do they? But regardless, if it doesn't provide a significant benefit, it doesn't make much sense to keep those codepaths, IMHO. |
I remember I normalized benchmark by cpu frequency, avx512 is still worse than avx2 (maybe sse4) on caslake. |
Alright, then it might be a case of memory bandwidth being the bottleneck and not CPU uops per second. Let's remove it. ✂️ |
I think BYTE_STREAM_SPLIT encoding/decoding might be memory-bound operations. We can remove the AVX512 impl first. If anyone wants to improve or requires AVX512, we can also revert it back... |
This removal looks reasonable. I have consulted some people at Intel on this but didn't get any useful answer. |
…SPLIT encoding Two reasons: * the SSE2 and AVX2 variants are already fast enough (on the order of 10 GB/s) * the AVX512 variants do not seem faster, and can even be slower, on tested Intel machines
…encoding (#40127) Two reasons: * the SSE2 and AVX2 variants are already fast enough (on the order of 10 GB/s) * the AVX512 variants do not seem faster, and can even be slower, on tested Intel machines * Closes: #40095 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…SPLIT encoding (apache#40127) Two reasons: * the SSE2 and AVX2 variants are already fast enough (on the order of 10 GB/s) * the AVX512 variants do not seem faster, and can even be slower, on tested Intel machines * Closes: apache#40095 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…SPLIT encoding (apache#40127) Two reasons: * the SSE2 and AVX2 variants are already fast enough (on the order of 10 GB/s) * the AVX512 variants do not seem faster, and can even be slower, on tested Intel machines * Closes: apache#40095 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Describe the enhancement requested
According to previous observations, it seems the AVX512 accelerations of BYTE_STREAM_SPLIT perform equal or worse then their AVX2 counterparts.
Besides, the SSE2 and AVX2 accelerations, already performing at 5-10 GB/s or more, are amply fast enough.
We could therefore simply remove the AVX512 accelerations.
Component(s)
Benchmarking, C++, Parquet
The text was updated successfully, but these errors were encountered: