New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix aggregation issue in mixed x86_64 and ARM clusters #59132
Conversation
This is an automated comment for commit 94a79c0 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
@rschu1ze, could you or any ClickHouse team member review this PR? thanks! |
size_t ALWAYS_INLINE operator()(StringKey8 key) const | ||
{ | ||
size_t res = -1ULL; | ||
res = __crc32cd(static_cast<UInt32>(res), key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at
- https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_crc32_u64&ig_expand=1420,1421,1420 (l. 69)
- https://docs.unity3d.com/Packages/com.unity.burst@1.6/api/Unity.Burst.Intrinsics.Arm.Neon.__crc32cd.html (l. 78)
- and https://en.wikipedia.org/wiki/Cyclic_redundancy_check
it looks like different generator polynoms are used on x86_64 and ARM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Omg :-) Thanks for checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Branch master
is currently in a freeze due to test failures, I'll merge when it's open again. Thanks for the patch!
In mixed x86-64 / ARM clusters, distributed aggregation queries return incorrect results due to different hashes generated (
StringHashTableHash
) on both platforms.The issue can be reproduced by the sql statements in "tests/integration/test_backward_compatability/test_short_strings_aggregation.py".
The fix is to use aarch64's
__crc32cd()
intrinsic which returns the same hash results as the hash on x86_64.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fixed wrong aggregation results in mixed x86_64 and ARM clusters.