New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
catalog: change hash-sharded indexes to use md5 #109374
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
424621b
to
892b90f
Compare
Thank you for looking at this! Some things I'm wondering:
|
we do unfortunately, since
there may be - i was hoping some tests would tell me, but they have not. so i will do a bit more investigation. |
7272650
to
73af3bb
Compare
@michae2 this is RFAL. i've confirmed that this is safe for restores and upgrades. the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 16 of 16 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @michae2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add one (or both) of the poorly-distributed examples from #91109 to pkg/sql/logictest/testdata/logic_test/hash_sharded_index
?
Reviewed 16 of 16 files at r1, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @rafiss)
73af3bb
to
c0ff6de
Compare
good idea. done! tftrs! bors r+ |
Build failed (retrying...): |
Build failed (retrying...): |
Build failed: |
Release note (sql change): The hash function used by hash-sharded indexes was changed to `mod(fnv32(md5(crdb_internal.datums_to_bytes(columns))), bucket_count)`. (Previously, it did not use `md5`.) This change was made to enhance the uniformity of bucket distribution in cases when the bucket count is a power of 2, and the columns being sharded have numerical properties that make the fnv32 function return values with a non-uniformly distributed modulus.
c0ff6de
to
4b2da22
Compare
bors r+ |
Build succeeded: |
fixes #91109
Release note (sql change): The hash function used by hash-sharded indexes was changed to
mod(fnv32(md5(crdb_internal.datums_to_bytes(columns))), bucket_count)
. (Previously, it did not usemd5
.) This change was made to enhance the uniformity of bucket distribution in cases when the bucket count is a power of 2, and the columns being sharded have numerical properties that make the fnv32 function return values with a non-uniformly distributed modulus.