Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catalog: change hash-sharded indexes to use md5 #109374

Merged
merged 1 commit into from Sep 8, 2023

Conversation

rafiss
Copy link
Collaborator

@rafiss rafiss commented Aug 23, 2023

fixes #91109

Release note (sql change): The hash function used by hash-sharded indexes was changed to
mod(fnv32(md5(crdb_internal.datums_to_bytes(columns))), bucket_count). (Previously, it did not use md5.) This change was made to enhance the uniformity of bucket distribution in cases when the bucket count is a power of 2, and the columns being sharded have numerical properties that make the fnv32 function return values with a non-uniformly distributed modulus.

@rafiss rafiss requested a review from michae2 August 23, 2023 21:41
@blathers-crl
Copy link

blathers-crl bot commented Aug 23, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@michae2
Copy link
Collaborator

michae2 commented Aug 23, 2023

Thank you for looking at this! Some things I'm wondering:

  • Do we still need fnv32 if we're also using md5? I guess Andrew was trying fnv32(sha256(...)) so maybe that's what you're thinking?
  • Is there any upgrade concern here? Or restore concern? I guess restores should already have the old formula baked into the saved CREATE TABLE?

@rafiss
Copy link
Collaborator Author

rafiss commented Aug 23, 2023

Do we still need fnv32 if we're also using md5? I guess Andrew was trying fnv32(sha256(...)) so maybe that's what you're thinking?

we do unfortunately, since md5 (and shaX) return a string.

Is there any upgrade concern here? Or restore concern? I guess restores should already have the old formula baked into the saved CREATE TABLE?

there may be - i was hoping some tests would tell me, but they have not. so i will do a bit more investigation.

@rafiss rafiss force-pushed the hash-sharded-expr-md5 branch 2 times, most recently from 7272650 to 73af3bb Compare September 6, 2023 16:42
@rafiss rafiss marked this pull request as ready for review September 6, 2023 16:42
@rafiss rafiss requested review from a team as code owners September 6, 2023 16:42
@rafiss
Copy link
Collaborator Author

rafiss commented Sep 6, 2023

@michae2 this is RFAL. i've confirmed that this is safe for restores and upgrades. the MakeHashShardComputeExpr function is only called by the gateway node at the time of executing a CREATE statement.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 16 of 16 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @michae2)

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Could you add one (or both) of the poorly-distributed examples from #91109 to pkg/sql/logictest/testdata/logic_test/hash_sharded_index?

Reviewed 16 of 16 files at r1, all commit messages.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @rafiss)

@rafiss
Copy link
Collaborator Author

rafiss commented Sep 7, 2023

Could you add one (or both) of the poorly-distributed examples from #91109 to pkg/sql/logictest/testdata/logic_test/hash_sharded_index?

good idea. done!

tftrs!

bors r+

@craig
Copy link
Contributor

craig bot commented Sep 7, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Sep 7, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Sep 7, 2023

Build failed:

Release note (sql change): The hash function used by hash-sharded
indexes was changed to
`mod(fnv32(md5(crdb_internal.datums_to_bytes(columns))), bucket_count)`.
(Previously, it did not use `md5`.) This change was made to enhance the
uniformity of bucket distribution in cases when the bucket count is a
power of 2, and the columns being sharded have numerical properties that
make the fnv32 function return values with a non-uniformly distributed
modulus.
@rafiss
Copy link
Collaborator Author

rafiss commented Sep 8, 2023

bors r+

@craig
Copy link
Contributor

craig bot commented Sep 8, 2023

Build succeeded:

@craig craig bot merged commit 9d3115d into cockroachdb:master Sep 8, 2023
5 of 8 checks passed
@rafiss rafiss deleted the hash-sharded-expr-md5 branch September 11, 2023 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: non-uniform hash-sharded index with power-of-2 number of buckets
4 participants