Add useRawBytes config to Murmur and Murmur3 partition functions#17932
Merged
xiangfu0 merged 4 commits intoapache:masterfrom Mar 22, 2026
Merged
Add useRawBytes config to Murmur and Murmur3 partition functions#17932xiangfu0 merged 4 commits intoapache:masterfrom
xiangfu0 merged 4 commits intoapache:masterfrom
Conversation
When partitioning on BYTES columns, the partition value is hex-encoded. With useRawBytes=true in functionConfig, the hex string is decoded back to raw bytes before hashing, ensuring partition assignment matches the original byte values rather than treating the hex string as UTF-8 text. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds a useRawBytes option to Murmur/Murmur3 partition functions so BYTES values that are hex-encoded strings can be partitioned by hashing the original decoded bytes (instead of hashing the UTF-8 bytes of the hex text).
Changes:
- Add
useRawBytesparsing and hex-string decoding viaBytesUtils.toBytes()inMurmurPartitionFunctionandMurmur3PartitionFunction. - Pass
functionConfigintoMurmurPartitionFunctionfromPartitionFunctionFactory. - Add unit tests covering
useRawBytes=truebehavior and default behavior parity.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/MurmurPartitionFunction.java |
Adds useRawBytes support and changes constructor to accept functionConfig. |
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java |
Adds useRawBytes support for both x86_32 and x64_32 hashing paths. |
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/PartitionFunctionFactory.java |
Wires functionConfig through for Murmur/Murmur2 factory creation. |
pinot-segment-spi/src/test/java/org/apache/pinot/segment/spi/partition/PartitionFunctionTest.java |
Adds coverage for useRawBytes behavior and updates Murmur ctor usage. |
pinot-controller/src/test/java/org/apache/pinot/controller/utils/SegmentMetadataMockUtils.java |
Updates test helper to use the new Murmur constructor signature. |
- Murmur and Murmur3 partition functions now persist and return the functionConfig via getFunctionConfig(), ensuring partition metadata written to ZK retains the config so brokers can reconstruct the function correctly. - Re-introduce MurmurPartitionFunction(int) constructor delegating to the new (int, Map) constructor to preserve SPI backward compatibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Store functionConfig as Collections.unmodifiableMap() to prevent external mutation from diverging with parsed fields. - Annotate _functionConfig field and getFunctionConfig() with @nullable in both MurmurPartitionFunction and Murmur3PartitionFunction. - Annotate Murmur3PartitionFunction constructor parameter with @nullable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verify that PartitionFunctionFactory.getPartitionFunction() correctly wires functionConfig through to the partition function for all three aliases, and that getFunctionConfig() roundtrips the config. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Jackie-Jiang
approved these changes
Mar 22, 2026
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #17932 +/- ##
============================================
+ Coverage 63.19% 63.24% +0.04%
- Complexity 1481 1490 +9
============================================
Files 3191 3191
Lines 192592 192609 +17
Branches 29537 29542 +5
============================================
+ Hits 121710 121813 +103
+ Misses 61356 61251 -105
- Partials 9526 9545 +19
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
xiangfu0
pushed a commit
to pinot-contrib/pinot-docs
that referenced
this pull request
Mar 22, 2026
xiangfu0
added a commit
to pinot-contrib/pinot-docs
that referenced
this pull request
Mar 22, 2026
…inot#17932) (#561) Co-authored-by: Pinot Docs Bot <docs-bot@pinot.apache.org>
xiangfu0
added a commit
to xiangfu0/pinot
that referenced
this pull request
Mar 30, 2026
…che#17932) * Add useRawBytes config to Murmur and Murmur3 partition functions When partitioning on BYTES columns, the partition value is hex-encoded. With useRawBytes=true in functionConfig, the hex string is decoded back to raw bytes before hashing, ensuring partition assignment matches the original byte values rather than treating the hex string as UTF-8 text. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Override getFunctionConfig() and restore backward-compatible constructor - Murmur and Murmur3 partition functions now persist and return the functionConfig via getFunctionConfig(), ensuring partition metadata written to ZK retains the config so brokers can reconstruct the function correctly. - Re-introduce MurmurPartitionFunction(int) constructor delegating to the new (int, Map) constructor to preserve SPI backward compatibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Defensive copy and @nullable annotations for functionConfig - Store functionConfig as Collections.unmodifiableMap() to prevent external mutation from diverging with parsed fields. - Annotate _functionConfig field and getFunctionConfig() with @nullable in both MurmurPartitionFunction and Murmur3PartitionFunction. - Annotate Murmur3PartitionFunction constructor parameter with @nullable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add factory end-to-end test for useRawBytes with Murmur/Murmur2/Murmur3 Verify that PartitionFunctionFactory.getPartitionFunction() correctly wires functionConfig through to the partition function for all three aliases, and that getFunctionConfig() roundtrips the config. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
useRawBytesoption tofunctionConfigforMurmurPartitionFunctionandMurmur3PartitionFunction(defaults tofalse)useRawBytes=true, the hex-encoded partition value is decoded back to raw bytes viaBytesUtils.toBytes()before hashing, instead of treating it as UTF-8 text viavalue.getBytes(UTF_8)Test plan
testMurmurPartitionFunctionUseRawBytes— verifies hex-decoded bytes produce expected murmur2 hash partition, and that default behavior is unchangedtestMurmur3PartitionFunctionUseRawBytes— verifies hex-decoded bytes for x86_32, x64_32 variants, and non-zero seed, and that default behavior is unchangedPartitionFunctionTesttests pass🤖 Generated with Claude Code