Skip to content

Add useRawBytes config to Murmur and Murmur3 partition functions#17932

Merged
xiangfu0 merged 4 commits intoapache:masterfrom
xiangfu0:add-use-raw-bytes-partition-function
Mar 22, 2026
Merged

Add useRawBytes config to Murmur and Murmur3 partition functions#17932
xiangfu0 merged 4 commits intoapache:masterfrom
xiangfu0:add-use-raw-bytes-partition-function

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

Summary

  • Add useRawBytes option to functionConfig for MurmurPartitionFunction and Murmur3PartitionFunction (defaults to false)
  • When useRawBytes=true, the hex-encoded partition value is decoded back to raw bytes via BytesUtils.toBytes() before hashing, instead of treating it as UTF-8 text via value.getBytes(UTF_8)
  • This enables correct partition assignment for BYTES columns where the value is hex-encoded but should be hashed on the original raw bytes

Test plan

  • Added testMurmurPartitionFunctionUseRawBytes — verifies hex-decoded bytes produce expected murmur2 hash partition, and that default behavior is unchanged
  • Added testMurmur3PartitionFunctionUseRawBytes — verifies hex-decoded bytes for x86_32, x64_32 variants, and non-zero seed, and that default behavior is unchanged
  • All existing PartitionFunctionTest tests pass

🤖 Generated with Claude Code

When partitioning on BYTES columns, the partition value is hex-encoded.
With useRawBytes=true in functionConfig, the hex string is decoded back
to raw bytes before hashing, ensuring partition assignment matches the
original byte values rather than treating the hex string as UTF-8 text.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@xiangfu0 xiangfu0 added the enhancement Improvement to existing functionality label Mar 22, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a useRawBytes option to Murmur/Murmur3 partition functions so BYTES values that are hex-encoded strings can be partitioned by hashing the original decoded bytes (instead of hashing the UTF-8 bytes of the hex text).

Changes:

  • Add useRawBytes parsing and hex-string decoding via BytesUtils.toBytes() in MurmurPartitionFunction and Murmur3PartitionFunction.
  • Pass functionConfig into MurmurPartitionFunction from PartitionFunctionFactory.
  • Add unit tests covering useRawBytes=true behavior and default behavior parity.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/MurmurPartitionFunction.java Adds useRawBytes support and changes constructor to accept functionConfig.
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java Adds useRawBytes support for both x86_32 and x64_32 hashing paths.
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/PartitionFunctionFactory.java Wires functionConfig through for Murmur/Murmur2 factory creation.
pinot-segment-spi/src/test/java/org/apache/pinot/segment/spi/partition/PartitionFunctionTest.java Adds coverage for useRawBytes behavior and updates Murmur ctor usage.
pinot-controller/src/test/java/org/apache/pinot/controller/utils/SegmentMetadataMockUtils.java Updates test helper to use the new Murmur constructor signature.

@xiangfu0 xiangfu0 added the feature New functionality label Mar 22, 2026
- Murmur and Murmur3 partition functions now persist and return the
  functionConfig via getFunctionConfig(), ensuring partition metadata
  written to ZK retains the config so brokers can reconstruct the
  function correctly.
- Re-introduce MurmurPartitionFunction(int) constructor delegating to
  the new (int, Map) constructor to preserve SPI backward compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

- Store functionConfig as Collections.unmodifiableMap() to prevent
  external mutation from diverging with parsed fields.
- Annotate _functionConfig field and getFunctionConfig() with @nullable
  in both MurmurPartitionFunction and Murmur3PartitionFunction.
- Annotate Murmur3PartitionFunction constructor parameter with @nullable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Verify that PartitionFunctionFactory.getPartitionFunction() correctly
wires functionConfig through to the partition function for all three
aliases, and that getFunctionConfig() roundtrips the config.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 22, 2026

Codecov Report

❌ Patch coverage is 95.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 63.24%. Comparing base (550afb7) to head (5f21f1c).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
...segment/spi/partition/MurmurPartitionFunction.java 87.50% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17932      +/-   ##
============================================
+ Coverage     63.19%   63.24%   +0.04%     
- Complexity     1481     1490       +9     
============================================
  Files          3191     3191              
  Lines        192592   192609      +17     
  Branches      29537    29542       +5     
============================================
+ Hits         121710   121813     +103     
+ Misses        61356    61251     -105     
- Partials       9526     9545      +19     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.21% <95.00%> (+0.05%) ⬆️
java-21 63.15% <95.00%> (-0.01%) ⬇️
temurin 63.24% <95.00%> (+0.04%) ⬆️
unittests 63.24% <95.00%> (+0.04%) ⬆️
unittests1 55.49% <95.00%> (-0.01%) ⬇️
unittests2 34.27% <0.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 merged commit 1329e1f into apache:master Mar 22, 2026
16 checks passed
@xiangfu0 xiangfu0 deleted the add-use-raw-bytes-partition-function branch March 22, 2026 05:19
xiangfu0 pushed a commit to pinot-contrib/pinot-docs that referenced this pull request Mar 22, 2026
xiangfu0 added a commit to pinot-contrib/pinot-docs that referenced this pull request Mar 22, 2026
…inot#17932) (#561)

Co-authored-by: Pinot Docs Bot <docs-bot@pinot.apache.org>
xiangfu0 added a commit to xiangfu0/pinot that referenced this pull request Mar 30, 2026
…che#17932)

* Add useRawBytes config to Murmur and Murmur3 partition functions

When partitioning on BYTES columns, the partition value is hex-encoded.
With useRawBytes=true in functionConfig, the hex string is decoded back
to raw bytes before hashing, ensuring partition assignment matches the
original byte values rather than treating the hex string as UTF-8 text.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Override getFunctionConfig() and restore backward-compatible constructor

- Murmur and Murmur3 partition functions now persist and return the
  functionConfig via getFunctionConfig(), ensuring partition metadata
  written to ZK retains the config so brokers can reconstruct the
  function correctly.
- Re-introduce MurmurPartitionFunction(int) constructor delegating to
  the new (int, Map) constructor to preserve SPI backward compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Defensive copy and @nullable annotations for functionConfig

- Store functionConfig as Collections.unmodifiableMap() to prevent
  external mutation from diverging with parsed fields.
- Annotate _functionConfig field and getFunctionConfig() with @nullable
  in both MurmurPartitionFunction and Murmur3PartitionFunction.
- Annotate Murmur3PartitionFunction constructor parameter with @nullable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add factory end-to-end test for useRawBytes with Murmur/Murmur2/Murmur3

Verify that PartitionFunctionFactory.getPartitionFunction() correctly
wires functionConfig through to the partition function for all three
aliases, and that getFunctionConfig() roundtrips the config.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improvement to existing functionality feature New functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants