Dynamic PartitionFunction registry; unify int-to-id mapping via PartitionIdNormalizer#18446
Conversation
Replaces the closed `PartitionFunctionFactory` enum/switch with an
annotation-based registry so plug-in partition functions no longer
require touching segment-spi code.
- Add `@PartitionFunctionType(names = {...})` annotation in pinot-spi.
- Convert `PartitionFunctionFactory` to a classpath-scanning registry
(regex `.*\.partition\.function\..*`); the factory keeps the same
static API, callers are unchanged. Add `init()` mirroring
`FunctionRegistry.init()` and wire it from broker / server /
controller starters.
- Move the seven built-in impls (`Modulo`, `Murmur` / `Murmur2`,
`Murmur3`, `Fnv`, `HashCode`, `ByteArray`, `BoundedColumnValue`)
out of `pinot-segment-spi` into
`pinot-common/.../partition/function/`, standardize their
constructor on `(int numPartitions, Map<String,String> functionConfig)`,
and annotate each.
- Add `PartitionIntNormalizer` enum (`POSITIVE_MODULO` / `ABS` / `MASK`)
in pinot-segment-spi and a default `getPartitionIdNormalizer()` on
`PartitionFunction` so impls can declare which normalizer matches
their internal modulo semantics. Used by the framework only for
identity / staleness matching between config-side and segment-side
metadata; legacy impls still compute their own modulo. Javadoc spells
out the descriptive-only nature of the value for legacy functions.
- Tests: existing `PartitionFunctionTest` (19 cases) moved to
pinot-common; new `PartitionFunctionFactoryTest` covers
registration completeness, alias resolution (`Murmur` /
`Murmur2`), case-insensitive lookup, idempotent `init()`,
unknown-name rejection, and per-impl normalizer label;
new `PartitionIntNormalizerTest` covers per-normalizer math at
edge cases (`Integer.MIN_VALUE`, `Integer.MAX_VALUE`),
range invariant across all normalizers, and `fromConfigString`
round-trip / blank / unknown.
Plug-in path going forward: drop a class on the classpath under
`*.partition.function.*`, implement `PartitionFunction` with the
standard ctor, add `@PartitionFunctionType(names = "MyFn")` - the
registry picks it up at startup.
NOTE - backward-incompat: the seven impl classes' fully-qualified
names changed (`org.apache.pinot.segment.spi.partition.*` ->
`org.apache.pinot.common.partition.function.*`) and the no-Map
constructors on `Modulo` / `HashCode` / `ByteArray` are gone in favor
of the standard `(int, Map<String,String>)` form. `getName()` strings
(`Modulo`, `Murmur`, ...) are unchanged, so segment-on-disk metadata
is unaffected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ctly Removes the duplicated `NegativePartitionHandling` inner enum in `FnvPartitionFunction` and the `_negativePartitionHandling` field in favor of a single `_normalizer` of type `PartitionIntNormalizer`. The config key (`negativePartitionHandling`) and accepted values stay unchanged for `mask` / `abs`; `positive_modulo` is now also acceptable (strict superset). The error message for unknown values now comes from `PartitionIntNormalizer.fromConfigString`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The interface default now returns `PartitionIntNormalizer.POSITIVE_MODULO` instead of null, and `@Nullable` is dropped from the method. Plug-ins that don't map onto a standard normalizer (e.g. `BoundedColumnValue`, which already produces ids in `[0, N)`) inherit the safe default without needing an explicit override. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes the interface default so every PartitionFunction must explicitly pick a normalizer. BoundedColumnValuePartitionFunction now overrides with POSITIVE_MODULO (no-op label, since its output is already a fixed mapping in [0, N)). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every built-in PartitionFunction now stores a `_normalizer` of type PartitionIntNormalizer and applies it in `getPartition(...)` rather than open-coding the modulo logic. Defaults match prior behavior: | Impl | Default normalizer | |-------------------|--------------------| | Modulo | POSITIVE_MODULO | | Murmur / Murmur2 | MASK | | Murmur3 | MASK | | Fnv | MASK | | HashCode | KAFKA_ABS (new) | | ByteArray | KAFKA_ABS (new) | | BoundedColumnValue| POSITIVE_MODULO | Adds `KAFKA_ABS` to PartitionIntNormalizer to cover the Kafka-style `abs(hash) % N` (with `Integer.MIN_VALUE -> 0`) used by HashCode and ByteArray. With KAFKA_ABS in place the normalizer is now an authoritative driver of the partition-id computation, not just a descriptive label - the interface Javadoc is updated accordingly. Unifies the per-impl override path under a single config key `partitionIdNormalizer` (case-insensitive), parsed via shared `PartitionFunctionConfigs#normalizer`. Drops the FNV-specific `negativePartitionHandling` key. Tests: - New `KAFKA_ABS` cases in PartitionIntNormalizerTest covering the MIN_VALUE corner. - New `testPartitionIdNormalizerConfigOverridesDefaultAcrossImpls` in PartitionFunctionFactoryTest verifying the config rewires the computed partition for HashCode, Modulo, and ByteArray. - FNV tests updated to use `partitionIdNormalizer` config key. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR replaces the hard-coded PartitionFunctionFactory enum/switch with an annotation-driven registry discovered via reflection, moves the built-in partition functions into pinot-common, and introduces PartitionIntNormalizer plus a new PartitionFunction#getPartitionIdNormalizer() SPI to label/standardize partition-id normalization semantics.
Changes:
- Add
@PartitionFunctionTypeand updatePartitionFunctionFactoryto classpath-scan and instantiate partition functions by name/aliases. - Move built-in partition function implementations to
org.apache.pinot.common.partition.function.*and standardize constructors to(int, Map<String,String>). - Add
PartitionIntNormalizerand tests; wire eager factory init into broker/server/controller startup.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pinot-spi/src/main/java/org/apache/pinot/spi/annotations/PartitionFunctionType.java | New annotation used to mark partition function implementations for discovery. |
| pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/PartitionFunctionFactory.java | Replaced switch/enum with reflection-based registry + init() hook. |
| pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/PartitionFunction.java | Adds getPartitionIdNormalizer() contract for partition-id normalization labeling. |
| pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/PartitionIntNormalizer.java | New enum implementing multiple normalization strategies (incl. Kafka-style ABS). |
| pinot-segment-spi/src/test/java/org/apache/pinot/segment/spi/partition/PartitionIntNormalizerTest.java | Unit tests for PartitionIntNormalizer edge cases and parsing. |
| pinot-common/src/main/java/org/apache/pinot/common/partition/function/PartitionFunctionConfigs.java | Shared helper for parsing partitionIdNormalizer from function config. |
| pinot-common/src/main/java/org/apache/pinot/common/partition/function/ModuloPartitionFunction.java | Moved built-in impl + applies configured normalizer + annotation. |
| pinot-common/src/main/java/org/apache/pinot/common/partition/function/MurmurPartitionFunction.java | Moved built-in impl + alias registration (Murmur/Murmur2) + normalizer support. |
| pinot-common/src/main/java/org/apache/pinot/common/partition/function/Murmur3PartitionFunction.java | Moved built-in impl + normalizer support + annotation. |
| pinot-common/src/main/java/org/apache/pinot/common/partition/function/FnvPartitionFunction.java | Moved built-in impl; replaces old negative-handling config with normalizer config. |
| pinot-common/src/main/java/org/apache/pinot/common/partition/function/HashCodePartitionFunction.java | Moved built-in impl + normalizer support + annotation. |
| pinot-common/src/main/java/org/apache/pinot/common/partition/function/ByteArrayPartitionFunction.java | Moved built-in impl + normalizer support + annotation. |
| pinot-common/src/main/java/org/apache/pinot/common/partition/function/BoundedColumnValuePartitionFunction.java | Moved built-in impl + normalizer labeling support + annotation. |
| pinot-common/src/test/java/org/apache/pinot/common/partition/function/PartitionFunctionTest.java | Updates existing tests to new package/constructors and config key rename. |
| pinot-common/src/test/java/org/apache/pinot/common/partition/function/PartitionFunctionFactoryTest.java | New tests validating dynamic registry behavior (aliases, init, normalizer labels). |
| pinot-server/src/main/java/org/apache/pinot/server/starter/ServerInstance.java | Eagerly initializes PartitionFunctionFactory during server startup. |
| pinot-broker/src/main/java/org/apache/pinot/broker/broker/helix/BaseBrokerStarter.java | Eagerly initializes PartitionFunctionFactory during broker startup. |
| pinot-controller/src/main/java/org/apache/pinot/controller/BaseControllerStarter.java | Eagerly initializes PartitionFunctionFactory during controller startup. |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/creator/SegmentPartitionTest.java | Updates imports due to moved implementations. |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/ColumnMetadataTest.java | Updates imports due to moved implementations. |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImplDropRecordOnPartitionMismatchTest.java | Updates imports and constructor signature for moved ModuloPartitionFunction. |
| pinot-controller/src/test/java/org/apache/pinot/controller/utils/SegmentMetadataMockUtils.java | Updates imports and constructor signature for moved MurmurPartitionFunction. |
Switches all new Javadoc blocks added by this PR from `/** */` block syntax to `///` markdown syntax (JEP 467), matching the convention established in apache#18165 and the rest of the recently-touched pinot-segment-spi files. Replaces `<p>`, `<ul>/<li>`, `<code>`, `{@code X}`, `{@link X}` HTML/Javadoc tags with their markdown equivalents (paragraph breaks, `-` lists, backticks, `[X]` refs). License headers remain as `/** */` block comments. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Rename `PartitionIntNormalizer` -> `PartitionIdNormalizer` (Jackie #3). - Rename `KAFKA_ABS` -> `PRE_MODULO_ABS` to make the pre-vs-post-modulo distinction with `ABS` explicit (Jackie #4). - Change `getPartitionIdNormalizer()` to return `PartitionIdNormalizer` enum directly instead of `String`; drop `@JsonIgnore` so the field is visible in serialized form (Jackie #2). Method stays abstract -- every PartitionFunction implementation declares its own normalizer. - Drop the redundant `_normalizer` field from `BoundedColumnValuePartitionFunction`; it now returns `PartitionIdNormalizer.POSITIVE_MODULO` directly since its output is already a fixed mapping in `[0, numPartitions)` (Jackie #6). - Make `@PartitionFunctionType` annotation optional. The factory now scans every public, concrete `PartitionFunction` subtype under the `org.apache.pinot.*` package tree. When a class lacks the annotation (or `names()` is empty), the registry probes `PartitionFunction.getName()` by instantiating with `(1, null)` and registers under the returned name (Jackie #5). Annotation is now a pure aliasing / overriding mechanism. - Fix Javadoc on `@PartitionFunctionType` and `PartitionFunctionFactory` to drop the incorrect "any plugin package" claim and the unimplemented "stripping underscores" note (Copilot #1, #2, #4). - Add `UnannotatedTestPartitionFunction` fixture + factory test that exercises the no-annotation `getName()` fallback path. - Update `PartitionFunctionTest` to expect 4 JSON fields (the new `partitionIdNormalizer` field is no longer hidden). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 23 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
pinot-common/src/main/java/org/apache/pinot/common/partition/function/BoundedColumnValuePartitionFunction.java:101
BoundedColumnValuePartitionFunctionalways reportsPartitionIdNormalizer.POSITIVE_MODULOand does not parse/apply the sharedpartitionIdNormalizerfunctionConfig key that the other built-in partition functions honor viaPartitionFunctionConfigs.normalizer(...). This means a user-providedpartitionIdNormalizervalue will be silently ignored and will not round-trip viagetPartitionIdNormalizer()/ JSON serialization, which can also interfere with metadata identity/staleness checks that rely on the reported normalizer. Consider reading the config key (even if it’s a no-op for the computation) and returning the configured normalizer label for consistency across built-ins.
…ame()
Validates each entry in `@PartitionFunctionType.names()` at registry
build time:
- Each entry is trimmed of surrounding whitespace.
- Blank entries are dropped silently.
- When ALL declared entries are blank (or the array is empty after
filtering), the registry logs a warning and falls back to probing
`PartitionFunction.getName()`, the same path used for unannotated
classes.
This prevents a misconfigured annotation (e.g. `names = {" "}`) from
silently registering a function under an empty canonical name, making
it undiscoverable at lookup time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18446 +/- ##
============================================
- Coverage 63.65% 63.64% -0.01%
+ Complexity 1735 1684 -51
============================================
Files 3254 3256 +2
Lines 199446 199519 +73
Branches 30977 30989 +12
============================================
+ Hits 126953 126992 +39
- Misses 62356 62388 +32
- Partials 10137 10139 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…) default The annotation added no value beyond what `PartitionFunction.getName()` already returned for every built-in. Remove it entirely and let the interface itself declare the registry contract: - New default `List<String> getNames()` on `PartitionFunction` returns `[getName()]`. The factory's static scan instantiates each subtype with `(1, null)` and registers under whatever `getNames()` returns. - Only `MurmurPartitionFunction` overrides `getNames()` (returns `["Murmur", "Murmur2"]` so both aliases resolve to the same impl); the other six built-ins use the default. - `getNames()` is `@JsonIgnore`'d so it doesn't pollute the serialized form (`testBasicProperties` JSON-shape assertion stays at four fields). - `BoundedColumnValuePartitionFunction`'s ctor now tolerates `null` config (probe path) and defers validation; real-config use still throws as before. `getPartition()` rejects probe-built instances. - `@PartitionFunctionType` annotation file deleted. - `UnannotatedTestPartitionFunction` fixture removed (no longer meaningful now that every class is "unannotated"). - Added two new factory tests covering the default `getNames()` behavior and Murmur's override. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`BoundedColumnValuePartitionFunction` produces a fixed mapping in `[0, numPartitions)` by construction; the previous label was `POSITIVE_MODULO` (a no-op for in-range values, but semantically overloaded). Add an explicit `NO_OP` value that is the identity on its inputs and use it for `BoundedColumnValue`. The framework does not validate that callers actually pass in-range values to NO_OP — out-of-range inputs yield out-of-range partition ids, by design. `PartitionIdNormalizerTest#testRangeFoldingNormalizersReturnInRange` (was `testAllNormalizersReturnInRange`) now skips NO_OP since the range invariant doesn't apply. New `testNoOpIsIdentity` locks the identity-with-narrowing semantics including the explicit out-of-range pass-through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Docs follow-up: pinot-contrib/pinot-docs#805 documents the updated partition function registry, built-in normalizer defaults, |
Summary
Replaces the closed
PartitionFunctionFactoryenum/switch with a dynamic registry, and unifies the int-to-partition-id mapping across all built-in functions behind a singlePartitionIdNormalizer.PartitionFunctionFactorybecomes a classpath-scanning registry that walks every public, concretePartitionFunctionsubtype under theorg.apache.pinot.*package tree. Newinit()mirrorsFunctionRegistry.init()and is wired into broker / server / controller starters.PartitionFunctioninterface gains adefault List<String> getNames()that returns[getName()]. The factory probes each subtype with(numPartitions = 1, functionConfig = null)and registers under whatevergetNames()returns. OverridegetNames()only when you want aliases — the only built-in that does isMurmurPartitionFunction, which returns["Murmur", "Murmur2"].Modulo,Murmur/Murmur2,Murmur3,Fnv,HashCode,ByteArray,BoundedColumnValue) move out ofpinot-segment-spiintoorg.apache.pinot.common.partition.function.*and standardize on(int numPartitions, Map<String, String> functionConfig)constructor.PartitionIdNormalizerenum (POSITIVE_MODULO/ABS/MASK/PRE_MODULO_ABS/NO_OP) inpinot-segment-spi, plus a requiredgetPartitionIdNormalizer()SPI method onPartitionFunctionthat returns the enum directly. The hash-based built-ins apply the configured normalizer (via apartitionIdNormalizerconfig key) directly ingetPartition(...), so the value is the authoritative driver of the int-to-id mapping, not just a descriptive label.Why
Today, registering a custom partition function requires editing
PartitionFunctionFactory.PartitionFunctionTypeand theswitchblock in the same SPI module. After this change:org.apache.pinot.*, implementPartitionFunction, and the registry picks it up at startup. No annotation, no factory edits.PartitionIdNormalizer(a slice from Add partition function expressions for chained partitioning #18165) gives operators a uniform knob —partitionIdNormalizer— that works the same way across every built-in function, and gives the framework a clean way to describe int-to-partition-id semantics for staleness / identity matching against segment metadata.PartitionIdNormalizerreferenceEvery Pinot built-in
PartitionFunctionultimately produces a signed integer (or long) and needs to fold it into[0, numPartitions).PartitionIdNormalizerenumerates the four ways the project does that fold today. The enum is selectable per-column via thepartitionIdNormalizerconfig key on any built-in function; the values are also exposed as the SPI value returned byPartitionFunction#getPartitionIdNormalizer()so brokers / servers / controllers can confirm config-side and segment-side metadata agree.POSITIVE_MODULOFormula:
int p = value % N; return p < 0 ? p + N : p;(long variant casts at the end).Behavior: Java-style remainder, then shift any negative result back into
[0, N)so every input lands in a non-negative partition. Output is in[0, N)for every signed input includingInteger.MIN_VALUEandLong.MIN_VALUE.When to use:
Moduloon a signedbigintcolumn key).[0, N)and you want a no-op label that documents that fact (e.g.BoundedColumnValue).Used as the default for:
Modulo.Example difference vs. other normalizers: for
value = -10,N = 7,POSITIVE_MODULOreturns4(because-10 % 7 = -3, then-3 + 7 = 4).ABSreturns3(abs(-3) = 3).MASKreturns(-10 & 0x7FFFFFFF) % 7 = 2147483638 % 7 = 0.ABSFormula:
int p = value % N; return p < 0 ? -p : p;(long variant casts at the end).Behavior: Computes the remainder first, then takes its absolute value. Also lands every input in
[0, N). Differs fromPOSITIVE_MODULOonly when the remainder is negative —ABSreflects across zero (-3 → 3) instead of shifting (-3 → 4). ForInteger.MIN_VALUEit is overflow-safe because the% Nstep always produces a remainder strictly larger thanInteger.MIN_VALUE(so-pis well-defined).When to use:
vand-vland in the same partition for most inputs).Used as the default for: none of the built-ins by default, but reachable via config on every function (most commonly on
FNVfor compatibility with Kafka-style consumers that wrap signed ABS over a 32-bit hash).Example: for
value = -10,N = 7,ABSreturns3. Forvalue = 10, it also returns3.MASKFormula:
(value & Integer.MAX_VALUE) % N(long variant:(value & Long.MAX_VALUE) % N, cast).Behavior: Strips the sign bit by masking, then applies modulo. Cheapest of the four (one AND + one mod, no branches). Output is in
[0, N)for every signed input. The downside is that the sign bit being stripped means the upper-half and lower-half of the signed integer space collapse onto the same partitions — for hash functions whose output is approximately uniform this is fine; for non-hash inputs (e.g. row counters) it can cause skew.When to use:
Murmur2,Murmur3,FNV1a), where collapsing the sign bit doesn't bias the distribution.Used as the default for:
Murmur/Murmur2,Murmur3,FNV.Example: for
value = -10,N = 7,MASKreturns0(since-10 & 0x7FFFFFFF = 2147483638and2147483638 % 7 = 0). Forvalue = Integer.MIN_VALUE,MASKreturns0(the sign bit is the value's only bit, so masking yields0).PRE_MODULO_ABS(new in this PR)Formula:
int abs = (value == Integer.MIN_VALUE) ? 0 : Math.abs(value); return abs % N;(long variant: same shape,Long.MIN_VALUE→0L, cast at the end).Behavior: Kafka producer's
Utils.toPositive/partitionsemantics — abs before modulo (hence "pre-modulo") and patches theMath.abs(MIN_VALUE)overflow corner (sinceMath.abs(Integer.MIN_VALUE) == Integer.MIN_VALUE) by mappingMIN_VALUE → 0. Output is in[0, N)for every signed input. For most inputs it agrees withABS, but the two differ onInteger.MIN_VALUEand on negative inputs whose abs would otherwise overflow.When to use:
HashCode/ByteArraydistribution bit-for-bit, which Pinot has shipped historically and which Kafka producers use.Used as the default for:
HashCode,ByteArray.Example: for
value = Integer.MIN_VALUE,N = 7,PRE_MODULO_ABSreturns0(after theMIN_VALUE → 0patch).ABSwould returnabs(Integer.MIN_VALUE % 7) = abs(-2) = 2.MASKwould return0(incidentally agreeing withPRE_MODULO_ABSon this specific value, but for different reasons).NO_OP(new in this PR)Formula:
return value;(long variant:return (int) value;).Behavior: Identity. Returns the input unchanged. The framework does NOT validate that the input is in
[0, numPartitions)— passing an out-of-range value yields an out-of-range partition id.When to use:
PartitionFunction.getPartition(...)already produces a value in[0, numPartitions)by construction (e.g. lookup-style functions where partition ids come from a fixed mapping).Used as the default for:
BoundedColumnValue.Example: for
value = 3,N = 7,NO_OPreturns3. Forvalue = -5,N = 7,NO_OPreturns-5(the caller violated the contract).Choosing a normalizer
MASK(cheapest, uniform if hash is uniform)POSITIVE_MODULO[0, N)NO_OP(identity, no validation)PRE_MODULO_ABSabs(hash) % NwithoutMIN_VALUEpatchingABSNumeric example: same hash, different normalizers
For
value = -10,N = 7:POSITIVE_MODULO4ABS3MASK0PRE_MODULO_ABS3For
value = Integer.MIN_VALUE,N = 7:POSITIVE_MODULO5ABS2MASK0PRE_MODULO_ABS0User manual
Table config
Partition functions are configured per column under
tableIndexConfig.segmentPartitionConfig.columnPartitionMap:{ "tableIndexConfig": { "segmentPartitionConfig": { "columnPartitionMap": { "memberId": { "functionName": "Murmur3", "numPartitions": 32, "functionConfig": { "seed": "0" } } } } } }functionNameis matched case-insensitively against the registry.functionConfigis an optionalMap<String, String>whose accepted keys are function-specific.Built-in functions
ModuloPOSITIVE_MODULOMurmurMurmur2MASKuseRawBytesMurmur3MASKseed,variant(x86_32/x64_32),useRawBytesFNVMASKvariant(fnv1_32,fnv1a_32,fnv1_64,fnv1a_64),useRawBytesHashCodePRE_MODULO_ABSByteArrayPRE_MODULO_ABSBoundedColumnValueNO_OPcolumnValues,columnValuesDelimiter(required)The six hash-based functions (everything except
BoundedColumnValue) accept the unified keypartitionIdNormalizer(case-insensitive) that selects aPartitionIdNormalizer; see the reference section above.BoundedColumnValueproduces a fixed mapping in[0, numPartitions)and does not honor the override.Examples
Murmur with explicit POSITIVE_MODULO
{ "columnPartitionMap": { "userId": { "functionName": "Murmur", "numPartitions": 16, "functionConfig": { "partitionIdNormalizer": "POSITIVE_MODULO" } } } }Modulo on a signed bigint column
Moduloparses the input as along, so it works for both positive and negative integer keys. The defaultPOSITIVE_MODULOkeeps-7 % 4 → 1(instead of the Java-3).{ "columnPartitionMap": { "txnId": { "functionName": "Modulo", "numPartitions": 4 } } }FNV with raw bytes + ABS normalizer
{ "columnPartitionMap": { "memberId": { "functionName": "FNV", "numPartitions": 64, "functionConfig": { "variant": "fnv1a_32", "useRawBytes": "true", "partitionIdNormalizer": "ABS" } } } }Murmur3 x64_32 with non-default seed
{ "columnPartitionMap": { "deviceId": { "functionName": "Murmur3", "numPartitions": 128, "functionConfig": { "variant": "x64_32", "seed": "42" } } } }BoundedColumnValue (lookup-style partitioning)
{ "columnPartitionMap": { "subject": { "functionName": "BoundedColumnValue", "numPartitions": 4, "functionConfig": { "columnValues": "Maths|English|Chemistry", "columnValuesDelimiter": "|" } } } }Maths → 1,English → 2,Chemistry → 3, anything else →0.numPartitionsmust equal the number of listed values plus one.Plugin author manual: registering a custom partition function
Implement
PartitionFunction— the SPI lives inpinot-segment-spi.Place the class anywhere under
org.apache.pinot.*— the registry scans this package tree at startup. Standard locations:org.apache.pinot.common.partition.function.*(built-in path)org.apache.pinot.plugin.<plugin-name>.partition.function.*(Pinot-plugin path)Expose a public
(int numPartitions, Map<String, String> functionConfig)constructor — the registry instantiates impls reflectively via this signature, both for the startupgetNames()probe (called with(1, null)) and for actual use. Implementations that don't read fromfunctionConfigshould still accept it. If your ctor would reject(1, null), special-case it (seeBoundedColumnValuePartitionFunctionfor an example).Override
getNames()only if you want aliases — e.g.MurmurPartitionFunctionreturnsList.of("Murmur", "Murmur2")so both names resolve to the same impl.Full example (no aliases)
Drop the resulting jar on the broker / server / controller / minion classpaths (e.g. via
pinot-plugins/) and reference it by name in table config:{ "columnPartitionMap": { "tenant": { "functionName": "CityHash", "numPartitions": 32 } } }The starters call
PartitionFunctionFactory.init()early in startup; the registry scan then picks the class up automatically.Example with aliases
Both
"CityHash"and"city"resolve to the same impl class.Backward compatibility
This change is
backward-incompatfor any external code that referenced the moved classes:org.apache.pinot.segment.spi.partition.*→org.apache.pinot.common.partition.function.*.ModuloPartitionFunction,HashCodePartitionFunction, andByteArrayPartitionFunctionare gone; all impls now use(int numPartitions, Map<String, String> functionConfig)(config may benull).PartitionFunctionFactory.PartitionFunctionTypeenum is removed (it was an internal switch helper).negativePartitionHandlingconfig key is replaced by the unifiedpartitionIdNormalizerkey. Existing values (mask/abs) carry over verbatim.PartitionFunction#getPartitionIdNormalizer()is now required (no default); plug-in implementations must declare a value.Segment-on-disk format is unaffected.
getName()strings (Modulo,Murmur,Murmur3,FNV,HashCode,ByteArray,BoundedColumnValue) are unchanged, so existing segment metadata resolves through the new registry by name.Test plan
./mvnw spotless:apply checkstyle:check license:format license:check -pl pinot-spi,pinot-segment-spi,pinot-common,pinot-broker,pinot-server,pinot-controller,pinot-segment-local./mvnw -pl pinot-common -am -Dtest=PartitionFunctionTest,PartitionFunctionFactoryTest,PartitionIdNormalizerTest -Dsurefire.failIfNoSpecifiedTests=false test(38 tests pass)./mvnw -pl pinot-segment-local -am -Dtest=SegmentPartitionTest,ColumnMetadataTest,MutableSegmentImplDropRecordOnPartitionMismatchTest,RealtimeSegmentConverterTest -Dsurefire.failIfNoSpecifiedTests=false test(178 tests pass)./mvnw -pl pinot-core -am -Dtest=ColumnValueSegmentPrunerTest -Dsurefire.failIfNoSpecifiedTests=false test(3 tests pass)./mvnw -pl pinot-broker,pinot-server,pinot-controller,pinot-segment-local,pinot-core,pinot-tools -am test-compile(full test-compile clean)New regression coverage:
PartitionFunctionFactoryTest— registration completeness across all 7 impls; alias resolution (Murmur/Murmur2); case-insensitive lookup; idempotentinit(); unknown-name rejection; per-implgetPartitionIdNormalizer()value;partitionIdNormalizerconfig rewires the actual computation across HashCode / Modulo / ByteArray; defaultgetNames()returns[getName()]; Murmur's override returns the two-alias list.PartitionIdNormalizerTest— per-normalizer math atInteger.MIN_VALUE/Integer.MAX_VALUE/ negative /Long.MIN_VALUE; range invariant[0, numPartitions)across all four normalizers and a sweep of partition counts;PRE_MODULO_ABScovers theMIN_VALUE → 0patch;NO_OPis identity (range invariant skipped intentionally);fromConfigStringround-trip, blank / null rejection, unknown-name rejection.🤖 Generated with Claude Code