Skip to content

Dynamic PartitionFunction registry; unify int-to-id mapping via PartitionIdNormalizer#18446

Merged
xiangfu0 merged 10 commits into
apache:masterfrom
xiangfu0:worktree-refactor-partition-functions
May 9, 2026
Merged

Dynamic PartitionFunction registry; unify int-to-id mapping via PartitionIdNormalizer#18446
xiangfu0 merged 10 commits into
apache:masterfrom
xiangfu0:worktree-refactor-partition-functions

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 commented May 8, 2026

Summary

Replaces the closed PartitionFunctionFactory enum/switch with a dynamic registry, and unifies the int-to-partition-id mapping across all built-in functions behind a single PartitionIdNormalizer.

  • PartitionFunctionFactory becomes a classpath-scanning registry that walks every public, concrete PartitionFunction subtype under the org.apache.pinot.* package tree. New init() mirrors FunctionRegistry.init() and is wired into broker / server / controller starters.
  • Discovery is annotation-free. The PartitionFunction interface gains a default List<String> getNames() that returns [getName()]. The factory probes each subtype with (numPartitions = 1, functionConfig = null) and registers under whatever getNames() returns. Override getNames() only when you want aliases — the only built-in that does is MurmurPartitionFunction, which returns ["Murmur", "Murmur2"].
  • The seven built-in impls (Modulo, Murmur / Murmur2, Murmur3, Fnv, HashCode, ByteArray, BoundedColumnValue) move out of pinot-segment-spi into org.apache.pinot.common.partition.function.* and standardize on (int numPartitions, Map<String, String> functionConfig) constructor.
  • New PartitionIdNormalizer enum (POSITIVE_MODULO / ABS / MASK / PRE_MODULO_ABS / NO_OP) in pinot-segment-spi, plus a required getPartitionIdNormalizer() SPI method on PartitionFunction that returns the enum directly. The hash-based built-ins apply the configured normalizer (via a partitionIdNormalizer config key) directly in getPartition(...), so the value is the authoritative driver of the int-to-id mapping, not just a descriptive label.

Why

Today, registering a custom partition function requires editing PartitionFunctionFactory.PartitionFunctionType and the switch block in the same SPI module. After this change:

  • Plug-ins drop a class on the classpath under org.apache.pinot.*, implement PartitionFunction, and the registry picks it up at startup. No annotation, no factory edits.
  • Pulling the int-normalizer out into PartitionIdNormalizer (a slice from Add partition function expressions for chained partitioning #18165) gives operators a uniform knob — partitionIdNormalizer — that works the same way across every built-in function, and gives the framework a clean way to describe int-to-partition-id semantics for staleness / identity matching against segment metadata.

PartitionIdNormalizer reference

Every Pinot built-in PartitionFunction ultimately produces a signed integer (or long) and needs to fold it into [0, numPartitions). PartitionIdNormalizer enumerates the four ways the project does that fold today. The enum is selectable per-column via the partitionIdNormalizer config key on any built-in function; the values are also exposed as the SPI value returned by PartitionFunction#getPartitionIdNormalizer() so brokers / servers / controllers can confirm config-side and segment-side metadata agree.

POSITIVE_MODULO

Formula: int p = value % N; return p < 0 ? p + N : p; (long variant casts at the end).

Behavior: Java-style remainder, then shift any negative result back into [0, N) so every input lands in a non-negative partition. Output is in [0, N) for every signed input including Integer.MIN_VALUE and Long.MIN_VALUE.

When to use:

  • The function's input is already a signed integer that you want partitioned mathematically (e.g. Modulo on a signed bigint column key).
  • The function's raw output is already in [0, N) and you want a no-op label that documents that fact (e.g. BoundedColumnValue).
  • You want the simplest, most predictable distribution for human-readable values.

Used as the default for: Modulo.

Example difference vs. other normalizers: for value = -10, N = 7, POSITIVE_MODULO returns 4 (because -10 % 7 = -3, then -3 + 7 = 4). ABS returns 3 (abs(-3) = 3). MASK returns (-10 & 0x7FFFFFFF) % 7 = 2147483638 % 7 = 0.

ABS

Formula: int p = value % N; return p < 0 ? -p : p; (long variant casts at the end).

Behavior: Computes the remainder first, then takes its absolute value. Also lands every input in [0, N). Differs from POSITIVE_MODULO only when the remainder is negative — ABS reflects across zero (-3 → 3) instead of shifting (-3 → 4). For Integer.MIN_VALUE it is overflow-safe because the % N step always produces a remainder strictly larger than Integer.MIN_VALUE (so -p is well-defined).

When to use:

  • You're matching an existing system that distributes negative-hash inputs by reflecting the remainder across zero rather than wrapping it.
  • You need a deterministic, side-symmetric mapping (a value v and -v land in the same partition for most inputs).

Used as the default for: none of the built-ins by default, but reachable via config on every function (most commonly on FNV for compatibility with Kafka-style consumers that wrap signed ABS over a 32-bit hash).

Example: for value = -10, N = 7, ABS returns 3. For value = 10, it also returns 3.

MASK

Formula: (value & Integer.MAX_VALUE) % N (long variant: (value & Long.MAX_VALUE) % N, cast).

Behavior: Strips the sign bit by masking, then applies modulo. Cheapest of the four (one AND + one mod, no branches). Output is in [0, N) for every signed input. The downside is that the sign bit being stripped means the upper-half and lower-half of the signed integer space collapse onto the same partitions — for hash functions whose output is approximately uniform this is fine; for non-hash inputs (e.g. row counters) it can cause skew.

When to use:

  • The input is the output of a uniform hash function (Murmur2, Murmur3, FNV1a), where collapsing the sign bit doesn't bias the distribution.
  • You want the lowest-overhead path on the hot per-row partitioning loop.

Used as the default for: Murmur / Murmur2, Murmur3, FNV.

Example: for value = -10, N = 7, MASK returns 0 (since -10 & 0x7FFFFFFF = 2147483638 and 2147483638 % 7 = 0). For value = Integer.MIN_VALUE, MASK returns 0 (the sign bit is the value's only bit, so masking yields 0).

PRE_MODULO_ABS (new in this PR)

Formula: int abs = (value == Integer.MIN_VALUE) ? 0 : Math.abs(value); return abs % N; (long variant: same shape, Long.MIN_VALUE0L, cast at the end).

Behavior: Kafka producer's Utils.toPositive / partition semantics — abs before modulo (hence "pre-modulo") and patches the Math.abs(MIN_VALUE) overflow corner (since Math.abs(Integer.MIN_VALUE) == Integer.MIN_VALUE) by mapping MIN_VALUE → 0. Output is in [0, N) for every signed input. For most inputs it agrees with ABS, but the two differ on Integer.MIN_VALUE and on negative inputs whose abs would otherwise overflow.

When to use:

  • You're matching the legacy HashCode / ByteArray distribution bit-for-bit, which Pinot has shipped historically and which Kafka producers use.
  • You're partitioning to be co-located with a Kafka topic that uses Kafka's default partitioner.

Used as the default for: HashCode, ByteArray.

Example: for value = Integer.MIN_VALUE, N = 7, PRE_MODULO_ABS returns 0 (after the MIN_VALUE → 0 patch). ABS would return abs(Integer.MIN_VALUE % 7) = abs(-2) = 2. MASK would return 0 (incidentally agreeing with PRE_MODULO_ABS on this specific value, but for different reasons).

NO_OP (new in this PR)

Formula: return value; (long variant: return (int) value;).

Behavior: Identity. Returns the input unchanged. The framework does NOT validate that the input is in [0, numPartitions) — passing an out-of-range value yields an out-of-range partition id.

When to use:

  • Your PartitionFunction.getPartition(...) already produces a value in [0, numPartitions) by construction (e.g. lookup-style functions where partition ids come from a fixed mapping).
  • You want a label that documents "no normalization needed" without lying about the underlying math.

Used as the default for: BoundedColumnValue.

Example: for value = 3, N = 7, NO_OP returns 3. For value = -5, N = 7, NO_OP returns -5 (the caller violated the contract).

Choosing a normalizer

Goal Pick
Default for a new hash-based plug-in MASK (cheapest, uniform if hash is uniform)
Default for a non-hash signed input POSITIVE_MODULO
Output is already in [0, N) NO_OP (identity, no validation)
Match Kafka producer's partition assignment PRE_MODULO_ABS
Match a system that does abs(hash) % N without MIN_VALUE patching ABS

Numeric example: same hash, different normalizers

For value = -10, N = 7:

Normalizer Result
POSITIVE_MODULO 4
ABS 3
MASK 0
PRE_MODULO_ABS 3

For value = Integer.MIN_VALUE, N = 7:

Normalizer Result
POSITIVE_MODULO 5
ABS 2
MASK 0
PRE_MODULO_ABS 0

User manual

Table config

Partition functions are configured per column under tableIndexConfig.segmentPartitionConfig.columnPartitionMap:

{
  "tableIndexConfig": {
    "segmentPartitionConfig": {
      "columnPartitionMap": {
        "memberId": {
          "functionName": "Murmur3",
          "numPartitions": 32,
          "functionConfig": {
            "seed": "0"
          }
        }
      }
    }
  }
}

functionName is matched case-insensitively against the registry. functionConfig is an optional Map<String, String> whose accepted keys are function-specific.

Built-in functions

Function name Aliases Default normalizer Function-specific config keys
Modulo POSITIVE_MODULO none
Murmur Murmur2 MASK useRawBytes
Murmur3 MASK seed, variant (x86_32 / x64_32), useRawBytes
FNV MASK variant (fnv1_32, fnv1a_32, fnv1_64, fnv1a_64), useRawBytes
HashCode PRE_MODULO_ABS none
ByteArray PRE_MODULO_ABS none
BoundedColumnValue NO_OP columnValues, columnValuesDelimiter (required)

The six hash-based functions (everything except BoundedColumnValue) accept the unified key partitionIdNormalizer (case-insensitive) that selects a PartitionIdNormalizer; see the reference section above. BoundedColumnValue produces a fixed mapping in [0, numPartitions) and does not honor the override.

Examples

Murmur with explicit POSITIVE_MODULO

{
  "columnPartitionMap": {
    "userId": {
      "functionName": "Murmur",
      "numPartitions": 16,
      "functionConfig": {
        "partitionIdNormalizer": "POSITIVE_MODULO"
      }
    }
  }
}

Modulo on a signed bigint column

Modulo parses the input as a long, so it works for both positive and negative integer keys. The default POSITIVE_MODULO keeps -7 % 4 → 1 (instead of the Java -3).

{
  "columnPartitionMap": {
    "txnId": {
      "functionName": "Modulo",
      "numPartitions": 4
    }
  }
}

FNV with raw bytes + ABS normalizer

{
  "columnPartitionMap": {
    "memberId": {
      "functionName": "FNV",
      "numPartitions": 64,
      "functionConfig": {
        "variant": "fnv1a_32",
        "useRawBytes": "true",
        "partitionIdNormalizer": "ABS"
      }
    }
  }
}

Murmur3 x64_32 with non-default seed

{
  "columnPartitionMap": {
    "deviceId": {
      "functionName": "Murmur3",
      "numPartitions": 128,
      "functionConfig": {
        "variant": "x64_32",
        "seed": "42"
      }
    }
  }
}

BoundedColumnValue (lookup-style partitioning)

{
  "columnPartitionMap": {
    "subject": {
      "functionName": "BoundedColumnValue",
      "numPartitions": 4,
      "functionConfig": {
        "columnValues": "Maths|English|Chemistry",
        "columnValuesDelimiter": "|"
      }
    }
  }
}

Maths → 1, English → 2, Chemistry → 3, anything else → 0. numPartitions must equal the number of listed values plus one.


Plugin author manual: registering a custom partition function

  1. Implement PartitionFunction — the SPI lives in pinot-segment-spi.

    public interface PartitionFunction extends Serializable {
      int getPartition(String value);
      String getName();
      default List<String> getNames() { return List.of(getName()); }  // override only for aliases
      int getNumPartitions();
      @Nullable Map<String, String> getFunctionConfig();
      PartitionIdNormalizer getPartitionIdNormalizer();  // required
    }
  2. Place the class anywhere under org.apache.pinot.* — the registry scans this package tree at startup. Standard locations:

    • org.apache.pinot.common.partition.function.* (built-in path)
    • org.apache.pinot.plugin.<plugin-name>.partition.function.* (Pinot-plugin path)
  3. Expose a public (int numPartitions, Map<String, String> functionConfig) constructor — the registry instantiates impls reflectively via this signature, both for the startup getNames() probe (called with (1, null)) and for actual use. Implementations that don't read from functionConfig should still accept it. If your ctor would reject (1, null), special-case it (see BoundedColumnValuePartitionFunction for an example).

  4. Override getNames() only if you want aliases — e.g. MurmurPartitionFunction returns List.of("Murmur", "Murmur2") so both names resolve to the same impl.

Full example (no aliases)

package org.apache.pinot.plugin.acme.partition.function;

import java.util.Map;
import javax.annotation.Nullable;
import org.apache.pinot.segment.spi.partition.PartitionFunction;
import org.apache.pinot.segment.spi.partition.PartitionIdNormalizer;

public class CityHashPartitionFunction implements PartitionFunction {
  private static final String NAME = "CityHash";
  private static final PartitionIdNormalizer DEFAULT_NORMALIZER = PartitionIdNormalizer.MASK;

  private final int _numPartitions;
  private final PartitionIdNormalizer _normalizer;

  public CityHashPartitionFunction(int numPartitions, @Nullable Map<String, String> functionConfig) {
    _numPartitions = numPartitions;
    String raw = functionConfig != null ? functionConfig.get("partitionIdNormalizer") : null;
    _normalizer = raw == null ? DEFAULT_NORMALIZER : PartitionIdNormalizer.fromConfigString(raw);
  }

  @Override
  public int getPartition(String value) {
    int hash = computeCityHash32(value); // your hash impl
    return _normalizer.getPartitionId(hash, _numPartitions);
  }

  @Override public String getName() { return NAME; }
  @Override public int getNumPartitions() { return _numPartitions; }
  @Override public Map<String, String> getFunctionConfig() { return null; }
  @Override public PartitionIdNormalizer getPartitionIdNormalizer() { return _normalizer; }

  private static int computeCityHash32(String value) { /* ... */ return 0; }
}

Drop the resulting jar on the broker / server / controller / minion classpaths (e.g. via pinot-plugins/) and reference it by name in table config:

{
  "columnPartitionMap": {
    "tenant": {
      "functionName": "CityHash",
      "numPartitions": 32
    }
  }
}

The starters call PartitionFunctionFactory.init() early in startup; the registry scan then picks the class up automatically.

Example with aliases

public class CityHashPartitionFunction implements PartitionFunction {
  private static final List<String> NAMES = List.of("CityHash", "city");
  // ...
  @Override
  public List<String> getNames() {
    return NAMES;
  }
}

Both "CityHash" and "city" resolve to the same impl class.


Backward compatibility

This change is backward-incompat for any external code that referenced the moved classes:

  • The seven impl classes' fully-qualified names changed: org.apache.pinot.segment.spi.partition.*org.apache.pinot.common.partition.function.*.
  • The legacy single-arg constructors on ModuloPartitionFunction, HashCodePartitionFunction, and ByteArrayPartitionFunction are gone; all impls now use (int numPartitions, Map<String, String> functionConfig) (config may be null).
  • PartitionFunctionFactory.PartitionFunctionType enum is removed (it was an internal switch helper).
  • The FNV-specific negativePartitionHandling config key is replaced by the unified partitionIdNormalizer key. Existing values (mask / abs) carry over verbatim.
  • PartitionFunction#getPartitionIdNormalizer() is now required (no default); plug-in implementations must declare a value.

Segment-on-disk format is unaffected. getName() strings (Modulo, Murmur, Murmur3, FNV, HashCode, ByteArray, BoundedColumnValue) are unchanged, so existing segment metadata resolves through the new registry by name.

Test plan

  • ./mvnw spotless:apply checkstyle:check license:format license:check -pl pinot-spi,pinot-segment-spi,pinot-common,pinot-broker,pinot-server,pinot-controller,pinot-segment-local
  • ./mvnw -pl pinot-common -am -Dtest=PartitionFunctionTest,PartitionFunctionFactoryTest,PartitionIdNormalizerTest -Dsurefire.failIfNoSpecifiedTests=false test (38 tests pass)
  • ./mvnw -pl pinot-segment-local -am -Dtest=SegmentPartitionTest,ColumnMetadataTest,MutableSegmentImplDropRecordOnPartitionMismatchTest,RealtimeSegmentConverterTest -Dsurefire.failIfNoSpecifiedTests=false test (178 tests pass)
  • ./mvnw -pl pinot-core -am -Dtest=ColumnValueSegmentPrunerTest -Dsurefire.failIfNoSpecifiedTests=false test (3 tests pass)
  • ./mvnw -pl pinot-broker,pinot-server,pinot-controller,pinot-segment-local,pinot-core,pinot-tools -am test-compile (full test-compile clean)

New regression coverage:

  • PartitionFunctionFactoryTest — registration completeness across all 7 impls; alias resolution (Murmur / Murmur2); case-insensitive lookup; idempotent init(); unknown-name rejection; per-impl getPartitionIdNormalizer() value; partitionIdNormalizer config rewires the actual computation across HashCode / Modulo / ByteArray; default getNames() returns [getName()]; Murmur's override returns the two-alias list.
  • PartitionIdNormalizerTest — per-normalizer math at Integer.MIN_VALUE / Integer.MAX_VALUE / negative / Long.MIN_VALUE; range invariant [0, numPartitions) across all four normalizers and a sweep of partition counts; PRE_MODULO_ABS covers the MIN_VALUE → 0 patch; NO_OP is identity (range invariant skipped intentionally); fromConfigString round-trip, blank / null rejection, unknown-name rejection.

🤖 Generated with Claude Code

Replaces the closed `PartitionFunctionFactory` enum/switch with an
annotation-based registry so plug-in partition functions no longer
require touching segment-spi code.

- Add `@PartitionFunctionType(names = {...})` annotation in pinot-spi.
- Convert `PartitionFunctionFactory` to a classpath-scanning registry
  (regex `.*\.partition\.function\..*`); the factory keeps the same
  static API, callers are unchanged. Add `init()` mirroring
  `FunctionRegistry.init()` and wire it from broker / server /
  controller starters.
- Move the seven built-in impls (`Modulo`, `Murmur` / `Murmur2`,
  `Murmur3`, `Fnv`, `HashCode`, `ByteArray`, `BoundedColumnValue`)
  out of `pinot-segment-spi` into
  `pinot-common/.../partition/function/`, standardize their
  constructor on `(int numPartitions, Map<String,String> functionConfig)`,
  and annotate each.
- Add `PartitionIntNormalizer` enum (`POSITIVE_MODULO` / `ABS` / `MASK`)
  in pinot-segment-spi and a default `getPartitionIdNormalizer()` on
  `PartitionFunction` so impls can declare which normalizer matches
  their internal modulo semantics. Used by the framework only for
  identity / staleness matching between config-side and segment-side
  metadata; legacy impls still compute their own modulo. Javadoc spells
  out the descriptive-only nature of the value for legacy functions.
- Tests: existing `PartitionFunctionTest` (19 cases) moved to
  pinot-common; new `PartitionFunctionFactoryTest` covers
  registration completeness, alias resolution (`Murmur` /
  `Murmur2`), case-insensitive lookup, idempotent `init()`,
  unknown-name rejection, and per-impl normalizer label;
  new `PartitionIntNormalizerTest` covers per-normalizer math at
  edge cases (`Integer.MIN_VALUE`, `Integer.MAX_VALUE`),
  range invariant across all normalizers, and `fromConfigString`
  round-trip / blank / unknown.

Plug-in path going forward: drop a class on the classpath under
`*.partition.function.*`, implement `PartitionFunction` with the
standard ctor, add `@PartitionFunctionType(names = "MyFn")` - the
registry picks it up at startup.

NOTE - backward-incompat: the seven impl classes' fully-qualified
names changed (`org.apache.pinot.segment.spi.partition.*` ->
`org.apache.pinot.common.partition.function.*`) and the no-Map
constructors on `Modulo` / `HashCode` / `ByteArray` are gone in favor
of the standard `(int, Map<String,String>)` form. `getName()` strings
(`Modulo`, `Murmur`, ...) are unchanged, so segment-on-disk metadata
is unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@xiangfu0 xiangfu0 added the backward-incompat Introduces a backward-incompatible API or behavior change label May 8, 2026
xiangfu0 and others added 4 commits May 8, 2026 14:01
…ctly

Removes the duplicated `NegativePartitionHandling` inner enum in
`FnvPartitionFunction` and the `_negativePartitionHandling` field in
favor of a single `_normalizer` of type `PartitionIntNormalizer`. The
config key (`negativePartitionHandling`) and accepted values stay
unchanged for `mask` / `abs`; `positive_modulo` is now also
acceptable (strict superset). The error message for unknown values
now comes from `PartitionIntNormalizer.fromConfigString`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The interface default now returns `PartitionIntNormalizer.POSITIVE_MODULO`
instead of null, and `@Nullable` is dropped from the method. Plug-ins
that don't map onto a standard normalizer (e.g. `BoundedColumnValue`,
which already produces ids in `[0, N)`) inherit the safe default
without needing an explicit override.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes the interface default so every PartitionFunction must explicitly
pick a normalizer. BoundedColumnValuePartitionFunction now overrides
with POSITIVE_MODULO (no-op label, since its output is already a fixed
mapping in [0, N)).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every built-in PartitionFunction now stores a `_normalizer` of type
PartitionIntNormalizer and applies it in `getPartition(...)` rather
than open-coding the modulo logic. Defaults match prior behavior:

| Impl              | Default normalizer |
|-------------------|--------------------|
| Modulo            | POSITIVE_MODULO    |
| Murmur / Murmur2  | MASK               |
| Murmur3           | MASK               |
| Fnv               | MASK               |
| HashCode          | KAFKA_ABS (new)    |
| ByteArray         | KAFKA_ABS (new)    |
| BoundedColumnValue| POSITIVE_MODULO    |

Adds `KAFKA_ABS` to PartitionIntNormalizer to cover the Kafka-style
`abs(hash) % N` (with `Integer.MIN_VALUE -> 0`) used by HashCode and
ByteArray. With KAFKA_ABS in place the normalizer is now an
authoritative driver of the partition-id computation, not just a
descriptive label - the interface Javadoc is updated accordingly.

Unifies the per-impl override path under a single config key
`partitionIdNormalizer` (case-insensitive), parsed via shared
`PartitionFunctionConfigs#normalizer`. Drops the FNV-specific
`negativePartitionHandling` key.

Tests:
- New `KAFKA_ABS` cases in PartitionIntNormalizerTest covering the
  MIN_VALUE corner.
- New `testPartitionIdNormalizerConfigOverridesDefaultAcrossImpls`
  in PartitionFunctionFactoryTest verifying the config rewires the
  computed partition for HashCode, Modulo, and ByteArray.
- FNV tests updated to use `partitionIdNormalizer` config key.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@xiangfu0 xiangfu0 requested review from Jackie-Jiang and Copilot May 8, 2026 21:27
@Jackie-Jiang Jackie-Jiang added refactor Code restructuring without changing behavior plugins Related to the plugin system labels May 8, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the hard-coded PartitionFunctionFactory enum/switch with an annotation-driven registry discovered via reflection, moves the built-in partition functions into pinot-common, and introduces PartitionIntNormalizer plus a new PartitionFunction#getPartitionIdNormalizer() SPI to label/standardize partition-id normalization semantics.

Changes:

  • Add @PartitionFunctionType and update PartitionFunctionFactory to classpath-scan and instantiate partition functions by name/aliases.
  • Move built-in partition function implementations to org.apache.pinot.common.partition.function.* and standardize constructors to (int, Map<String,String>).
  • Add PartitionIntNormalizer and tests; wire eager factory init into broker/server/controller startup.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pinot-spi/src/main/java/org/apache/pinot/spi/annotations/PartitionFunctionType.java New annotation used to mark partition function implementations for discovery.
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/PartitionFunctionFactory.java Replaced switch/enum with reflection-based registry + init() hook.
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/PartitionFunction.java Adds getPartitionIdNormalizer() contract for partition-id normalization labeling.
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/PartitionIntNormalizer.java New enum implementing multiple normalization strategies (incl. Kafka-style ABS).
pinot-segment-spi/src/test/java/org/apache/pinot/segment/spi/partition/PartitionIntNormalizerTest.java Unit tests for PartitionIntNormalizer edge cases and parsing.
pinot-common/src/main/java/org/apache/pinot/common/partition/function/PartitionFunctionConfigs.java Shared helper for parsing partitionIdNormalizer from function config.
pinot-common/src/main/java/org/apache/pinot/common/partition/function/ModuloPartitionFunction.java Moved built-in impl + applies configured normalizer + annotation.
pinot-common/src/main/java/org/apache/pinot/common/partition/function/MurmurPartitionFunction.java Moved built-in impl + alias registration (Murmur/Murmur2) + normalizer support.
pinot-common/src/main/java/org/apache/pinot/common/partition/function/Murmur3PartitionFunction.java Moved built-in impl + normalizer support + annotation.
pinot-common/src/main/java/org/apache/pinot/common/partition/function/FnvPartitionFunction.java Moved built-in impl; replaces old negative-handling config with normalizer config.
pinot-common/src/main/java/org/apache/pinot/common/partition/function/HashCodePartitionFunction.java Moved built-in impl + normalizer support + annotation.
pinot-common/src/main/java/org/apache/pinot/common/partition/function/ByteArrayPartitionFunction.java Moved built-in impl + normalizer support + annotation.
pinot-common/src/main/java/org/apache/pinot/common/partition/function/BoundedColumnValuePartitionFunction.java Moved built-in impl + normalizer labeling support + annotation.
pinot-common/src/test/java/org/apache/pinot/common/partition/function/PartitionFunctionTest.java Updates existing tests to new package/constructors and config key rename.
pinot-common/src/test/java/org/apache/pinot/common/partition/function/PartitionFunctionFactoryTest.java New tests validating dynamic registry behavior (aliases, init, normalizer labels).
pinot-server/src/main/java/org/apache/pinot/server/starter/ServerInstance.java Eagerly initializes PartitionFunctionFactory during server startup.
pinot-broker/src/main/java/org/apache/pinot/broker/broker/helix/BaseBrokerStarter.java Eagerly initializes PartitionFunctionFactory during broker startup.
pinot-controller/src/main/java/org/apache/pinot/controller/BaseControllerStarter.java Eagerly initializes PartitionFunctionFactory during controller startup.
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/creator/SegmentPartitionTest.java Updates imports due to moved implementations.
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/ColumnMetadataTest.java Updates imports due to moved implementations.
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImplDropRecordOnPartitionMismatchTest.java Updates imports and constructor signature for moved ModuloPartitionFunction.
pinot-controller/src/test/java/org/apache/pinot/controller/utils/SegmentMetadataMockUtils.java Updates imports and constructor signature for moved MurmurPartitionFunction.

Switches all new Javadoc blocks added by this PR from `/** */` block
syntax to `///` markdown syntax (JEP 467), matching the convention
established in apache#18165 and the rest of the recently-touched
pinot-segment-spi files. Replaces `<p>`, `<ul>/<li>`, `<code>`,
`{@code X}`, `{@link X}` HTML/Javadoc tags with their markdown
equivalents (paragraph breaks, `-` lists, backticks, `[X]` refs).
License headers remain as `/** */` block comments.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Rename `PartitionIntNormalizer` -> `PartitionIdNormalizer` (Jackie #3).
- Rename `KAFKA_ABS` -> `PRE_MODULO_ABS` to make the pre-vs-post-modulo
  distinction with `ABS` explicit (Jackie #4).
- Change `getPartitionIdNormalizer()` to return `PartitionIdNormalizer`
  enum directly instead of `String`; drop `@JsonIgnore` so the field is
  visible in serialized form (Jackie #2). Method stays abstract -- every
  PartitionFunction implementation declares its own normalizer.
- Drop the redundant `_normalizer` field from
  `BoundedColumnValuePartitionFunction`; it now returns
  `PartitionIdNormalizer.POSITIVE_MODULO` directly since its output is
  already a fixed mapping in `[0, numPartitions)` (Jackie #6).
- Make `@PartitionFunctionType` annotation optional. The factory now
  scans every public, concrete `PartitionFunction` subtype under the
  `org.apache.pinot.*` package tree. When a class lacks the annotation
  (or `names()` is empty), the registry probes
  `PartitionFunction.getName()` by instantiating with `(1, null)` and
  registers under the returned name (Jackie #5). Annotation is now a
  pure aliasing / overriding mechanism.
- Fix Javadoc on `@PartitionFunctionType` and `PartitionFunctionFactory`
  to drop the incorrect "any plugin package" claim and the unimplemented
  "stripping underscores" note (Copilot #1, #2, #4).
- Add `UnannotatedTestPartitionFunction` fixture + factory test that
  exercises the no-annotation `getName()` fallback path.
- Update `PartitionFunctionTest` to expect 4 JSON fields (the new
  `partitionIdNormalizer` field is no longer hidden).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

pinot-common/src/main/java/org/apache/pinot/common/partition/function/BoundedColumnValuePartitionFunction.java:101

  • BoundedColumnValuePartitionFunction always reports PartitionIdNormalizer.POSITIVE_MODULO and does not parse/apply the shared partitionIdNormalizer functionConfig key that the other built-in partition functions honor via PartitionFunctionConfigs.normalizer(...). This means a user-provided partitionIdNormalizer value will be silently ignored and will not round-trip via getPartitionIdNormalizer() / JSON serialization, which can also interfere with metadata identity/staleness checks that rely on the reported normalizer. Consider reading the config key (even if it’s a no-op for the computation) and returning the configured normalizer label for consistency across built-ins.

…ame()

Validates each entry in `@PartitionFunctionType.names()` at registry
build time:

- Each entry is trimmed of surrounding whitespace.
- Blank entries are dropped silently.
- When ALL declared entries are blank (or the array is empty after
  filtering), the registry logs a warning and falls back to probing
  `PartitionFunction.getName()`, the same path used for unannotated
  classes.

This prevents a misconfigured annotation (e.g. `names = {"  "}`) from
silently registering a function under an empty canonical name, making
it undiscoverable at lookup time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@xiangfu0 xiangfu0 changed the title Make PartitionFunction registry dynamic; move impls to pinot-common Dynamic PartitionFunction registry; unify int-to-id mapping via PartitionIdNormalizer May 8, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 8, 2026

Codecov Report

❌ Patch coverage is 76.42857% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.64%. Comparing base (4863cdc) to head (1b3e70e).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
...egment/spi/partition/PartitionFunctionFactory.java 55.55% 18 Missing and 6 partials ⚠️
.../function/BoundedColumnValuePartitionFunction.java 77.77% 0 Missing and 2 partials ⚠️
...t/segment/spi/partition/PartitionIdNormalizer.java 93.33% 0 Missing and 2 partials ⚠️
...e/pinot/broker/broker/helix/BaseBrokerStarter.java 0.00% 1 Missing ⚠️
...partition/function/ByteArrayPartitionFunction.java 83.33% 0 Missing and 1 partial ⚠️
.../partition/function/HashCodePartitionFunction.java 83.33% 0 Missing and 1 partial ⚠️
...on/partition/function/ModuloPartitionFunction.java 83.33% 0 Missing and 1 partial ⚠️
...rg/apache/pinot/server/starter/ServerInstance.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18446      +/-   ##
============================================
- Coverage     63.65%   63.64%   -0.01%     
+ Complexity     1735     1684      -51     
============================================
  Files          3254     3256       +2     
  Lines        199446   199519      +73     
  Branches      30977    30989      +12     
============================================
+ Hits         126953   126992      +39     
- Misses        62356    62388      +32     
- Partials      10137    10139       +2     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 63.64% <76.42%> (-0.01%) ⬇️
temurin 63.64% <76.42%> (-0.01%) ⬇️
unittests 63.64% <76.42%> (-0.01%) ⬇️
unittests1 55.71% <77.37%> (+0.01%) ⬆️
unittests2 34.98% <23.57%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

xiangfu0 and others added 2 commits May 8, 2026 16:40
…) default

The annotation added no value beyond what `PartitionFunction.getName()`
already returned for every built-in. Remove it entirely and let the
interface itself declare the registry contract:

- New default `List<String> getNames()` on `PartitionFunction` returns
  `[getName()]`. The factory's static scan instantiates each subtype
  with `(1, null)` and registers under whatever `getNames()` returns.
- Only `MurmurPartitionFunction` overrides `getNames()` (returns
  `["Murmur", "Murmur2"]` so both aliases resolve to the same impl);
  the other six built-ins use the default.
- `getNames()` is `@JsonIgnore`'d so it doesn't pollute the
  serialized form (`testBasicProperties` JSON-shape assertion stays
  at four fields).
- `BoundedColumnValuePartitionFunction`'s ctor now tolerates `null`
  config (probe path) and defers validation; real-config use still
  throws as before. `getPartition()` rejects probe-built instances.
- `@PartitionFunctionType` annotation file deleted.
- `UnannotatedTestPartitionFunction` fixture removed (no longer
  meaningful now that every class is "unannotated").
- Added two new factory tests covering the default `getNames()`
  behavior and Murmur's override.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`BoundedColumnValuePartitionFunction` produces a fixed mapping in
`[0, numPartitions)` by construction; the previous label was
`POSITIVE_MODULO` (a no-op for in-range values, but semantically
overloaded). Add an explicit `NO_OP` value that is the identity on
its inputs and use it for `BoundedColumnValue`. The framework does
not validate that callers actually pass in-range values to NO_OP —
out-of-range inputs yield out-of-range partition ids, by design.

`PartitionIdNormalizerTest#testRangeFoldingNormalizersReturnInRange`
(was `testAllNormalizersReturnInRange`) now skips NO_OP since the
range invariant doesn't apply. New `testNoOpIsIdentity` locks the
identity-with-narrowing semantics including the explicit
out-of-range pass-through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@xiangfu0 xiangfu0 merged commit a3d7b2a into apache:master May 9, 2026
10 of 11 checks passed
@xiangfu0 xiangfu0 deleted the worktree-refactor-partition-functions branch May 9, 2026 02:22
@xiangfu0
Copy link
Copy Markdown
Contributor Author

xiangfu0 commented May 9, 2026

Docs follow-up: pinot-contrib/pinot-docs#805 documents the updated partition function registry, built-in normalizer defaults, Murmur2 aliasing, and BoundedColumnValue guidance. The docs PR has already merged into latest: pinot-contrib/pinot-docs#805

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backward-incompat Introduces a backward-incompatible API or behavior change plugins Related to the plugin system refactor Code restructuring without changing behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants