Skip to content

fix: support all packed value lengths in NumericFieldStats.decodeLong#15817

Merged
romseygeek merged 3 commits intoapache:mainfrom
salvatorecampagna:fix/numeric-field-stats-short-bytes
Mar 13, 2026
Merged

fix: support all packed value lengths in NumericFieldStats.decodeLong#15817
romseygeek merged 3 commits intoapache:mainfrom
salvatorecampagna:fix/numeric-field-stats-short-bytes

Conversation

@salvatorecampagna
Copy link
Contributor

@salvatorecampagna salvatorecampagna commented Mar 12, 2026

Problem

NumericFieldStats.decodeLong (introduced in #15760) only handled 4-byte (IntField) and 8-byte (LongField) packed point values via a switch on packed.length. Any other width caused an IllegalArgumentException, crashing the query for queries using byte lengths other than 4 or 8.

HalfFloatPoint produces 2-byte packed values. A range query on such a field triggered the exception. The original PR only tested IntField (4 bytes) and LongField (8 bytes), so CI did not catch the bug:

java.lang.IllegalArgumentException: Unsupported packed value length: 2 (expected 8 or 4)
    at org.apache.lucene.search.NumericFieldStats.decodeLong(NumericFieldStats.java:121)
    at org.apache.lucene.search.NumericFieldStats.getStatsFromPoints(NumericFieldStats.java:72)
    at org.apache.lucene.search.NumericFieldStats.getStats(NumericFieldStats.java:58)
    ...

Solution

Replace the switch-based decoder with a generic loop that handles any packed value length from 1 to 8 bytes. All Lucene point types use the same encoding: big-endian byte order with the sign bit flipped. The loop reads the bytes sequentially, re-flips the sign bit on the first byte, and sign-extends the result into a long. This is allocation-free, unlike NumericUtils.sortableBytesToBigInt, which copies the array and creates a BigInteger.

For point fields wider than 8 bytes (e.g. InetAddressPoint at 16 bytes, BigIntegerPoint at 16 bytes), getStatsFromPoints now returns null instead of throwing, allowing getStats to fall through to the DocValuesSkipper path. These wider point types are never used with SortedNumericDocValuesRangeQuery in practice, but the graceful fallback avoids unexpected failures.

Tests

  • TestNumericFieldStats.testGetStatsWithAllByteWidths: exercises decodeLong with min, zero, and max values at every byte width from 1 to 8
  • TestNumericFieldStats.testGetStatsReturnsNullForWidePointValues: verifies graceful null return for InetAddressPoint (16 bytes)
  • TestHalfFloatPoint.testNumericFieldStats: integration test with real HalfFloatPoint (2 bytes)
./gradlew -p lucene/core test --tests "org.apache.lucene.search.TestNumericFieldStats"
./gradlew -p lucene/sandbox test --tests "org.apache.lucene.sandbox.document.TestHalfFloatPoint.testNumericFieldStats"

Follows up on #15760.

The previous implementation only handled Integer.BYTES (4) and Long.BYTES (8),
throwing IllegalArgumentException for other lengths. This broke fields using
2-byte point values such as HalfFloatPoint.

Replace the switch with a generic big-endian decoder that handles any length
from 1 to 8 bytes. For point fields wider than Long.BYTES (e.g. InetAddressPoint),
getStatsFromPoints returns null to fall through to the DocValuesSkipper path.
@salvatorecampagna salvatorecampagna changed the title fix: support all packed value lengths in NumericFieldStats.decodeLong fix: support all packed value lengths in NumericFieldStats.decodeLong Mar 12, 2026
@salvatorecampagna salvatorecampagna marked this pull request as ready for review March 12, 2026 15:29
@github-actions github-actions bot added this to the 10.5.0 milestone Mar 12, 2026
Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for updating @salvatore-campagna

@romseygeek romseygeek merged commit 64af7b8 into apache:main Mar 13, 2026
13 checks passed
romseygeek pushed a commit that referenced this pull request Mar 13, 2026
)

The previous implementation only handled Integer.BYTES (4) and Long.BYTES (8),
throwing IllegalArgumentException for other lengths. This broke fields using
2-byte point values such as HalfFloatPoint.

Replaces the switch with a generic big-endian decoder that handles any length
from 1 to 8 bytes. For point fields wider than Long.BYTES (e.g. InetAddressPoint),
getStatsFromPoints returns null to fall through to the DocValuesSkipper path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants