Skip to content

[VL] Tighten cache-stats wire-edge bound checks#12224

Merged
yaooqinn merged 1 commit into
apache:mainfrom
yaooqinn:kentyao/fu-d5-wire-edge-bound-checks
Jun 3, 2026
Merged

[VL] Tighten cache-stats wire-edge bound checks#12224
yaooqinn merged 1 commit into
apache:mainfrom
yaooqinn:kentyao/fu-d5-wire-edge-bound-checks

Conversation

@yaooqinn
Copy link
Copy Markdown
Member

@yaooqinn yaooqinn commented Jun 2, 2026

What changes are proposed in this pull request?

Two defensive bound checks at the wire-edges of the cache-stats
serialization path:

  1. cpp JNI serializeWithStats: GLUTEN_CHECK that the framed
    payload fits in jsize (signed int32, ~2 GiB) before
    NewByteArray. A payload in the (2 GiB, 4 GiB] window (allowed by
    the inner bytesLen <= UINT32_MAX check) would currently wrap to
    a negative jsize and surface as NegativeArraySizeException
    with no actionable diagnostic. Fail fast as GlutenException with
    byte count + limit.

  2. JVM deserializeStats: when schema != null, require
    numCols <= schema.length. The > direction would IOB on
    schema(col) during type dispatch (a real corrupt-frame signal).
    The < direction is intentionally allowed -- existing fixtures
    like ColumnarCachedBatchIntFamilyMarshalSuite pass an EXPANDED
    5-field-per-source-col schema where schema.length == numCols * 5
    and only the first numCols entries drive dispatch. The new
    guard catches corrupt frames at the wire boundary instead of
    letting an undersized row propagate into a downstream
    ArrayIndexOutOfBoundsException. The pre-existing
    0 <= numCols <= Int.MaxValue/5 bound is preserved.

Both fixes are defense-in-depth -- no production caller triggers
either path today. Recovery on the JVM side is via the existing
NonFatal corrupt-frame catch in ColumnarCachedBatchSerializer.serialize
(no change), which falls back to legacy serialize() for the batch.

How was this patch tested?

  • ColumnarCachedBatchFramedBytesSuite: 3 new JVM tests covering
    numCols > schema.length rejection, numCols < schema.length
    loose-schema sentinel, and the schema=null V1-wire backcompat
    sentinel.
  • ColumnarCachedBatch*Suite family (9 suites, 69 tests): all
    pass against a fresh local libgluten.so / libvelox.so built
    from this branch. Local preflight (cpp + JVM compile + format +
    lint) clean.
  • No cpp gtest for the jsize guard: exercising the bound would
    require materializing a >2 GiB framed payload, which risks OOM on
    CI runners with 8-16 GiB total. The guard itself is a one-liner
    immediately before NewByteArray.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

Two defensive-bound checks at the cpp JNI / JVM deserializer edges of
the cache-stats wire path:

1. cpp JNI Java_..._serializeWithStats: GLUTEN_CHECK that the framed
   payload size fits in jsize (signed int32 / 2 GiB) before
   NewByteArray. Prevents NegativeArraySizeException with no
   actionable diagnostic when a >2 GiB framed payload is produced.

2. JVM CachedColumnarBatchKryoSerializer.deserializeStats: require
   numCols <= schema.length when schema is non-null. Catches the
   IOB-bound case (numCols > schema.length would index past
   schema(col) during type dispatch) with a clear cpp/JVM
   wire-mismatch IAE. The opposite direction (numCols < schema.length)
   is intentionally allowed — the public API is loose enough that
   callers (incl. existing unit-test fixtures) pass an EXPANDED
   5-field-per-source-col schema where schema.length == numCols * 5.

Test coverage:
- 1 RED test for the JVM numCols > schema.length corrupt-frame case.
- 1 sentinel for the loose-schema numCols < schema.length pattern
  used by ColumnarCachedBatchIntFamilyMarshalSuite et al.
- 1 sentinel for the schema=null V1-wire backcompat path.
- cpp side has no gtest (would require a >2 GiB allocation on CI
  runners; the bound itself is a one-liner immediately before the
  NewByteArray call).
@github-actions github-actions Bot added the VELOX label Jun 2, 2026
@yaooqinn yaooqinn merged commit 0dabb55 into apache:main Jun 3, 2026
61 checks passed
@yaooqinn yaooqinn deleted the kentyao/fu-d5-wire-edge-bound-checks branch June 3, 2026 04:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants