Skip to content

Api: Harden Variant Reading#16335

Draft
steveloughran wants to merge 1 commit into
apache:mainfrom
steveloughran:pr/variant-hardening
Draft

Api: Harden Variant Reading#16335
steveloughran wants to merge 1 commit into
apache:mainfrom
steveloughran:pr/variant-hardening

Conversation

@steveloughran
Copy link
Copy Markdown
Contributor

Hardens variant metadata and value data parsing

  • long multiplication to avoid overflows
  • reject metadata, array data or object data which extends beyond the range of the variant data in the file.

No checks on variant size (spark has this), or the depth limits of apache/parquet-java#3562 . I think that should be discussed there as to whether its appropriate at all, before copying.

No tests yet

Fixes #16334

With help from Claude

Fixes apache#16334

Change-Id: I0fc34bfcb62ccab1d4111df5ca5db9e22ce5dfcc
@steveloughran steveloughran marked this pull request as draft May 14, 2026 14:17
@github-actions github-actions Bot added the API label May 14, 2026
@steveloughran steveloughran changed the title Harden Variant Reading Api: Harden Variant Reading May 14, 2026
@steveloughran
Copy link
Copy Markdown
Contributor Author

test failures

  1. regession: different exception (fix: assert differently for -Ve values)
  2. others look all test suite setup, as you can no longer mimic a large array without creating a large array, etc

TestSerializedArray > testEmptyArray() FAILED
    java.lang.IllegalArgumentException: Variant array offset table extends past buffer: numElements=0 offsetSize=1 remaining=2
        at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:484)
        at org.apache.iceberg.variants.SerializedArray.<init>(SerializedArray.java:70)
        at org.apache.iceberg.variants.SerializedArray.from(SerializedArray.java:43)
        at org.apache.iceberg.variants.SerializedArray.from(SerializedArray.java:34)
        at org.apache.iceberg.variants.TestSerializedArray.testEmptyArray(TestSerializedArray.java:58)

TestSerializedArray > testEmptyLargeArray() STARTED

TestSerializedArray > testEmptyLargeArray() FAILED
    java.lang.IllegalArgumentException: Variant array offset table extends past buffer: numElements=0 offsetSize=1 remaining=5
        at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:484)
        at org.apache.iceberg.variants.SerializedArray.<init>(SerializedArray.java:70)
        at org.apache.iceberg.variants.SerializedArray.from(SerializedArray.java:43)
        at org.apache.iceberg.variants.SerializedArray.from(SerializedArray.java:34)
        at org.apache.iceberg.variants.TestSerializedArray.testEmptyLargeArray(TestSerializedArray.java:67)

TestSerializedArray > testArrayOfMixedTypes() STARTED

TestSerializedArray > testArrayOfMixedTypes() PASSED

TestSerializedArray > testLargeArraySize() STARTED

TestSerializedArray > testLargeArraySize() FAILED
    java.lang.IllegalArgumentException: Variant array offset table extends past buffer: numElements=511 offsetSize=1 remaining=5
        at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:484)
        at org.apache.iceberg.variants.SerializedArray.<init>(SerializedArray.java:70)
        at org.apache.iceberg.variants.SerializedArray.from(SerializedArray.java:43)
        at org.apache.iceberg.variants.SerializedArray.from(SerializedArray.java:34)
        at org.apache.iceberg.variants.TestSerializedArray.testLargeArraySize(TestSerializedArray.java:183)

TestSerializedArray > testNegativeArraySize() STARTED

TestSerializedArray > testNegativeArraySize() FAILED
    java.lang.AssertionError: 
    Expecting actual throwable to be an instance of:
      java.lang.NegativeArraySizeException
    but was:
      java.lang.IllegalArgumentException: Invalid variant array element count: -1
    	at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:192)
    	at org.apache.iceberg.variants.SerializedArray.<init>(SerializedArray.java:65)
    	at org.apache.iceberg.variants.SerializedArray.from(SerializedArray.java:43)
    	...(98 remaining lines not displayed - this can be changed with Assertions.setMaxStackTraceElementsDisplayed)
        at org.apache.iceberg.variants.TestSerializedArray.testNegativeArraySize(TestSerializedArray.java:197)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Harden Variant Reading

1 participant