Skip to content

INT96 dictionary-packed decode writes to the wrong byte offset #16459

@rdblue

Description

@rdblue

This issue was reported to the private Apache Iceberg security mailing list. The submitter is being kept anonymous because the report was sent to a private list. After review, the issue is not considered a serious vulnerability that needs to be kept private, so it is being filed publicly here for tracking and resolution.

Note: this submission was generated by AI. Please review its claims and source references carefully before acting on them.

Summary

The packed INT96 dictionary decode path writes at row indexes instead
of byte offsets, corrupting output and potentially surfacing stale
bytes from prior batches.

Affected Maven coordinates

  • primary shipped client artifact: org.apache.iceberg:iceberg-arrow

Attacker prerequisites

  • a workload that hits the affected vectorized decode path
  • vector reuse or multi-batch execution so the stale-byte behavior matters

Impact

  • Returned timestamp values can be corrupted.
  • When vector reuse is enabled, rows that were not fully overwritten
    can retain bytes from a prior batch.
  • That creates a narrow stale-data exposure path in addition to the
    integrity bug.

Proof status

Source review only. The issue is visible directly from source.

Key source references

  • org.apache.iceberg.arrow.vectorized.parquet.VectorizedParquetDefinitionLevelReader
  • org.apache.iceberg.arrow.vectorized.parquet.VectorizedDictionaryEncodedParquetValuesReader

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions