This issue was reported to the private Apache Iceberg security mailing list. The submitter is being kept anonymous because the report was sent to a private list. After review, the issue is not considered a serious vulnerability that needs to be kept private, so it is being filed publicly here for tracking and resolution.
Note: this submission was generated by AI. Please review its claims and source references carefully before acting on them.
Summary
The packed INT96 dictionary decode path writes at row indexes instead
of byte offsets, corrupting output and potentially surfacing stale
bytes from prior batches.
Affected Maven coordinates
- primary shipped client artifact:
org.apache.iceberg:iceberg-arrow
Attacker prerequisites
- a workload that hits the affected vectorized decode path
- vector reuse or multi-batch execution so the stale-byte behavior matters
Impact
- Returned timestamp values can be corrupted.
- When vector reuse is enabled, rows that were not fully overwritten
can retain bytes from a prior batch.
- That creates a narrow stale-data exposure path in addition to the
integrity bug.
Proof status
Source review only. The issue is visible directly from source.
Key source references
- org.apache.iceberg.arrow.vectorized.parquet.VectorizedParquetDefinitionLevelReader
- org.apache.iceberg.arrow.vectorized.parquet.VectorizedDictionaryEncodedParquetValuesReader
Summary
The packed INT96 dictionary decode path writes at row indexes instead
of byte offsets, corrupting output and potentially surfacing stale
bytes from prior batches.
Affected Maven coordinates
org.apache.iceberg:iceberg-arrowAttacker prerequisites
Impact
can retain bytes from a prior batch.
integrity bug.
Proof status
Source review only. The issue is visible directly from source.
Key source references