Skip to content

Arrow: Fix ClassCastException in vectorized reader on int-to-long pro…#16343

Merged
CTTY merged 2 commits into
apache:mainfrom
xndai:iceberg-16341
May 20, 2026
Merged

Arrow: Fix ClassCastException in vectorized reader on int-to-long pro…#16343
CTTY merged 2 commits into
apache:mainfrom
xndai:iceberg-16341

Conversation

@xndai
Copy link
Copy Markdown
Contributor

@xndai xndai commented May 14, 2026

…motion with INT logical type

Fix ClassCastException: BigIntVector cannot be cast to IntVector when reading Parquet files with INT(32, true) logical type annotation after promoting a column from int to long.

The vectorized reader's LogicalTypeVisitor now allocates vectors based on the Parquet physical type instead of deriving them from the (potentially promoted) Iceberg schema type.

Root Cause:
In VectorizedArrowReader.allocateFieldVector(), the Arrow field was created from the Iceberg schema type (which reflects the promoted LongType), producing a BigIntVector. The LogicalTypeVisitor then cast this vector to IntVector based on the Parquet file's INT(32) logical type, causing the mismatch.

The non-vectorized reader (BaseParquetReaders) already handles this correctly by checking the expected Iceberg type and using IntAsLongReader for promotion. The vectorized reader relies on the accessor layer for widening (IntAccessor.getLong() widens int to long), so the fix ensures the vector matches the physical data layout.

Tests:

  • testIntToLongPromotionWithLogicalType: verifies reading after promotion when file has INT(32, true) annotation (the reported crash)
  • testIntToLongPromotionWithoutLogicalType: verifies reading after promotion when file has bare INT32

Fixes #16341

@github-actions github-actions Bot added the arrow label May 14, 2026
Copy link
Copy Markdown
Contributor

@CTTY CTTY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! just one minor comment

Comment thread arrow/src/test/java/org/apache/iceberg/arrow/vectorized/TestArrowReader.java Outdated
Comment thread arrow/src/test/java/org/apache/iceberg/arrow/vectorized/TestArrowReader.java Outdated
Comment thread arrow/src/test/java/org/apache/iceberg/arrow/vectorized/TestArrowReader.java Outdated
Comment thread arrow/src/test/java/org/apache/iceberg/arrow/vectorized/TestArrowReader.java Outdated
…motion with INT logical type

Fix ClassCastException: BigIntVector cannot be cast to IntVector when reading
Parquet files with INT(32, true) logical type annotation after promoting a
column from int to long.

The vectorized reader's LogicalTypeVisitor now allocates vectors based on the
Parquet physical type instead of deriving them from the (potentially promoted)
Iceberg schema type.

Root Cause:
In VectorizedArrowReader.allocateFieldVector(), the Arrow field was created
from the Iceberg schema type (which reflects the promoted LongType), producing
a BigIntVector. The LogicalTypeVisitor then cast this vector to IntVector based
on the Parquet file's INT(32) logical type, causing the mismatch.

The non-vectorized reader (BaseParquetReaders) already handles this correctly
by checking the expected Iceberg type and using IntAsLongReader for promotion.
The vectorized reader relies on the accessor layer for widening
(IntAccessor.getLong() widens int to long), so the fix ensures the vector
matches the physical data layout.

Tests:
- testIntToLongPromotionWithLogicalType: verifies reading after promotion when
  file has INT(32, true) annotation (the reported crash)
- testIntToLongPromotionWithoutLogicalType: verifies reading after promotion
  when file has bare INT32
testIntToLongPromotionWithLargeValuesAndReuseContainers and address some
minor comments
Copy link
Copy Markdown
Contributor

@CTTY CTTY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@CTTY CTTY merged commit e3a4c64 into apache:main May 20, 2026
44 checks passed
@xndai xndai deleted the iceberg-16341 branch May 20, 2026 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vectorized reader throws ClassCastException on int-to-long promotion when Parquet file has INT(32) logical type annotation

3 participants