Skip to content

Conversation

@chenzl25
Copy link
Contributor

@chenzl25 chenzl25 commented Oct 16, 2025

Which issue does this PR close?

What changes are included in this PR?

  • Updated the logic in ArrowReader::get_arrow_projection_mask to allow missing columns in the Parquet file, skipping them instead of returning an error. Missing columns are now gracefully skipped during projection, and the RecordBatchTransformer adds them later with NULL/default values

Are these changes tested?

Testing schema evolution:

    • Added an async test test_schema_evolution_add_column to verify that reading an old Parquet file (with only column 'a') using a newer schema (with columns 'a' and 'b') works as expected. The test checks that missing columns are filled with NULLs and the original data is preserved.

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chenzl25 for this fix!

@liurenjie1024 liurenjie1024 merged commit fa07ec6 into apache:main Oct 17, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: failed to read iceberg table after adding new columns

2 participants