Skip to content

Conversation

@KevinJiao
Copy link
Contributor

Closes #2672

Rationale for this change

When performing column projection on partitioned tables with schema evolution, PyIceberg incorrectly uses the projected schema (containing only selected columns) instead
of the full table schema when building partition types in _get_column_projection_values(). This causes ValueError: Could not find field with id: X when:

  1. Reading from partitioned Iceberg tables
  2. Using column projection (selecting specific columns, not SELECT *)
  3. Selected columns do NOT include the partition field(s)
  4. The table has undergone schema evolution (fields added/removed after initial creation)
  5. Reading files that are missing some of the selected columns (written before schema evolution)

The root cause is where partition_spec.partition_type(projected_schema) fails because the projected schema may be missing fields that
exist in the partition specification.

The fix passes the full table schema from ArrowScan._table_metadata.schema() through _task_to_record_batches() to _get_column_projection_values(), ensuring all fields are available when building partition accessors.

Are these changes tested?

Yes. Added a test test_partition_column_projection_with_schema_evolution that:

  • Creates a partitioned table with initial schema
  • Writes data with the initial schema
  • Evolves the schema by adding a new column
  • Writes data with the evolved schema
  • Performs column projection that excludes the partition field

Are there any user-facing changes?

No. Only internal helpers are changed

@KevinJiao KevinJiao force-pushed the fix-partition-column-projection-schema-evolution branch from f0f9fa6 to 5508ed2 Compare November 3, 2025 21:21
Use table schema instead of projected schema when building partition type
to avoid 'Could not find field with id' errors during column projection
on partitioned tables with schema evolution.
@KevinJiao KevinJiao force-pushed the fix-partition-column-projection-schema-evolution branch from 5508ed2 to f658044 Compare November 3, 2025 21:31
Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks for the fix!

@kevinjqliu kevinjqliu merged commit 2d549a9 into apache:main Nov 3, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ValueError when reading partitioned tables with column projection after schema evolution

2 participants