Core: Fix SerializableTable.sortOrders() throwing on historical sort orders with dropped fields (#16519)#16521
Core: Fix SerializableTable.sortOrders() throwing on historical sort orders with dropped fields (#16519)#16521MonkeyCanCode wants to merge 2 commits into
Conversation
|
@aihuaxu mind take a look? |
|
Thanks for the quick review @anuragmantri . Mind take another look? |
|
@RussellSpitzer mind take a look if this is okay to proceed? |
|
I'm curious what the expected behavior is with using this new feature, if a field is dropped or sort order spec is removed? |
Based on my understanding, the #15150 feature uses only the current default sort order at the write time. The historical orders only need to be loadable. This fix restore load-ability when a historical sort order's filed is dropped without the needed of re-enabling use of that dropped filed. Then regarding sort order been removed, I don't think there is DDL to actually remove it as setting |
bryanck
left a comment
There was a problem hiding this comment.
LGTM, thanks for the fix! We should apply this in 1.11.1 also.
Background
After upgraded Apache Iceberg runtime from 1.10.1 to 1.11.0, the write from spark on my workload started to fail when the table has a historical sort order that references a column that has since been dropped from the schema. This is introduced by #15150. This is also reported by @aihuaxu via #16519
Change
SerializableTable.sortOrders()strict binds every sort order against the current schema, which fails when a historical sort order references a field dropped by schema evolution. This PR fixes it to only strict bind the default sort order.Test Plan
I added the new test case added by @aihuaxu from #16519 as well as the local reproducible that I wrote to validate the issue is resolved by this PR.