Describe the bug
Auron already resolves Iceberg columns by field-id in the regular native Iceberg scan path, which makes top-level schema evolution such as column rename and drop-then-add safe for Parquet files.
The newer insert-only Iceberg changelog scan path reuses the native file reader, but it does not pass the same Iceberg field-id mapping into the native scan plan yet.
As a result, native Parquet schema matching falls back to column names on the changelog path.
This can return wrong data after schema evolution:
- after
RENAME COLUMN, pre-rename changelog files may read as null;
- after
DROP + ADD of the same name, the new column may read data from the old dropped column.
To Reproduce
create table local.db.t_changelog_rename (id int, old_name string)
using iceberg
tblproperties ('format-version' = '2');
insert into local.db.t_changelog_rename values (0, 'initial');
-- record start snapshot
insert into local.db.t_changelog_rename values (1, 'before');
alter table local.db.t_changelog_rename rename column old_name to new_name;
insert into local.db.t_changelog_rename values (2, 'after');
-- record end snapshot
CALL local.system.create_changelog_view(
table => 'db.t_changelog_rename',
changelog_view => 't_changelog_rename_changes',
options => map(
'start-snapshot-id', '<start_snapshot_id>',
'end-snapshot-id', '<end_snapshot_id>'
)
);
select id, new_name, _change_type, _change_ordinal, _commit_snapshot_id
from t_changelog_rename_changes
order by id;
The native changelog scan may return null for the pre-rename row.
A similar issue exists for drop-then-add with the same column name:
create table local.db.t_changelog_drop_add (id int, value string)
using iceberg
tblproperties ('format-version' = '2');
insert into local.db.t_changelog_drop_add values (0, 'initial');
-- record start snapshot
insert into local.db.t_changelog_drop_add values (1, 'old');
alter table local.db.t_changelog_drop_add drop column value;
alter table local.db.t_changelog_drop_add add column value string;
insert into local.db.t_changelog_drop_add values (2, 'new');
-- record end snapshot
The old value and the re-added value have different Iceberg field IDs, so the old row should not be read as the new column.
Expected behavior
Iceberg changelog scan should resolve data columns by Iceberg field-id, matching Spark/Iceberg results.
For renamed columns, old files should map to the renamed column by field-id.
For drop-then-add of the same name, old dropped column data should not be read as the newly added column.
Screenshots
Additional context
Describe the bug
Auron already resolves Iceberg columns by field-id in the regular native Iceberg scan path, which makes top-level schema evolution such as column rename and drop-then-add safe for Parquet files.
The newer insert-only Iceberg changelog scan path reuses the native file reader, but it does not pass the same Iceberg field-id mapping into the native scan plan yet.
As a result, native Parquet schema matching falls back to column names on the changelog path.
This can return wrong data after schema evolution:
RENAME COLUMN, pre-rename changelog files may read as null;DROP+ADDof the same name, the new column may read data from the old dropped column.To Reproduce
The native changelog scan may return null for the pre-rename row.
A similar issue exists for drop-then-add with the same column name:
The old value and the re-added value have different Iceberg field IDs, so the old row should not be read as the new column.
Expected behavior
Iceberg changelog scan should resolve data columns by Iceberg field-id, matching Spark/Iceberg results.
For renamed columns, old files should map to the renamed column by field-id.
For drop-then-add of the same name, old dropped column data should not be read as the newly added column.
Screenshots
Additional context