Skip to content

feat(table): reject AddDataFiles on v3 when first_row_id is missing (#1000)#1101

Merged
laskoviymishka merged 1 commit into
apache:mainfrom
tanmayrauth:feat/1000-reject-add-data-files-missing-first-row-id
May 21, 2026
Merged

feat(table): reject AddDataFiles on v3 when first_row_id is missing (#1000)#1101
laskoviymishka merged 1 commit into
apache:mainfrom
tanmayrauth:feat/1000-reject-add-data-files-missing-first-row-id

Conversation

@tanmayrauth
Copy link
Copy Markdown
Contributor

Closes #1000

When users call AddDataFiles (or ReplaceDataFilesWithDataFiles / ReplaceFiles) with externally-written parquet files on a v3 table, reject the operation if first_row_id is not set. This mirrors
pyiceberg's behavior — the library cannot fabricate row IDs retroactively, so callers must supply them explicitly via DataFileBuilder.FirstRowID().

Changes

  • Added validation in validateDataFilesToAdd to require first_row_id on format version >= 3
  • Error message directs users to DataFileBuilder.FirstRowID()
  • v2 tables are unaffected

@tanmayrauth tanmayrauth requested a review from zeroshade as a code owner May 20, 2026 09:45
@tanmayrauth
Copy link
Copy Markdown
Contributor Author

@laskoviymishka can you please review it?

Copy link
Copy Markdown
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and narrowly scoped. I'll wait for @laskoviymishka to take a look before merging as I haven't dug into the pyiceberg behavior on this to make sure we're matching

@tanmayrauth
Copy link
Copy Markdown
Contributor Author

@laskoviymishka could you please review it?

Copy link
Copy Markdown
Contributor

@laskoviymishka laskoviymishka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛳️ 👍

@laskoviymishka laskoviymishka merged commit 1401cb6 into apache:main May 21, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(table): reject AddDataFiles on v3 when first_row_id is missing

3 participants