fix(table): validate snapshot timestamp drift on add snapshot#3062
fix(table): validate snapshot timestamp drift on add snapshot#3062mrutunjay-kinagi wants to merge 5 commits intoapache:mainfrom
Conversation
Fokko
left a comment
There was a problem hiding this comment.
@mrutunjay-kinagi Can you explain more about the use-case? I'm hesitant with these kind of features since it isn't implemented in other OSS implementations AFAIK.
| elif ( | ||
| base_metadata.snapshot_log and update.snapshot.timestamp_ms - base_metadata.snapshot_log[-1].timestamp_ms < -ONE_MINUTE_MS | ||
| ): | ||
| raise ValueError( |
There was a problem hiding this comment.
Maybe CommitFailedException?
There was a problem hiding this comment.
Good point. I kept ValueError for consistency with the existing validation branches in this same method (sequence/row-id checks all currently raise ValueError). If you prefer, I can switch this validation path to CommitFailedException in a follow-up commit.
|
@Fokko Thanks for the review. The use case here is handling malformed/clock-skewed commit metadata safely and aligning behavior with Java/Rust, which already apply a 1-minute backward-drift tolerance for snapshot timestamps.\n\nWithout this guard, a client can add a snapshot timestamp that is significantly older than current table metadata ( / latest ), which can make timeline semantics inconsistent for timestamp-based operations.\n\nThis change keeps the same 60s tolerance and only rejects larger backward drift; small skew remains allowed. |
|
Clarification on my previous comment (formatting issue): The guard is intended to prevent adding snapshots that are significantly older than the table timeline, specifically older than This follows the same tolerance approach used in Java/Rust and keeps small clock skew allowed while rejecting larger backward drift. |
could you point to the java/rust references? |
|
The existing validation branches inside are raising , so for symmetry I left this guard as as well. If you’d prefer switching to later that’s easy to follow up on but I’d keep the current behavior for consistency. |
|
Reposting cleanly: the guard maintains the same exception type as the surrounding validation branches, so I kept it as |
|
@kevinjqliu Here are the references we aligned with:
|
|
Hey @mrutunjay-kinagi Looking at the code: The first loop will ensure that the snapshot log entries are sorted when parsing the JSON. The second checks that the last snapshot entry is not ahead of the lastUpdatedMillis. If we want to add this to PyIceberg, I think a better place is to do this when constructing the Metadata: https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/metadata.py Similar to what Java/Rust are doing. Thoughts? |
|
Thanks, good suggestion. I pushed a follow-up commit that adds metadata-level timestamp validation in
I also added parser-level tests in I kept the existing |
Rationale for this change
Align snapshot timestamp validation with other Iceberg implementations by rejecting snapshots that drift backwards by more than one minute.
The new guard in
AddSnapshotUpdatechecks:snapshot_logentrylast_updated_msBoth checks allow up to 60 seconds of clock skew tolerance.
Are these changes tested?
Yes.
test_update_metadata_add_snapshot_rejects_old_timestamp_vs_snapshot_log.test_update_metadata_add_snapshot_rejects_old_timestamp_vs_last_updated.Are there any user-facing changes?
No API changes.
ValueErrorwhen timestamp drift exceeds tolerance.