-
Notifications
You must be signed in to change notification settings - Fork 310
Added ExpireSnapshots Feature #1880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…h a new Expired Snapshot class. updated tests.
ValueError: Cannot expire snapshot IDs {3051729675574597004} as they are currently referenced by table refs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ForeverAngry for raising this PR! I'll go into details tomorrow morning (UTC+2 here). Could you resolve the conflicts to get the CI running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied intial suggestions so that CI can run on the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebuilt the poetry lock file.
Moved expiration-related methods from `ExpireSnapshots` to `ManageSnapshots` for improved organization and clarity. Updated corresponding pytest tests to reflect these changes.
Re-ran the `poetry run pre-commit run --all-files` command on the project.
Re-ran the `poetry run pre-commit run --all-files` command on the project.
After looking at the way the action here was implemented, I refined the changes. Let me know if these make sense :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevinjqliu thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebased from main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a merge conflict somewhere, there are a lot of code here from other PRs. Perhaps you need to update your fork's main branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ForeverAngry Could you see if you can get the linters/tests passing? Thanks!
`make lint` now passes
I was able to get the linting figured out. I dont have docker on this machine, so im going to give the integration tests a try in codespaces to see if i can get away doing the checks there. |
@ForeverAngry Sorry for the late reply, it looks like that there is a test failing now 👀 |
I think this commit should fix the test error, i also added additional tests. All passed - and appear to be in-good-order. 🤞 this time is the charm. |
Updated test_expire_snapshots.py to use model_copy(update={...}) for modifying the refs attribute of the table metadata, ensuring compatibility with Pydantic's frozen models. Fixed all snapshot expiration tests to avoid direct assignment to frozen attributes. All tests now pass, confirming correct behavior for protected and unprotected snapshot expiration.
this would be great - whats the status of this? |
I see that in #1958, for orphaned file removal, we decided to have a |
Well, right now this pr doesn't do anything with the newly orphaned files. It just handles the metadata operation. |
Yes, I agree with you there. Before doing the 0.10.0 release, we need to ensure we align on this and make proper docs. I have a slight preference towards |
I'm happy to follow the '.maintenance' api design if there is a strong preference toward it. |
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Added ExpireSnapshots.expire_snapshots_older_than and expire_snapshots_by_ids methods to support expiring snapshots by timestamp and by multiple IDs. Ensured that protected snapshots (branch/tag heads) cannot be expired, both at the API and commit stages. Updated the expiration logic to always skip protected snapshots, even if they are accidentally included. Added and fixed tests to verify that protected snapshots are never expired and that expiration works as expected for unprotected snapshots. Improved test setup to accurately reflect post-commit metadata and to assert correct expiration behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ForeverAngry for working on this, and I think it is ready to go 👍
Great! I think @kevinjqliu is still listed as needing approval. @kevinjqliu can you put your stamp on this as well? |
@ForeverAngry I think we can move this one forward. Before the release, we need to follow up on two things:
|
Thanks again @ForeverAngry for working on this 🚀 |
Thank you, for being such a supportive and inspiring member to work with! |
I'll work on this, this weekend! |
@ForeverAngry Appreciate that, thanks! 🙌 |
Summary
This PR Closes issue #516 by implementing support for the
ExpireSnapshot
table metadata action.Rationale
The
ExpireSnapshot
action is a core part of Iceberg’s table maintenance APIs. Adding support for this action in PyIceberg helps ensure feature parity with other language implementations (e.g., Java) and supports users who want to programmatically manage snapshot retention using PyIceberg’s public API.Testing
User-facing changes
ExpireSnapshot
.