Skip to content

Implement snapshot expiration #401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

arnaudbriche
Copy link
Contributor

@arnaudbriche arnaudbriche commented Apr 22, 2025

Hi @zeroshade,

As discussed on a previous PR, I'm trying to implement snapshots expiration now.
This is not complete, but I wanted to discuss it with you before I further work on it.

If the design looks ok to you, here are a few technical questions regarding snapshot expiration:

  • currently, the snapshots field of the produced metadata file looks ok (expired snapshots are not there), but the snapshot-log field still contains entries for every operation since table creation. Is is expected ? I don't find an appropriate answer in the spec.
  • should we handle expired data and metadata files deletion here ?
  • the produced metadata file does not contains any new snapshot (not snapshot is created for the expire snapshots operation). I cannot find out what the spec says about it. Most probably a new snapshot must be created.

References

Comment on lines +198 to +200
}

for refName, ref := range t.meta.refs {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we combine these loops and just put the continue after refsToDelete[refName] = struct{}{}?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to keep it as explained here: https://iceberg.apache.org/spec/#snapshot-retention-policy
I can change that if you think it's better.

Copy link
Member

@zeroshade zeroshade May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything in that description which prevents combining these loops into a single loop, so let's change it to simplify the logic and avoid the extra trip through all of the refs.

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! other than a few comments, the only thing missing here is some tests. Can you add some tests please?

@arnaudbriche
Copy link
Contributor Author

arnaudbriche commented May 2, 2025

Hi @zeroshade

Sure, I'll add tests.
Regarding these 3 points, can you provide guidance/opinions ?

  • currently, the snapshots field of the produced metadata file looks ok (expired snapshots are not there), but the snapshot-log field still contains entries for every operation since table creation. Is is expected ? I don't find an appropriate answer in the spec.
  • should we handle expired data and metadata files deletion here ?
  • the produced metadata file does not contains any new snapshot (not snapshot is created for the expire snapshots operation). I cannot find out what the spec says about it. Most probably a new snapshot must be created.

If we decide to handle file removal as part of ExpireSnapshots, we would have to postpone the actual deletion after the commit is successful. I don't see any easy way to do it.

@zeroshade
Copy link
Member

@kevinjqliu @Fokko do either of you know the answers to @arnaudbriche's questions here you both have more experience with the spec side of things than I do as far as what makes sense for those situations.

@arnaudbriche
Copy link
Contributor Author

arnaudbriche commented May 2, 2025

I just added a test and implemented snapshot-log pruning because it's growing unbounded otherwise.

@arnaudbriche
Copy link
Contributor Author

arnaudbriche commented May 2, 2025

Regarding file deletion, maybe we can add a new method to the Update interface where we can put some post-commit code.
What do you think @zeroshade ?

@zeroshade
Copy link
Member

Regarding file deletion, maybe we can add a new method to the Update interface where we can put some post-commit code.

It's a good question. If we have the table, then we have the file system interface so that we can remove the files. It might make sense for us to simply have an option to tell it whether or not to delete the files (rather than allow generic post-commit code).

@kevinjqliu
Copy link
Contributor

kevinjqliu commented May 9, 2025

currently, the snapshots field of the produced metadata file looks ok (expired snapshots are not there), but the snapshot-log field still contains entries for every operation since table creation. Is is expected ? I don't find an appropriate answer in the spec.

the snapshot-log should be cleaned up as part of expiration. From https://iceberg.apache.org/spec/#table-metadata-fields, in the snapshot-log entry of the table.
"""
A list (optional) of timestamp and snapshot ID pairs that encodes changes to the current snapshot for the table. Each time the current-snapshot-id is changed, a new entry should be added with the last-updated-ms and the new current-snapshot-id. When snapshots are expired from the list of valid snapshots, all entries before a snapshot that has expired should be removed.
"""

should we handle expired data and metadata files deletion here ?

From the ExpireSnapshots javadoc, it looks you can optionally remove the data and metadata files, but its set to true by default

the produced metadata file does not contains any new snapshot (not snapshot is created for the expire snapshots operation). I cannot find out what the spec says about it. Most probably a new snapshot must be created.

Also from the java docs for ExpireSnapshots,
"""
This API accumulates snapshot deletions and commits the new list to the table. This API does not allow deleting the current snapshot.

When committing, these changes will be applied to the latest table metadata. Commit conflicts will be resolved by applying the changes to the new latest metadata and reattempting the commit.
"""

A new snapshot should be created reflecting the newest table update, i.e. old snapshots removed

Hope this helps! Happy to help point to any other questions :)

@arnaudbriche arnaudbriche changed the title Basic version of expireSnapshots operation. Implement snapshot expiration May 14, 2025
@arnaudbriche
Copy link
Contributor Author

arnaudbriche commented May 14, 2025

A new snapshot should be created reflecting the newest table update, i.e. old snapshots removed

Thx for all the answers @kevinjqliu.
I implemented unreachable files deletion and snapshot log pruning as specified in your reply.

I have a hard time figuring out how a new snapshot could be created, I we technically don't write , update or delete any data from the table here.
Manifest would be exactly the same as current snapshot's one.

@arnaudbriche arnaudbriche requested a review from zeroshade May 18, 2025 18:55
Comment on lines -190 to +197
b.schemaList = metadata.Schemas()
b.schemaList = slices.Clone(metadata.Schemas())
b.currentSchemaID = metadata.CurrentSchema().ID
b.specs = metadata.PartitionSpecs()
b.specs = slices.Clone(metadata.PartitionSpecs())
b.defaultSpecID = metadata.DefaultPartitionSpec()
b.lastPartitionID = metadata.LastPartitionSpecID()
b.props = metadata.Properties()
b.snapshotList = metadata.Snapshots()
b.sortOrderList = metadata.SortOrders()
b.props = maps.Clone(metadata.Properties())
b.snapshotList = slices.Clone(metadata.Snapshots())
b.sortOrderList = slices.Clone(metadata.SortOrders())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a way we could lazily do this cloning? Avoid performing the copies until we actually need to?

Comment on lines +193 to +197
refAge := time.Now().UnixMilli() - snap.TimestampMs
if refAge > *maxRefAgeMs {
updates = append(updates, NewRemoveSnapshotRefUpdate(refName))
refsToDelete[refName] = struct{}{}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract time.Now().UnixMilli() from the loop instead of calling it repeatedly. Stick it in the var block at the top of this function.

Comment on lines +400 to +404
for _, snap := range preTable.Metadata().Snapshots() {
if slices.Contains(u.SnapshotIDs, snap.SnapshotID) {
filesToDelete[snap.ManifestList] = struct{}{}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have our list of snapshotIDs, so why do we need this loop?

Comment on lines +412 to +428
mans, err := snap.Manifests(preTable.fs)
if err != nil {
return err
}

for _, man := range mans {
filesToDelete[man.FilePath()] = struct{}{}

entries, err := man.FetchEntries(preTable.fs, false)
if err != nil {
return err
}

for _, entry := range entries {
filesToDelete[entry.DataFile().FilePath()] = struct{}{}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snapshot objects have a dataFiles method which will return an iterator over the files, maybe we can leverage this to encapsulate this logic more rather than duplicate it?

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is shaping up nicely! Just a few questions left.


package internal

func Coalesce[T any](vals ...*T) *T {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: better to move it to utils.go, no need to span a new file just for this function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants