Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update metadata cleanup to delete only expired files non needed to time travel for non expired versions #627

Conversation

JassAbidi
Copy link
Contributor

This PR updates the metadata cleanup behavior to delete only expired files non needed to time travel to non expired version.
Current behavior:
A Delta log is deleted when it's older than the retention period and there's a checkpoint file after it. The problem with such logic is that it can lead to situations where time travel is not available for non expired versions. this issue shows an example of that.
Proposed behavior:
when an expired file is needed to time travel for a non expired version, we keep it in the transaction log. We only cleanup expired logs when they are no longer needed to time travel to any non expired version.

@scottsand-db
Copy link
Collaborator

scottsand-db commented Dec 7, 2021

Hi @JassAbidi. Thanks for this PR. As of now we do not consider this issue to be a bug, though we may consider in the future revisiting this and implementing something similar to your solution here.

In Delta Lake 1.1 docs here, we've added a note explaining that this can happen and that increasing delta.logRetentionDuration can help avoid these situations.

Going to close this PR for now, and if we revisit this issue later we can take another look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged This issue has been read and acknowledged by Delta admins
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants