-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update metadata cleanup to delete only expired files non needed to time travel for non expired versions #627
Conversation
[SC-24892] Add typesafe bintray repo for sbt-mima-plugin
update fork
catch up to master
update master
update with master
update fork branch
update fork
update fork
…versions. - updates unit tests
Hi @JassAbidi. Thanks for this PR. As of now we do not consider this issue to be a bug, though we may consider in the future revisiting this and implementing something similar to your solution here. In Delta Lake 1.1 docs here, we've added a note explaining that this can happen and that increasing Going to close this PR for now, and if we revisit this issue later we can take another look. |
This PR updates the metadata cleanup behavior to delete only expired files non needed to time travel to non expired version.
Current behavior:
A Delta log is deleted when it's older than the retention period and there's a checkpoint file after it. The problem with such logic is that it can lead to situations where time travel is not available for non expired versions. this issue shows an example of that.
Proposed behavior:
when an expired file is needed to time travel for a non expired version, we keep it in the transaction log. We only cleanup expired logs when they are no longer needed to time travel to any non expired version.