-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[FLINK-25994] Implement FileStoreExpire #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
private final ManifestList manifestList; | ||
private final FileStoreScan scan; | ||
|
||
public FileStoreExpireImpl( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking that we can provide three options:
- snapshot-expire.retention: retention time for snapshot changes
- snapshot-expire.num-retained
- snapshot-expire.check-interval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expire
is an operation called by users (for example by connector sinks). snapshot-expire.check-interval
should be checked by users, not by the operation itself.
As for retention time I'll implement this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you modify the caller in a separate PR too? StoreGlobalCommitter
?
* Deleted sst files in current snapshot. They can be safely deleted from file system if | ||
* this snapshot expires. | ||
*/ | ||
List<ManifestEntry> deletedFiles(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This deletedFiles
is misunderstanding.
I prefer to add a new method planChangeFiles()
, it returns all files. The caller can compute physical deleted files.
return; | ||
} | ||
|
||
// binary search for the last snapshot to expire |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not cost effective to use dichotomous lookups (assuming there are enough surviving snapshots).
Just start traversing from the beginning.
Set<Path> sstInUse = new HashSet<>(); | ||
FileStorePathFactory.SstPathFactoryCache sstPathFactoryCache = | ||
new FileStorePathFactory.SstPathFactoryCache(pathFactory); | ||
for (ManifestEntry entry : scan.withSnapshot(nextSnapshot.id()).plan().files()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see large number of repeat reads. We can add a TODO to add manifest cache in scan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can optimize this by expiring all snapshots at once so no repeated reads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
parent 2a62022 author Alibaba-HZY <daowu.hzy@cainiao.com> 1681226598 +0800 committer Alibaba-HZY <daowu.hzy@cainiao.com> 1681226783 +0800 # 这是一个 13 个提交的组合。 parent 2a62022 author Alibaba-HZY <daowu.hzy@cainiao.com> 1681226598 +0800 committer Alibaba-HZY <daowu.hzy@cainiao.com> 1681226730 +0800 # 这是一个 8 个提交的组合。tree 46928987599714b071489a7b6d4957049e6ded7a parent 2a62022 author Alibaba-HZY <daowu.hzy@cainiao.com> 1681226598 +0800 committer Alibaba-HZY <daowu.hzy@cainiao.com> 1681226598 +0800 [core] Add streaming read from option (apache#778) # 这是提交说明 #8: [core] add test (apache#778) # 这是提交说明 apache#11: Update paimon-core/src/main/java/org/apache/paimon/CoreOptions.java Co-authored-by: Nicholas Jiang <programgeek@163.com> # 这是提交说明 apache#12: Update paimon-core/src/main/java/org/apache/paimon/CoreOptions.java Co-authored-by: Nicholas Jiang <programgeek@163.com> # 这是提交说明 apache#13: Update paimon-core/src/main/java/org/apache/paimon/CoreOptions.java Co-authored-by: Nicholas Jiang <programgeek@163.com> # 这是提交说明 apache#15: Update paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/AbstractFlinkTableFactory.java Co-authored-by: Nicholas Jiang <programgeek@163.com> # 这是提交说明 apache#16: Update paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/AbstractFlinkTableFactory.java Co-authored-by: Nicholas Jiang <programgeek@163.com> # 这是提交说明 apache#17: Update paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/AbstractFlinkTableFactory.java Co-authored-by: Nicholas Jiang <programgeek@163.com> # 这是提交说明 apache#18: [core] add test (apache#778) # 这是提交说明 apache#19: merger sdda 这是一个 3 个提交的组合。 [core] commit1 (apache#778) [core] commit2 (apache#778) [core] commit3 (apache#778) [core] do commit 1(apache#778) [core] do commit 2(apache#778) # 这是提交说明 apache#20: sdda 这是一个 3 个提交的组合。 [core] commit1 (apache#778) [core] commit2 (apache#778) [core] commit3 (apache#778)
Currently FileStoreExpire does not have an implementation. We need an implementation to clean up old snapshots and related files.