Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-25994] Implement FileStoreExpire #17

Merged
merged 2 commits into from
Feb 16, 2022
Merged

[FLINK-25994] Implement FileStoreExpire #17

merged 2 commits into from
Feb 16, 2022

Conversation

tsreaper
Copy link
Contributor

@tsreaper tsreaper commented Feb 8, 2022

Currently FileStoreExpire does not have an implementation. We need an implementation to clean up old snapshots and related files.

private final ManifestList manifestList;
private final FileStoreScan scan;

public FileStoreExpireImpl(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking that we can provide three options:

  • snapshot-expire.retention: retention time for snapshot changes
  • snapshot-expire.num-retained
  • snapshot-expire.check-interval

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expire is an operation called by users (for example by connector sinks). snapshot-expire.check-interval should be checked by users, not by the operation itself.

As for retention time I'll implement this.

Copy link
Contributor

@JingsongLi JingsongLi Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you modify the caller in a separate PR too? StoreGlobalCommitter?

* Deleted sst files in current snapshot. They can be safely deleted from file system if
* this snapshot expires.
*/
List<ManifestEntry> deletedFiles();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deletedFiles is misunderstanding.
I prefer to add a new method planChangeFiles(), it returns all files. The caller can compute physical deleted files.

return;
}

// binary search for the last snapshot to expire
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not cost effective to use dichotomous lookups (assuming there are enough surviving snapshots).
Just start traversing from the beginning.

Set<Path> sstInUse = new HashSet<>();
FileStorePathFactory.SstPathFactoryCache sstPathFactoryCache =
new FileStorePathFactory.SstPathFactoryCache(pathFactory);
for (ManifestEntry entry : scan.withSnapshot(nextSnapshot.id()).plan().files()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see large number of repeat reads. We can add a TODO to add manifest cache in scan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can optimize this by expiring all snapshots at once so no repeated reads.

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit 6ec208b into apache:master Feb 16, 2022
Alibaba-HZY pushed a commit to Alibaba-HZY/incubator-paimon that referenced this pull request Apr 11, 2023
parent 2a62022
author Alibaba-HZY <daowu.hzy@cainiao.com> 1681226598 +0800
committer Alibaba-HZY <daowu.hzy@cainiao.com> 1681226783 +0800

# 这是一个 13 个提交的组合。
parent 2a62022
author Alibaba-HZY <daowu.hzy@cainiao.com> 1681226598 +0800
committer Alibaba-HZY <daowu.hzy@cainiao.com> 1681226730 +0800

# 这是一个 8 个提交的组合。tree 46928987599714b071489a7b6d4957049e6ded7a
parent 2a62022
author Alibaba-HZY <daowu.hzy@cainiao.com> 1681226598 +0800
committer Alibaba-HZY <daowu.hzy@cainiao.com> 1681226598 +0800

[core] Add streaming read from option (apache#778)

# 这是提交说明 apache#8:

[core] add test (apache#778)

# 这是提交说明 apache#11:

Update paimon-core/src/main/java/org/apache/paimon/CoreOptions.java

Co-authored-by: Nicholas Jiang <programgeek@163.com>
# 这是提交说明 apache#12:

Update paimon-core/src/main/java/org/apache/paimon/CoreOptions.java

Co-authored-by: Nicholas Jiang <programgeek@163.com>
# 这是提交说明 apache#13:

Update paimon-core/src/main/java/org/apache/paimon/CoreOptions.java

Co-authored-by: Nicholas Jiang <programgeek@163.com>
# 这是提交说明 apache#15:

Update paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/AbstractFlinkTableFactory.java

Co-authored-by: Nicholas Jiang <programgeek@163.com>
# 这是提交说明 apache#16:

Update paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/AbstractFlinkTableFactory.java

Co-authored-by: Nicholas Jiang <programgeek@163.com>
# 这是提交说明 apache#17:

Update paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/AbstractFlinkTableFactory.java

Co-authored-by: Nicholas Jiang <programgeek@163.com>
# 这是提交说明 apache#18:

[core] add test (apache#778)

# 这是提交说明 apache#19:

merger

sdda 这是一个 3 个提交的组合。

[core]  commit1
 (apache#778)

[core]  commit2
 (apache#778)

[core]  commit3
 (apache#778)

[core]  do commit 1(apache#778)

[core]  do commit 2(apache#778)

# 这是提交说明 apache#20:

sdda 这是一个 3 个提交的组合。

[core]  commit1
 (apache#778)

[core]  commit2
 (apache#778)

[core]  commit3
 (apache#778)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants