Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BP-47 (task5): Garbage collection support direct IO entrylogger #3256

Conversation

hangc0276
Copy link
Contributor

Motivation

Task 5 of BP-47, Garbage collection support direct IO entrylogger.

GC support requires that the entrylogger provides a way to retrieve all entrylogs which have been completely flushed to disk. Previously this was done by returning the least unflushed log id. However, this is problematic as it doesn't support the log ids wrapping around. It also means that GC has to start checking for log id existence from zero every time it boots.

Changes

This change replaces getLeastUnflushedLogId() with getFlushedLogIds(), to give the entrylogger full control of which logs should be considered for GC.

It also changes the CompactableLedgerStorage interface, removing getEntryLogger() and adding injection of the entrylogger to the GarbageCollectionThread. This makes testing easier.

Master Issue: #2943 , Subtask of #2932

Others

The commit is made by @ivankelly . This is the sub task of #2932 , which pushed out by @mauricebarnum. However, this PR contains too many changes and the number of code lines reaches 8K+, and it is hard to review. According to your suggestion #2932 (comment), and after communicate with @mauricebarnum, we are planing to divide the PR into 6 PRs, Please refer to #2943 (comment).

However, @mauricebarnum doesn't have enough time to deal with those issues, and we desperately need this feature. After communicated with @mauricebarnum and @merlimat , I help to work on divide the PRs and push them out. We are hoping to contain this feature in BookKeeper 4.16.0

Ivan Kelly and others added 6 commits May 3, 2022 00:26
GC support requires that the entrylogger provides a way to retrieve
all entrylogs which have been completely flushed to disk. Previously
this was done by returning the least unflushed log id. However, this
is problematic as it doesn't support the log ids wrapping around. It
also means that GC has to start checking for log id existence from
zero every time it boots.

This change replaces getLeastUnflushedLogId() with getFlushedLogIds(),
to give the entrylogger full control of which logs should be
considered for GC.

It also changes the CompactableLedgerStorage interface, removing
getEntryLogger() and adding injection of the entrylogger to the
GarbageCollectionThread. This makes testing easier.

(cherry picked from commit 853dd84)
@hangc0276
Copy link
Contributor Author

rerun failure checks

1 similar comment
@hangc0276
Copy link
Contributor Author

rerun failure checks

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eolivelli
Copy link
Contributor

eolivelli commented May 4, 2022

@dlg99 @merlimat @reddycharan @jvrao PTAL

@hangc0276
Copy link
Contributor Author

rerun failure checks

@merlimat merlimat added this to the 4.16.0 milestone May 4, 2022
@hangc0276
Copy link
Contributor Author

rerun failure checks

@eolivelli eolivelli merged commit b5a483a into apache:master May 5, 2022
@hangc0276 hangc0276 self-assigned this Jul 25, 2022
@zymap zymap mentioned this pull request Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants