Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add throttle for rebuild entryMetadataMap #2963

Merged

Conversation

hangc0276
Copy link
Contributor

Motivation

When a bookie restart, the garbageCollectorThread will rebuild entryMetadataMap from all the entry log files in ledger directory. For normal case, it will extract the EntryLogMetadata from the index in entry log file. However, if there's no index, then fallback to scanning the entry log file.

In user's production environment, the log files without index occupied 4%. The total entry log files is 80000, and the log files without index is 3000. The default entry log file size is 2GB, and the garbageCollectorThread will read 3000 * 2GB = 6TB data without speed limit, which will cause ledger disk IO util runs high for dozens of minutes and affect ledger read and write latency.

Modification

  1. Add read speed rate limiter for scanning entry log file in entryMetadataMap rebuild.

Copy link
Contributor

@nicoloboschi nicoloboschi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm

But I left one comment

// entry log
try {
return extractEntryLogMetadataFromIndex(entryLogId);
} catch (Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we catch specific exceptions here?

@hangc0276
Copy link
Contributor Author

rerun failure checks

@hangc0276
Copy link
Contributor Author

rerun failure checks

3 similar comments
@hangc0276
Copy link
Contributor Author

rerun failure checks

@hangc0276
Copy link
Contributor Author

rerun failure checks

@hangc0276
Copy link
Contributor Author

rerun failure checks

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

// First try to extract the EntryLogMetadata from the index, if there's no index then fallback to scanning the
// entry log
try {
return extractEntryLogMetadataFromIndex(entryLogId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you could add a test that injects a failure here using PowerMock and test that the bookie works even if we fall into the catch clause

@dlg99
Copy link
Contributor

dlg99 commented Feb 14, 2022

@hangc0276 please rebase/resolve conflicts

@hangc0276
Copy link
Contributor Author

rerun failure checks

3 similar comments
@hangc0276
Copy link
Contributor Author

rerun failure checks

@hangc0276
Copy link
Contributor Author

rerun failure checks

@hangc0276
Copy link
Contributor Author

rerun failure checks

@hangc0276 hangc0276 force-pushed the chenhang/add_throttle_for_build_entryMetadataMap branch from ecef7f9 to b8b3655 Compare February 21, 2022 02:07
@hangc0276 hangc0276 force-pushed the chenhang/add_throttle_for_build_entryMetadataMap branch from c695fea to 87ef9db Compare February 22, 2022 03:49
@dlg99 dlg99 added this to the 4.15.0 milestone Mar 10, 2022
@dlg99 dlg99 merged commit 181a6dc into apache:master Mar 10, 2022
dlg99 pushed a commit to datastax/bookkeeper that referenced this pull request Nov 19, 2022
When a bookie restart, the garbageCollectorThread will rebuild entryMetadataMap from all the entry log files in ledger directory. For normal case, it will extract the EntryLogMetadata from the index in entry log file. However, if there's no index, then fallback to scanning the entry log file.

In user's production environment, the log files without index occupied 4%. The total entry log files is 80000, and the log files without index is 3000. The default entry log file size is 2GB, and the garbageCollectorThread will read 3000 * 2GB = 6TB data without speed limit, which will cause ledger disk IO util runs high for dozens of minutes and affect ledger read and write latency.

1. Add read speed rate limiter for scanning entry log file in entryMetadataMap rebuild.

Reviewers: Nicolò Boschi <boschi1997@gmail.com>, Enrico Olivelli <eolivelli@gmail.com>

This closes apache#2963 from hangc0276/chenhang/add_throttle_for_build_entryMetadataMap

(cherry picked from commit 181a6dc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants