New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add throttle for rebuild entryMetadataMap #2963
Add throttle for rebuild entryMetadataMap #2963
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall lgtm
But I left one comment
// entry log | ||
try { | ||
return extractEntryLogMetadataFromIndex(entryLogId); | ||
} catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we catch specific exceptions here?
rerun failure checks |
rerun failure checks |
3 similar comments
rerun failure checks |
rerun failure checks |
rerun failure checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
// First try to extract the EntryLogMetadata from the index, if there's no index then fallback to scanning the | ||
// entry log | ||
try { | ||
return extractEntryLogMetadataFromIndex(entryLogId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you could add a test that injects a failure here using PowerMock and test that the bookie works even if we fall into the catch clause
@hangc0276 please rebase/resolve conflicts |
rerun failure checks |
3 similar comments
rerun failure checks |
rerun failure checks |
rerun failure checks |
ecef7f9
to
b8b3655
Compare
c695fea
to
87ef9db
Compare
When a bookie restart, the garbageCollectorThread will rebuild entryMetadataMap from all the entry log files in ledger directory. For normal case, it will extract the EntryLogMetadata from the index in entry log file. However, if there's no index, then fallback to scanning the entry log file. In user's production environment, the log files without index occupied 4%. The total entry log files is 80000, and the log files without index is 3000. The default entry log file size is 2GB, and the garbageCollectorThread will read 3000 * 2GB = 6TB data without speed limit, which will cause ledger disk IO util runs high for dozens of minutes and affect ledger read and write latency. 1. Add read speed rate limiter for scanning entry log file in entryMetadataMap rebuild. Reviewers: Nicolò Boschi <boschi1997@gmail.com>, Enrico Olivelli <eolivelli@gmail.com> This closes apache#2963 from hangc0276/chenhang/add_throttle_for_build_entryMetadataMap (cherry picked from commit 181a6dc)
Motivation
When a bookie restart, the garbageCollectorThread will rebuild entryMetadataMap from all the entry log files in ledger directory. For normal case, it will extract the EntryLogMetadata from the index in entry log file. However, if there's no index, then fallback to scanning the entry log file.
In user's production environment, the log files without index occupied 4%. The total entry log files is 80000, and the log files without index is 3000. The default entry log file size is 2GB, and the garbageCollectorThread will read 3000 * 2GB = 6TB data without speed limit, which will cause ledger disk IO util runs high for dozens of minutes and affect ledger read and write latency.
Modification