Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checkAllLedgers in Auditor supports read throttle #2973

Merged

Conversation

lordcheng10
Copy link
Contributor

@lordcheng10 lordcheng10 commented Jan 5, 2022

Motivation

When checkAllLedgers is scheduled periodically, because it will try to read almost all entry data, it may cause the bookkeeper to time out and cause the entry to be incorrectly marked with markLedgerUnderreplicatedAsync.
Every time when check all ledger is executed, a large number of ledgers will be marked as under replica, obviously this is a wrong judgment。
In our cluster, the execution cycle of checkAllLedgers is 1 week. Then we found that a large number of ledger will be marked markLedgerUnderreplicatedAsync each time it is executed. Analyzing the log found that there are some reading bookkeeper timeouts:
image

image

image

Due to too many read requests, the cluster pressure is too high, and the latency of pulsar's write time continues to soar until the recovery is completed.:
image

@lordcheng10
Copy link
Contributor Author

rerun failure checks

1 similar comment
@lordcheng10
Copy link
Contributor Author

rerun failure checks

Copy link
Contributor

@dlg99 dlg99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

averageEntrySize needs synchronization (or a design change to avoid it)

@@ -76,6 +84,11 @@
@Override
public void readEntryComplete(int rc, long ledgerId, long entryId,
ByteBuf buffer, Object ctx) {
if (readThrottle != null && buffer != null) {
int readSize = buffer.readableBytes();
averageEntrySize = (int) (averageEntrySize * AVERAGE_ENTRY_SIZE_RATIO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

async code, concurrent reads, plus updates of non-volatile field/later reads from non-volatile field.
You'll need to add synchronization around use of this field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I will fix

@@ -51,6 +51,14 @@
public final BookieClient bookieClient;
public final BookieWatcher bookieWatcher;

private static int averageEntrySize;

private static final int INITIAL_AVERAGE_ENTRY_SIZE = 1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider simplifying by using number of entries in flight (using Semaphore, releasing when processed) instead of guessing avg sizes and rate limiting. Or rate limit by number of entries.
This also removes need in synchronization.

One ledger may have entries of 512K, next one of the 1K, third one is mixed.
I don't see how tracking avg size significantly helps in this case, especially if the backpressure is not enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I will fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rate limit by number of entries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In accordance with your suggestions, I made changes。
Please review again, thank you!@dlg99

…ocessed) instead of guessing avg sizes and rate limiting.
…ocessed) instead of guessing avg sizes and rate limiting.
@lordcheng10 lordcheng10 requested a review from dlg99 January 7, 2022 06:11
Copy link
Contributor

@dlg99 dlg99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lordcheng10
Copy link
Contributor Author

@michaeljmarshall PTAL,thanks!

@lordcheng10
Copy link
Contributor Author

@eolivelli @nicoloboschi PTAL,thanks!

@pkumar-singh
Copy link
Member

pkumar-singh commented Jan 8, 2022

But I am still not sure , how it addresses the root cause. I understand that throttling with semaphore sort of reduces the pressure on the bookie. And there is a another mechanism for that too, that is percentageOfLedgerFragmentToBeVerified(Slight misnomer here, it actually checks percentage of entries in the ledger fragments). I understand throttling will reduce timeout from bookie. But timeout can still happen and will happen. My question is why not address this issue that occasional timeout should not be considered a failure, or may be should be retried? Thoughts? Looks good otherwise.

@lordcheng10
Copy link
Contributor Author

lordcheng10 commented Jan 9, 2022

But I am still not sure , how it addresses the root cause. I understand that throttling with semaphore sort of reduces the pressure on the bookie. And there is a another mechanism for that too, that is percentageOfLedgerFragmentToBeVerified(Slight misnomer here, it actually checks percentage of entries in the ledger fragments). I understand throttling will reduce timeout from bookie. But timeout can still happen and will happen. My question is why not address this issue that occasional timeout should not be considered a failure, or may be should be retried? Thoughts? Looks good otherwise.

percentageOfLedgerFragmentToBeVerified is not very useful. This parameter is for a Fragment, but in fact, most Fragments have only one entry, but at least the first and last entry will be checked.
percentageOfLedgerFragmentToBeVerified configures the default configuration we use, and the default configuration is 0.
Every time when check all ledger is executed, a large number of ledgers will be marked as under replica, obviously this is a wrong judgment。@pkumar-singh

@lordcheng10
Copy link
Contributor Author

But I am still not sure , how it addresses the root cause. I understand that throttling with semaphore sort of reduces the pressure on the bookie. And there is a another mechanism for that too, that is percentageOfLedgerFragmentToBeVerified(Slight misnomer here, it actually checks percentage of entries in the ledger fragments). I understand throttling will reduce timeout from bookie. But timeout can still happen and will happen. My question is why not address this issue that occasional timeout should not be considered a failure, or may be should be retried? Thoughts? Looks good otherwise.

I agree that the timeout problem should be solved through retry, but at the same time I think the rate should also be limited to prevent checking all ledger from putting too much pressure on the cluster! @pkumar-singh

@lordcheng10
Copy link
Contributor Author

ping

@pkumar-singh
Copy link
Member

But I am still not sure , how it addresses the root cause. I understand that throttling with semaphore sort of reduces the pressure on the bookie. And there is a another mechanism for that too, that is percentageOfLedgerFragmentToBeVerified(Slight misnomer here, it actually checks percentage of entries in the ledger fragments). I understand throttling will reduce timeout from bookie. But timeout can still happen and will happen. My question is why not address this issue that occasional timeout should not be considered a failure, or may be should be retried? Thoughts? Looks good otherwise.

I agree that the timeout problem should be solved through retry, but at the same time I think the rate should also be limited to prevent checking all ledger from putting too much pressure on the cluster! @pkumar-singh

Sure. It may not be sufficient but its accurate regardless. So OK from my end.

@lordcheng10
Copy link
Contributor Author

ping

@pkumar-singh pkumar-singh merged commit 525a4a0 into apache:master Jan 11, 2022
hangc0276 pushed a commit to hangc0276/bookkeeper that referenced this pull request Nov 7, 2022
* support  read throttle in checkAllLedgers

* using number of entries in flight (using Semaphore, releasing when processed) instead of guessing avg sizes and rate limiting.

* using number of entries in flight (using Semaphore, releasing when processed) instead of guessing avg sizes and rate limiting.

* check style

(cherry picked from commit 525a4a0)
hangc0276 pushed a commit to hangc0276/bookkeeper that referenced this pull request Nov 7, 2022
* support  read throttle in checkAllLedgers

* using number of entries in flight (using Semaphore, releasing when processed) instead of guessing avg sizes and rate limiting.

* using number of entries in flight (using Semaphore, releasing when processed) instead of guessing avg sizes and rate limiting.

* check style

(cherry picked from commit 525a4a0)
nicoloboschi pushed a commit to datastax/bookkeeper that referenced this pull request Jan 11, 2023
* support  read throttle in checkAllLedgers

* using number of entries in flight (using Semaphore, releasing when processed) instead of guessing avg sizes and rate limiting.

* using number of entries in flight (using Semaphore, releasing when processed) instead of guessing avg sizes and rate limiting.

* check style

(cherry picked from commit 525a4a0)
(cherry picked from commit ebd5e4c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants