checkAllLedgers in Auditor supports read throttle #2973

lordcheng10 · 2022-01-05T09:37:46Z

Motivation

When checkAllLedgers is scheduled periodically, because it will try to read almost all entry data, it may cause the bookkeeper to time out and cause the entry to be incorrectly marked with markLedgerUnderreplicatedAsync.
Every time when check all ledger is executed, a large number of ledgers will be marked as under replica, obviously this is a wrong judgment。
In our cluster, the execution cycle of checkAllLedgers is 1 week. Then we found that a large number of ledger will be marked markLedgerUnderreplicatedAsync each time it is executed. Analyzing the log found that there are some reading bookkeeper timeouts:

Due to too many read requests, the cluster pressure is too high, and the latency of pulsar's write time continues to soar until the recovery is completed.：

lordcheng10 · 2022-01-05T10:08:16Z

rerun failure checks

lordcheng10 · 2022-01-05T10:30:05Z

rerun failure checks

dlg99

averageEntrySize needs synchronization (or a design change to avoid it)

dlg99 · 2022-01-05T23:58:35Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerChecker.java

@@ -76,6 +84,11 @@
        @Override
        public void readEntryComplete(int rc, long ledgerId, long entryId,
                ByteBuf buffer, Object ctx) {
+            if (readThrottle != null && buffer != null) {
+                int readSize = buffer.readableBytes();
+                averageEntrySize = (int) (averageEntrySize * AVERAGE_ENTRY_SIZE_RATIO


async code, concurrent reads, plus updates of non-volatile field/later reads from non-volatile field.
You'll need to add synchronization around use of this field.

OK I will fix

dlg99 · 2022-01-06T00:03:36Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerChecker.java

@@ -51,6 +51,14 @@
    public final BookieClient bookieClient;
    public final BookieWatcher bookieWatcher;

+    private static int averageEntrySize;
+
+    private static final int INITIAL_AVERAGE_ENTRY_SIZE = 1024;


consider simplifying by using number of entries in flight (using Semaphore, releasing when processed) instead of guessing avg sizes and rate limiting. Or rate limit by number of entries.
This also removes need in synchronization.

One ledger may have entries of 512K, next one of the 1K, third one is mixed.
I don't see how tracking avg size significantly helps in this case, especially if the backpressure is not enabled.

OK I will fix

I will rate limit by number of entries

In accordance with your suggestions, I made changes。
Please review again, thank you！@dlg99

…ocessed) instead of guessing avg sizes and rate limiting.

dlg99

LGTM

lordcheng10 · 2022-01-07T16:49:34Z

@michaeljmarshall PTAL,thanks!

lordcheng10 · 2022-01-08T08:18:27Z

@eolivelli @nicoloboschi PTAL,thanks!

pkumar-singh · 2022-01-08T19:10:35Z

But I am still not sure , how it addresses the root cause. I understand that throttling with semaphore sort of reduces the pressure on the bookie. And there is a another mechanism for that too, that is percentageOfLedgerFragmentToBeVerified(Slight misnomer here, it actually checks percentage of entries in the ledger fragments). I understand throttling will reduce timeout from bookie. But timeout can still happen and will happen. My question is why not address this issue that occasional timeout should not be considered a failure, or may be should be retried? Thoughts? Looks good otherwise.

lordcheng10 · 2022-01-09T03:24:44Z

But I am still not sure , how it addresses the root cause. I understand that throttling with semaphore sort of reduces the pressure on the bookie. And there is a another mechanism for that too, that is percentageOfLedgerFragmentToBeVerified(Slight misnomer here, it actually checks percentage of entries in the ledger fragments). I understand throttling will reduce timeout from bookie. But timeout can still happen and will happen. My question is why not address this issue that occasional timeout should not be considered a failure, or may be should be retried? Thoughts? Looks good otherwise.

percentageOfLedgerFragmentToBeVerified is not very useful. This parameter is for a Fragment, but in fact, most Fragments have only one entry, but at least the first and last entry will be checked.
percentageOfLedgerFragmentToBeVerified configures the default configuration we use, and the default configuration is 0.
Every time when check all ledger is executed, a large number of ledgers will be marked as under replica, obviously this is a wrong judgment。@pkumar-singh

lordcheng10 · 2022-01-09T03:32:08Z

But I am still not sure , how it addresses the root cause. I understand that throttling with semaphore sort of reduces the pressure on the bookie. And there is a another mechanism for that too, that is percentageOfLedgerFragmentToBeVerified(Slight misnomer here, it actually checks percentage of entries in the ledger fragments). I understand throttling will reduce timeout from bookie. But timeout can still happen and will happen. My question is why not address this issue that occasional timeout should not be considered a failure, or may be should be retried? Thoughts? Looks good otherwise.

I agree that the timeout problem should be solved through retry, but at the same time I think the rate should also be limited to prevent checking all ledger from putting too much pressure on the cluster！ @pkumar-singh

lordcheng10 · 2022-01-10T13:10:48Z

ping

pkumar-singh · 2022-01-10T19:01:35Z

But I am still not sure , how it addresses the root cause. I understand that throttling with semaphore sort of reduces the pressure on the bookie. And there is a another mechanism for that too, that is percentageOfLedgerFragmentToBeVerified(Slight misnomer here, it actually checks percentage of entries in the ledger fragments). I understand throttling will reduce timeout from bookie. But timeout can still happen and will happen. My question is why not address this issue that occasional timeout should not be considered a failure, or may be should be retried? Thoughts? Looks good otherwise.

I agree that the timeout problem should be solved through retry, but at the same time I think the rate should also be limited to prevent checking all ledger from putting too much pressure on the cluster！ @pkumar-singh

Sure. It may not be sufficient but its accurate regardless. So OK from my end.

lordcheng10 · 2022-01-11T05:58:14Z

ping

* support read throttle in checkAllLedgers * using number of entries in flight (using Semaphore, releasing when processed) instead of guessing avg sizes and rate limiting. * using number of entries in flight (using Semaphore, releasing when processed) instead of guessing avg sizes and rate limiting. * check style (cherry picked from commit 525a4a0)

* support read throttle in checkAllLedgers * using number of entries in flight (using Semaphore, releasing when processed) instead of guessing avg sizes and rate limiting. * using number of entries in flight (using Semaphore, releasing when processed) instead of guessing avg sizes and rate limiting. * check style (cherry picked from commit 525a4a0) (cherry picked from commit ebd5e4c)

support read throttle in checkAllLedgers

e8ff149

dlg99 requested changes Jan 6, 2022

View reviewed changes

lordcheng10 added 3 commits January 6, 2022 21:13

using number of entries in flight (using Semaphore, releasing when pr…

2236083

…ocessed) instead of guessing avg sizes and rate limiting.

using number of entries in flight (using Semaphore, releasing when pr…

6de6a84

…ocessed) instead of guessing avg sizes and rate limiting.

check style

0809bdb

lordcheng10 requested a review from dlg99 January 7, 2022 06:11

dlg99 approved these changes Jan 7, 2022

View reviewed changes

pkumar-singh approved these changes Jan 10, 2022

View reviewed changes

pkumar-singh merged commit 525a4a0 into apache:master Jan 11, 2022

massakam mentioned this pull request Apr 18, 2022

Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled #3214

Merged

hangc0276 assigned lordcheng10 Nov 6, 2022

hangc0276 added type/improvement release/4.14.6 labels Nov 6, 2022

hangc0276 added the cherry-picked/branch-4.14 label Nov 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

checkAllLedgers in Auditor supports read throttle #2973

checkAllLedgers in Auditor supports read throttle #2973

lordcheng10 commented Jan 5, 2022 •

edited

lordcheng10 commented Jan 5, 2022

lordcheng10 commented Jan 5, 2022

dlg99 left a comment

dlg99 Jan 5, 2022

lordcheng10 Jan 6, 2022

dlg99 Jan 6, 2022

lordcheng10 Jan 6, 2022

lordcheng10 Jan 6, 2022

lordcheng10 Jan 7, 2022

dlg99 left a comment

lordcheng10 commented Jan 7, 2022

lordcheng10 commented Jan 8, 2022

pkumar-singh commented Jan 8, 2022 •

edited

lordcheng10 commented Jan 9, 2022 •

edited

lordcheng10 commented Jan 9, 2022

lordcheng10 commented Jan 10, 2022

pkumar-singh commented Jan 10, 2022

lordcheng10 commented Jan 11, 2022

checkAllLedgers in Auditor supports read throttle #2973

checkAllLedgers in Auditor supports read throttle #2973

Conversation

lordcheng10 commented Jan 5, 2022 • edited

Motivation

lordcheng10 commented Jan 5, 2022

lordcheng10 commented Jan 5, 2022

dlg99 left a comment

Choose a reason for hiding this comment

dlg99 Jan 5, 2022

Choose a reason for hiding this comment

lordcheng10 Jan 6, 2022

Choose a reason for hiding this comment

dlg99 Jan 6, 2022

Choose a reason for hiding this comment

lordcheng10 Jan 6, 2022

Choose a reason for hiding this comment

lordcheng10 Jan 6, 2022

Choose a reason for hiding this comment

lordcheng10 Jan 7, 2022

Choose a reason for hiding this comment

dlg99 left a comment

Choose a reason for hiding this comment

lordcheng10 commented Jan 7, 2022

lordcheng10 commented Jan 8, 2022

pkumar-singh commented Jan 8, 2022 • edited

lordcheng10 commented Jan 9, 2022 • edited

lordcheng10 commented Jan 9, 2022

lordcheng10 commented Jan 10, 2022

pkumar-singh commented Jan 10, 2022

lordcheng10 commented Jan 11, 2022

lordcheng10 commented Jan 5, 2022 •

edited

pkumar-singh commented Jan 8, 2022 •

edited

lordcheng10 commented Jan 9, 2022 •

edited