Issue #1791: Read Submission should bypass OSE Threads #1792

nicmichael · 2018-11-06T19:56:10Z

Motivation

Profiling of our Bookkeeper Client code for read requests shows that client threads spend half of their time in dispatching requests to OrderedExecutors (just the dispatch itself, not the execution inside OSE): 54% of their CPU time is spent in OrderedExecutor.executeOrdered() (called by LedgerHandle.readEntriesInternalAsync()). The high time spend in request submission to OSE is largely caused by Linux scheduling cost, that is the cost of dispatching the OSE thread to CPU: 42% of total time (3/4th of executeOrdered() time), threads spend in Unsafe.unpark(), which is essentially Linux scheduling/dispatching of another thread.

Changes

This change executes read submissions (PendingReadOp) on read-only ledger handles directly inside the client thread instead of submitting them to Ordered Executors.

Tests with a prototype have shown significant improvements in both overall CPU consumption as well as read latency. The additional work client threads have to do (the dispatch of the read requests to netty) is roughly the same as the (saved) dispatch cost to OSE, so the change turns out to be neutral for CPU consumption of client threads. In some experiments, the savings even exceed the additional work, and client threads consume less cpu even though they "do more". It also frees up lots of resources in OSE threads. Since it eliminates one context-switch in read submission and also avoids serialization of reads to the same ledger (or ledgers hashing to the same OSE), it also reduces read latency. For a mixed read-write workload (14,000 reads/sec on read-only ledgers, 4,000 writes/sec on another set of ledgers), this change has reduced CPU consumption of OSE threads by 25%, kept CPU consumption of client (and Netty) threads the same, and yielded a 6% improvement of read latency (as measured by BK Client).

Master Issue: #1791: Read Submission should bypass OSE Threads

This change executes read submissions (PendingReadOp) on read-only ledger handles directly inside the client thread instead of submitting them to Ordered Executors. Tests with a prototype have shown significant improvements in both overall CPU consumption as well as read latency. The additional work client threads have to do (the dispatch of the read requests to netty) is roughly the same as the (saved) dispatch cost to OSE, so the change turns out to be neutral for CPU consumption of client threads. In some experiments, the savings even exceed the additional work, and client threads consume less cpu even though they "do more". It also frees up lots of resources in OSE threads. Since it eliminates one context-switch in read submission and also avoids serialization of reads to the same ledger (or ledgers hashing to the same OSE), it also reduces read latency. For a mixed read-write workload (14,000 reads/sec on read-only ledgers, 4,000 writes/sec on another set of ledgers), this change has reduced CPU consumption of OSE threads by 25%, kept CPU consumption of client (and Netty) threads the same, and yielded a 6% improvement of read latency (as measured by BK Client).

eolivelli

Can we add some test case? We already have mockito based tests for PendingAppOp.

I am not sure we can able this by default, what about having a configuration flag?

sijie · 2018-11-06T21:48:37Z

@eolivelli

Can we add some test case?

don't think we need test cases. read op is a common enough operation, which should be covered by all existing test cases.

I am not sure we can able this by default,

I think the change is enabling only on readonly handles. that seems to be fine.

nicmichael · 2018-11-06T23:00:02Z

@eolivelli I looked at the existing test cases. As sijie pointed out, there's quite a few existing ones that test reads on both read-only as well as read-write ledger handles. I hope that would be sufficient. Or were you thinking of any kind of special tests?

eolivelli · 2018-11-07T07:58:39Z

I was talking about adding a test about the fact that if the handlr is read-only the operation is not submitted to the OSE, otherwise it uses the OSE.

Not blocker

eolivelli

Nice improvement.

Thank you @nicmichael

eolivelli · 2018-11-07T08:02:59Z

No need for a flag

sijie · 2018-11-07T17:35:57Z

@merlimat please review this since you have comment at #1791

Profiling of our Bookkeeper Client code for read requests shows that client threads spend half of their time in dispatching requests to OrderedExecutors (just the dispatch itself, not the execution inside OSE): 54% of their CPU time is spent in OrderedExecutor.executeOrdered() (called by LedgerHandle.readEntriesInternalAsync()). The high time spend in request submission to OSE is largely caused by Linux scheduling cost, that is the cost of dispatching the OSE thread to CPU: 42% of total time (3/4th of executeOrdered() time), threads spend in Unsafe.unpark(), which is essentially Linux scheduling/dispatching of another thread. This change executes read submissions (PendingReadOp) on read-only ledger handles directly inside the client thread instead of submitting them to Ordered Executors. Tests with a prototype have shown significant improvements in both overall CPU consumption as well as read latency. The additional work client threads have to do (the dispatch of the read requests to netty) is roughly the same as the (saved) dispatch cost to OSE, so the change turns out to be neutral for CPU consumption of client threads. In some experiments, the savings even exceed the additional work, and client threads consume less cpu even though they "do more". It also frees up lots of resources in OSE threads. Since it eliminates one context-switch in read submission and also avoids serialization of reads to the same ledger (or ledgers hashing to the same OSE), it also reduces read latency. For a mixed read-write workload (14,000 reads/sec on read-only ledgers, 4,000 writes/sec on another set of ledgers), this change has reduced CPU consumption of OSE threads by 25%, kept CPU consumption of client (and Netty) threads the same, and yielded a 6% improvement of read latency (as measured by BK Client). Master Issue: #1791: Read Submission should bypass OSE Threads Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Andrey Yegorov <None>, Sijie Guo <sijie@apache.org>, Matteo Merli <mmerli@apache.org> This closes #1792 from nicmichael/DirectRead, closes #1791

lhotari · 2022-03-13T19:55:33Z

In Pulsar, apache/pulsar#14436 seems to be caused by a thread safety issue in Bookkeeper client. It looks like it is caused by the changes in this PR.

The reason why there's a thread safety issue is that a network request-response call doesn't ensure happens-before. Happens-before visibility guarantee is needed since there's no synchronization in the PendingReadOp handling.

lhotari · 2022-03-13T20:13:38Z

BookKeeper uses LinkedBlockingQueue for executors by default which isn't very efficient.

bookkeeper/bookkeeper-common/src/main/java/org/apache/bookkeeper/common/util/OrderedExecutor.java

Line 308 in 3db4de9

queue = new LinkedBlockingQueue<>();

A better Queue implementation would be something that Jetty uses for it's thread pools https://github.com/eclipse/jetty.project/blob/jetty-10.0.x/jetty-util/src/main/java/org/eclipse/jetty/util/BlockingArrayQueue.java .

By improving the queue implementation, there would be better performance without breaking thread safety.

The BookKeeper code includes https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/util/collections/GrowableArrayBlockingQueue.java queue implementation.

lhotari · 2022-03-14T06:16:16Z

I created #3104 about the thread safety issue.

lhotari · 2022-03-15T15:18:58Z

#3104 turned out to be a clear state handling issue in PendingReadOp class (and included embedded classes), fix is #3110

eolivelli requested changes Nov 6, 2018

View reviewed changes

dlg99 approved these changes Nov 6, 2018

View reviewed changes

sijie approved these changes Nov 6, 2018

View reviewed changes

eolivelli approved these changes Nov 7, 2018

View reviewed changes

sijie requested a review from merlimat November 7, 2018 17:36

sijie assigned nicmichael Nov 7, 2018

merlimat approved these changes Nov 8, 2018

View reviewed changes

sijie added this to the 4.9.0 milestone Nov 8, 2018

sijie added area/bookie area/client release/4.9.0 type/improvement labels Nov 8, 2018

sijie merged commit 6b99ff7 into apache:master Nov 8, 2018

sijie added the release/4.8.2 label Nov 8, 2018

dlg99 mentioned this pull request Feb 14, 2022

Split read and write orderExecutor #3003

Open

This was referenced Mar 13, 2022

[Proto] java.lang.IllegalStateException: Some required fields are missing apache/pulsar#14436

Closed

Support isolate read write thread pool #3062

Open

lhotari mentioned this pull request Mar 14, 2022

Recycled LedgerEntryImpl instances are corrupted due to invalid recycling in BK client #3104

Closed

lhotari mentioned this pull request Mar 14, 2022

Improve performance of OrderedExecutor by switching to a more performant BlockingQueue implementation #3105

Closed

eolivelli mentioned this pull request Mar 14, 2022

Revert "Issue #1791: Read Submission should bypass OSE Threads" #3106

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue #1791: Read Submission should bypass OSE Threads #1792

Issue #1791: Read Submission should bypass OSE Threads #1792

nicmichael commented Nov 6, 2018

eolivelli left a comment

sijie commented Nov 6, 2018

nicmichael commented Nov 6, 2018

eolivelli commented Nov 7, 2018

eolivelli left a comment

eolivelli commented Nov 7, 2018

sijie commented Nov 7, 2018

lhotari commented Mar 13, 2022

lhotari commented Mar 13, 2022 •

edited

Loading

lhotari commented Mar 14, 2022

lhotari commented Mar 15, 2022

Issue #1791: Read Submission should bypass OSE Threads #1792

Issue #1791: Read Submission should bypass OSE Threads #1792

Conversation

nicmichael commented Nov 6, 2018

Motivation

Changes

eolivelli left a comment

Choose a reason for hiding this comment

sijie commented Nov 6, 2018

nicmichael commented Nov 6, 2018

eolivelli commented Nov 7, 2018

eolivelli left a comment

Choose a reason for hiding this comment

eolivelli commented Nov 7, 2018

sijie commented Nov 7, 2018

lhotari commented Mar 13, 2022

lhotari commented Mar 13, 2022 • edited Loading

lhotari commented Mar 14, 2022

lhotari commented Mar 15, 2022

lhotari commented Mar 13, 2022 •

edited

Loading