XADD Chunked Array Queue #230

franz1981 · 2019-01-18T19:28:50Z

It is a chunked array q designed to scale with the number of producers.

franz1981 · 2019-01-18T19:36:25Z

@nitsanw It is a just a lil' more then a POC but I will continue to improve it if the logic seems reasonable and sound to you, so please take a look to the logic 👍
I have yet to fix:

tests
implements missing methods
use standard naming
use Unsafe where necessary
padding to protect from false sharing
"maybe" Implement AtomicChunk::next using a JUMP + AtomicChunk element

The first benchmarks shows a better scaling then the other multi-producer queues and the code/logic/layout is not optimized at all 👍

coveralls · 2019-01-18T19:45:06Z

Pull Request Test Coverage Report for Build 511

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 47.61%

Totals
Change from base Build 507:	0.0%
Covered Lines:	1673
Relevant Lines:	3514

💛 - Coveralls

eolivelli

Awesome.
Just a newbie question, why is there no new test case ?

franz1981 · 2019-01-18T20:46:37Z

@eolivelli
I will probably reuse some of the already existing tests, but as I have written above is just a PR to get a review on the q logic; I will squash a more refined version If the logic will be validated 👍

franz1981 · 2019-01-20T00:51:05Z

I'm working on an idea to recycle the consumed buffers and make it GC-free in the same way MpscChunkedArrayQueue is 👍

franz1981 · 2019-01-20T15:03:36Z

@eolivelli @nitsanw I've played a lil' more with this on the weekend, making possible to recycle chunks, adding tests and paddings + refine the implementation in order to use Unsafe.
The performance results are interesting: with 2 producers I'm getting 20 ops/us vs 13 ops/usec of MpscUnboundedArrayQueue + no garbage produced.
With 7 producers I'm getting 40 ops/us vs 8 ops/usec of MpscUnboundedArrayQueue + near no garbage produced (depends on long stall of any producer thread).

franz1981 · 2019-01-20T16:17:30Z

I think that both poll/peek and fill could be improved a lot; single threaded fill benchs are not good as I was expecting!

jctools-core/src/main/java/org/jctools/queues/MpscProgressiveChunkedQueue.java

nitsanw · 2019-01-21T08:07:51Z

jctools-core/src/main/java/org/jctools/queues/MpscProgressiveChunkedQueue.java

+            consumerBuffer = next;
+            this.consumerBuffer = consumerBuffer;
+        }
+        E e = consumerBuffer.lvElement(consumerOffset);


The following code can be re-arranged as elsewhere for single consumer code

Do you mean by grouping everything under a single method or just by grouping the "slow" path?

assert !firstElementOfNewChunk; if (lvProducerIndex() == consumerIndex) { return null; } e = spinForElement(consumerBuffer, consumerOffset); consumerBuffer.spElement(consumerOffset, null); soConsumerIndex(consumerIndex + 1); return e;

jctools-core/src/main/java/org/jctools/queues/MpscProgressiveChunkedQueue.java

nitsanw · 2019-01-21T08:17:16Z

Great stuff! This makes for an interesting tradeoff, though the progress is still blocked if the new buffer allocator is interrupted (so this is still blocking producers on new buffer). The progress guarantees are the same overall, so this is not more "progressive" IMO. You can make it more progressive if the first to discover a missing chunk allocates it, rather than the first in chunk, but this is not simple...

franz1981 · 2019-01-21T09:01:32Z

though the progress is still blocked if the new buffer allocator is interrupted (so this is still blocking producers on new buffer).

Eheh, smart observation: that's why I've added the recycling of the buffer; to speedup the process to publish a new chunk to be used by the other producers awaiting it

The progress guarantees are the same overall, so this is not more "progressive" IMO

You're right, but I wasn't able to find a better name for this :P
MostlyProgressive seems even worst eheh

You can make it more progressive if the first to discover a missing chunk allocates it, rather than the first in chunk, but this is not simple...

Nope, but it is an interesting idea! Not sure I will be able to get it into this PR

franz1981 · 2019-01-21T09:27:10Z

@nitsanw I've addressed the most of the changes I've understood: let me know if it seems better and...probably will need some help to improve fill/drain bud 👍

franz1981 · 2019-01-21T19:08:33Z

@nitsanw I've added some significant changes to the producerBuffer publication/initialization logic due to the complication of recycling instances, but I think to have addressed any weird corner cases due to long stalls of producer threads. Please let me know if the comments are clear enough: I wasn't expecting recycling logic to affect the offer logic that much!

franz1981 · 2019-01-24T18:48:19Z

I'm just too curious to see how it performs on a machine with many cores: @nitsanw there is any chance to test this one on such hw?

franz1981 · 2019-01-27T10:59:42Z

@nitsanw I've addressed in a separate commit the challenge:

though the progress is still blocked if the new buffer allocator is interrupted (so this is still blocking producers on new buffer). The progress guarantees are the same overall, so this is not more "progressive" IMO. You can make it more progressive if the first to discover a missing chunk allocates it, rather than the first in chunk, but this is not simple...

Now it should be more progressive: any offer that found any missing chunk (including previous ones) will try to fill the gap, but given that producer buffer publication (partial) ordering should be preserved I've used a CAS on a separate producerChunkIndex to maintain ordering of publish.
Maybe there is a better way to make it totally progressive, but I will probably need a lot more time to find it :)

From perf pov I can't see any evident benefit in this new solution, but maybe I'm not benchmarking it properly or the implementation is just not enough "progressive".

franz1981 · 2019-02-15T08:39:32Z

@eolivelli @felixbarny did you have chances to try it or check the logic guys?

felixbarny · 2019-02-15T09:22:53Z

Sorry, but I didn't try it out yet. For my use case, it's important that the size of the queue is limited. This one seems to be unlimited? Ideally, I'd also need a XADD-based MPMC queue.

<offtopic>
A quick outline of my use case. I'm writing an APM agent. It needs to collect traces from multiple application threads which are transferred to a reporter thread. After reporting, the objects are reset and put in an object pool for reuse. When a request is recorded, the agent takes a recycled object from the pool in the context of an application thread. I'm currently using a MPSC disruptor to transfer the objects from the application threads to the reporter thread. For the object pool, I'm using a MPMCArrayQueue. This design is quite simple but getting objects from the pool and putting them into the disruptor is contended. I have some ideas to reduce the contention with ThreadLocals but that would be even more off-topic here. A XADD-based MPMC queue would allow the architecture to remain simple but still not be prone to contention.
</offtopic>

franz1981 · 2019-02-15T13:18:03Z

I have already implemented a bounded mpsc xadd queue but is prone to be blocked forever in very special cases ie is in the experimental branch of JCTools for that reason and is inspired by the Aeron log buffer algorithm.
I can't ATM see a chance to preserve a queue semantic while using XADD on producer side: without being too negative I think this is difficult to achieve with a bounded q.
That's why I have built this unbounded q.

This q could be "easily" turn into mpmc (but without using XADD on consumer side obviously) and I could provide a weakOffer method that could just weakly back-pressure a producer by looking the current distance of consumers, reducing the chance to allocate a new chunks.
I'm not sure it is the right tool TBH while using a mpmc bounded queue with batch drain/fill should work a lot better.

franz1981 · 2019-03-03T10:12:06Z

@kay I have always referenced Nitsan for the review of this PR, but please, feel free to validate the code as well eh 👍

franz1981 · 2019-03-11T07:53:34Z

@nitsanw I've pushed a squashed commit addressing a null check, but the logic should be the same.
The first commit, that wasn't performing cooperative/progressive allocation of missing chunks isn't needed anymore IMHO, but I'm lefting it here just while awaiting the code review: I will squash the commits into one right after that 👍

franz1981 · 2019-03-18T09:44:27Z

I've decided to squash the commits into one, given that the "more progressive" version is preferreable to the original one,@nitsanw let me know if I'm missing other parts to clean-up or any improvement you see on the algorithm/implementation.
I see that the freeBuffer could be replaced by a SpscArrayQueue configurable at startup and many volatile loads could be replaced, but my only concerns perf-wise are the batch operations that doesn't seem to me effective as I was expecting.

franz1981 · 2019-04-01T11:35:34Z

@nitsanw I've tried a scaling benchmark with increasing numbers of producers, but I'm not getting what I was expecting:

According to http://www.perfdynamics.com/Manifesto/USLscalability.html is quite clear that there are some other costs in place that are reducing the scalability of this q.
The first part of the shape of the chart is perfectly matching the classic USL (for some coherency and contention factors to be estimated):

But what seems the most intriguing part to me is that I've found an evident pattern for the tail of the chart
and it seems somehow related to Gene Amdahl himself:

After several hours I can show you my bet:

Can you see it? The nose of Amdahl has a perfect match with my results: and is totally reproducible!!!

It is clear that the Universal Scalability Law is clearly an extension of Amdahl Law, but Amdhal himself is a personified extension of it as well; and he seems to go beyond that too (!!!).
I'm sure that Little too could be hidden somewhere in my bechmark results...
maybe next 1st April I'll find out :)

nitsanw · 2019-04-06T09:12:26Z

I'll review further before next release, but generally it looks good :-)
Thanks for your efforts and patience

franz1981 force-pushed the xadd_q branch from 333bfc5 to 27a0229 Compare January 18, 2019 19:35

franz1981 force-pushed the xadd_q branch from 27a0229 to c158651 Compare January 18, 2019 19:38

eolivelli reviewed Jan 18, 2019

View reviewed changes

franz1981 force-pushed the xadd_q branch 4 times, most recently from 59519c0 to 714a8fc Compare January 19, 2019 15:47

franz1981 force-pushed the xadd_q branch 5 times, most recently from 41dfa8e to a27ecf0 Compare January 20, 2019 14:55

franz1981 mentioned this pull request Jan 20, 2019

Queues based on Unsafe#getAndAdd instead of CAS spins? #226

Closed

franz1981 force-pushed the xadd_q branch 2 times, most recently from 1ee2c31 to c72f06b Compare January 20, 2019 15:13

franz1981 force-pushed the xadd_q branch from c72f06b to 6c4aa8f Compare January 20, 2019 20:59

nitsanw reviewed Jan 21, 2019

View reviewed changes

jctools-core/src/main/java/org/jctools/queues/MpscProgressiveChunkedQueue.java Outdated Show resolved Hide resolved

nitsanw reviewed Jan 21, 2019

View reviewed changes

jctools-core/src/main/java/org/jctools/queues/MpscProgressiveChunkedQueue.java Outdated Show resolved Hide resolved

nitsanw reviewed Jan 21, 2019

View reviewed changes

jctools-core/src/main/java/org/jctools/queues/MpscProgressiveChunkedQueue.java Outdated Show resolved Hide resolved

franz1981 force-pushed the xadd_q branch from 6c4aa8f to 931fcc8 Compare January 21, 2019 08:51

franz1981 force-pushed the xadd_q branch from 931fcc8 to 3722d6b Compare January 21, 2019 09:10

franz1981 force-pushed the xadd_q branch from 3722d6b to 4ca24d8 Compare January 21, 2019 09:25

franz1981 force-pushed the xadd_q branch 5 times, most recently from 2564497 to abec981 Compare January 21, 2019 18:47

franz1981 force-pushed the xadd_q branch 2 times, most recently from e29c6e1 to f4b7931 Compare January 23, 2019 06:31

franz1981 force-pushed the xadd_q branch 3 times, most recently from fbfafb0 to c6a4da9 Compare January 28, 2019 08:59

franz1981 force-pushed the xadd_q branch from c6a4da9 to 125917c Compare March 11, 2019 07:40

Fixes JCTools#226: XADD mpsc queue: highly scalable queue

b0febce

franz1981 force-pushed the xadd_q branch from 125917c to b0febce Compare March 18, 2019 09:41

nitsanw merged commit 9f745a0 into JCTools:master Apr 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XADD Chunked Array Queue #230

XADD Chunked Array Queue #230

franz1981 commented Jan 18, 2019 •

edited

Loading

franz1981 commented Jan 18, 2019

coveralls commented Jan 18, 2019 •

edited

Loading

eolivelli left a comment

franz1981 commented Jan 18, 2019 •

edited

Loading

franz1981 commented Jan 20, 2019

franz1981 commented Jan 20, 2019 •

edited

Loading

franz1981 commented Jan 20, 2019

nitsanw Jan 21, 2019

franz1981 Jan 21, 2019 •

edited

Loading

nitsanw commented Jan 21, 2019

franz1981 commented Jan 21, 2019

franz1981 commented Jan 21, 2019

franz1981 commented Jan 21, 2019 •

edited

Loading

franz1981 commented Jan 24, 2019

franz1981 commented Jan 27, 2019 •

edited

Loading

franz1981 commented Feb 15, 2019

felixbarny commented Feb 15, 2019

franz1981 commented Feb 15, 2019 •

edited

Loading

franz1981 commented Mar 3, 2019 •

edited

Loading

franz1981 commented Mar 11, 2019

franz1981 commented Mar 18, 2019

franz1981 commented Apr 1, 2019 •

edited

Loading

nitsanw commented Apr 6, 2019

XADD Chunked Array Queue #230

XADD Chunked Array Queue #230

Conversation

franz1981 commented Jan 18, 2019 • edited Loading

franz1981 commented Jan 18, 2019

coveralls commented Jan 18, 2019 • edited Loading

Pull Request Test Coverage Report for Build 511

💛 - Coveralls

eolivelli left a comment

Choose a reason for hiding this comment

franz1981 commented Jan 18, 2019 • edited Loading

franz1981 commented Jan 20, 2019

franz1981 commented Jan 20, 2019 • edited Loading

franz1981 commented Jan 20, 2019

nitsanw Jan 21, 2019

Choose a reason for hiding this comment

franz1981 Jan 21, 2019 • edited Loading

Choose a reason for hiding this comment

nitsanw commented Jan 21, 2019

franz1981 commented Jan 21, 2019

franz1981 commented Jan 21, 2019

franz1981 commented Jan 21, 2019 • edited Loading

franz1981 commented Jan 24, 2019

franz1981 commented Jan 27, 2019 • edited Loading

franz1981 commented Feb 15, 2019

felixbarny commented Feb 15, 2019

franz1981 commented Feb 15, 2019 • edited Loading

franz1981 commented Mar 3, 2019 • edited Loading

franz1981 commented Mar 11, 2019

franz1981 commented Mar 18, 2019

franz1981 commented Apr 1, 2019 • edited Loading

nitsanw commented Apr 6, 2019

franz1981 commented Jan 18, 2019 •

edited

Loading

coveralls commented Jan 18, 2019 •

edited

Loading

franz1981 commented Jan 18, 2019 •

edited

Loading

franz1981 commented Jan 20, 2019 •

edited

Loading

franz1981 Jan 21, 2019 •

edited

Loading

franz1981 commented Jan 21, 2019 •

edited

Loading

franz1981 commented Jan 27, 2019 •

edited

Loading

franz1981 commented Feb 15, 2019 •

edited

Loading

franz1981 commented Mar 3, 2019 •

edited

Loading

franz1981 commented Apr 1, 2019 •

edited

Loading