Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Optimize unbounded channels #279
Use a different queue for unbounded channels.
This gives us:
279: Optimize unbounded channels r=stjepang a=stjepang Use a different queue for unbounded channels. This gives us: * Performance improvements (see benchmarks below). * Lower memory consumption (memory reclamation is not deferred, it's *eager*). * Fewer dependencies (no more `crossbeam-epoch`). Before: ``` unbounded_mpmc Rust crossbeam-channel 0.392 sec unbounded_mpsc Rust crossbeam-channel 0.373 sec unbounded_select_both Rust crossbeam-channel 0.536 sec unbounded_select_rx Rust crossbeam-channel 0.589 sec unbounded_seq Rust crossbeam-channel 0.462 sec unbounded_spsc Rust crossbeam-channel 0.235 sec ``` After: ``` unbounded_mpmc Rust crossbeam-channel 0.266 sec unbounded_mpsc Rust crossbeam-channel 0.250 sec unbounded_select_both Rust crossbeam-channel 0.449 sec unbounded_select_rx Rust crossbeam-channel 0.438 sec unbounded_seq Rust crossbeam-channel 0.333 sec unbounded_spsc Rust crossbeam-channel 0.210 sec ``` Co-authored-by: Stjepan Glavina <firstname.lastname@example.org>
This unbounded MPMC queue is very similar to Dmitry Vyukov's bounded MPMC - it's the same idea, except we have a linked list of blocks rather than one big circular array.
This queue is not 100% lock-free (but neither is
In my benchmarks (those in
Another benefit of this queue is that memory reclamation is not deferred at all - it's fully eager and there is no concept of garbage! As soon as the last operation using a block is done, it gets destroyed. I did some tests and measured lower memory overhead in high concurrency scenarios than with queues using epochs. Hopefully this fixes the problem where Firecracker tests were failing on Amazon's internal CI due to excessive memory use by
So how does this avoid epochs? The queue has head and tail indices of type
To send a message, load
To receive a message, similarly load
Here comes block destruction. If our receive operation got the last slot in the block, after reading the message and setting the
That's the gist of it. I'm omitting a few less important remaining details, but hopefully they make sense when reading the code.
I'm very excited about this queue. Although it doesn't seem super interesting from the CS perspective, in practice it seems to be outperforming pretty much everything else as a general-purpose unbounded MPMC queue.
I haven't tried with that many cores yet, let's do this!
So I'm thinking about writing a comprehensive benchmark suite for queues similar to
There are scenarios with non-blocking operations:
There are scenarios with blocking operations (we spin+yield when the queue is full or empty):
We do this for a bunch of different Ts (number of threads), ranging from 1 to 100 or so.
A few notes about bounded queues:
How does this plan sound? What do benchmarks in published papers on concurrent queues do? Should we do something differently?
Also, what should the message type be? I'm thinking maybe it should be of type
I just played with benchmarks on 24-core machine for a few hours. Here are some conclusions:
291: Rewrite SegQueue for better performance r=stjepang a=stjepang The implementation of `SegQueue<T>` is completely rewritten and is based on https://github.com/stjepang/queue, which provides notably better performance. This one doesn't use `crossbeam-epoch` for memory reclamation, which means we don't have to pin and execute a full fence on every operation. For more information on how this queue works, see: * stjepang/queue#1 * #279 (comment) One new addition in this PR is the `SegQueue::len()` method. Benchmarks before: ``` unbounded_mpmc Rust segqueue 0.336 sec unbounded_mpsc Rust segqueue 0.261 sec unbounded_seq Rust segqueue 0.306 sec unbounded_spsc Rust segqueue 0.201 sec ``` Benchmarks after: ``` unbounded_mpmc Rust segqueue 0.186 sec unbounded_mpsc Rust segqueue 0.206 sec unbounded_seq Rust segqueue 0.241 sec unbounded_spsc Rust segqueue 0.115 sec ``` Co-authored-by: Stjepan Glavina <email@example.com>