Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a bounded SPSC queue #338

Open
wants to merge 10 commits into
base: master
from
Open

Add a bounded SPSC queue #338

wants to merge 10 commits into from

Conversation

@stjepang
Copy link
Member

stjepang commented Feb 28, 2019

Inspired by the Rust 2019: Rust Audio thread, I decided to add a high-performance wait-free bounded SPSC queue. This is the most common queue type in audio programming and it's very important it's wait-free and as fast as possible.

cc @Engid @raphlinus Pinging you if interested.

Simple benchmarks (ArrayQueue vs SegQueue vs spsc, smaller is better):

bounded_seq               Rust ArrayQueue   0.176 sec
bounded_spsc              Rust ArrayQueue   0.104 sec

unbounded_seq             Rust SegQueue     0.236 sec
unbounded_spsc            Rust SegQueue     0.141 sec

bounded_seq               Rust spsc         0.031 sec
bounded_spsc              Rust spsc         0.029 sec

stjepang added 2 commits Feb 28, 2019
@Engid

This comment has been minimized.

Copy link

Engid commented Feb 28, 2019

I cannot say thank you enough, but I'll try: Thank you!!!

stjepang added 2 commits Feb 28, 2019
@cianoc

This comment has been minimized.

Copy link

cianoc commented Mar 1, 2019

This is fantastic. Thanks!

stjepang added 2 commits Mar 1, 2019
@vertexclique

This comment has been minimized.

Copy link

vertexclique commented Mar 1, 2019

Thanks, that would be a way to go now 🎉

///
/// let (p, c) = spsc::<i32>(100);
/// ```
pub fn spsc<T>(cap: usize) -> (Producer<T>, Consumer<T>) {

This comment has been minimized.

Copy link
@lucab

lucab Mar 2, 2019

It looks like the input argument really is a NonZeroUsize.

This comment has been minimized.

Copy link
@stjepang

stjepang Mar 2, 2019

Author Member

True, but NonZeroUsize would be more appropriate for memory layout optimizations rather than ad-hoc uses like this one. There's little advantage of spsc(NonZeroUsize::new(n).unwrap()) over spsc(n) :)

This comment has been minimized.

Copy link
@lucab

lucab Mar 12, 2019

For completeness, the rationale for my comment above was to remove the runtime-panic behavior for something that can be encoded at type-level. I'm fine with your call if you think it is not worth it for ergonomics.

/// assert_eq!(c.pop(), Ok(10));
/// assert_eq!(c.pop(), Err(PopError));
/// ```
pub fn pop(&self) -> Result<T, PopError> {

This comment has been minimized.

Copy link
@lucab

lucab Mar 2, 2019

From a consumer point-of-view, why not at an Option<T>?

This comment has been minimized.

Copy link
@stjepang

stjepang Mar 2, 2019

Author Member

Because push returns a Result<(), PushError<T>> so I'm returning a Result here for consistency. But otherwise I agree, Option<T> would be fine as well.

impl<T> Producer<T> {
/// Attempts to push an element into the queue.
///
/// If the queue is full, the element is returned back as an error.

This comment has been minimized.

Copy link
@lucab

lucab Mar 2, 2019

This may sound silly, but I'd be happy if this behavior-when-full would be stated a bit more prominently in the module docs.

This comment has been minimized.

Copy link
@stjepang

stjepang Mar 2, 2019

Author Member

Sorry, not sure what you mean exactly. Would you be happy if the first sentence said the following?

/// Attempts to push an element into the queue, returning it back on failure.

This comment has been minimized.

Copy link
@lucab

lucab Mar 12, 2019

Sorry if I was unclear, I meant for the whole spsc module (rustdoc and/or README). Something like "a bounded SPSC queue that allocates a fixed-capacity buffer on construction, preventing insertions when full".

tail: Cell<usize>,
}

unsafe impl<T: Send> Send for Producer<T> {}

This comment has been minimized.

Copy link
@matklad

matklad Mar 10, 2019

Contributor

Would making just the Inner itself Send + Sync work? That way, you'll need only one impl.

This comment has been minimized.

Copy link
@stjepang

stjepang Mar 10, 2019

Author Member

I think that would work too, yeah. Although, I personally prefer specifying these kinds of invariants on the API boundaries rather than inside the implementation and hoping auto traits work as intended... :)

This comment has been minimized.

Copy link
@matklad

matklad Mar 10, 2019

Contributor

Leveraging auto-traits is in theory a good things because it helps to catch bugs. Like, if you add Rc<T> to Produce in the future, the auto-trait approach will catch this (as in, Producer wouldn't be Send), while manual usnafe impl will lead to unsound code. For simple cases it doesn't matter though.

This comment has been minimized.

Copy link
@stjepang

stjepang Mar 10, 2019

Author Member

That's a good point, but it's a double-edged sword. Relying on auto-traits can be risky when all of the following is true:

  • Auto-traits are manually implemented for private types.
  • Auto-traits are automatically inferred for public types.
  • There's a layer of unsafe code between private and public types.

The problem is that the layer of unsafe code can break invariants inferred from auto-traits implemented on private types.

Here's an example. If we relied on Inner: Send + Sync and replaced all Cell<usize>s with AtomicUsizes, then Sync would be inferred for Producer and Consumer, which would be incorrect and could cause data races. That's because the unsafe code in Producer and Consumer already assumes that they don't implement Sync.

So maybe auto-traits are good only when manually implemented on the innermost layer of safe code? But in any case, I think this queue is simple enough that we'd be fine either way.

/// assert_eq!(p.push(10), Ok(()));
/// assert_eq!(p.push(20), Err(PushError(20)));
/// ```
pub fn push(&self, value: T) -> Result<(), PushError<T>> {

This comment has been minimized.

Copy link
@matklad

matklad Mar 10, 2019

Contributor

Given sp nature of this thing, I'd expect &mut self here, but I guess it doesn't really matter and makes API more flexible?

This comment has been minimized.

Copy link
@stjepang

stjepang Mar 10, 2019

Author Member

Exactly - I just made an effort to avoid requiring &mut self since mutability can be annoying to deal with at times.

@alexbool

This comment has been minimized.

Copy link
Contributor

alexbool commented Apr 13, 2019

Great work! What's the status on this?

@mgeier

This comment has been minimized.

Copy link

mgeier commented Apr 13, 2019

The two SPSC ringbuffer implementations I know have a way to read/write multiple "items" at a time (and a way of checking how much space is there to read/write). Would it be feasible to add such a feature to this implementation?

The main use case would be if you have a ringbuffer of audio samples (e.g. T = f32) you can read/write blocks of different lengths from/to the ringbuffer (and you don't need a separate pop()/push() operation for each single audio sample).

Here's two examples:

The JACK ringbuffer (docs, code) has jack_ringbuffer_read_space() and jack_ringbuffer_write_space() to check the available size and jack_ringbuffer_get_read_vector() and jack_ringbuffer_get_write_vector() to get pointers to memory where data can be read from or written into.

The PortAudio ringbuffer (docs, code) has PaUtil_GetRingBufferReadAvailable(), PaUtil_GetRingBufferWriteAvailable(), PaUtil_GetRingBufferReadRegions() and PaUtil_GetRingBufferWriteRegions() with pretty much the same functionality.

I guess instead of (unsafe) raw pointers it would make more sense to provide (safe) &[T] slices for reading and &mut [T] for writing.

@stjepang

This comment has been minimized.

Copy link
Member Author

stjepang commented Apr 20, 2019

@alexbool The PR is blocked mostly by the fact that I'm not sure what changes to anticipate in the future and how to structure the modules.

Right now I'm thinking we should perhaps move the SPSC queue (types Producer and Consumer) into a submodule array::spsc. Then, later in the future, we might add SPMC and MPSC versions of array-based and segment-based queues too.

So the idea is to structure the crate like this:

mod array {
    mod spsc {
        struct Producer<T>;
        struct Consumer<T>;
        fn new(cap: usize) -> (Producer<T>, Consumer<T>);
    }

    mod mpsc {
        // ...
    }

    mod spmc {
        // ...
    }

    struct ArrayQueue<T>; // mpmc version
}

mod seg {
    mod spsc {
        struct Producer<T>;
        struct Consumer<T>;
        fn new() -> (Producer<T>, Consumer<T>);
    }

    mod mpsc {
        // ...
    }

    mod spmc {
        // ...
    }

    struct SegQueue<T>; // mpmc version
}

use array::ArrayQueue;
use seg::SegQueue;

How does that look?

@stjepang

This comment has been minimized.

Copy link
Member Author

stjepang commented Apr 20, 2019

@mgeier Interesting!

I wonder how should we expose a safe interface for that in Rust. Note that we can't just return &[T] and &mut [T] for reading and writing - there are multiple problems with that:

  • Reading and writing from/into uninitialized memory is possible.
  • There is a risk of data races because we could read from &[T] while the producer thread is writing into the slice.

It's a bit tricky, but we can probably figure out a reasonable interface...

Another option is to expose methods for batched reads and writes, e.g.:

impl<T> Consumer<T> {
    fn read_batch(&self, dest: &mut Vec<T>);
}

impl<T> Producer<T> {
    fn write_batch(&self, src: &mut Vec<T>);
}
@mgeier

This comment has been minimized.

Copy link

mgeier commented Apr 22, 2019

@stjepang I'm a total Rust beginner and I'm not at all an expert on lock-free data structures. Please bear that in mind!

I didn't think about uninitialized memory. The implementations I was mentioning just operate on plain bytes of memory, I guess that's not a serious approach for Rust, right?

Would it be possible to somehow default-initialize the whole underlying Vec<T>?

I guess it would be fine if this feature were only available for a somewhat "trivial" subset of types T. Mostly types like f32 would be interesting.

There is a risk of data races because we could read from &[T] while the producer thread is writing into the slice.

If a user requests a "read slice", this memory area must of course be "blocked" for writing (and for other readers as well, but it is a "single reader" queue anyway).

In the JACK example, the function jack_ringbuffer_get_read_vector() provides access to all "items" that can currently be read from the ringbuffer. But until jack_ringbuffer_read_advance() is called, no writes are possible to this area of memory (but there may still be some other memory left to write to).

In the PortAudio example, callers of the function PaUtil_GetRingBufferReadRegions() can specify how many "items" they want to be reading (but they may actually get less). Those "items" are not available for writing until PaUtil_AdvanceRingBufferReadIndex() is called.

I would hope that with the lifetimes of Rust slices, it would not be necessary to call a separate function when finished accessing the data. But of course I have no idea if this would actually work.

Another option is to expose methods for batched reads and writes

Do read_batch() and write_batch() really need Vec in their API?
Wouldn't that also work with slices?

Anyway, this may force an unnecessary copy of data in some cases, because you need the data to live in some contiguous memory before/after the write/read operation.

If you have e.g. strided data (e.g. interleaved audio channels), you'll have to make an additional copy.
OTOH, if you get a &mut [T], you could directly write into it from your strided values.

UPDATE:

The Rust bindings for JACK provide access to its ring buffer. They use this API for writing multiple "items" at once:

fn get_vector<'a>(&'a mut self) -> (&'a mut [u8], &'a mut [u8])

See https://docs.rs/jack/0.6.0/jack/struct.RingBufferWriter.html#method.get_vector

@stjepang

This comment has been minimized.

Copy link
Member Author

stjepang commented Apr 22, 2019

@mgeier Here's another try - how about the following interface?

impl<T> Consumer<T> {
    fn pop_batch(&mut self) -> PopBatch<'_, T>;
}

impl<T> Producer<T> {
    fn push_batch(&mut self) -> PushBatch<'_, T>;
}

struct PopBatch<'a, T>;

impl<T> PopBatch<'_, T> {
    fn len(&self) -> usize;
    fn pop(&self) -> Result<T, PopError>;
}

struct PushBatch<'a, T>;

impl<T> PushBatch<'_, T> {
    fn len(&self) -> usize;
    fn push(&self) -> Result<(), PushError<T>>;
}

The idea is that we start a batch by calling pop_batch() or push_batch(), then do a number of pop/push operations, and finally drop the PopBatch/PushBatch type to update the atomic indices inside the queue.

This way each individual pop/push operation in a batch will only do a single memcpy to transfer the value, whereas each regular non-batched pop/push operation updates the atomic indices every single time.

Would this work for you? Or is it really important that you're able to directly access contiguous slices of memory inside the queue?

@stjepang

This comment has been minimized.

Copy link
Member Author

stjepang commented Apr 22, 2019

Actually, we could then also add the ability for the batches to access direct slices of memory when T is a trivial type (i.e. it implements Copy) and zero-initializable:

impl<T: Copy + Zeroable> PopBatch<'_, T> {
    fn as_slice(&self) -> &[T];
    fn advance(&mut self, n: usize);
}

impl<T: Copy + Zeroable> PushBatch<'_, T> {
    fn as_slice(&mut self) -> &mut [T];
    fn advance(&mut self, n: usize);
}

For example, in order to push a bunch of elements, we would call push_batch(), grab a slice into the queue with as_slice(), then write data into the slice, and call advance(n) to move the write index by n positions forward.

stjepang added 2 commits Apr 22, 2019
@mgeier

This comment has been minimized.

Copy link

mgeier commented Apr 23, 2019

Thanks @stjepang, this looks very promising!

But we would still need two slices, right?

I guess something like (probably using the plural as_slices()?):

fn as_slice(&mut self) -> (&mut [T], &mut [T]);

I don't think that the additional push() and pop() methods of PushBatch and PopBatch objects are really needed. At least I cannot see a real use case. I think such a thing should only be provided if there is need for it in the community.

If anything, I think some kind of a "peek" functionality would be more interesting. But for the cases where the "slices" functionality is available, a separate "peek" method is not necessary, because we can simply "peek" into the slices.

and finally drop the PopBatch/PushBatch type to update the atomic indices inside the queue.

Assuming that len(), push() and pop() are not really needed, do we actually need those separate types?

What if advance() directly updates the atomic indices?

I guess it wouldn't make too much sense to call advance() multiple times (without updating the indices) and then update the indices when dropping PopBatch/PushBatch. Also, calling advance() (without updating the indices) and afterwards as_slice() again doesn't make much sense, either, right?

When I hear advance(), I expect some atomic operations to happen ...

If this works:

{
    let mut mybatch = myqueue.push_batch();
    let (slice1, slice2) = mybatch.as_slice();
    // write to slices
    mybatch.advance(some_amount);
}

... shouldn't something like this also work?

let (slice1, slice2) = myqueue.write_slices();
// write to slices
myqueue.advance_write_index(some_amount);

(I probably missed a mut somewhere)

Or is it really important that you're able to directly access contiguous slices of memory inside the queue?

Well I don't know if it would be that important for me personally. But if we want to convince low-level audio programmers to switch to Rust, we should try to provide library constructs with as little overhead as possible. Having to make unnecessary copies of audio data is not a good selling point!

@stjepang

This comment has been minimized.

Copy link
Member Author

stjepang commented Apr 23, 2019

@mgeier Thanks for your patience, I really appreciate it!

After thinking a bit more about it, I believe we could implement the following API:

impl<T: Copy + Default> Producer<T> {
    fn write_slices(&mut self) -> (&mut [T], &mut [T]);
    fn advance(&self, n: usize);
}

impl<T: Copy> Consumer<T> {
    fn read_slices(&self) -> (&[T], &[T]);
    fn advance(&self, n: usize);
}

Two tricks that are necessary for ensuring safety:

  1. The two advance() functions will check whether n is valid, i.e. they will make sure the index doesn't go too far (or else panic occurs).

  2. Function write_slices() will initialize the slices with T::default() values if they would otherwise hold uninitialized data. Note that if they hold some data that has been written and subsequently read, we don't have to do anything.

Neither of these two should really be a performance hit since they can be optimized away very well.

Would you be happy with this API?


Alternative 1: write_slices() could return slices of type &mut [MaybeUninit<T>], which would totally sidestep the problem of uninitialized memory. But then users of the API would have to deal with MaybeUninit, which could be a little annoying.

Alternative 2: It would also be possible for write_slices() to return ((*mut T, usize), (*mut T, usize)), but I suppose you'd rather not have the users of the API be forced to write unsafe code.

@mgeier

This comment has been minimized.

Copy link

mgeier commented Apr 23, 2019

Would you be happy with this API?

Yes, thanks, that looks very nice!

Please excuse my ignorance, but I have several more questions:

One open question for me is whether there should be additional convenience functions. Getting those pairs of slices is the most important functionality because it is the lowest-level interface. But probably there should also be convenience functions like the read_batch() and write_batch() functions you suggested above?
I'm definitely biased, because the JACK and PortAudio ringbuffers have such functions as well.
Probably it's not necessary to add more functions until somebody requests them?

As I said, I'm a Rust beginner, so I cannot really tell which interface makes sense and which doesn't.
Would it be feasible to use iterators on top of the suggested API?
Something like creating a mutable iterator for each of the returned slices, chaining the two and then assigning values via iteration?
Or should there be a separate API for iterators?

I'm also not sure if the functions read_available() and write_available() are missing or not.
Of course, one could call read_slices() and check their combined length, but probably a separate function for that would simplify things?
I guess it is a common use case to regularly check if the ringbuffer has enough data available and only get the slices when there's enough data.

Another open question is the T::default() thing you described above.
It seems strange to me that write_slices() would sometimes call T::default() and sometimes it wouldn't.
I'd like to believe you when you say that this can be optimized away, but it makes me feel a bit uneasy.
Wouldn't it be possible to initialize the whole memory in the constructor?
I guess if only push() is used, the memory doesn't have to be initialized and default-initialization would cause unwarranted overhead in that case, right?

TBH, I didn't know MaybeUninit and still don't really understand it. But I don't think it would simplify the usage, right? And it must have a performance penalty, right?

Finally, I'm wondering whether the push()/pop() and the write_slices()/read_slices() APIs will ever be used on the same object. It seems to me that I would want to use either one or the other API, depending on whether I want to use chunks of (audio) data or some higher-level user-defined structs.
Would it make sense to have two different spsc types (which would share a large part of the implementation)?

@stjepang

This comment has been minimized.

Copy link
Member Author

stjepang commented Apr 23, 2019

Would it be feasible to use iterators on top of the suggested API?
Something like creating a mutable iterator for each of the returned slices, chaining the two and then assigning values via iteration?

You could do that with the standard iterator API:

let (s1, s2) = p.write_slices();
let iter = s1.iter_mut().chain(s2.iter_mut());

for slot in iter {
    // do something
}

p.advance(/* some number */);

Or with a slightly different API, similar to one of those I proposed before:

let mut batch = p.batch();

for v in my_values_to_push {
    batch.push(v);
}

drop(batch); // advance the write_index by the number of pushed items

I'm also not sure if the functions read_available() and write_available() are missing or not.
Of course, one could call read_slices() and check their combined length, but probably a separate function for that would simplify things?

I just assumed those functions are not needed because you can compute them from .len() and .capacity(). But then again, it doesn't hurt to also have .read_available() and .write_available(). So maybe we should add them, too. Good idea!

It seems strange to me that write_slices() would sometimes call T::default() and sometimes it wouldn't.
I'd like to believe you when you say that this can be optimized away, but it makes me feel a bit uneasy.
Wouldn't it be possible to initialize the whole memory in the constructor?

We would zero-initialize the whole buffer inside the constructor. Note that we can't initialize with T::default() because the constructor doesn't require T: Default. In order to be able to initialize with the default value, we'd need a special constructor that does require T: Default.

Fortunately, almost every piece of data that implements Copy is zero-initializable. We can check whether T::default() really does return a zeroed value of type T, and if so, write_slices() doesn't have to do anything because the whole buffer was zeroed in the constructor. This check will happen during compile time because the compiler is sufficiently smart.

If T::default() doesn't return a zeroed value (and we don't have the special constructor that requires T: Default), then it gets tricky. We only need to set the slots in slices (returned by write_slices()) to T::default() if they've never been written to before. You can think of this as lazy initialization of the buffer - instead of doing it during construction, we do it lazily on write_slices(). Also note that if you call write_slices() twice in succession, only the first call initializes memory.

So yeah, it's really just a bunch of boring and pedantic safety stuff. But the overall effect on performance is zero in 99% cases and minimal in the remaining 1% cases. So no big deal.

TBH, I didn't know MaybeUninit and still don't really understand it. But I don't think it would simplify the usage, right? And it must have a performance penalty, right?

MaybeUninit<T> has literally zero cost and is just a simple wrapper type indicating that the piece of T may be uninitialized. So if you want to read from it, you must open an unsafe {} block and call .read() on it, promising you're 100% sure the data inside is initialized.

Fortunately, writing into an MaybeUninit<T> is perfectly safe. And that's great because we don't want to read anything from [MaybeUninit<T>] slices returned by write_slices() anyway. We only want to write into them!

Again, more boring pedantic stuff. We're just making sure that if the user really attempts to do something silly like read from slices returned by write_slices(), nothing explodes. :)

Finally, I'm wondering whether the push()/pop() and the write_slices()/read_slices() APIs will ever be used on the same object. It seems to me that I would want to use either one or the other API, depending on whether I want to use chunks of (audio) data or some higher-level user-defined structs.
Would it make sense to have two different spsc types (which would share a large part of the implementation)?

It's probably true that one would either use single-value or batched operations. But does it hurt if we support both? So if we support both, push()/pop() will always be available, while write_slices()/read_slices() will be available only when T: Copy.

My feeling is that it makes sense to have different types only if supporting both types of operations simultaneously somehow hurts performance or increases the risk of bugs. But that doesn't seem true to me. Is my thinking correct here?

@mgeier

This comment has been minimized.

Copy link

mgeier commented Apr 24, 2019

Thanks for the answers, I think I'm starting to understand!
And thanks for the iterator examples! Those look very nice, so I think we don't need any additional API for that.

So writing to MaybeUninit<T> is safe and without overhead, but users still have to use the .write() method all the time, right?

But I think the bigger problem is that we cannot force users to .write() to all the items before they advance(), so we would still have to check for "unwritten" items and call T::default() before being able to read from them, right?
If I understand correctly, we therefore don't really gain anything from MaybeUninit<T> (except a worse user experience).

It is good to know that types that have a default of all-zero-bits are recognized at compile time and don't need a T::default() call at runtime (this should probably be mentioned in the docs!).
I guess that's the case for the floating-point types f32 and f64, so I see no problem in requiring Default and I think we don't need a special constructor with T: Default.

I still have yet another question about the Consumer assuming some non-trivial user-defined type:
Why exactly does the Consumer need Copy?
AFAIK, Copy implies the absence of Drop, therefore Copy would ensure that no destructors have to be called in advance(), right?
Are there other reasons for Copy in advance()?
And, more importantly, does read_slices() even need Copy?
If not, read_slices() could even be used for non-trivial types to get a "peek" functionality (which may be helpful even if advance() is not available).

I just assumed those functions are not needed because you can compute them from .len() and .capacity(). But then again, it doesn't hurt to also have .read_available() and .write_available().

Oh, I didn't even see the .len() method in there.

I think .len() and .capacity() are completely irrelevant from a user's perspective and should be removed from the public API. Users should only care how much space there is for reading and writing.

The methods .is_empty() and .is_full() are not strictly necessary if we get .read_available() and .write_available(), but they may still be nice for convenience (and they might even be implemented slightly more efficiently?).

But Producer::is_empty() and Consumer::is_full() are superfluous, right?

But does it hurt if we support both?

It might confuse newcomers.

I was previously thinking that we would need different constructors to avoid the Default requirement. But you've cleared that up and it looks like Default is not as bad as I thought and we don't need different constructors. So there's also no real need to separate the two APIs.

My feeling is that it makes sense to have different types only if supporting both types of operations simultaneously somehow hurts performance or increases the risk of bugs. But that doesn't seem true to me. Is my thinking correct here?

AFAICT, there's no problem here.
I think it would be perfectly safe to mix the two APIs, I just think there would be no real use case for that.
And other than an increased API surface and probably the need for slightly more documentation, I don't see any downsides.

And if it is possible to have read_slices() even for non-Copy types it would actually make no sense to separate the two APIs.

@stjepang

This comment has been minimized.

Copy link
Member Author

stjepang commented Apr 24, 2019

So writing to MaybeUninit<T> is safe and without overhead, but users still have to use the .write() method all the time, right?

Correct.

But I think the bigger problem is that we cannot force users to .write() to all the items before they advance(), so we would still have to check for "unwritten" items and call T::default() before being able to read from them, right?
If I understand correctly, we therefore don't really gain anything from MaybeUninit<T> (except a worse user experience).

Oh, I forgot about that! You're totally right.

Why exactly does the Consumer need Copy?

Because otherwise we can't "take" items from the slices returned by read_slices(). Suppose you have a &[T] and T is not Copy. You can't do let x = slice[0] to take a value from the slot, that won't compile. You might instead clone the value from the slot with let x = slice[0].clone() if T is Clone. But sometimes cloning is expensive.

An alternative to that would be to put every T inside an Option so that read_slices() returns slices of type &mut [Option<T>]. Then you can do let x = slice[0].take().unwrap() to take a value from a slot. But using options like that incurs time and space overhead, which you probably don't want.

But Producer::is_empty() and Consumer::is_full() are superfluous, right?

They exist in case someone needs them. shrug Lots of collection types in Rust have methods .len(), .capacity(), .is_empty(), and .is_full() so I included them here, too. It's just a common idiom.

I think it would be perfectly safe to mix the two APIs, I just think there would be no real use case for that.

I'm thinking maybe it's good to have push() and pop() because they're a bit easier to use. Like, one might want to use them to quickly code up a prototype and then later on switch to batched reads/writes for that extra performance.

@mgeier

This comment has been minimized.

Copy link

mgeier commented Apr 25, 2019

You can't do let x = slice[0] to take a value from the slot, that won't compile.

Sure, but isn't that the user's problem for trying to move a non-Copy thing out of a (borrowed) slice?

You might instead clone the value from the slot with let x = slice[0].clone() if T is Clone.

Sure, but what I would be more interested in (given some non-trivial user-defined type) is taking a reference let x = &slice[0], which should work for unconstrained T, right?

This way I could do something like:

let x = &slice[0]
if x.some_attribute > 42 {
    let x = consumer.pop();
    // do something with x
} else {
    // don't pop()
}
// don't call consumer.advance()!

AFAICT, getting a reference to a non-Copy element in the slice should work. Is there another reason for requiring Copy?

BTW, this would be an example where the two APIs would be mixed. And I think this would be a somewhat reasonable scenario.

But using options like that incurs time and space overhead, which you probably don't want.

No, I would definitely not want to be forced to use Option<T> in this situation.

Lots of collection types in Rust have methods .len(), .capacity(), .is_empty(), and .is_full() so I included them here, too.

As I said, I can see how .is_empty(), and .is_full() might be useful even if they are not strictly necessary.

It's just a common idiom.

Yeah, but the spsc queue isn't really a "typical" collection type.
It's more akin to a std::sync::mpsc::channel (which has none of those methods), isn't it?

I would go so far as to say that providing .capacity() might be reasonable (because we might have forgotten what we specified in the constructor), but .len() really doesn't make any sense (given that we have "read available" and "write available")!

BTW, calling the constructor argument cap is inconsistent with Vec::with_capacity(capacity: usize)!
But it looks like this mistake has been made before in crossbeam::queue::ArrayQueue and crossbeam::channel::bounded. So I guess it has to be cap to at least have internal consistency.

Like, one might want to use them to quickly code up a prototype and then later on switch to batched reads/writes for that extra performance.

You are right, this could happen.

@mgeier

This comment has been minimized.

Copy link

mgeier commented Apr 25, 2019

I forgot something: I still think Producer::is_empty() and Consumer::is_full() are superfluous. At best they are useless, but they may be actually confusing and are definitely a waste of space and time for everyone.

I've seen that crossbeam::channel::Sender::is_empty() and crossbeam::channel::Receiver::is_full() also exist. I also think it was a mistake to create those functions, but I understand if you want to have those for spsc for internal consistency, however useless they might be.

@stjepang

This comment has been minimized.

Copy link
Member Author

stjepang commented Apr 25, 2019

Sure, but isn't that the user's problem for trying to move a non-Copy thing out of a (borrowed) slice?

Yeah, just wanted to point out that we'd either have to support taking out only Copy types or always wrap T inside Option. :)

AFAICT, getting a reference to a non-Copy element in the slice should work. Is there another reason for requiring Copy?

Oh, I see now. So you just want to read values by reference (i.e. borrow them) and not really take them out (i.e. "consume" them by value). I just kinda assumed you'd always want to consume the read values.

It's more akin to a std::sync::mpsc::channel (which has none of those methods), isn't it?

Heh, it does actually :) See Sender and Receiver.

But it looks like this mistake has been made before in crossbeam::queue::ArrayQueue and crossbeam::channel::bounded. So I guess it has to be cap to at least have internal consistency.

We should just rename those arguments to capacity then. The names only appear in the generated docs and are not really a part of the API anyway.

I forgot something: I still think Producer::is_empty() and Consumer::is_full() are superfluous. At best they are useless, but they may be actually confusing and are definitely a waste of space and time for everyone.

That's fair enough! I guess we could remove them then as they might be a net negative overall.

@mgeier

This comment has been minimized.

Copy link

mgeier commented Apr 25, 2019

Oh, I see now. So you just want to read values by reference and not really take them out (i.e. "consume" them by value). I just kinda assumed you'd always want to consume the read values.

Well, I want to consume them eventually.
But I might want to "peek" at some of the items beforehand to check if I want to consume them already or probably wait for more data.

I haven't actually used such a functionality, but I might at some point. The use case would be a message queue where I have special messages "start bundle" and "stop bundle". All other messages between those should happen at the same time (= in the same audio block). Whenever I "peek" a "start bundle", I check if I also find a "stop bundle" in the queue before I handle all messages. If there is no "stop bundle", I just wait until the next audio block. I'm not sure if such a thing would work reliably, for now it's just an idea ...

Having said all this, isn't it generally a good idea to keep the trait bounds to a minimum?

Is there a reason to add T: Copy if it's not actually necessary?

Heh, it does actually :)

No it doesn't!

You are talking about crossbeam_channel:

https://docs.rs/crossbeam-channel/0.3.8/crossbeam_channel/struct.Sender.html
https://docs.rs/crossbeam-channel/0.3.8/crossbeam_channel/struct.Receiver.html

I'm talking about std::sync::mpsc::channel:

https://doc.rust-lang.org/std/sync/mpsc/struct.Sender.html
https://doc.rust-lang.org/std/sync/mpsc/struct.Receiver.html

We should just rename those arguments then.

OK, that sounds good.

Interestingly, the names aren't even consistent in Rust itself: rust-lang/rust#60271

I guess we could remove them then as they might be a net negative overall.

If at a later point somebody actually needs them (which I doubt will ever happen), it should be easy to add them. Removing them later is probably not that easy.

@stjepang

This comment has been minimized.

Copy link
Member Author

stjepang commented Apr 25, 2019

I haven't actually used such a functionality, but I might at some point. The use case would be a message queue where I have special messages "start bundle" and "stop bundle".

Very interesting! I'll think about this use case some more - but pretty sure we can devise something to support it well.

Here's an idea. We could have two methods:

  • read_slices() that returns (&[T], &[T]) as usual.
  • pop_iter() that returns PopIter<T>, which implements Iterator<Item = T>. You can use it to take elements out one by one. The read index is finally updated once you stop iterating (i.e. drop PopIter<T>) so that only one atomic operation actually ocurrs under the hood.

No it doesn't!

You are talking about crossbeam_channel:

Sorry, my bad! FWIW, .len() and .capacity() are useful in crossbeam-channel because it allows Servo to estimate how much memory the channel is currently using.

@mgeier

This comment has been minimized.

Copy link

mgeier commented Apr 26, 2019

Your PopIter idea sounds really interesting! It looks like a more ergonomic version of your PopBatch::pop() idea above (#338 (comment)).

It may seem a bit asymmetric to have only Consumer::pop_iter() and nothing on the Producer side.
OTOH, iterating over the Producer can easily be implemented on top of the write_slices() function, as you have shown in #338 (comment). It would also be easy to create a "peek" iterator (with Item = &T) in user code. So I don't think it's really worth adding more iterators to the API.

FWIW, .len() and .capacity() are useful in crossbeam-channel because it allows Servo to estimate how much memory the channel is currently using.

As I said above, I think .capacity() might be OK (but I guess it will not be heavily used). But Servo doesn't need .len(), right?

AFAICT, .len() is utterly useless.

@stjepang

This comment has been minimized.

Copy link
Member Author

stjepang commented Apr 26, 2019

Okay, to summarize, the Consumer side has (we can discuss final naming choices later):

impl<T> Consumer<T> {
    fn read_available(&self) -> usize;
    fn read_slices(&self) -> (&[T], &[T]);

    fn pop(&mut self) -> Result<T, PopError>;
    fn pop_batch(&mut self) -> PopBatch<'_, T>;
}

struct PopBatch<'a, T>;

impl<T> PopBatch<'_, T> {
    fn read_available(&self) -> usize;
    fn pop(&mut self) -> Result<T, PopError>;
}

impl<T> Iterator for PopBatch<'_, T> {
    type Output = T;
}

The producer side is analogous, except we don't implement Iterator for PushBatch and don't have write_slices() due to problems with uninitialized memory:

impl<T> Producer<T> {
    fn write_available(&self) -> usize;

    fn push(&mut self, val: T) -> Result<(), PushError<T>>;
    fn push_batch(&mut self) -> PushBatch<'_, T>;
}

struct PushBatch<'a, T>;

impl<T> PushBatch<'_, T> {
    fn write_available(&self) -> usize;
    fn push(&mut self, val: T) -> Result<(), PushError<T>>;
}
@mgeier

This comment has been minimized.

Copy link

mgeier commented Apr 26, 2019

we [...] don't have write_slices() due to problems with uninitialized memory

Based on your comments above, I thought those problems could be solved!

I was expecting something like this (which you suggested in #338 (comment)):

impl<T: Copy + Default> Producer<T> {
    fn write_slices(&mut self) -> (&mut [T], &mut [T]);
    fn advance(&self, n: usize);
}

And I'm missing advance() in the Consumer:

impl<T: Copy> Consumer<T> {
    fn advance(&self, n: usize);
}

The big (and I guess only) difference to your original suggestion is that Consumer::read_slices() doesn't need T: Copy.

I have no opinion on that, but your push()/pop() signatures seem to have changed from &self to &mut self, see also https://github.com/crossbeam-rs/crossbeam/pull/338/files#r264038883.

I was a bit confused by the seemingly unnecessary duplication of Consumer::read_available() and PopBatch::read_available(), but I think I understand it now.

Just to make sure my understanding is correct:

  • Between calling Consumer::pop_batch() and dropping the returned PopBatch object, it is not possible to call Consumer::read_available(). Therefore, we need an additional PopBatch::read_available().
  • For each successful call to PopBatch::pop(), the value returned from PopBatch::read_available() decreases, even though the atomic read index hasn't been changed yet.
  • When Consumer::pop_batch() is called, the atomic write index is cached and not checked anymore for subsequent PopBatch::pop() calls.
  • Having a single PopBatch object, I can switch between calling .pop() and iterating over it and calling .pop() again. The value returned from PopBatch::read_available() decreases each time.
  • In summary, a sequence of statements like this would work (and the commented ones wouldn't compile), assuming there are enough items available for reading:
    let n1 = consumer.read_available();
    let (slice1, slice2) = consumer.read_slices();
    // read some values from the slices (or not)
    // don't call consumer.advance()!
    let batch = consumer.read_batch();
    let n2 = batch.read_available();  // n1 == n2, unless someone has written to the Producer
    // wait a bit
    let n3 = batch.read_available();  // n2 == n3, even if someone has written to the Producer
    let x = batch.pop();
    let n4 = batch.read_available();  // n4 == (n3 - 1), even if someone has written to the Producer
    let x = batch.next();
    let n5 = batch.read_available();  // n5 == (n4 - 1), even if someone has written to the Producer
    
    //let n = consumer.read_available();  // Compiler error
    //let (slice1, slice2) = consumer.read_slices();  // Compiler error
    //let x = consumer.pop();  // Compiler error

Finally, I think your Iterator implementation above is a bit off. If I'm not mistaken, it should be something like this:

impl<T> Iterator for PopBatch<'_, T> {
    type Item = T;
    fn next(&mut self) -> Option<Self::Item>;
}
@stjepang

This comment has been minimized.

Copy link
Member Author

stjepang commented Apr 26, 2019

Based on your comments above, I thought those problems could be solved!

Ugh, sorry! Yeah, with Copy and Default we can have write_slices().

I've just created a new document where we can write the code together and add comments. Might be easier than going back and forth on Github.

In this PR, I'll just add the minimal possible interface (push/pop only) and then we can follow up with a new PR that adds more features to it.

Finally, I think your Iterator implementation above is a bit off.

I'm just omitting fn next for brevity because it doesn't add any information ;)

@stjepang

This comment has been minimized.

Copy link
Member Author

stjepang commented Apr 26, 2019

I have no opinion on that, but your push()/pop() signatures seem to have changed from &self to &mut self, see also https://github.com/crossbeam-rs/crossbeam/pull/338/files#r264038883.

If we don't make pop() take &mut self, then one could do the following:

  1. Grab slices using read_slices().
  2. Call pop() a couple times.
  3. Read invalid data from the slices retrieved in the first step.

But if we make pop() take a &mut self, the borrow checker will complain.

If method read_slices() didn't exist, then it'd be totally okay to take a &self. Alternatively, we can make read_slices() take a &mut self so that pop() doesn't have to. But I find the inverse a cleaner solution.

Between calling Consumer::pop_batch() and dropping the returned PopBatch object, it is not possible to call Consumer::read_available(). Therefore, we need an additional PopBatch::read_available().

I agree.

For each successful call to PopBatch::pop(), the value returned from PopBatch::read_available() decreases, even though the atomic read index hasn't been changed yet.

Correct.

When Consumer::pop_batch() is called, the atomic write index is cached and not checked anymore for subsequent PopBatch::pop() calls.

Yes.

Having a single PopBatch object, I can switch between calling .pop() and iterating over it and calling .pop() again. The value returned from PopBatch::read_available() decreases each time.

Yes. By the way, it's worth mentioning that PopBatch mutably borrows Consumer so you can't call pop() while there's an active PopBatch.

@mgeier

This comment has been minimized.

Copy link

mgeier commented Apr 26, 2019

I've just created a new document

Thanks, it looks great!

I can't wait to try out this API!

In this PR, I'll just add the minimal possible interface (push/pop only) and then we can follow up with a new PR that adds more features to it.

OK, cool.

And thanks for the explanation about &mut self, that makes sense!

@stjepang

This comment has been minimized.

Copy link
Member Author

stjepang commented Apr 27, 2019

All right, I have slimmed down the API (removed is_empty(), is_full(), len()) and will leave the PR open so that everyone can take one final look.

After that I'll follow up with several PRs introducing the new methods we have discussed.

@mgeier By the way, in case you live in Berlin, I'll be tomorrow in co.up coworking space in Adalbertstraße. :)

Copy link
Contributor

jeehoonkang left a comment

@stjepang It seems awesome. Thanks.

/// The queue capacity.
cap: usize,

/// Indicates that dropping a `Buffer<T>` may drop elements of type `T`.

This comment has been minimized.

Copy link
@mgeier

mgeier Jun 7, 2019

Buffer -> Inner

@mgeier

This comment has been minimized.

Copy link

mgeier commented Jul 31, 2019

Sorry if that's a naive question: Is it possible to make the ring buffer UnwindSafe (or RefUnwindSafe?)?

I would like to use it in a FFI function within catch_unwind() and I get this error:

error[E0277]: the type `std::cell::UnsafeCell<usize>` may contain interior mutability and a reference may not be safely transferrable across a catch_unwind boundary
@mgeier

This comment has been minimized.

Copy link

mgeier commented Sep 12, 2019

In case somebody is interested in a use case for the SPSC ring buffer, I'm using it in https://github.com/AudioSceneDescriptionFormat/asdf-rust/blob/master/src/streamer.rs.

In this project I need ring buffers for both use cases:

  • plain old data (with Copy), in my case f32 audio samples
  • non-trivial objects (without Copy), in my case the consumer-end of another ring buffer

Since the "slices" API is not yet available, I couldn't directly implement the first case and had to come up with a somewhat complicated work-around.

What about merging this PR and moving forward with the "slices" API?

What is the timeline of the previously discussed re-structuring of the modules/structs/functions in crossbeam::queue?

If the re-structuring takes some more time, what about merging this PR now and doing the re-structuring later together with everything else?

AFAICT we've reached consensus regarding the names of most functions and structs in this PR, the "only" open questions are with regards to the module hierarchy and the "constructor" functions, right?

@uklotzde

This comment has been minimized.

Copy link

uklotzde commented Oct 13, 2019

An extension of this work enhanced with batch operations on slices can be found here:

https://github.com/uklotzde/crossbeam/tree/spsc2

Some background: I'm a developer of the Mixxx team. Recently I came up with a new SPSC queue implementation to replace our naive C++ implementation and the error-prone PortAudio C code. The enthusiasm for my initiative was limited, so I decided to abandon the C++ work. Since I didn't find a versatile Rust implementation I ported and extended my code.

Publishing another implementation in a separate crate would further fragment the ecosystem and crossbeam seems to be the right place where those concurrency tools should reside.

I've tried to closely follow the existing design in this PR with some notable exceptions:

  • The names of operations where an error result is not considered exceptional behavior start with try_, i.e. try_push() and try_pop(). This is consistent with the mpsc facilities in std.
  • The error results should explicitly express their intention by using an enum with corresponding (tuple) variants. Even if only a single variant exists.
  • Completely hiding mutation between behind interior mutability is disputable. Modifying the contents of the queue should require exclusive access, i.e. &mut self.
  • The idea to add cache pads to the buffer upon allocation is borrowed from here: https://github.com/rigtorp/SPSCQueue. Possible extension: Instead of only requiring a single empty slot between writer and reader one could reserve multiple slots occupying a whole cache line. This would prevent false sharing in situations when the queue is almost full. But it cannot prevent false sharing in the opposite situation when the queue is almost empty. Might not be worth the effort.
  • Both an allocating constructor with_capacity() (renamed) as well as a generic constructor new() is provided. The latter works for no_std and allows to provide a static or externally allocated buffer that is managed by the owner of the queue.
  • The raw batch operations in the generic producer/consumer are declared as unsafe. Imposing additional trait bounds on T as suggested in the preceding comments removes this burden.
  • I'm a strong advocate of extensively using debug assertions in code that could not be tested exhaustively. For detecting violations of invariants or for catching wrong implicit assumptions early and confidently during development.
  • This more versatile implementation is slightly slower when running the benchmarks, expect +10% total time. The benefits of the batch operations on slices don't count yet.

I didn't want to open a competing PR and instead are asking for your feedback here. It is still work in progress.

@@ -142,7 +142,7 @@ unsafe impl<T: Send> Send for SegQueue<T> {}
unsafe impl<T: Send> Sync for SegQueue<T> {}

impl<T> SegQueue<T> {
/// Creates a new unbounded queue.
/// Creates a unbounded queue.

This comment has been minimized.

Copy link
@jonas-schievink

jonas-schievink Nov 9, 2019

Suggested change
/// Creates a unbounded queue.
/// Creates an unbounded queue.
@GeorgeScrivener

This comment has been minimized.

Copy link

GeorgeScrivener commented Jan 25, 2020

Any update on this PR?
Looks like it'll be really useful for audio work. I'm happy to help if there's anything still required to get it in

@xacrimon

This comment has been minimized.

Copy link

xacrimon commented Feb 8, 2020

Yeah I really want this in. Updates would be nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

You can’t perform that action at this time.