Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce crossbeam-channel #22

Merged
3 commits merged into from Nov 26, 2017
Merged

Introduce crossbeam-channel #22

3 commits merged into from Nov 26, 2017

Conversation

ghost
Copy link

@ghost ghost commented Nov 9, 2017

Introduce crate crossbeam-channel, which aims to be an upgrade over std::sync::mpsc from the standard library in pretty much all aspects: features, convenience, and performance.

Rendered RFC
Prototype implementation
Selection macro
Benchmarks

Graphs

cc @jdm @SimonSapin @asajeffrey - we've chatted about this in #servo.
cc @alexcrichton @aturon @mgattozzi - talked about this at RustConf.
cc @BurntSushi - this might be a replacement for chan.
cc @insanitybit - finally! :)

```rust
pub mod select {
pub fn send<T>(tx: &Sender<T>, value: T) -> Result<(), T>;
pub fn recv<T>(rx: &Receiver<T>) -> Result<T, ()>;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super minor nit, but I'd use a ZST error type like struct NotReady; or something here, rather than ().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the symmetry between Result<(), T> and Result<T, ()>.

If recv returns a Result<T, SelectRecvError>, then send should probably similarly return a Result<(), SelectSendError<T>>. That means we'd have to regain ownership from a failed send operation like this:

if let Err(err) = sel.send(&tx, name) {
    // Sending failed. Regain ownership of the message.
    name = err.into_inner();
} else {
    // Success! The message was sent.
    break;
}

Calling .into_inner() is slightly more unergonomic, but it's not too bad. Possible alternatives could be .into() and .take(). Do you agree with this reasoning? :)

@ghost
Copy link
Author

ghost commented Nov 9, 2017

In the Reddit comment thread, /u/lurebat argues that we should have explicit Select structs instead of hidden thread-local magic.

I just don't like state I can't see. Usually when I call a free function with an argument the only thing I expect to modify is that argument. But here the function doesn't even take a mutable reference and it modified a hidden state that I can't even access. I feel that it's more rustic to go with the explicit option by default, and I don't see why not have it here.

I'm sympathetic to the argument.

Here's an example of selection using hidden thread-local state:

loop {
    if let Ok(msg) = select::recv(&rx1) {
        println!("{}", msg);
        break;
    }
    if let Ok(msg) = select::recv(&rx2) {
        println!("{}", msg);
        break;
    }
    if select::timeout(Duration::from_millis(100)) {
        break;
    }
}

And this is what it would like with an explicit Select struct:

let mut sel = Select::with_timeout(Duration::from_millis(100));
loop {
    if let Ok(msg) = sel.recv(&rx1) {
        println!("{}", msg);
        break;
    }
    if let Ok(msg) = sel.recv(&rx2) {
        println!("{}", msg);
        break;
    }
    if sel.timed_out() {
        break;
    }
}

We could require the sel variable to be mut or not - it doesn't really matter because we can use Cells anyway. But perhaps requiring mut makes it clearer that there is some kind of mutable state?

Note that this way we can also make timeouts much less magical.

Any thoughts? Please upvote if you think we should go with explicit Select struct, and downvote if you prefer hidden thread-local state.

@nikomatsakis
Copy link

nikomatsakis commented Nov 9, 2017

We could require the sel variable to be mut or not - it doesn't really matter because we can use Cells anyway. But perhaps requiring mut makes it clearer that there is some kind of mutable state?

One advantage to not requiring mutable is that people may want to put the Select struct into some sort of context that gets copied around. If you require mut, then they would need to use a RefCell or some such thing somewhat needlessly.

@aturon
Copy link

aturon commented Nov 9, 2017

I wish I had time to give this a proper review, but meantime, just wanted to say: thanks so much, @stjepang (and team), for what you've done for Crossbeam! I couldn't be happier.

@bstrie
Copy link

bstrie commented Nov 9, 2017

@stjepang I'm in favor of the explicit struct approach, though it seems as though we could also offer the implicit approach as well if people are worried about verbosity, yes?

As for the need to explicitly break, is there some possible simplification to be had in association with the catch RFC? Keep in mind that I am completely unclear on the current state of that RFC (both in design and implementation), but would you find something like the following a bit nicer?

catch {
    loop {
        sel.recv(&rx1, |msg|
            println!("{}", msg);
        )?;
        sel.timed_out()?;
    }
}

If nothing else, this would seem to have the benefit of warning the user if a ? is forgotten, via must_use.

@TimNN
Copy link

TimNN commented Nov 9, 2017

I'm personally in favor of the explicit struct approach as well, however I believe that a macro, which also enforces the loop structure, could work quite well and wouldn't be too complicated, for example:

https://gist.github.com/TimNN/669d030cc99e4dcefc2b5bc6203a7019

@droundy
Copy link

droundy commented Nov 9, 2017

I notice that the select algorithm as described (in the "Rendered" link) is not actually fair. Is there a strong performance reason not to go with a fair implementation? I would expect a fair implementation to either check just one option each time through the loop (instead of about half) or to check all the options in a random order at the beginning of the loop.

The unfairness would come around if you have a select loop with many options, and a pair of them are always triggered simultaneously. If the two that are triggered simultaneously are next to each other in the loop, then the first one will be chosen far more often than the second one. Maybe this is unimportant, and maybe the performance cost of making it fair is not worthwhile. But it seems worth investigating.

@ebarnard
Copy link

ebarnard commented Nov 9, 2017

To make the explicit struct approach a bit neater and remove the need for explicit breaks something like this might work.

If you have a mutable reference or are constructing on the spot:

for sel in Select::with_timeout(Duration::from_millis(100)) {
    if let Ok(msg) = sel.recv(&rx1) {
        println!("{}", msg);
    }
    if let Ok(msg) = sel.recv(&rx2) {
        println!("{}", msg);
    }
}

or if you needed to store a Select in a struct behind an & reference:

let select = Select::with_timeout(Duration::from_millis(100));

for sel in select.iter() {
    if let Ok(msg) = sel.recv(&rx1) {
        println!("{}", msg);
    }
    if let Ok(msg) = sel.recv(&rx2) {
        println!("{}", msg);
    }
}

@droundy
Copy link

droundy commented Nov 9, 2017

@ebarnard I like that idea. The fewer invariants the API forces the user to manage (in this case, break on success of any rule), the better!

@TimNN
Copy link

TimNN commented Nov 9, 2017

The for API looks nice, however has the disadvantage that you cannot return a value using break $something (unless Rust has changed recently and I didn‘t notice).

@gmorenz
Copy link

gmorenz commented Nov 9, 2017

Could we allow for both the for api, and the loop/break with struct API?

Alternatively you should still be able to wrap it in a function and use return to emulate break $something.

@ebarnard
Copy link

ebarnard commented Nov 9, 2017

@TimNN @gmorenz

You can return a value from a loop using a similar construction to before.

let mut select = Select::new()

let msg = 'select: loop {
    // Panics if you forget to break
    let sel = select.next().unwrap()

    for rx in &receivers {
        if let Ok(msg) = sel.recv(rx) {
            break 'select msg;
        }
    }
};

@ghost
Copy link
Author

ghost commented Nov 9, 2017

@bstrie

I'm in favor of the explicit struct approach, though it seems as though we could also offer the implicit approach as well if people are worried about verbosity, yes?

Well, I'd rather choose one or the other. Having two separate, but marginally different interfaces for doing the same thing is kinda silly, isn't it? :) Let's just go with the explicit struct approach.

As for the need to explicitly break, is there some possible simplification to be had in association with the catch RFC? Keep in mind that I am completely unclear on the current state of that RFC (both in design and implementation), but would you find something like the following a bit nicer?

Interesting idea! While it does look neat, I'm worried that it just sort of pushes the problem from one place to another - now we don't have explicit breaks, but still have to write catch and ? instead. Also, the control-flow graph now loses important information. See below.

@ebarnard

To make the explicit struct approach a bit neater and remove the need for explicit breaks something like this might work.

Very nice, but this way the control-flow graph loses information - it doesn't know that the body of a successful case is executed just once anymore.

Consider the following:

let output;
let mut sel = Select::new();
loop {
    if let Ok(msg) = sel.recv(&rx1) {
        output = msg.to_string();
        break;
    }
    if let Ok(msg) = sel.recv(&rx2) {
        output = msg.to_string();
        break;
    }
}
println!("{}", output);

This code compiles because Rust knows that output will be initialized exactly once. From the control-flow-graph, it can deduce that the body of each of the ifs will be executed at most once, and that exactly one if will succeed exactly once.

If you rewrite the same example using for sel in select.iter(), it will not compile.

@ebarnard
Copy link

ebarnard commented Nov 9, 2017

@stjepang
That's a shame, I hadn't thought about that. It's still possible to use the for based api with a loop if you want to return a value (see above).

However, in the case you only want to perform an operation on each received value, or there's a timeout, or some other fallible assignment the for loop approach is much nicer to use.

I suppose there's a cost to explaining both ways of using the select api though.

@ghost
Copy link
Author

ghost commented Nov 9, 2017

@nikomatsakis

One advantage to not requiring mutable is that people may want to put the Select struct into some sort of context that gets copied around. If you require mut, then they would need to use a RefCell or some such thing somewhat needlessly.

Good point! Although, I wonder why would someone attempt to store a Select anywhere else other than one line above the loop { .. }. Seems like an unusual and potentially dangerous thing to do.

Also, I'm feeling a bit uneasy about allowing to have multiple references to the same Select in two completely different places and then using them.

@TimNN

I believe that a macro, which also enforces the loop structure, could work quite well and wouldn't be too complicated, for example:

Looks pretty good! A question: would it be possible to allow optional commas between select cases? Putting an unexpected comma (or forgetting one) in macros can blow up with poor compilation error messages.

Other than that one, I don't see any quirks in the macro. Are there any I'm missing? :)

Also, we should probably also support the following syntax: send(tx, mut s) so that the returned value in Err(v) is reacquired (s = v).

@rpjohnst
Copy link

rpjohnst commented Nov 10, 2017

Instead of just one universal Sender/Receiver pair, there could be more types for different, better-optimized kinds of channels. For example, we could have unbounded::Sender/unbounded::Receiver, or spsc::Sender/spsc::Receiver, or maybe even oneshot::Sender/oneshot::Receiver.

I like this alternative quite a bit. It eliminates the redundant monomorphizations, it eliminates the runtime dispatch on flavor, and it allows for non-Sync variants that not only achieve the same performance as std::sync::mpsc, but enforce it at compile time.

It also doesn't preclude a dynamic wrapper that behaves the way the current prototype does, though that would potentially make things more confusing as it would provide multiple types for the same purpose.

@droundy
Copy link

droundy commented Nov 10, 2017

@rpjohnst The problem I see with having many different channel versions is that it vastly impedes the use of channels in library interfaces. A library author either needs to create N different versions of the API, or needs to decide which variant is required by their users. I expect the cost of the dynamic dispatch is negligible (testing would be wise), and you get a simple API that allows the users of libraries to patch them together however they like.

@rpjohnst
Copy link

rpjohnst commented Nov 10, 2017

It would of course also be possible to solve that problem by making Sender and Receiver traits, and letting libraries work with those.

@johncf
Copy link

johncf commented Nov 10, 2017

I don't understand why select has to have such complicated API. I have a much simpler proposal (which might be infeasible since I have no knowledge of the internals):

let select = rx1.join(&rx2).join(&rx3).with_timeout(8);
select.wait();
if let Some(msg) = rx1.recv_nb() { ... }
else if let Some(msg) = rx2.recv_nb() { ... }
...
else {
    // timed out
}

(Note: _nb stands for non-blocking)

Edit: clarify

@jdm
Copy link

jdm commented Nov 10, 2017

@johncf That interface does not provide any fairness guarantees, so you could end up starving the second receiver if there's a steady stream on the first one and you're using the select in a loop.

@johncf
Copy link

johncf commented Nov 10, 2017

@jdm Ah, then why not use select.recv(&rx1) as proposed before instead of rx1.recv_nb() ?

@bstrie
Copy link

bstrie commented Nov 10, 2017

@TimNN's macro looks lovely, and seems far less fraught than the analogous macro from chan. Does anyone see anything immediately wrong with it? Perhaps it will end up less nice if we have to account for unforeseen edge cases. :P

@stjepang , note that when I say that I prefer the explicit Select struct, what I really mean is that I just don't like global state/TLS. :) I, and I assume many others, would likely be totally fine with the select machinery being "behind-the-scenes" via a macro, especially if you're concerned about the risk of further misuse of the API. (After all, all the machinery used by a macro would be public, and so people could still choose to use the explicit Select struct if they so desired, they just wouldn't have to.)

@TimNN
Copy link

TimNN commented Nov 10, 2017

@stjepang: I don't think there is an easy way for optional commas, but allowing an arbitrary number of commas ($(,)*) is easily possible (I've updated the gist).

@TimNN
Copy link

TimNN commented Nov 10, 2017

@stjepang

Also, we should probably also support the following syntax: send(tx, mut s) so that the returned value in Err(v) is reacquired (s = v).

The send(tx, $ident) form of the current macro already recovers the value of $ident in case of an error (although that currently requires $ident to be mut, but that should be easy to fix: The value of $ident moves into the loop anyway and is unusable afterwards, so adding an let mut $ident = $ident before the loop should be easy enough)

@johncf
Copy link

johncf commented Nov 10, 2017

Here is a refinement of the idea I proposed before, now having APIs for sending and possibly recovering failed-to-send messages.

https://gist.github.com/johncf/81778f23da91ae928e0ee7e92fda1b48

@KasMA1990
Copy link

Reading through this RFC, I was quite confused by what exactly select is supposed to do. Granted, using channels is new to me, but it wasn't until I was pointed in the direction of some Unix history that I learned what select is supposed to do.

I would like to propose that select is slightly renamed to make it more clear what it is actually doing, to e.g. select_any or select_any_ready maybe (or any other name that is deemed better). I know this breaks somewhat with the precedent for this functionality (since it is named as such in Unix and Go for example), but I think there are many Rust users who are not familiar with these precedents and will hence be confused by the name (myself included). At the same time, by keeping the word "select" in the name, it is still easy to search for, for those that do know the precedent (and if "select" is kept as the first part of the name, anyone just looking to type select in their IDE will also be helped by autocomplete).

I feel this would be a very rustic improvement to make; it should still be easy for veterans to use, but welcomes newcomers at the same time, by optimizing for readability over familiarity.

@ghost
Copy link
Author

ghost commented Nov 10, 2017

@johncf

Here is a refinement of the idea I proposed before, now having APIs for sending and possibly recovering failed-to-send messages.

A few issues I can think of:

  1. It's unfortunate that we have to mention each case twice - e.g. first as .join(&rx2) and then as if let Some(msg) = select.recv(&rx2).
  2. We have to write else { unreachable!() } at the end to make sure we didn't forget a case, or to make code typecheck in case we're using the final value of the whole if-elseif expression.

@johncf
Copy link

johncf commented Nov 10, 2017

@stjepang I have a few more concerns regarding a select! macro.

  1. You can't break or continue a loop wrapping the select! without using a label. (Also, loop_select! sounds like a helper macro for this use-case.)

  2. To guarantee fairness, wouldn't it be necessary to use thread-local variables to store past state? Unless select! takes a local variable given by the user to store state, as follows:

    fn _foo(rx1: &Receiver<String>, rx2: &Receiver<String>) {
        let mut state = SelectState::new();
        loop {
            select! {state,
                disconnected() => {
                    break; // oops! ignore for now
                }
                recv(rx1, val) => {
                    if val == "exit" {
                        break;
                    } else {
                        System.out.println("Got {}", val);
                    }
                }
                recv(rx2, val) => {
                    System.out.println("Got {} from rx2", val);
                }
            }
        }
    }
  3. Is it possible to easily recover a variable passed into send? One possibility is:

    let (s_opt, recv_opt) = select! {
        disconnected() => {
            (Some(s), None)
        }
        send(tx, s) => {
            (None, None)
        }
        recv(rx, val) => {
            (Some(s), Some(val))
        }
    };
    System.out.println("{:?}", s_opt, recv_opt);

Edit: Possible recovery of a sent variable on failure.
Edit2: Strikeout 2.

@TimNN
Copy link

TimNN commented Nov 10, 2017

@johncf:

  1. I think this is an acceptable downside. (You can statically prevent unlabeled breaks / continues inside the select macro).
  2. The SelectState::new() stuff would be generated by the macro.
  3. The recovery as written in your updated code works already with the macro I proposed (if s is mut, although that restriction can be lifted). [You original code example would produce a compiler error].

@stjepang:

Can we make it an error if you omit both braces and commas?

Probably, but IMO that would complicate the macro a lot for very little gain.

@johncf
Copy link

johncf commented Nov 10, 2017

@TimNN

  1. Yes, I think I remember reading that. But will the compiler be able to provide a helpful message on how to fix that?
  2. Whatever generated by the macro would be inside the outer loop (see my code), but the state needs to be outside to outer loop for persistence.
  3. Yes, and imagine having one more send. I just think that it is a lot of inconvenience, for an "elegant" solution (which it is only for the very basic use-case).

@TimNN
Copy link

TimNN commented Nov 10, 2017

@stjepang: I personally like select_loop!.

Regarding the send() variants, maybe three version would work:

  1. send($tx, $ident): The IMO most common case: $ident is made mutable and automatically recovered
  2. send($tx, mut $($ident).*): The (nested) struct field case (works for a single variable as well): $ident is assumed to be mutable and automatically recovered
  3. send($tx, eval $expr): An arbitrary expression, which is evaluated every time. (The eval key tries to highlight that $expr is evaluated multiple times)

@TimNN
Copy link

TimNN commented Nov 10, 2017

@johncf

  1. Oh, I think I see what you mean. If I understand the RFC correctly you wouldn't need past state because the "start" case of the select loop is randomly determined each time the loop is initialized.

@TimNN
Copy link

TimNN commented Nov 10, 2017

@johncf:

  1. The error message would look something like this (which I believe is helpful enough):
error[E0384]: re-assignment of immutable variable `_dont_use_an_unlabeled_continue_or_break_in_select_loop`
  --> src/main.rs:44:13
   |
44 |               _dont_use_an_unlabeled_continue_or_break_in_select_loop = ();
   |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ re-assignment of immutable variable```
  1. Oh I think I misunderstood your point, did you mean if there was an easy way to recover the value to a send after the select loop? If that was the question, then I don't believe there is an easy way to do that without a significantly more complex / specialized / restricted macro.

@johncf
Copy link

johncf commented Nov 10, 2017

@TimNN

  1. 😕 hmm... Incorrect line numbers... I dunno YMMV.
  2. Ah sorry, I think you're right. I only skimmed through that part earlier.
  3. Yes, after the select-loop was what I meant. I'm not sure how common using send in select would be to comment about it though.

@bstrie
Copy link

bstrie commented Nov 11, 2017

Reminder that potential edge-case shortcomings of the macro form aren't the end of the world, as long as it's documented how to use this library without the macro, via manually looping and breaking. If the macro will keep people from shooting themselves in the foot for 95% of use cases, then it's still a worthwhile addition.

@ghost
Copy link
Author

ghost commented Nov 17, 2017

What if we simply add a new method Select::is_done that returns true when selection is done?
Then you don't have to manually break the loop in every successful case:

let mut sel = Select::new();
while !sel.is_done() {
    if let Ok(peer) = sel.recv(rx) {
        println!("{} received a message from {}.", name, peer);
    }
    if let Ok(()) = sel.send(tx, name) {
        // Wait for someone to receive my message.
    }
}

This way you can still put a break if you want, but it's not required.

@TimNN
Copy link

TimNN commented Nov 17, 2017

@stjepang: I believe that would have the same downside as the proposed for-loop version: the compiler could no longer guarantee that each "branch" was only taken once. (And in that case I believe the for-syntax is a bit nicer).

- use explicit `Select` struct
- introduce two new error types (used in selection)
- clarify the text on wait lists
- mention unfair selection as a drawback
@ghost
Copy link
Author

ghost commented Nov 21, 2017

Thank you, everyone, for the feedback! I gave all this some thought over the last week.

Overall, it seems that the RFC is generally accepted quite well and there aren't any major blockers. I'll address some comments:

  1. @droundy says selection is not exactly fair, which is true. I've added an item on this to the Drawbacks section. It's probably possible to make selection fairer, although it's a bit tricky to do correctly and efficiently. I'll open a separate issue on this.

  2. @cramertj had a comment on the error type used in the recv selection case. I've introduced two new types to streamline the interface: SelectSendError<T> and SelectRecvError.

  3. @matthieu-m had trouble understanding the selection mechanism in full detail. I've put some effort to clarify the text a bit (e.g. by explaining what wait lists are for). I know the text is not perfectly clear now either, but explaining all the subtle intricacies would require a lot more work. I'll probably put this effort into documenting the actual code instead. :)

  4. The idea of switching from hidden thread-local state machine to explicit Select struct was universally accepted. I've addressed this in the text and in the implementation.

  5. Some people wanted finer-grained control over Sender and Receiver by making them traits, or custom types for each channel variant (flavor). This is a reasonable idea, but I'd prefer to go with a simple interface that is basically a drop-in replacement for std::sync::mpsc, at least for now.

  6. There were proposals for alternative selection interfaces. Unfortunately, I don't think any of those were tangibly better than the current one. All the proposals would fix some issues in the current mechanism, but then introduce new problems along the way.

  7. A lot of people think selection has brittle interface (it's easy to accidentally use it incorrectly). I can agree with that. The best solution would be to introduce a macro for more convenient and foolproof selection. Thanks to @TimNN for a very promising prototype! I'm definitely onboard for a selection macro and will open an issue so we can discuss this further. A new RFC just for the macro may follow up later.

  8. It's unfortunate that incorrect use of selection can lead to deadlocks and similar weirdness. We should definitely add some runtime checks to make sure that the same set of selection cases is visited in the same order in each iteration of the loop. Will open an issue for that.

Some bikeshedding... Channels are constructed using functions named bounded and unbounded, which should be pretty self-explanatory. I intentionally chose those names instead of channel and sync_channel (as in the standard library) to avoid confusion, especially because Go calls zero-sized channels synchronous and bounded channels asynchronous (while Rust calls those synchronous :D). How does everyone feel about that? A possible alternative would be to have just a single function new(cap: Option<usize>).

Finally, I'd like to push forward and publish version 0.1.0 now so that everyone can start playing with the channel. The interface as presented in this RFC will not be set in stone, so if we figure out how to improve it down the line, we'll definitely do it! But let's get some mileage first. How does that sound, @jeehoonkang?

@jeehoonkang
Copy link
Contributor

  • in order to serve as many people as possible, how about shipping @TimNN's macro in 0.1.0? From the discussions so far, I couldn't see any drawback in doing so.

  • I feel "bounded/unbounded" are better names than "sync/async"; they explain exactly what they do. Also, I don't see much benefit of new() at this moment. Let's hear experience from users after releasing it.

  • Even though channel's code is not reviewed at all, I'm quite sure that it's well-written 😃, and at the very least it's tested and its interface is reviewed. Also, it seems users are waiting for a new Crossbeam release a lot! Let's review and enhance it after releasing it.

Thanks for this great job, let's move forward!

@TimNN
Copy link

TimNN commented Nov 21, 2017

  • in order to serve as many people as possible, how about shipping @TimNN's macro in 0.1.0 From the discussions so far, I couldn't see any drawback in doing so.

I can send a PR with the initial macro to the prototype repo, which should also allow us to better discuss the implementation details there.

@TimNN
Copy link

TimNN commented Nov 21, 2017

Regarding the fairness, I just had the following idea (I haven't properly thought it through yet, but wanted to write it down):

This approach is probably a bit more expensive but should still be manageable: The core idea is to change the "two half" iteration to something different, so that neighboring operations are not necessarily directly next to each other.

This can be achieved by, instead of choosing a split point, choosing a skip count: Chose a skip count X, then skip X operations, perform one operation, skip X operations, perform one operation and repeat until every operation has been performed once (but see caveats / details below).

Example:

Consider six operations A - F. With a split point the "check" order would be something like C.D.E.F.A.B for a split point of 2.

For skip counts between 0 - 5 the following orders are possible: A.B.C.D.E.F, B.D.F.A.C.E, C.F.B.E.A.D, D.A.E.B.F.C, E.C.A.F.D.B and F.E.D.C.B.A.

Caveats / Details:
Assume N operations in the loop. To guarantee that we visit all operations, we need to find a prime P > N and assume there are actually P "operations" in the loop. We also choose a random skip count X (0 <= X < N). To visit all elements in the loop we then do the following:

  1. Start at position i = 0
  2. Skip X operations (i = (i + X) % P)
  3. If the current operation does not actually exist (i >= N) goto 1.
  4. Visit the operation at position i
  5. If we have not yet visited N operations, goto 1.

Since P is prime and X < P, we are guaranteed to visit every position once before repeating.

Fairness: If I remember my probability theory lessons correctly this system should guarantee that the probability that a specific operation B is visited directly after an operation A should be approximately equally likely for all possible operations B. This should provide some fairly strong fairness guarantees.

Disadvantages:
This method is less performant than the "split point" algorithm:

  1. It requires finding a suitable prime P, although that should be trivial for any reasonable value of N.
  2. For large values of X the loop may need to be executed almost N times instead of only twice. This may be especially problematic in the face of expensive send value computations / moves due to send value recovery.

And now I've spent way too long writing this down and it is way too late, and I just realized that this turns the whole operation essentially from O(n) to O(n^2), which is probably too slow for a default but maybe works as an "extra-fair" select. Anyway, I'll think about this some more tomorrow. Comments / Feedback are very welcome.

@bstrie
Copy link

bstrie commented Nov 21, 2017

I'm in favor of bounded/unbounded.

@droundy
Copy link

droundy commented Nov 22, 2017

@stjepang Thanks for your thoughtful response and thorough summary. I think your response to the unfairness issue was particularly fair. (Apologies for the pun.)

On the bikeshedding, I agree that bounded and unbounded are way better than sync/async, but wonder whether even better names are possible. In particular, I wonder whether 3 names could be more instructive, even if one name is a special case for bounded(0). To my mind a zero-length bounded channel is a distinctively different thing (in use) than a channel with a finite-length buffer, and I wonder whether there is a more descriptive name for it. The zero-length buffer really means that the receiver and sender are synchronized. In terms of reasoning (and deadlocks) this is very different behavior from a channel with either finite (but non-zero) or infinite buffer...

What about blocking (zero-length buffer), buffered, and nonblocking? That avoids the "sync" syllable which is backwards between go and standard rust, and focuses instead on whether the channel will block. Of the three names, buffered doesn't quite match the others, which I don't like. But a finite buffer will sometimes wait for input (or a reader) and will sometimes not block.

@droundy
Copy link

droundy commented Nov 22, 2017

@TimNN I think a simpler version of your algorithm would simply be to randomly choose one operation to attempt each time through the loop. It is still O(N^2), but is easier to reason about than the skipping version. The prefactor on the O(N^2) shouldn't be too bad if no extra work is done in the loop, an you really want no extra work to be done in the loop, or the implementation leaks.

Also O(N^2) isn't too bad if N is small, and when N is large is precisely when the non-fairness gets large with the current implementation, so users might be likely to value it more when it is more expensive. That said, I think the current plan to document the unfairness and postpone any idea of fixing it is good.

@TimNN
Copy link

TimNN commented Nov 22, 2017

That said, I think the current plan to document the unfairness and postpone any idea of fixing it is good.

I agree!

I think a simpler version of your algorithm would simply be to randomly choose one operation to attempt each time through the loop.

Ah, now I understand your earlier comment. I don't think that could easily work, consider this because you aren't guaranteed to visit all operations once:

  • No operation is ready, the thread parks itself.
  • The thread is unparked
  • The thread randomly visits N operations
  • The operation that unparked the thread is not visited
  • <I can imagine multiple things that happen now, but overall I would consider this situation bad>

@TimNN
Copy link

TimNN commented Nov 22, 2017

I think adding the following API to Select would make using it (with the macro) even safer for negligible costs:

  • .enable_extra_checks(): Called before the loop and enables extra checks (= the one below)
  • .register_loop_end(): Called at the very end of the loop

This will allow checking if the same end of a specific channel was used more than once, which means that the macro can enforce / catch violations of all three "important rules one must follow in order to use selection correctly" listed in the RFC either at run- or compile-time.

Edit: Well, mostly, even with the macro one can do something like select_loop! { send(pick_random_tx(), value) => ... }, but that basically means someone is actively trying to break things.

@TimNN
Copy link

TimNN commented Nov 22, 2017

Another thought: If this can easily be supported by the implementation it might be nice to have a Select::send version which takes a closure instead of a value: This version would attempt to reserve a spot in the channel and only if that is successful would it evaluate the closure.

@droundy
Copy link

droundy commented Nov 22, 2017

@TimNN You are right about my idea, it would only work if the thread never parked, which would be terribly inefficient. Perhaps just creating an array with length of the number of options, and then shuffling it to determine the order of attempts? The shuffling is an O(N) operation, and then we'd have to run through the loop O(N) times, so it's still an O(N^2) approach, I think.

Regarding your closure suggestion, that would seem to make the code far easier to use without side effects or repeated computation.

@ghost
Copy link
Author

ghost commented Nov 23, 2017

PSA: To everyone interested in the select_loop! macro: follow the discussion in this pull request.

@ghost
Copy link
Author

ghost commented Nov 23, 2017

@droundy @TimNN Regarding fairness, we could do the following.

In the first iteration of the loop, while counting cases we also check which cases are ready by calling internal methods .can_send() and .can_recv(). These methods check whether there's a waiting receiver/sender on the other side of the channel, respectively.

Typically we select over no more than a dozen or so cases, so we can remember which cases are ready by marking a bit for each ready case in a u32 bitmask. Then we choose a random number between 0 and bitmask.count_ones() (let's call it R) and set start to the index of the Rth bit in the bitmask.

There is some performance cost associated with checking which cases are ready (I believe the cost is around 10%), but it's probably not too bad.

@droundy

What about blocking (zero-length buffer), buffered, and nonblocking?

For me, names blocking/nonblocking are not much more understandable than synchronous/asynchronous. :) I prefer to differentiate channels by how many messages they can hold: are they infinite, do they have a bounded buffer, or are they zero-sized. The last kind is very special, that is true. It is sometimes also called a rendezvous channel.

It might be nice to have a dedicated constructor for zero-sized channels, but I'm afraid at the same time it might also make constructors more complicated than is necessary.

Note that there is already some precedent for constructors similar to bounded(cap). For example, in Go you can construct bounded channels with make(chan int, 0) and make(chan int, 50), while in Rust you can do sync_channel(0) and sync_channel(50).

@ghost
Copy link
Author

ghost commented Nov 26, 2017

I'm merging this RFC and publishing the first version.
New repository: https://github.com/crossbeam-rs/crossbeam-channel
Documentation: https://docs.rs/crossbeam-channel

You can add crossbeam-channel = "0.1.0" to your Cargo.toml and try it out!

extern crate crossbeam_channel;

use crossbeam_channel::unbounded;

fn main() {
    let (tx, rx) = unbounded();
    tx.send("Hello").unwrap();
    tx.send("World").unwrap();
    drop(tx);

    for msg in rx {
        print!("{} ", msg);
    }
    println!();
}

I decided to include select_loop! macro for easier selection. Thanks again to @TimNN for implementing it! As is often the case with macros, there are still a few warts in it and you can run into weird compilation errors, but I believe it's much less likely than with select!.

Let me know if you have any feedback or comments. I'm especially curious:

  • How difficult is it to switch from std::sync::mpsc to crossbeam-channel?
  • Did you notice any difference in performance?
  • How does the select_loop! macro feel?
  • Did you notice weird compilation errors produced by select_loop!? Any examples?
  • If you notice any bugs, please open an issue in the repo. Thanks!

Finally, the remaining problems can be tracked in dedicated issues:

@ghost ghost merged commit 2e22d0e into crossbeam-rs:master Nov 26, 2017
@ghost
Copy link
Author

ghost commented Nov 27, 2017

Just one more bikesheddy question. :)

I'd like to split the disconnected() case into two: all_disconnected() (fired when all selected channels are disconnected) and any_disconnected() (fired when any selected channel is disconnected). Crate ipc-channel (used in Servo) needs this feature.

Do you think names all_disconnected and any_disconnected are adequate? Any better suggestions?

So the full list of possible selection cases would be: send, recv, would_block, timed_out, any_disconnected, all_disconnected.

(also, if you think would_block and timed_out should be renamed, please say so)

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.