-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I think there's an error in impl<T> Stream for RingReceiver<T> #27
Comments
Generally speaking, your assessment is correct. It was a conscious decision in the design to keep the polling thread awake to avoid having to keep an unbounded number of wakers stored (remember it's a MPMC channel) and tax producers with the unconditional overhead of checking whether there are wakers to awake. Considering ring-channel is throughput oriented, chances are there will always be work to do, so there's little use in letting the polling thread sleep anyways. You can have a run at the benchmarks to see that ring-channel achieves very high throughput, when compared with regular channels. Does that sound reasonable? |
Hmm, this is async code, so there isn't necessarily any threads involved. But basically what you are doing here is running code in an endless loop consuming CPU when no data is present. The whole point of how async in rust is designed is to avoid having to loop. Think about it, what's the advantage of this over using the sync API and just doing something like: loop
{
match rx.recv()
{
Ok(data) => // do something useful
Err(RecvError::Empty) => {}
}
} Now if this is single threaded, you can't actually do anything else here, because this loop will not exit, but in async code, the thread is shared with other async tasks, so if you just run the CPU in a loop it's really a problem. Sure, it requires storing an unknown amount of wakers, but it's up to the user not to go insane and have a trillion tasks polling this channel, unless of course they have the memory to back it up. In any case it's consuming some memory to do something useful, whereas this impl is just endlessly burning cpu cycles for doing nothing. I haven't looked at how I think it's worth considering that the library could be useful for much more than just high throughput. I just advised someone on reddit to use it, where they need to communicate between tasks and they only want to consider the last item sent. That could be done with a ring_channel of size 1. I personally am interested in this for use cases where you don't want back pressure, but not unbounded channels either. That's an interesting property on it's own, regardless of throughput characteristics. All it takes to do this correctly is keep a I wouldn't use the implementation for async as it's written right now, which is a pity. |
It is true that the polling thread will never sleep and our future will spin the CPU even if there are no messages to process, but in general the executor has a backlog of futures to process and it is not true that calling
The problem is not at all related to memory consumption, the real issue are the concurrent reads and writes to the shared
The call to
Don't get me wrong, I'm absolutely interested in making I'll revive an old unpublished branch where I explored alternative implementations of this shared |
Ok, I think I see what you mean. Do we agree this only concerns the async API? The sync api could just ignore the vector. So yes, every read/write would have to check the vector even if there where no wakers in it. It could surely be improved by finding a better implementation, like setting an AtomicBool when a waker get's added and unsetting it on write, that would mean that you only need to check the bool if no wakers are present. I haven't actually done any benchmarks yet, I just looked at the code and found it suspicious to wake in a loop. I will try to play around a bit with it soon and see what the effect on cpu are for infrequent events. |
Actually no, there could be a receiver waiting on a
It would rather have to be an
That would be awesome! |
Actually, ring_channel async receiver always consume 100% CPU doesn't matter frequent or infrequent messages. Example code (change FREQ_MS to try different event frequencies): use futures::stream::StreamExt;
use tokio::runtime::Runtime;
use tokio::timer::Interval;
use tokio::future::FutureExt;
use futures::future::ready;
use ring_channel::*;
use std::num::NonZeroUsize;
use std::time::Instant;
use std::sync::atomic::{AtomicU64, Ordering};
static ELAPSED_EMA: AtomicU64 = AtomicU64::new(0);
static JITTER_EMA: AtomicU64 = AtomicU64::new(0);
const FREQ_MS: u64 = 1_000;
fn main() {
let rt = match Runtime::new() {
Ok(x) => x,
Err(e) => {
eprintln!("{}: cannot create tokio runtime, error: {}", line!(), e);
return;
}
};
let (tx, rx) = ring_channel(NonZeroUsize::new(1).unwrap());
let producer = Interval::new(Instant::now(), Duration::from_millis(FREQ_MS))
.for_each(move |_| {
tx.send(Instant::now()).unwrap();
ready(())
});
let consumer = rx.for_each(move |instant| {
let elapsed = instant.elapsed().as_nanos() as u64;
let mut elapsed_ema = ELAPSED_EMA.load(Ordering::Relaxed);
if elapsed_ema != 0 {
elapsed_ema = (9_999 * elapsed_ema + elapsed) / 10_000;
} else {
elapsed_ema = elapsed;
}
ELAPSED_EMA.store(elapsed_ema, Ordering::Relaxed);
let jitter =
if elapsed > elapsed_ema { elapsed - elapsed_ema }
else { elapsed_ema - elapsed };
let mut jitter_ema = JITTER_EMA.load(Ordering::Relaxed);
if jitter_ema != 0 {
jitter_ema = (9_999 * jitter_ema + jitter) / 10_000;
} else {
jitter_ema = jitter;
}
JITTER_EMA.store(jitter_ema, Ordering::Relaxed);
println!("Elapsed {}ns, avg {}ns, jitter {}ns", elapsed, elapsed_ema, jitter_ema);
ready(())
});
rt.spawn(consumer);
rt.block_on(producer);
} |
@mcseemk thanks for chipping in. I'm very busy right now, so haven't found time to run tests. If you put your code in a block which specifies the language...
...it will have syntax highlighting and be more readable. |
Great, thanks for the hint! |
Yeah, the question was more whether you want to support mixing the API's. I can imagine it can be useful. However adding overhead for someone that doesn't use async (in reader nor writer) is probably not justified especially if you care about maximizing the throughput. On the other hand, if the current impl really just goes to 100% CPU, I doubt anyone will want to use that. It seems to confirm there really is a problem. |
Yeah, I would love to use ring_channel in my async programs, but 100% CPU is a show-stopper at the moment. |
I think a reasonable way out would be to provide both @najamelan, @mcseemk how does that sound for a design? |
It sounds like a nifty design! except:
I suppose you accidentally reversed the names? It just leaves the performance issue for async to be solved. |
Sounds good to me. I wouldn't mind also having an mpsc blocking/async ring receiver as well. |
@najamelan actually not, my reasoning is that from the point of view I can see now this naming convention may be confusing, what would you call them instead?
@mcseemk sounds like a great idea, specially because it can be incrementally built later with no impact to existing code. |
I'm a bit confused now. In principle if the receiver implements |
That's correct, through the Does that clarify it? EDIT: In fact, impl<T> BlockingRingReceiver<T> {
pub fn recv(&mut self) -> Result<T, RecvError> {
block_on(self.next()).ok_or(RecvError::Disconnected)
}
} |
@najamelan @mcseemk I got a working implementation of the core logic that keeps track of |
@najamelan @brunocodutra I've just done some tests, all looks good to me. New version of ring-channel looks virtually indistinguishable from futures::channel::mpsc in terms of async performance / CPU consumption. The latency per message is just under 0.1ms, which I presume is as good as it gets with async anyway. |
Fantastic, thanks for checking @mcseemk! |
i ran the benchmarks, I haven't had time to really understand the code, but at least the results are these: Benchmark resultsRunning target/release/deps/concurrency-9f82666c638de450
concurrency/10000 time: [320.70 us 326.18 us 331.33 us]
thrpt: [30.181 Melem/s 30.657 Melem/s 31.181 Melem/s]
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) low mild
1 (1.00%) high mild
Running target/release/deps/futures-2e521ff894685f8b
futures/mpmc/4x4x1000/1 time: [390.92 us 412.37 us 436.43 us]
thrpt: [2.2913 Melem/s 2.4250 Melem/s 2.5580 Melem/s]
Found 14 outliers among 100 measurements (14.00%)
6 (6.00%) low mild
1 (1.00%) high mild
7 (7.00%) high severe
futures/mpmc/4x4x1000/8 time: [197.32 us 202.58 us 208.95 us]
thrpt: [4.7857 Melem/s 4.9363 Melem/s 5.0678 Melem/s]
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low mild
3 (3.00%) high mild
2 (2.00%) high severe
futures/mpmc/4x4x1000/1000
time: [182.16 us 190.89 us 201.06 us]
thrpt: [4.9736 Melem/s 5.2385 Melem/s 5.4897 Melem/s]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
futures/mpsc/7x1x1000/1 time: [61.782 us 63.231 us 64.660 us]
thrpt: [15.465 Melem/s 15.815 Melem/s 16.186 Melem/s]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
futures/mpsc/7x1x1000/8 time: [243.27 us 252.28 us 260.98 us]
thrpt: [3.8317 Melem/s 3.9639 Melem/s 4.1107 Melem/s]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
futures/mpsc/7x1x1000/1000
time: [249.98 us 254.53 us 258.78 us]
thrpt: [3.8643 Melem/s 3.9288 Melem/s 4.0003 Melem/s]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe
Benchmarking futures/spmc/1x7x1000/1: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 10.5s or reduce sample count to 40
futures/spmc/1x7x1000/1 time: [3.1005 ms 3.3259 ms 3.5215 ms]
thrpt: [283.97 Kelem/s 300.67 Kelem/s 322.53 Kelem/s]
Found 9 outliers among 100 measurements (9.00%)
6 (6.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
Benchmarking futures/spmc/1x7x1000/8: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.5s or reduce sample count to 50
futures/spmc/1x7x1000/8 time: [2.3968 ms 2.4472 ms 2.4980 ms]
thrpt: [400.31 Kelem/s 408.63 Kelem/s 417.22 Kelem/s]
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) low severe
4 (4.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe
Benchmarking futures/spmc/1x7x1000/1000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s or reduce sample count to 60
futures/spmc/1x7x1000/1000
time: [962.04 us 1.0386 ms 1.1237 ms]
thrpt: [889.89 Kelem/s 962.83 Kelem/s 1.0395 Melem/s]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
futures/spsc/1x1x1000/1 time: [95.765 us 97.763 us 100.03 us]
thrpt: [9.9973 Melem/s 10.229 Melem/s 10.442 Melem/s]
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe
futures/spsc/1x1x1000/2 time: [171.25 us 173.43 us 175.69 us]
thrpt: [5.6918 Melem/s 5.7661 Melem/s 5.8393 Melem/s]
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low mild
2 (2.00%) high mild
4 (4.00%) high severe
futures/spsc/1x1x1000/1000
time: [228.00 us 231.37 us 235.06 us]
thrpt: [4.2542 Melem/s 4.3221 Melem/s 4.3859 Melem/s]
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
Running target/release/deps/throughput-2a1729882c69d15b
mpmc/4x4x1000/1 time: [103.44 us 105.95 us 108.55 us]
thrpt: [9.2120 Melem/s 9.4383 Melem/s 9.6670 Melem/s]
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) low mild
2 (2.00%) high mild
mpmc/4x4x1000/8 time: [147.77 us 151.37 us 154.95 us]
thrpt: [6.4535 Melem/s 6.6063 Melem/s 6.7672 Melem/s]
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) low mild
3 (3.00%) high mild
mpmc/4x4x1000/1000 time: [125.40 us 130.34 us 135.26 us]
thrpt: [7.3932 Melem/s 7.6724 Melem/s 7.9744 Melem/s]
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high mild
mpsc/7x1x1000/1 time: [74.924 us 76.430 us 77.822 us]
thrpt: [12.850 Melem/s 13.084 Melem/s 13.347 Melem/s]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild
mpsc/7x1x1000/8 time: [149.11 us 151.93 us 154.89 us]
thrpt: [6.4563 Melem/s 6.5819 Melem/s 6.7065 Melem/s]
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
mpsc/7x1x1000/1000 time: [128.19 us 130.20 us 132.30 us]
thrpt: [7.5588 Melem/s 7.6807 Melem/s 7.8011 Melem/s]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
spmc/1x7x1000/1 time: [119.72 us 123.17 us 126.56 us]
thrpt: [7.9011 Melem/s 8.1186 Melem/s 8.3530 Melem/s]
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
spmc/1x7x1000/8 time: [127.61 us 130.02 us 132.32 us]
thrpt: [7.5572 Melem/s 7.6912 Melem/s 7.8362 Melem/s]
Found 6 outliers among 100 measurements (6.00%)
5 (5.00%) high mild
1 (1.00%) high severe
spmc/1x7x1000/1000 time: [106.28 us 108.36 us 110.72 us]
thrpt: [9.0315 Melem/s 9.2287 Melem/s 9.4095 Melem/s]
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low mild
4 (4.00%) high mild
spsc/1x1x1000/1 time: [76.011 us 77.322 us 78.670 us]
thrpt: [12.711 Melem/s 12.933 Melem/s 13.156 Melem/s]
spsc/1x1x1000/2 time: [133.79 us 136.34 us 138.79 us]
thrpt: [7.2053 Melem/s 7.3346 Melem/s 7.4742 Melem/s]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild
spsc/1x1x1000/1000 time: [93.918 us 95.640 us 97.428 us]
thrpt: [10.264 Melem/s 10.456 Melem/s 10.648 Melem/s]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe |
Yes, slightly modified code above. I'm mostly interested in latency/jitter rather than throughput so just ran 100k messages 1ms apart and got average latency and its deviation. On my PC I couldn't find any statistically meaningful difference between mpsc futures and ring-channel. |
@ALL, thanks for looking into it! I look forward playing with ringchannel, and when I get round to it, I will review the code and try some benchmarks. 90us seems a lot, but then it depends the hardware and exact bench. |
AFAICT you should just return Pending here and store the waker. When new data comes in, then you call wake (I suppose that means your ringbuffer needs to wake the waker).
The text was updated successfully, but these errors were encountered: