-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shared channel benchmark fails/hangs with probability #261
Comments
Thanks for the reproducer ! I will take a look soon |
Is the code above the version with higher or lower probability or reproducing ? |
I ran this with the LLVM thread sanitizer and the output is not very useful. It claims there are data races inside the buffer accesses, but the places where it complains about are properly wrapped in Atomics. @zserik - you touched the shared channels recently so tagging you just in case you have any idea. |
While searching for another issue (DataDog#261) I realized that we never unregister channels on drop. That has, unfortunately, nothing to do with the issue but that's a leak so let's fix it.
Hi - this is to let you know that I don't yet have a solution for this but I am working on it.
If I remove this code and just returns I'll keep you posted. |
While searching for another issue (#261) I realized that we never unregister channels on drop. That has, unfortunately, nothing to do with the issue but that's a leak so let's fix it.
It seems to be the exact cause for the memory issues. I'm now walkround it by commenting out the eventfd clearing statement in the impl<T: Copy> Producer<T> {
...
pub(crate) fn disconnect(&self) -> bool {
// (*self.buffer).producer_eventfd.set(None);
(*self.buffer).disconnect_producer()
}
}
impl<T: Copy> Consumer<T> {
...
pub(crate) fn disconnect(&self) -> bool {
// (*self.buffer).consumer_eventfd.set(None);
(*self.buffer).disconnect_consumer()
}
} However the hanging problem is still there, with very low probability to occur. Have to reproduce it with a loop. And seems it's only reproduceable when the receiving executor was built without use glommio::channels::shared_channel;
use glommio::prelude::*;
use std::sync::mpsc::sync_channel;
use std::time::{Duration, Instant};
fn test_spsc(capacity: usize) {
let runs: u32 = 10_000_000;
let (sender, receiver) = shared_channel::new_bounded(capacity);
let sender = LocalExecutorBuilder::new()
.pin_to_cpu(0)
//.spin_before_park(Duration::from_millis(10))
.spawn(move || async move {
let sender = sender.connect();
// let t = Instant::now();
for _ in 0..runs {
sender.send(1).await.unwrap();
}
// println!(
// "cost of sending shared channel {:#?}, capacity: {}",
// t.elapsed() / runs,
// capacity
// );
drop(sender);
})
.unwrap();
let receiver = LocalExecutorBuilder::new()
//.spin_before_park(Duration::from_millis(10))
.pin_to_cpu(1)
.spawn(move || async move {
let receiver = receiver.connect();
// let t = Instant::now();
for _ in 0..runs {
receiver.recv().await.unwrap();
}
// println!(
// "cost of receiving shared channel: {:#?}, capacity {}",
// t.elapsed() / runs,
// capacity
// );
})
.unwrap();
sender.join().unwrap();
receiver.join().unwrap();
}
fn main() {
// test_rust_std(1024);
// test_spsc(1024);
for i in 0..10000 {
println!("==========");
println!("Round {}", i);
//test_spsc(10);
test_spsc(100);
test_spsc(1000);
test_spsc(10000);
}
} |
I added some log in ConnectedSender and ConnectedReceiver: pub fn try_send(&self, item: T) -> Result<(), T> {
...
if self.state.buffer.consumer_disconnected() {
return Err(GlommioError::Closed(ResourceType::Channel(item)));
}
match self.state.buffer.try_push(item) {
None => {
if let Some(fd) = self.state.buffer.must_notify() {
self.reactor.upgrade().unwrap().notify(fd);
println!("SEND: pushed and notified");
} else {
println!("SEND: pushed without notifying");
}
Ok(())
} fn wait_for_room(&self, cx: &mut Context<'_>) -> Poll<()> {
match self.state.buffer.free_space() > 0 {
true => Poll::Ready(()),
false => {
self.reactor
.upgrade()
.unwrap()
.add_shared_channel_waker(self.id, cx.waker().clone());
println!(
"SEND: wait for room! free space: {}",
self.state.buffer.free_space()
);
Poll::Pending
}
}
} fn recv_one(&self, cx: &mut Context<'_>) -> Poll<Option<T>> {
match self.state.buffer.try_pop() {
None if !self.state.buffer.producer_disconnected() => {
self.reactor
.upgrade()
.unwrap()
.add_shared_channel_waker(self.id, cx.waker().clone());
println!("RECV: wait for message! size: {}", self.state.buffer.size());
Poll::Pending
}
res => {
if let Some(fd) = self.state.buffer.must_notify() {
self.reactor.upgrade().unwrap().notify(fd);
println!("RECV: notify room");
}
Poll::Ready(res)
}
}
} Here is some last lines printed before the benchmark hangs:
|
From the log, I found fn must_notify(&self) -> Option<RawFd> {
let eventfd = self.opposite_eventfd();
let mem = eventfd.take();
let ret = mem.as_ref().map(|x| x.load(Ordering::Acquire) as _); <== this value changes from time to time
eventfd.set(mem);
println!("executor {}: {:?}", Local::id(), ret);
match ret {
None | Some(0) => None,
Some(x) => Some(x),
}
}
|
Right, so if you don't use I have the sketch for a solution for this problem. Unfortunately it will require an API change for the Stay tuned, and thanks for verifying! |
It will unfortunately not be possible to fix DataDog#261 without an API change. This is because the channel needs to keep the remote end - or at least part of its state - alive to avoid a data race. We don't know when - or if - the remote end will connect. Therefore this needs to be made into a future that can resolve when the connection happens (or when the endpoint disconnects) This will be a bit complex, so it pays to add the API change separately to reduce the change. Note that some of the examples as-is won't work anymore: that is because they were connecting both ends in the same thread. One can make that work by spawning tasks, but the whole point of shared channels is to connect different executors so the examples are changed to reflect that.
Issue DataDog#261 exposed a serious weakness in how we're handling remote notifications: the lifetime of the eventfd and the memory area for notifications is disconnected from each other, plus the remote end of the channel has no way of keeping that alive. To pave the way for solving this problem, this patch introduces the SleepNotifier. Wrapped around an Arc, it will ease the task of allowing the peer keeping the channel alive. Plus the lifetimes of all entities involved in the notification is kept together.
When an executor goes to sleep, it's write into a memory area that is stored in the shared buffer to signal to the other side of the shared channel that it needs to be notified and wake up. This works well with long-lived executors for which the shutdown process is well behaved so that the channels will be always empty. While that is still a good practice (we are adding that to the docs!) we don't want to force that or depend on that for correctness. In this patch we create a global table that maps executor IDs to their SleepNotifier. Because the SleepNotifier is wrapped in an Arc, a connected channel can keep its peer alive by holding onto that Arc. Even if the reactor dies, the memory will still be valid and so will the eventfd. Fixes DataDog#261
It will unfortunately not be possible to fix DataDog#261 without an API change. This is because the channel needs to keep the remote end - or at least part of its state - alive to avoid a data race. We don't know when - or if - the remote end will connect. Therefore this needs to be made into a future that can resolve when the connection happens (or when the endpoint disconnects) This will be a bit complex, so it pays to add the API change separately to reduce the change. Note that some of the examples as-is won't work anymore: that is because they were connecting both ends in the same thread. One can make that work by spawning tasks, but the whole point of shared channels is to connect different executors so the examples are changed to reflect that.
Issue DataDog#261 exposed a serious weakness in how we're handling remote notifications: the lifetime of the eventfd and the memory area for notifications is disconnected from each other, plus the remote end of the channel has no way of keeping that alive. To pave the way for solving this problem, this patch introduces the SleepNotifier. Wrapped around an Arc, it will ease the task of allowing the peer keeping the channel alive. Plus the lifetimes of all entities involved in the notification is kept together.
When an executor goes to sleep, it's write into a memory area that is stored in the shared buffer to signal to the other side of the shared channel that it needs to be notified and wake up. This works well with long-lived executors for which the shutdown process is well behaved so that the channels will be always empty. While that is still a good practice (we are adding that to the docs!) we don't want to force that or depend on that for correctness. In this patch we create a global table that maps executor IDs to their SleepNotifier. Because the SleepNotifier is wrapped in an Arc, a connected channel can keep its peer alive by holding onto that Arc. Even if the reactor dies, the memory will still be valid and so will the eventfd. Fixes DataDog#261
I just sent a PR for this. Because I haven't released the current version on crates yet (waiting for a liburing release, which is a semi-blocker), I haven't bumped the semver for this. Should be merged soon, by it passes my local tests now. |
One comment about your benchmark, btw: Creating an destroying executor is very expensive! You are measuring that as well in the cost of your shared channel |
It will unfortunately not be possible to fix DataDog#261 without an API change. This is because the channel needs to keep the remote end - or at least part of its state - alive to avoid a data race. We don't know when - or if - the remote end will connect. Therefore this needs to be made into a future that can resolve when the connection happens (or when the endpoint disconnects) This will be a bit complex, so it pays to add the API change separately to reduce the change. Note that some of the examples as-is won't work anymore: that is because they were connecting both ends in the same thread. One can make that work by spawning tasks, but the whole point of shared channels is to connect different executors so the examples are changed to reflect that.
Issue DataDog#261 exposed a serious weakness in how we're handling remote notifications: the lifetime of the eventfd and the memory area for notifications is disconnected from each other, plus the remote end of the channel has no way of keeping that alive. To pave the way for solving this problem, this patch introduces the SleepNotifier. Wrapped around an Arc, it will ease the task of allowing the peer keeping the channel alive. Plus the lifetimes of all entities involved in the notification is kept together.
When an executor goes to sleep, it's write into a memory area that is stored in the shared buffer to signal to the other side of the shared channel that it needs to be notified and wake up. This works well with long-lived executors for which the shutdown process is well behaved so that the channels will be always empty. While that is still a good practice (we are adding that to the docs!) we don't want to force that or depend on that for correctness. In this patch we create a global table that maps executor IDs to their SleepNotifier. Because the SleepNotifier is wrapped in an Arc, a connected channel can keep its peer alive by holding onto that Arc. Even if the reactor dies, the memory will still be valid and so will the eventfd. Fixes DataDog#261
Is it still necessary to use |
no, |
|
There are some problems in our current spsc implementation that are related to memory order. Those problems are fixed (in particular a Relaxed load from the real tail), but none of them are really the root cause of what we are seeing. The real root cause is that because of the possibility of reordering at the CPU level, it could be that we marked ourselves as going to sleep and checked if new work has happened, but our peer did those things in the opposite order. A barrier is needed, that we were missing. Barriers are needed both in the producer and the receiver side, which can be quite expensive. To avoid that, we are using a technique much like the one described at https://www.scylladb.com/2018/02/15/memory-barriers-seastar-linux/ Fully Fixes DataDog#261
I wrote a benchmark for shared channels, which has a probability to fail with
free(): invalid pointer
:It also has a probability (lower than that of freeing invalid pointer) to hang:
Code for the benchmark:
The text was updated successfully, but these errors were encountered: