fix(bindgen): avoid async future and stream host injection deadlocks#1437
Conversation
8353412 to
2f3384d
Compare
|
This requires some more thinking - I will keep it as a draft until I figure out what caused a regression. |
The spec is subtle here but IIUC we should start host side future and
stream injection without awaiting it before the canonical read operation
can see BLOCKED.
So if the copy does not immediately produce a pending event, the
implementation must set ASYNC_COPYING and return BLOCKED. Our old code
awaited the host promise during host injection before allowing this path
to return BLOCKED, which deadlocks the guest scheduler.
This is consistent with python code from canon ABI spec:
```python
if not e.has_pending_event():
if not opts.async_:
e.state = CopyState.SYNC_COPYING
thread.wait_until(e.has_pending_event)
else:
e.state = CopyState.ASYNC_COPYING
return [BLOCKED]
code,index,payload = e.get_pending_event()
assert(code == event_code and index == i)
return [payload]
```
This is the same for futures and streams. An example of deadlock on the
guest side with an old implementation looks like this:
```rust
let (mut tx, rx) = wit_stream::new();
futures::join!(
async {
wasi::cli::stdout::write_via_stream(rx).await.unwrap();
},
async {
tx.write(b"hello, world\n".to_vec()).await;
drop(tx);
},
);
```
I've added concurrency tests for both streams and futures that this
patch fixes.
2f3384d to
36fdf90
Compare
|
I think the problem is in this code: I'm not sure why we call e.copy(thread.task.inst, buffer, on_copy, on_copy_done)
if not e.has_pending_event():
if not opts.async_:
e.state = CopyState.SYNC_COPYING
thread.wait_until(e.has_pending_event)
else:
e.state = CopyState.ASYNC_COPYING
return [BLOCKED]
code,index,payload = e.get_pending_event() # <= here, do not copy again but just get result from event
assert(code == event_code and index == i and payload != BLOCKED)
return [payload]The way I understand this is that after a blocked copy is awaken, the pending event contains the completed copy and calling I can try to fix that, but I won't be confident to land it without thorough review 😄 |
|
Last commit seems to fix the regression and with previous changes the deadlock is also fixed. |
vados-cosmonic
left a comment
There was a problem hiding this comment.
LGTM 🚀
I left some nits, but in general this looks good to me -- thanks for adding the new tests and fixing so many impl bugs!
The spec is subtle here but IIUC we should start host side future and stream injection without awaiting it before the canonical read operation can see
BLOCKED.So if the copy does not immediately produce a pending event, the implementation must set
ASYNC_COPYINGand returnBLOCKED. Our old code awaited the host promise during host injection before allowing this path to returnBLOCKED, which deadlocks the guest scheduler.This is consistent with python code from canon ABI spec:
This is the same for futures and streams. An example of deadlock on the guest side with an old implementation looks like this:
I've added concurrency tests for both streams and futures that this patch fixes.