Skip to content

fix(bindgen): avoid async future and stream host injection deadlocks#1437

Merged
andreiltd merged 4 commits into
bytecodealliance:mainfrom
andreiltd:fix/future-deadlock
May 4, 2026
Merged

fix(bindgen): avoid async future and stream host injection deadlocks#1437
andreiltd merged 4 commits into
bytecodealliance:mainfrom
andreiltd:fix/future-deadlock

Conversation

@andreiltd
Copy link
Copy Markdown
Member

@andreiltd andreiltd commented Apr 29, 2026

The spec is subtle here but IIUC we should start host side future and stream injection without awaiting it before the canonical read operation can see BLOCKED.

So if the copy does not immediately produce a pending event, the implementation must set ASYNC_COPYING and return BLOCKED. Our old code awaited the host promise during host injection before allowing this path to return BLOCKED, which deadlocks the guest scheduler.

This is consistent with python code from canon ABI spec:

  if not e.has_pending_event():
    if not opts.async_:
      e.state = CopyState.SYNC_COPYING
      thread.wait_until(e.has_pending_event)
    else:
      e.state = CopyState.ASYNC_COPYING
      return [BLOCKED]
  code,index,payload = e.get_pending_event()
  assert(code == event_code and index == i)
  return [payload]

This is the same for futures and streams. An example of deadlock on the guest side with an old implementation looks like this:

let (mut tx, rx) = wit_stream::new();
futures::join!(
    async {
        wasi::cli::stdout::write_via_stream(rx).await.unwrap();
    },
    async {
        tx.write(b"hello, world\n".to_vec()).await;
        drop(tx);
    },
);

I've added concurrency tests for both streams and futures that this patch fixes.

@andreiltd andreiltd force-pushed the fix/future-deadlock branch from 8353412 to 2f3384d Compare April 29, 2026 17:28
@andreiltd andreiltd marked this pull request as draft April 29, 2026 18:08
@andreiltd
Copy link
Copy Markdown
Member Author

This requires some more thinking - I will keep it as a draft until I figure out what caused a regression.

The spec is subtle here but IIUC we should start host side future and
stream injection without awaiting it before the canonical read operation
can see BLOCKED.

So if the copy does not immediately produce a pending event, the
implementation must set ASYNC_COPYING and return BLOCKED. Our old code
awaited the host promise during host injection before allowing this path
to return BLOCKED, which deadlocks the guest scheduler.

This is consistent with python code from canon ABI spec:

```python
  if not e.has_pending_event():
    if not opts.async_:
      e.state = CopyState.SYNC_COPYING
      thread.wait_until(e.has_pending_event)
    else:
      e.state = CopyState.ASYNC_COPYING
      return [BLOCKED]
  code,index,payload = e.get_pending_event()
  assert(code == event_code and index == i)
  return [payload]
```

This is the same for futures and streams. An example of deadlock on the
guest side with an old implementation looks like this:

```rust
let (mut tx, rx) = wit_stream::new();
futures::join!(
    async {
        wasi::cli::stdout::write_via_stream(rx).await.unwrap();
    },
    async {
        tx.write(b"hello, world\n".to_vec()).await;
        drop(tx);
    },
);
```

I've added concurrency tests for both streams and futures that this
patch fixes.
@andreiltd andreiltd force-pushed the fix/future-deadlock branch from 2f3384d to 36fdf90 Compare April 30, 2026 07:24
@andreiltd
Copy link
Copy Markdown
Member Author

I think the problem is in this code:

https://github.com/bytecodealliance/jco/blob/main/crates/js-component-bindgen/src/intrinsics/p3/async_stream.rs#L912

I'm not sure why we call copy() again here instead of getting result from pending event? I believe the python code counterpart does that:

  e.copy(thread.task.inst, buffer, on_copy, on_copy_done)

  if not e.has_pending_event():
    if not opts.async_:
      e.state = CopyState.SYNC_COPYING
      thread.wait_until(e.has_pending_event)
    else:
      e.state = CopyState.ASYNC_COPYING
      return [BLOCKED]
  code,index,payload = e.get_pending_event()  # <= here, do not copy again but just get result from event
  assert(code == event_code and index == i and payload != BLOCKED)
  return [payload]

The way I understand this is that after a blocked copy is awaken, the pending event contains the completed copy and calling copy() again will re-enter the stream state machine? The problem is symmetric so it's the same for read and write.

I can try to fix that, but I won't be confident to land it without thorough review 😄

@andreiltd andreiltd marked this pull request as ready for review April 30, 2026 10:01
@andreiltd
Copy link
Copy Markdown
Member Author

Last commit seems to fix the regression and with previous changes the deadlock is also fixed.

Copy link
Copy Markdown
Collaborator

@vados-cosmonic vados-cosmonic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

I left some nits, but in general this looks good to me -- thanks for adding the new tests and fixing so many impl bugs!

Comment thread crates/js-component-bindgen/src/intrinsics/p3/async_future.rs Outdated
Comment thread crates/js-component-bindgen/src/intrinsics/p3/async_stream.rs Outdated
Comment thread crates/js-component-bindgen/src/intrinsics/p3/async_stream.rs
Comment thread packages/jco/test/p3/deadlock-regressions.js Outdated
Comment thread packages/jco/test/p3/deadlock-regressions.js Outdated
@andreiltd andreiltd added this pull request to the merge queue May 4, 2026
Merged via the queue into bytecodealliance:main with commit 4fe08c8 May 4, 2026
37 checks passed
@andreiltd andreiltd deleted the fix/future-deadlock branch May 4, 2026 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants