Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(node): fix worker_threads issues blocking Angular support #26024

Merged
merged 4 commits into from
Oct 4, 2024

Conversation

nathanwhit
Copy link
Member

Fixes #22995. Fixes #23000.

There were a handful of bugs here causing the hang (each with a corresponding minimized test):

  • We were canceling recv futures when receiveMessageOnPort was called, but this caused the "receive loop" in the message port to exit. This was due to the fact that CancelHandles are never reset (i.e., once you cancel a CancelHandle, it remains cancelled). That meant that after receieveMessageOnPort was called, the subsequent calls to op_message_port_recv_message would throw Interrupted exceptions, and we would exit the loop.

    The cancellation, however, isn't actually necessary. op_message_port_recv_message only borrows the underlying port for long enough to poll the receiver, so the borrow there could never overlap with op_message_port_recv_message_sync.

  • Calling MessagePort.unref() caused the "receive loop" in the message port to exit. This was because we were setting messageEventListenerCount to 0 on unref. Not only does that break the counter when multiple MessagePorts are present in the same thread, but we also exited the "receive loop" whenever the listener count was 0. I assume this was to prevent the recv promise from keeping the event loop open.

    Instead of this, I chose to just unref the recv promise as needed to control the event loop.

  • The last bug causing the hang (which was a doozy to debug) ended up being an unfortunate interaction between how we implement our messageport "receive loop" and a pattern found in npm:piscina (which angular uses). The gist of it is that piscina uses an atomic wait loop along with receiveMessageOnPort in its worker threads, and as the worker is getting started, the following incredibly convoluted series of events occurs:

    1. Parent sends a MessagePort p to worker
    2. Parent sends a message m to the port p
    3. Parent notifies the worker with Atomics.notify that a new message is available
    4. Worker receives message, adds "message" listener to port p
    5. Adding the listener triggers MessagePort.start() on p
    6. Receive loop in MessagePort.start receives the message m, but then hits an await point and yields (before dispatching the "message" event)
    7. Worker continues execution, starts the atomic wait loop, and immediately receives the existing notification from the parent that a message is available
    8. Worker attempts to receive the new message m with receiveMessageOnPort, but this returns undefined because the receive loop already took the message in 6
    9. Atomic wait loop continues to next iteration, waiting for the next message with Atomic.wait
    10. Atomic.wait blocks the worker thread, which prevents the receive loop from continuing and dispatching the "message" event for the received message
    11. The parent waits for the worker to respond to the first message, and waits
    12. The thread can't make any more progress, and the whole process hangs

The fix I've chosen here (which I don't particularly love, but it works) is to just delay the MessagePort.start call until the end of the event loop turn, so that the atomic wait loop receives the message first. This prevents the hang.


Those were the main issues causing the hang. There ended up being a few other small bugs as well, namely exit being emitted multiple times, and not patching up the message port when it's received by receiveMessageOnPort.

Copy link
Contributor

@marvinhagemeister marvinhagemeister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic!! It also fixes an issue we had with vitest@1.x where the vitest run command would never finish (don't think we have an issue for that in our tracker, but ran into this with the test suite from urql).

@nathanwhit nathanwhit merged commit dd8cbf5 into main Oct 4, 2024
17 checks passed
@nathanwhit nathanwhit deleted the angular-worker-threads branch October 4, 2024 16:26
@littledivy
Copy link
Member

Wow #22995 was really annoying me. Thanks for fixing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Angular compat tracking issue Angular ng serve stuck at Building step
4 participants