Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolving an infinite loop distributing events to a websocket subscriber #13877

Merged
merged 2 commits into from
Jun 7, 2024

Conversation

chrisguidry
Copy link
Collaborator

The outward signs of this manifest as the Prefect server not stopping in
response to an interrupt after a websocket subscriber has connected once. The
specific scenario that caused that was:

  • Subscriber connects and start listening for events from the server
  • The server opens a queue with the distributor and starts the loop to check for
    new events, periodically yielding back control
  • If no events ever arrive for that subscriber (due to a restrictive filter or
    just an absence of events), the loop will just continue indefinitely even if
    the subscriber disconnects
  • This leads to there being an unfinished task on the event loop, preventing the
    server from shutting itself down cleanly

The ideal solution here would be to send a ping message down the wire to the
client, but we don't have access to those lower-level protocol messages from the
Starlette layer (they are handled by the webserver, like uvicorn in our case).
The solution here is twofold:

  1. Actually yield control all the way back to the caller of the stream.events()
    by yielding a None when there's no new event for a while
  2. When getting that control passed back to the socket endpoint, it can perform
    a (mildly hacky) test by attempting to receive from the socket (knowing
    that the client isn't expected to send anything); this works as a
    functional "are you still there?" test that we can perform without needing to
    introduce a client-side acknowledgement to the streaming protocol

Fixes #13868

The outward signs of this manifest as the Prefect server not stopping in
response to an interrupt after a websocket subscriber has connected once.  The
specific scenario that caused that was:

* Subscriber connects and start listening for events from the server
* The server opens a queue with the distributor and starts the loop to check for
  new events, periodically yielding back control
* If no events ever arrive for that subscriber (due to a restrictive filter or
  just an absence of events), the loop will just continue indefinitely _even if
  the subscriber disconnects_
* This leads to there being an unfinished task on the event loop, preventing the
  server from shutting itself down cleanly

The ideal solution here would be to send a `ping` message down the wire to the
client, but we don't have access to those lower-level protocol messages from the
Starlette layer (they are handled by the webserver, like uvicorn in our case).
The solution here is twofold:

1. Actually yield control all the way back to the caller of the `stream.events()`
   by `yield`ing a `None` when there's no new event for a while
2. When getting that control passed back to the socket endpoint, it can perform
   a (mildly hacky) test by attempting to `receive` from the socket (knowing
   that the client isn't expected to `send` anything); this works as a
   functional "are you still there?" test that we can perform without needing to
   introduce a client-side acknowledgement to the streaming protocol

Fixes #13868
@chrisguidry chrisguidry requested review from zangell44 and a team as code owners June 7, 2024 18:24
Comment on lines +128 to +131
try:
await _distributor_task
except asyncio.CancelledError:
pass
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated, but I noticed that it looks like the distributor stops immediately after starting. It's just because our start functions aren't supposed to return early

@chrisguidry chrisguidry merged commit db14095 into main Jun 7, 2024
26 checks passed
@chrisguidry chrisguidry deleted the socket-hangs branch June 7, 2024 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prefect server won't stop with Ctrl+C interrupt if a websocket has previously connected
2 participants