Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

πŸ› Bug Report β€” Runtime APIs: Hibernating WebSockets remain open when leaving webpage #1187

Closed
jevakallio opened this issue Sep 15, 2023 · 6 comments

Comments

@jevakallio
Copy link

jevakallio commented Sep 15, 2023

Problem

We started observing the following strange behaviour some time yesterday.

  • when using Durable Objects Hibernating WebSockets (calling state.acceptWebSocket(socket),
  • when connecting via web browser WebSocket API
  • when closing the browser tab or reloading the page
  • the connection appears to remain open, as in
    • webSocketClose callback is not executed
    • state.getWebSockets() continues to return the connection
    • connection.readyState continues to report 1 (OPEN)

Minimal reproduction

Code: https://github.com/jevakallio/wrangler-raw/tree/hibernating-sockets-repro/src
Demo: https://jevakallio-wrangler-raw.partykit.dev/?room=objectname

Steps

The issue seems to be triggered inconsistently, but observed as follows

  • Chrome 116.0.5845.187 on MacOS -- consistently (most times)
  • Firefox 117.0.1 (64-bit) on MacOS -- occasionally (less than half the time)
  • Safari 16.5 (18615.2.9.11.4) -- does not occur
  1. Open the following page: https://jevakallio-wrangler-raw.partykit.dev/?room=repro-123 in two browser windows side by side (use Chrome for most frequent reproducibility)
  2. Start refreshing one of the browser tabs

open-connections

You can change the room search parameter to get a different durable object instance for testing, e.g. https://jevakallio-wrangler-raw.partykit.dev/?room=repro-anything-else-here

Expected

  • The connection counter remains at 1 or 2.
  • Each reload displays an webSocketClose message in the other tab.

Actual

  • The connection counter increases
  • The webSocketClose message is sent only sporadically

Workaround

The client can work around this issue by calling websocket.close() manually when the page is unloaded:

window.addEventListener("beforeunload", () => {
  socket.close();
});

You can see the workaround in action by appending a close search parameter to the URL:
https://jevakallio-wrangler-raw.partykit.dev/?room=objectname&close

Other information

This issue started occurring in a live application some time yesterday afternoon/evening UTC. We did not see this issue before.

Initially, we noticed this on one of our sample apps that's supposed to show live connections only:
https://multicursor-sketch.vercel.app/?partyhost=voronoi-party.genmon.partykit.dev

We can see from this demo that:

  • this issue has occurred to users in different regions (flags)
  • on both desktop and mobile browsers (pointer vs touch cursors)
  • the connections seem "permanently" stuck (at least for (16 hours)
image

I deployed the same code without hibernation, and that works as expected:
https://multicursor-sketch.vercel.app/?partyhost=voronoi-party.jevakallio.partykit.dev

@kentonv
Copy link
Member

kentonv commented Sep 18, 2023

We're investigating this.

Are you able to verify that the browser is actually closing the connections?

@jevakallio
Copy link
Author

jevakallio commented Sep 18, 2023

Are you able to verify that the browser is actually closing the connections?

@kentonv I suspected that too, so I did the following:

  1. Close the tab
  2. Force close the browser process (in case Chrome kept the socket open)
  3. Restart the computer (in case the OS kept the socket open)

The socket remains open on CF side.

I'm not aware of any socket that can survive the machine shutting down, so I'm guessing it's closing them :)

@jasnell
Copy link
Member

jasnell commented Sep 18, 2023

Thank you for the details! We've identified the issue. Really appreciate the detailed bug report!

@kentonv
Copy link
Member

kentonv commented Sep 19, 2023

The fix has begun rolling out, but probably won't reach most places until tomorrow.

The problem occurs when a client disconnects a WebSocket without sending a close message first, and the WebSocket was hibernated.

For posterity, the actual bugfix: b529704

We've also added test coverage (not visible publicly, unfortunately), and are also making this class of bug into a compile-time error: capnproto/capnproto#1810

@threepointone
Copy link

Thanks for the turnaround on this, deeply appreciate it!

@kentonv
Copy link
Member

kentonv commented Sep 19, 2023

The fix is fully rolled out (as of ~8:00 UTC, several hours ago).

Thanks @jevakallio for the amazing bug report, and @MellowYarker and @jasnell for debugging and fixing.

@kentonv kentonv closed this as completed Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants