Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing heartbeat handling? #12

Closed
fuchsi-fuchs opened this issue May 3, 2021 · 7 comments
Closed

missing heartbeat handling? #12

fuchsi-fuchs opened this issue May 3, 2021 · 7 comments
Labels
bug Something isn't working

Comments

@fuchsi-fuchs
Copy link

fuchsi-fuchs commented May 3, 2021

Describe the bug
Instead of sending a heartbeat to the discord servers every ~40 seconds the connection is shutdown with code 1000 every minute (and immediately reconnected). After the connection is closed unexpectedly (eg. TCP error) the disconnect seems to be unhandled.

To Reproduce
I noticed the disconnect issue after I suspended my computer. After wakeup the bot was naturally disconnected from discord servers but the library never noticed or fixed this. All logs (even trace) stay empty.

Expected behavior

  • According to Discord documentation: first heartbeat after a random duration <= heartbeat_interval, subsequently one haertbeat every heartbeat_interval milliseconds.
  • Automatic resume, on failure reconnect if the connection is closed without a terminating message by discord servers.

System Details:

@fuchsi-fuchs fuchsi-fuchs added the bug Something isn't working label May 3, 2021
@braindigitalis
Copy link
Contributor

braindigitalis commented May 3, 2021

the error 1000 is an error generated by cloudflare.

the bot is still sending keepalives, but if the server side (discord) sends no traffic for 2 intervals cloudflare disconnects the socket with an error 1000.

the SSL error afterwards is just D++ forced disconnect and resume by invalidating the file descriptor.

in real production use, where the guilds have real traffic, this doesn't happen ( add your invite to a busier server and you'll see).

I don't know how to prevent this as, as ive said heartbeat is being sent (stick some debug in there to find out) and it doesn't disrupt production use.

I'll have a think about what I can do to force the connection to stay open, perhaps by low level websocket pings.

@braindigitalis
Copy link
Contributor

It seems sending a low level websocket ping will prevent this (note this is not a discord API keepalive and exists in the level below):

websockets/ws#1436 (comment)

I'll test this later and also look into the hibernation issue, however I don't run my code on anything that hibernates (only on real servers) this can mess with timers and cause issues. I'll let you know what I find out.

@fuchsi-fuchs
Copy link
Author

Well the trace level logging seems to include the heartbeat ACKs send by the server R: {"t":null,"s":null,"op":11,"d":null} - and I do see those coming up, but only if there is activity on the server. Are heartbeats only send after a received package was processed instead of after a timer? That would also explain the issue after hibernation as any heartbeat should make it immediately obvious that the connection died.

Thanks for checking it out in any case :-)

@fuchsi-fuchs
Copy link
Author

Oh and it is not the hibernation that worries me. Disconnecting my ethernet for a minute has the same effect - and network outages do in fact happen so it would be nice for the bot to handle them correctly ^^

@braindigitalis
Copy link
Contributor

no they're sent on a timer, actually 75% of the recommended frequency so if discord says 40 seconds D++ will send them every 30 seconds.

I need to add functionality to ping the underlying websocket, websocket opcode 0x09 to expect a response of type 0x0A. Simply having activity at the application layer above (discord JSON) is not enough to satisfy cloudflare.

this is why you don't see it on other libraries as I've implemented websockets from scratch.

braindigitalis added a commit that referenced this issue May 3, 2021
…9) every 30 seconds and expect a PONG (0xa). If we don't, on an inactive shard we will get disconnected with opcode 8 close code 1000. Also adjust SSL code to ensure that the PING is immediately sent and doesnt wait till we have discord application level stuff to send
@braindigitalis
Copy link
Contributor

This is now fixed in 1.0.1 dev branch, commit 6cf1d27.
We needed to send a low level websocket opcode 0x9 every 30 seconds to stop cloudflare disconnecting the websocket with close code 1000. For each 0x9 we send, we receive a low level 0xa.
The attempt to send the 0x9 will also trigger the socket error routine if the connection has died.
Please let me know if there continue to be issues.

@fuchsi-fuchs
Copy link
Author

fuchsi-fuchs commented May 3, 2021

The commit seems to fix the issue for me. Heartbeats are send out (EDIT: acks are received) every ~30sec. After network disconnect it reconnects quickly and creates a new session as resuming it wasn't possible.

Thanks :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants