-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Handle websocket closure while performing initial handshake #70
Comments
Great idea, I appreciate the prior-art references too. Always detailed! Also I like that you're using this library to run a bot. I started this library in 2018 trying to write a discord chat exporter, but discord.hs library was defunct. Reading through that code, it's filled with monad instances and complicated type stuff. Very good proposal, having a busy day so I'll work on this tomorrow I think. Maybe extract a
Nice narrow issue space. Retrying indefinably with a delay is a natural idea. I think the library shouldn't give up. |
Error codes like 4000, 1000, etc are specifically sent by the server. If the wifi cuts then the eventloop notices there's no heartbeat and tries to reconnect. Reading the discord socket error codes, most of these handle a library error, or tell to reconnect. Seems safe to reconnect on any gateway close event code. I pushed a commit that adds reconnecting. I'm going to test it out as well before I push a new hackage version. extra-deps:
- github: aquarial/discord-haskell
commit: 83cbc89e006ad91648a5f6f6d04c73ea758d028b
- emoji-0.1.0.2 |
When the Discord docs say "Reconnect and start a new session.", I think they mean "close the websocket and start anew", which is different from "try resuming again". So, when a CloseRequest is given within ConnReconnect (L106), my theory is that you have to do a ConnStart. |
Hm. Looks like Eventloop.hs needs a remold. Handle all the events in one large event catcher instead of separating out the eventloop. Handle the stored status edge case as well. |
I was afk in a way for a while, wanted to clean this up.
Oh right. Depends on the error code though. I've fixed up EventLoop.hs so all the
The reconnect must have failed in a way that the library decieded to do a clean loop restart, which then failed. I've updated the EventLoop so the library will fully close the connection before restarting. Perhaps that would help?
pushed with |
Closing this as it seems to have been fixed, both experimentally and in code. Thank you very much! |
Heya Karl, thanks as always for the simple yet amazing library. This proposal is to implement a design decision to handle websocket closures during the gateway handshake stage.
Proposal
While waiting for Opcode Hello and Opcode Dispatch "Ready",
discord-haskell
should handle websocket close codes properly, similar to how it is done ineventStream
:https://github.com/aquarial/discord-haskell/blob/7a33ecf756c6b404f7201e229a71bfdbd408e873/src/Discord/Internal/Gateway/EventLoop.hs#L175-L183
Existing Code
Currently, the library closes with
ConnClosed
on lines 74-76 (while waiting for Hello):https://github.com/aquarial/discord-haskell/blob/7a33ecf756c6b404f7201e229a71bfdbd408e873/src/Discord/Internal/Gateway/EventLoop.hs#L74-L76
and on lines 67-69 (while waiting for Ready):
https://github.com/aquarial/discord-haskell/blob/7a33ecf756c6b404f7201e229a71bfdbd408e873/src/Discord/Internal/Gateway/EventLoop.hs#L67-L69
The library only properly filters and handle the close codes in
eventStream
, which is only called after Hello and Ready was received as expected.Impact
Our bot experiences periodic disconnection from Discord (which itself is fine, since it reconnects), but once in a while, when a disconnection happens during the handshake phase, the bot goes down for good and never retries. Specifically, we get a
4000 Close Request "Unknown Error"
as the Response to Identify. This is not a fatal error, and should reconnect (docs say 4000 is worth reconnecting). This proposal would fix it.A problem I might see is that the bot would keep rapidly retrying indefinitely if there is no internet connection (i think), resulting in perhaps high computation power, so a delay may be necessary.
Other libraries
poll_event
function everywhere to get everything from Hello, Heartbeat, Dispatch, etc. This function contains a try/catch, in which it handles close codes even during the Hello/Ready exchange, and reconnects as necessary: https://github.com/Rapptz/discord.py/blob/3b555737778f718a4106fdc87a47e0a097bab37c/discord/gateway.py#L571-L576.on("CLOSE")
callback handler, that does the checking against a list ofUNRESUMABLE_CLOSE_CODES
, and resumes as necessary (which means even during Hello/Ready).handle_event
function everywhere to get everything from Hello, Heartbeat, Dispatch, etc. If the websocket has closed, it callshandle_close_event
which logs and checks as necessary, then reconnects (L474). This means it also properly reconnects upon an error during handshake.Overall, all major libraries reconnect as appropriate. The golang library was interesting, but either they may not have had reports on this (since they are not as major as py/js), or they made a conscious design decision not to reconnect.
The text was updated successfully, but these errors were encountered: