New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(websocketshard): deal with zombie connection caused by 4009 #7581
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to not spam the pull request, please use the enum instead of a magic number in this.connection?.readyState ?? 3
(there's at least 10 references of this in this pull request)
Is this issue not present on the main branch? |
it is. But we were testing in v13 branch so created pr for v13 first. He'll create another one for main. |
Co-authored-by: Almeida <almeidx@pm.me>
Co-authored-by: Vitor <milagre.vitor@gmail.com>
Before It used to just go offline ie the child process was still alive, but the WebSocket was at the closing state and unresponsive ie zombie connection. You can refer to the issue mentioned in this prs description for more details Now after pushing my fixes in production on my bot the reconnects are smooth for all codes, after 4009 it reconnects smoothly. It's weird that only the 5th shard out of 12 experiences this 4009 session timeout issue, the rest of the shards are fine with weeks of uptime and now my fixes work for that shard as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏 debug strings
Just these small things:
Co-authored-by: SpaceEEC <spaceeec@yahoo.com>
Co-authored-by: Vlad Frangu <kingdgrizzle@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the [WebSocket]
log message prefixes are not really useful (this.debug already appends a prefix), but this looks fine
Invoke destroy when connection object exists in onClose method for a cleanup and correct reconnect.
After a few days of testing this code. I think I covered/fixed most of the edge cases where WS fails to reconnect or used to go into reconnecting loop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added #7964 to the list of issues this PR fixes.
I also don't know if we'll see another v13 version but... I'll approve just in case.
Please describe the changes this PR makes and why it should be merged:
Fixes: #7450
This pr deals with the zombie connection and reconnections of the WebSocket, The issue was readyState being stuck at
CLOSING
indefinitely when there was a session timeout [4009].Status and versioning classification: