Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(websocketshard): deal with zombie connection caused by 4009 #7581

Merged
merged 51 commits into from Jun 5, 2022

Conversation

legendhimself
Copy link
Contributor

Please describe the changes this PR makes and why it should be merged:
Fixes: #7450
This pr deals with the zombie connection and reconnections of the WebSocket, The issue was readyState being stuck at CLOSING indefinitely when there was a session timeout [4009].

Status and versioning classification:

  • Code changes have been tested against the Discord API, or there are no code changes
  • I know how to update typings and have done so, or typings don't need updating
  • This PR changes the library's interface (methods or parameters added)

Copy link
Member

@almeidx almeidx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to not spam the pull request, please use the enum instead of a magic number in this.connection?.readyState ?? 3 (there's at least 10 references of this in this pull request)

src/client/websocket/WebSocketShard.js Outdated Show resolved Hide resolved
src/client/websocket/WebSocketShard.js Outdated Show resolved Hide resolved
src/client/websocket/WebSocketShard.js Outdated Show resolved Hide resolved
src/client/websocket/WebSocketShard.js Outdated Show resolved Hide resolved
src/client/websocket/WebSocketShard.js Outdated Show resolved Hide resolved
src/client/websocket/WebSocketShard.js Outdated Show resolved Hide resolved
src/client/websocket/WebSocketShard.js Outdated Show resolved Hide resolved
@almeidx
Copy link
Member

almeidx commented Mar 1, 2022

Is this issue not present on the main branch?

@imranbarbhuiya
Copy link
Contributor

Is this issue not present on the main branch?

it is. But we were testing in v13 branch so created pr for v13 first. He'll create another one for main.

legendhimself and others added 2 commits March 1, 2022 17:13
src/client/websocket/WebSocketShard.js Outdated Show resolved Hide resolved
src/client/websocket/WebSocketShard.js Outdated Show resolved Hide resolved
src/client/websocket/WebSocketShard.js Outdated Show resolved Hide resolved
Co-authored-by: Vitor <milagre.vitor@gmail.com>
@vvito7
Copy link
Contributor

vvito7 commented Mar 1, 2022

image
Is it necessary to set the status twice?

@legendhimself
Copy link
Contributor Author

image Is it necessary to set the status twice?

its unnecessary.

@legendhimself
Copy link
Contributor Author

legendhimself commented Mar 1, 2022

Before It used to just go offline ie the child process was still alive, but the WebSocket was at the closing state and unresponsive ie zombie connection. You can refer to the issue mentioned in this prs description for more details

Now after pushing my fixes in production on my bot the reconnects are smooth for all codes, after 4009 it reconnects smoothly.

It's weird that only the 5th shard out of 12 experiences this 4009 session timeout issue, the rest of the shards are fine with weeks of uptime and now my fixes work for that shard as well

Uptime:
image

Logs:
image

Copy link
Member

@SpaceEEC SpaceEEC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏 debug strings

Just these small things:

src/client/websocket/WebSocketShard.js Outdated Show resolved Hide resolved
src/client/websocket/WebSocketShard.js Outdated Show resolved Hide resolved
src/client/websocket/WebSocketShard.js Outdated Show resolved Hide resolved
Co-authored-by: SpaceEEC <spaceeec@yahoo.com>
src/client/websocket/WebSocketManager.js Outdated Show resolved Hide resolved
src/client/websocket/WebSocketShard.js Outdated Show resolved Hide resolved
src/util/Options.js Show resolved Hide resolved
src/util/Options.js Outdated Show resolved Hide resolved
Copy link
Member

@vladfrangu vladfrangu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the [WebSocket] log message prefixes are not really useful (this.debug already appends a prefix), but this looks fine

@legendhimself legendhimself marked this pull request as draft May 26, 2022 01:36
Invoke destroy when connection object exists in onClose method for a cleanup and correct reconnect.
@legendhimself legendhimself marked this pull request as ready for review May 29, 2022 07:07
@legendhimself
Copy link
Contributor Author

legendhimself commented May 29, 2022

After a few days of testing this code. I think I covered/fixed most of the edge cases where WS fails to reconnect or used to go into reconnecting loop.
@kyranet Please review

Copy link
Member

@kyranet kyranet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added #7964 to the list of issues this PR fixes.

I also don't know if we'll see another v13 version but... I'll approve just in case.

@BenPatersonxx

This comment was marked as spam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project