Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Internal Sharding, this time fixed™ #3140

Open
wants to merge 31 commits into
base: master
from

Conversation

Projects
None yet
7 participants
@vladfrangu
Copy link
Contributor

vladfrangu commented Mar 10, 2019

Please describe the changes this PR makes and why it should be merged:

This PR fixes all the internal sharding issues (fingers crossed). To put it simply, this closes #3053, closes #3085, closes #3021, closes #3083 (or should, this one I'm not 100% on) and closes #2601 (should, but we can't just cause an outage to test out)

But Vlad.. Didn't you say the same in #2976?

I did! You are correct. Buuut, development testing is nothing compared to real life testing, and as shown, people have experienced issues with IS, even after the mentioned PR got merged (although far less often!).

So, how do we know this PR solves this?

Short answer: we don't.

Long answer, I want your help. If you had issues with IS, or are just willing to give a helping hand, please install this PR (npm i vladfrangu/discOwOd.js#internal-sharding-2.0 or yarn add vladfrangu/discOwOd.js#internal-sharding-2.0) and run it. Logging the debug event on the client will be required, how you do it is up to you, but those logs can help in aid fixing issues you might encounter.

I'll also be working on the Discord.JS sharding test suite (at the time of writing, it's midnight and I can't work on this right now, but soon).
The plan is to implement as many different sharding ways people use and run this PR (or any for that matter) to make sure the end results are as expected. This is still, a dream, but I will work on this and let you know as soon as possible 🤞

What can I do if this PR solved my issues?

Approve it! That's the best way to show it worked perfectly for you (but do it after a more long period of testing)

And for the technical peeps out there, what changed?

  • Client.js
    • Made the login function handle just basic checks and presence parsing, leaving the main work on the WebSocketManager
      • Keeps everything in one place, easier to maintain
    • Made options.shards ALWAYS end up as an array to keep code simpler (was Gus's request)
    • Add 4 new client events
      • shardDisconnected (replaces disconnect), emitted when a shard will never reconnect again
      • shardError emitted when a shard encounters any error
        • To discuss if this should remain or if the ERROR event should be used instead
      • shardReconnecting emitted when a shard is getting queued up for a reconnect
      • shardResumed emitted when a shard successfully resumes (not ready's)
  • Constants.js
    • Renamed READY -> CLIENT_READY (since it's the only place it's used)
    • Add the 4 new client events
    • Added 3 new ShardEvents
      • Used internally by the WebSocketShard to emit consistent events that the Manager will process (yes this could've been hardcoded, but I like it cleaner)
  • WebSocketManager.js
    • Added a totalShards property which is used in the checkReady function
    • Made shardQueue and packetQueue hidden by default (if you log the class)
    • expectingClose -> destroyed
    • isReconnectingShards -> reconnecting
    • Made the debug function take shard as an optional parameter, to be used in the debug messages
    • Made the connect function handle everything, from fetching gateway information to checking if the token is valid, creating the shard instances and initiating the connect process
    • Renamed create to createShards and implemented more logic to it
      • Attaches event handles on the shard if they weren't already
      • awaits! the connection of a shard and handles the error appropriately
      • Handles the session limits accordingly
    • Made the reconnect function re-use the createShards one, while still handling session limits AND token invalidation checks
  • WebSocketShard.js
    • Made the ratelimit, connection, inflate, helloTimeout (more on this later) and eventsAttached not visible by default
    • Made the connect function return a Promise that resolves when the shard gets ready / resumes and rejects if it closes
    • onOpen sets the shard's status to NEARLY, to indicate we're almost READY
    • Simplified onClose by just emitting the close event and letting the manager handle the rest
    • Implement a HELLO payload timeout
      • This should prevent any dead or hanging shards that never connect, as well as ensure we're.. actually connected
    • Add an important parameter to send
      • Used only by important functions which need to send the payload NOW, not wait for the entire queue to drain (say, identify before presence updates)
    • Minor touch ups to the destroy function
  • index.d.ts
    • I really can't be bothered to explain everything I did, so I'll just say I added everything about the classes and events. Happy?

Have fun reviewing and testing! 🎉 🍰

Status

  • Code changes have been tested against the Discord API, or there are no code changes
  • I know how to update typings and have done so, or typings don't need updating

Semantic versioning classification:

  • This PR changes the library's interface (methods or parameters added)
    • This PR includes breaking changes (methods removed or renamed, parameters moved or removed)
  • This PR only includes non-code changes, like changes to documentation, README, etc.

vladfrangu added some commits Mar 10, 2019

src: Refactor unavailable guild check
Co-Authored-By: kyranet <kyradiscord@gmail.com>
src: More WIP Code
F in the chat to the old manager
src: It should work but Discord says no.
Seriously why is this not working!
fix: Again... edge cases
I love you guys .w.

@vladfrangu vladfrangu referenced this pull request Mar 11, 2019

Open

Master Version Causing WebSocket Timeout #3053

0 of 1 task complete

vladfrangu added some commits Mar 11, 2019

@pizzafox
Copy link
Contributor

pizzafox left a comment

This fixed my issue of shard 0 timing out completely. I'll mark #3021 as being solved by this.

@pizzafox pizzafox referenced this pull request Mar 11, 2019

Open

Shard 0 gateway timing out after a few seconds #3021

1 of 1 task complete
@SpaceEEC
Copy link
Member

SpaceEEC left a comment

  • error -> shardError 👍 from me
    Considering a noop was pretty much what you needed to add, except you want to log them.
  • ShardEvents 👍 organization is good

  • private properties and methods go above public ones
    There is an empty line between those as well:
    private rest: object;
    public options: ClientOptions;
Show resolved Hide resolved src/client/websocket/WebSocketManager.js
url: gatewayURL,
shards: recommendedShards,
session_start_limit: sessionStartLimit,
} = await this.client.api.gateway.bot.get().catch(() => { throw invalidToken; });

This comment has been minimized.

@SpaceEEC

SpaceEEC Mar 12, 2019

Member

This should probably check for error.code === 401 or something similar.
For example an internal server error (5xx status code) should not throw invalidToken.

This comment has been minimized.

@vladfrangu

vladfrangu Mar 12, 2019

Author Contributor

While I agree 100% with you, in my opinion the REST side of the library should handle 5xx gracefully, and shouldn't throw... There's a retries property again too.. Discussion later

Show resolved Hide resolved src/client/websocket/WebSocketShard.js Outdated
Show resolved Hide resolved src/client/websocket/WebSocketShard.js Outdated
Show resolved Hide resolved src/client/websocket/WebSocketShard.js Outdated
Show resolved Hide resolved typings/index.d.ts Outdated
Show resolved Hide resolved typings/index.d.ts
Show resolved Hide resolved typings/index.d.ts
Show resolved Hide resolved typings/index.d.ts
Show resolved Hide resolved typings/index.d.ts

vladfrangu added some commits Mar 12, 2019

@SpaceEEC SpaceEEC self-requested a review Mar 13, 2019

@vadremix

This comment has been minimized.

Copy link

vadremix commented Mar 15, 2019

Is it reasonable to use internal sharding in conjunction with the shard manager? E.g. 3 shards to a process

@SpaceEEC
Copy link
Member

SpaceEEC left a comment

Plus the leftover comment from my earlier review.

Otherwise LGTM.

Show resolved Hide resolved src/errors/Messages.js Outdated
@rumblefrog

This comment has been minimized.

Copy link
Contributor

rumblefrog commented Mar 16, 2019

Can confirm I no longer experience the same as #2976

Thanks

@CyberiumShadow
Copy link
Contributor

CyberiumShadow left a comment

Most of my main issues with IS are now fixed with this PR.
Looks good to go so far

vladfrangu added some commits Mar 16, 2019

src: Don't block reconnect while creating shards
Might fix silent disconnects *again*
src: Prevent reconnect from running if the Manager isn't READY
That way, if a shard dies while we're still spawning, it won't cause issues
fix: Retry to reconnect if there's a network error going on.
The manager *should* keep reconnecting unless the token is invalid
src: Enhance onClose handler for shards in the manager
- If the close code is between 1000 and 2000 (inclusive), you cannot resume
I tested this locally
- If there's a session ID still present, immediately try to resume
Faster resumes :papaBless:
Otherwise, the invalid session event will trigger and it'll handle accordingly

I swear if I see a SINGULAR Silent DC I'm yeeting
@xonrsoftware

This comment has been minimized.

Copy link

xonrsoftware commented Mar 21, 2019

Still appears silent disconnect. Just now two bot's went silently offline.
discord.js@12.0.0-dev (github:vladfrangu/discOwOd.js#cd8747c06899ce938addac13f893921c3fb7f902)
Will update to last commit and check again.

@vladfrangu

This comment has been minimized.

Copy link
Contributor Author

vladfrangu commented Mar 21, 2019

Hey @xonrsoftware, thank you for reporting that you've encountered an issue with this PR!

If, per chance, the latest commits do not solve the silent disconnects, please provide a log file of what the debug event emitted. (client.on('debug', message => {}))

That will aid me into figuring out where and what happened before they silently disconnected, in order to fix it ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.