Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for session checkpoints #1266

Open
3 tasks done
MinnDevelopment opened this issue Apr 18, 2020 · 6 comments
Open
3 tasks done

Add support for session checkpoints #1266

MinnDevelopment opened this issue Apr 18, 2020 · 6 comments
Labels
level: veteran requires deep understanding of java and jda priority: low type: feature

Comments

@MinnDevelopment
Copy link
Member

General Troubleshooting

  • I have checked for similar issues.
  • I have updated to the latest JDA version.
  • I have checked the branches or the maintainers' PRs for upcoming features.

Feature Request

The cache is currently thrown away the moment the program restarts. We could provide a way to dump a checkpoint file of the current cache to allow resuming the session after restarting. This could be extremely useful to bigger bots that would otherwise exhaust their session start rate limit too quickly.

Exactly how this would be implemented is up for debate and might have to wait for the new auto-sharding to be implemented first.

Example Use-Case

  1. Disconnect
File checkpoint = new File(shardId + "-checkpoint.jda");
jda.detach(checkpoint); // detach session and disconnect
System.exit(0);
  1. Resume
JDA shard = JDA.resume(checkpoint);
@MinnDevelopment MinnDevelopment added type: feature priority: low level: veteran requires deep understanding of java and jda labels Apr 18, 2020
@MinnDevelopment MinnDevelopment added this to the JDA Longtime Goals milestone Apr 18, 2020
@Andre601
Copy link
Contributor

I'm a bit confused...
If I now understand this right would using this create a (temporary) file with information to later use for a direct connect, without first logging into the session or how is that process to understand correctly?

@MrPowerGamerBR
Copy link
Contributor

Very late response, but this is how I understood the idea @Andre601:

If you have a big bot, reconnecting sessions takes a looong time (even if you have x16 login it still takes a while) and uses precious logins (you have limited logins, if you use up all your logins... well, your token is reset, and that's bad).

Here's an example when you need to update your bot:

Without session checkpoints:

  • Shut down your bot
  • Update
  • Start the bot
  • Wait until your bot finishes login... (which can take a while)

With session checkpoints:

  • Shut down your bot while deattaching the session to a file
  • Update
  • Start the bot and resume the deattached session file
  • No need for relogins!

This is a great idea for big bots since you could resume sessions without logging in all the shards again.

However I may be completely wrong in my interpretation of the feature, so sorry if I made a mistake!

@Andre601
Copy link
Contributor

I don't think that is the case as there seems to only be one login at the start and the shards just connect to the websocket.
In addition does the bot when being disconnected on a shard not try a complete relogin, but just a reconnect. Otherwhise this could easaly hit the rate limit very fast.

@MrPowerGamerBR
Copy link
Contributor

MrPowerGamerBR commented Oct 13, 2020

@Andre601 every shard has a different websocket connection. If you have 512 shards, you will need to login to the WebSocket 512 times and send a IDENTIFY each time (yes, each of them needs to IDENTIFY before they can receive events, I may be wrong but I'm 99% sure that's how it works). This uses valuable logins.

According to the example in the original issue, this would be useful if you need to do some downtime on the bot (updates and stuff like that) by saving the bot state into a file, then, on reboot, you load the checkpoint file, allowing you to resume the session without reidentifying. This is very useful if you have a bot that uses a lot of shards but doesn't has x16 login support yet (with causes shards to take up to 30 minutes just to relog!)

In theory you are able to resume the session without any issues by storing the session ID + sequence ID + all loaded guilds to a file and then reloading them when starting the bot again. (Of course, you can resume a session with only the Session ID + Sequence ID, and that's very easy to do with a bit of Reflection magic, but of course, JDA will not trigger any events because the guilds are missing)

@Andre601
Copy link
Contributor

From my experiences with my own bot and by checking the logs does JDA send such messages:

[ 06.10.2020 17:31:39 INFO  ] [main] [ShardManager] - Login Successful!
[ 06.10.2020 17:31:39 INFO  ] [JDA [0 / 27] MainWS-ReadThread] [WebSocketClient] - Connected to WebSocket
[ 06.10.2020 17:31:40 INFO  ] [JDA [0 / 27] MainWS-ReadThread] [JDA] - Finished Loading!
[ 06.10.2020 17:31:40 INFO  ] [JDA [0 / 27] MainWS-ReadThread] [ReadyListener] - Shard 0 ready! # This is my own log message
[ 06.10.2020 17:31:39 INFO  ] [JDA [1 / 27] MainWS-ReadThread] [WebSocketClient] - Connected to WebSocket
[ 06.10.2020 17:31:40 INFO  ] [JDA [1 / 27] MainWS-ReadThread] [JDA] - Finished Loading!
[ 06.10.2020 17:31:40 INFO  ] [JDA [1 / 27] MainWS-ReadThread] [ReadyListener] - Shard 1 ready! # This is my own log message

As you can see is the "Login successful" only send once and not for each shard separately so we can safely assume that an actual login only happens once,

I think we should differentiate between "resuming" and "reconnecting" a session/shard. A resume does to my knowledge not take another login as the connection was just (intentionally) lost temporarily, while on reconnecting it essentially was closed and a new connection needs to be established.
This is, of course, my understanding of this and if there is an actual definition for those two (in terms of what Discord understands between those two things) would I like to see it.

My point was mostly about resuming connections here, which don't really take any additional logins while the topic (now that I looked closer at the PR itself) seems more about a complete bot restart/shutdown which would cause a complete reconnect.

But my tl;dr here is that from what I gathered and saw in the logs does the Bot only log in once using the identify payload and the number of shards it should have, and afterwards just start the shards one by one.

To close this off do I believe that this should be moved to the Discord server as I don't want to continue in flooding this PR with (possibly) unrelated stuff.

@MrPowerGamerBR
Copy link
Contributor

MrPowerGamerBR commented Nov 17, 2022

I made a super stupid, bad, and hacky implementation of this, implementing my own idea that I had two years ago.

It works by persisting all guilds to a file when JDA shuts down, the stored format is the same used by the GUILD_CREATE event. When booting up, the gateway session is resumed and all stored events are dispatched to JDA.

Sadly it requires a JDA fork since I needed to make some internal changes to support it, but it does work, and maybe in the future I will clean it up and submit a PR. :3

Anyhow, here's my implementation of it! https://github.com/LorittaBot/DeviousJDA/blob/master/src/examples/java/SessionCheckpointAndGatewayResumeExample.kt#L32

If it was properly implemented, ofc I would not rely on that super crazy hacky hack.

I think the best way of handling it would be by creating a DefaultShardManagerBuilder#setCheckpointProvider or something like that, where you would provide the checkpointed data for the shard ID, which a CompletableFuture (maybe? in Kotlin it would be a () -> CheckpointData) that, when the shard is resumed, invokes the CompletableFuture to load the data (yeah, that blocks the gateway read thread, but in my experience it takes around ~2s to load and fully dispatch all Guild Create events, so it is fast enough to not cause any issues and keeps the code simple, and besides, you don't want to spend time loading the checkpoint data just to end up receiving a invalid session when trying to resume lol)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level: veteran requires deep understanding of java and jda priority: low type: feature
Projects
None yet
Development

No branches or pull requests

3 participants