Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client gives up on reconnect #606

Closed
joshforbes opened this issue Jun 28, 2023 · 16 comments
Closed

Client gives up on reconnect #606

joshforbes opened this issue Jun 28, 2023 · 16 comments
Assignees

Comments

@joshforbes
Copy link

Hello!

We're in the long overdue process of getting ably-go updated to the most recent version. As we're testing the new implementation we noticed a device that has become disconnected from Ably. Here is the last message:

image
ably connection state changed: current: DISCONNECTED; previous: CONNECTED; event: DISCONNECTED; retry in: 0s; error reason: [ErrorInfo :Post "[https://rest.ably.io:443/keys/MpxHNQ.2JQwTQ/requestToken](https://rest.ably.io/keys/MpxHNQ.2JQwTQ/requestToken)": context deadline exceeded (Client.Timeout exceeded while awaiting headers) code=80003 disconnected statusCode=0] See https://help.ably.io/error/80003

Per the docs, the is should have recovered on its own but it did not:

image

Is there anything special we should be doing to make sure this recovers?

@sacOO7
Copy link
Collaborator

sacOO7 commented Jun 28, 2023

Hi @joshforbes, thanks for raising the issue!
I have some questions before this issue is resolved

  1. ably-go version
  2. go version
  3. Which type of auth mechanism are you using?
  4. Is this only happening with the latest version of ably-go or do previous versions also have the same issue?
  5. If you have a small working snippet that would be able to reproduce this issue, it would be super useful for us to resolve it asap!

@joshforbes
Copy link
Author

  1. ably-go 1.2.12
  2. Go 1.20
  3. Token
  4. We upgraded from ably-go 1.1.3 🫣. Things were a lot different and we had years of workarounds to problems.
  5. I'm sorry but I don't. I don't think we're doing anything special with our implementation. Also, I should say that this is a rather rare problem. We're testing on 83 different devices with a token TTL set to 1 hour. After two days, one device is stuck in this state. Though once we put this "in the wild" we can't afford to lose contact with any devices.

@sacOO7
Copy link
Collaborator

sacOO7 commented Jun 28, 2023

So, currently we are facing this issue with only one of the device out of 83 devices and it remains in the same DISCONNECTED state?
Do we have more verbose logs with timestamp from this device?

@joshforbes
Copy link
Author

That is correct. The other 82 devices (well 79 - see other HTTP/2 issue) are reconnecting as expected. This device was also reconnecting as expected until this context timeout pushed it into a bad state.

Here is the exact timestamp (EDT):
Jun 27 20:04:53

I don't know if the "keys" in the URL help pinpoint anything:

https://rest.ably.io/keys/MpxHNQ.2JQwTQ/requestToken

Unfortunately, I don't have any more information than this. We're going to continue monitoring the staging environment for an extended period before going to production so I will bring in any other data that I find. Is there anything else that I should be doing to make sure a reconnect after a timeout?

@sacOO7
Copy link
Collaborator

sacOO7 commented Jun 28, 2023

Even if it goes into the disconnected state, ideally it should recover from it and should make reconnect attempt in 15 seconds. What is it's current behavior? does it remain in disconnected/suspended state or goes into a failed state?
For disconnected/suspended, retry will happen 15s/30s respectively.
For failed state, no retry will happen.

@joshforbes
Copy link
Author

We added debugging to spit out a log every time the Ably state changes. The last message we get is when it goes from connected -> disconnected. We have code in place to restart our agent if it ever moves to failed so I'm fairly certain that it isn't doing so.

@sacOO7
Copy link
Collaborator

sacOO7 commented Jun 28, 2023

Then maybe somehow it's going into a locking state ...
This information is useful. We will try to reproduce this from our side.
It would be great if you can send us logs till this particular state is reached.

@sacOO7
Copy link
Collaborator

sacOO7 commented Jun 28, 2023

@joshforbes Currently we have two open issues, so the other issue is the same as this one right? If it is, you can close that issue and we can continue our conversation here.

@joshforbes
Copy link
Author

Then maybe somehow it's going into a locking state ...
This information is useful. We will try to reproduce this from our side.
It would be great if you can send us logs till this particular state is reached.

Does Ably have a "verbose mode" that I should turn on while we try to get this error to happen again?

@joshforbes
Copy link
Author

@joshforbes Currently we have two open issues, so the other issue is the same as this one right? If it is, you can close that issue and we can continue our conversation here.

Sorry for the confusion. The other open issue is about a different error (HTTP/2).

@sacOO7
Copy link
Collaborator

sacOO7 commented Jun 28, 2023

Then maybe somehow it's going into a locking state ...
This information is useful. We will try to reproduce this from our side.
It would be great if you can send us logs till this particular state is reached.

Does Ably have a "verbose mode" that I should turn on while we try to get this error to happen again?

You can pass LogHandler in the clientOptions -> https://github.com/ably/ably-go/blob/cbb0f591f117534172e936dfaa2cd1d9e6fb6b8c/ably/options.go#L404C2-L404C12.
Also, there is LogLevel option ->

LogLevel LogLevel

@sacOO7 sacOO7 self-assigned this Jul 3, 2023
@sync-by-unito
Copy link

sync-by-unito bot commented Jul 3, 2023

➤ Automation for Jira commented:

The link to the corresponding Jira issue is https://ably.atlassian.net/browse/SDK-3714

@sacOO7
Copy link
Collaborator

sacOO7 commented Aug 1, 2023

@joshforbes I was curious if this issue is related to #610 ?

@sacOO7
Copy link
Collaborator

sacOO7 commented Aug 31, 2023

@joshforbes can you let us know if this issue still persists? Also, try updating to the latest version.

@sacOO7
Copy link
Collaborator

sacOO7 commented Nov 8, 2023

@joshforbes let us know if the issue still persists, if not, we can close the issue : )

@sacOO7
Copy link
Collaborator

sacOO7 commented Jan 16, 2024

Closing the issue since no related bugs are reported !

@sacOO7 sacOO7 closed this as completed Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants