Skip to content
This repository has been archived by the owner on Apr 14, 2023. It is now read-only.

Connection timeout strategy causing disconnections #295

Closed
dotansimha opened this issue Oct 24, 2017 · 14 comments · Fixed by #675
Closed

Connection timeout strategy causing disconnections #295

dotansimha opened this issue Oct 24, 2017 · 14 comments · Fixed by #675

Comments

@dotansimha
Copy link
Contributor

dotansimha commented Oct 24, 2017

@mistic

Since the PR that adds the following:
https://github.com/apollographql/subscriptions-transport-ws/blob/v0.8.3/src/client.ts#L407-L416

The default connection timeout starts with 1000ms which isn't enough for low performance servers (for example: free plan in heroku or free plan in now).
This causing the client to disconnect if it took more than 1000ms to perform the connection.

For example, with my remote server, I expect the client to connect in about 1500ms or more - in this case I'm seeing one or more disconnections every time I recreate my socket.

The 1000ms should be configurable, and I think it should be more than 1000ms by default.

apollo-client: 1.9.3
transport: 0.8.3

@pandemosth
Copy link

pandemosth commented Oct 25, 2017

I'm seeing this at the moment too (also on heroku). To add to the issue, appears that the reconnect logic is broken if the max connect timeout fires.

In my scenario the lazy flag is set, so first subscription is triggering the websocket connection attempt. If this times-out and is reattempted I'm seeing two different behaviours after the connection is opened:

  1. No subscription has been established (on Android real device)
  2. Two instances of the subscription that both then fire on new data from the server (on iOS Simulator and real device)

Haven't got any further on why this is happening but will continue to investigate.

And this is react-native with following versions:

apollo-client: 1.9.3
react-apollo: 1.4.16

@pandemosth
Copy link

OK, so I've managed to fix both issues above, changes here for comment and advice on whether this should be a PR.

@dotansimha I can create a separate issue if I'm hijacking this one..?

master...pandemosth:issue295

Explanation follows:-

Issue 1 - No subscription is made on reconnect, after initial connect times out (only seen on Android).
This is a strange one, the WebSocket onOpen callback is being called but the readyState is still 0 (connecting). This causes the following (apologies for verbose explanation):

  • sendMessage is called from onOpen, then in sendMessageRaw status is CONNECTING so message is pushed onto unsent queue
  • then next line in onOpen calls flushUnsentMessagesQueue. The subscription message and the init message in the unsent queue are attempted again, but since status is CONNECTING they are not sent and are added back to the unsent queue. But, in flushUnsentMessagesQueue the queue is then set to an empty array, so the unsent messages are discarded.
  • onOpen is called again, this time with status 1, the websocket is connected but no graphql subscription is established since the unsent queue is empty.

This is probably an issue with React Native / WebSocket on Android. Seems odd that onOpen would be called with reedyState still 0. But, easy enough to guard against that in code which is what I've done.

Issue 2 - Duplicate subscription made on reconnect, after initial connect timeout (both platforms).
After fixing issue 1, this one was observed consistently on both platforms. The max connect timeout calls close which calls tryReconnect. In tryReconnect all current operations are added onto the unsent queue, which is causing the duplication. The simple fix here is to set this.reconnecting = true in the max connect timeout callback in order to bypass this step.

I couldn't see any tests in the project related to timeouts so haven't updated tests.

Btw, Charles proxy with throttling (+1000ms latency) was used to reproduce these issues and verify.

@perrosnk
Copy link

Any updates on this? I am having the same issues

@ilijaNL
Copy link

ilijaNL commented Jan 12, 2018

@pandemosth can confirm this on android, sometimes a dubbel subscription is made, or none at all

@ash0080
Copy link

ash0080 commented Mar 5, 2018

Same issues, no subscription or duplicated subscriptions, any news here?

@Amareis
Copy link

Amareis commented May 16, 2018

#377 solve this

@nwronski
Copy link

nwronski commented May 24, 2018

I was also experiencing Issue 2 described by @pandemosth and can confirm his changes fixed the issue. I observed that multiple copies of the initial operation(s) were sent to the server whenever a websocket connection took longer than 1000ms to connect.

@NeoPhi
Copy link
Contributor

NeoPhi commented Jun 11, 2018

@pandemosth or @nwronski Please open a PR with the changes. The existing logic definitely sounds buggy.

jpgarcia added a commit to underscopeio/subscriptions-transport-ws that referenced this issue Jun 18, 2018
@jpgarcia
Copy link

@NeoPhi If you want I can create a PR with the changes @pandemosth suggested against master.

I've just forked the repo and pushed underscopeio@15c0d70 which is pretty much the same as @pandemosth except the src/server.ts which AFAIK is already updated on master.

I've tested the fix in a react-native project and it seems to be working fine now! (thank you @pandemosth !)

Once published the new version we should also create an issue in apollo-link-ws to upgrade the subscriptions-transport-ws dependency

@mxstbr
Copy link
Contributor

mxstbr commented Jun 18, 2018

@jpgarcia PR would be much appreciated!

jpgarcia added a commit to underscopeio/subscriptions-transport-ws that referenced this issue Jun 18, 2018
@jpgarcia
Copy link

@mxstbr done!

@NeoPhi
Copy link
Contributor

NeoPhi commented Jun 29, 2018

Included in: https://github.com/apollographql/subscriptions-transport-ws/releases/tag/v0.9.12

@NeoPhi NeoPhi closed this as completed Jun 29, 2018
@jedwards1211
Copy link
Contributor

@NeoPhi not everything in OP is fixed, it still appears we don't have a good way to raise the initial connect timeout do we?

I bet even just an expensive initial page load can overwhelm the 1 second initial connect timeout, even if the server is performing just fine.

@jedwards1211
Copy link
Contributor

Also, is applying the backoff to the connection timeout really the best way to go about it? Seems to me the backoff should apply to the time waited before attempting a reconnect, instead of the time waited for the connection to succeed.

jedwards1211 added a commit to jedwards1211/subscriptions-transport-ws that referenced this issue Oct 17, 2019
hwillson pushed a commit to jedwards1211/subscriptions-transport-ws that referenced this issue Aug 10, 2020
hwillson added a commit that referenced this issue Aug 10, 2020
* feat(client): add minTimeout option

actually fix #295 completely

* chore(CHANGELOG.md): document minTimeout change

* Changelog updates

Co-authored-by: hwillson <hugh@octonary.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.