Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detection of connection state change is too slow #99

Closed
mattheworiordan opened this issue Aug 22, 2015 · 10 comments
Closed

Detection of connection state change is too slow #99

mattheworiordan opened this issue Aug 22, 2015 · 10 comments
Assignees
Labels
bug Something isn't working. It's clear that this does need to be fixed. failing-test Where an automated test is failing either locally or in CI. Perhaps flakey, wrong or bug.

Comments

@mattheworiordan
Copy link
Member

Whilst testing the mobile Cordova demo in my browser, I see that the connection state change detection is far too slow to be useful.

I have compared this to the web versions of WhatsApp and Slack, and the results vary quite a lot. WhatsApp literally detects the Wifi connection being turned off immediately, Slack takes a good 10s or so, and it seems like our Cordova demo almost never detects that the connection is gone. See https://dl.dropboxusercontent.com/u/1575409/Screenshots/video-capture-connection-state.mov.

The connection state does eventually drop, but it takes far too long. I have not tested this in Cordova yet, but I expect we will need to add code specific to Cordova to detect network state, fortunately Cordova provides this already, see http://cordova.apache.org/docs/en/2.4.0/cordova_events_events.md.html#offline.

I expect we need to approach this in a number of ways:

@paddybyers thoughts?

@mattheworiordan mattheworiordan added the bug Something isn't working. It's clear that this does need to be fixed. label Aug 22, 2015
@paddybyers
Copy link
Member

I agree, but lets also look at what pusher and pubnub do.

With a websocket connection we're generally waiting for the tcp connection to drop, and the idea of the heartbeat is that it should prompt the connection itself to notice that it's broken - it shouldn't need to be the job of the client to have timers running for the heartbeat. However, this strategy is flawed, now I think about it, because only the server (which is attempting to send the heartbeat) then knows there's a problem. So really we should have the client reply to the heartbeat, or initiate sending a heartbeat itself, to trigger a disconnection event from the tcp layer.

I agree that where a platform provides it we should register for network status events, although then using them in place of the internet-up check is perhaps putting a bit too much trust in them.

@mattheworiordan
Copy link
Member Author

@paddybyers I think for the heartbeat we should not change it now, that would require a protocol change, and it's lower priority. If we can tap into the network status of the device / platform (for all client libraries, perhaps add to the spec) and supplement that with an internet up check that would be sensible. My only concern with the internet up check is that if for example the connection is reported as down, and the internet up check just hangs, then we won't respond quickly. Why do you not trust the internet up check available from the device?

@paddybyers
Copy link
Member

Why do you not trust the internet up check available from the device?

I'm not sure how reliable they are so it would take a bit of reaearch and/or testing to know how much we can rely on getting an event when the network is up or down. For example a check that is based on whether or not the network interface is up will succeed if you're connected to a wifi router but the WAN connection is down; or you're connected to a wifi with a paywall so ti seems to respond ok to things like DNS but actually there's no usable connection there. In the end, the internet-up check is the kind of check you will need to make to know for sure.

@mattheworiordan
Copy link
Member Author

I would agree that internet up according to the device's status does not necessarily mean internet up, however when it says it is down, I expect it really is down. So I am not suggesting we use that to tell us when the network is up, but do trust it when it says its down. WDYT?

@paddybyers
Copy link
Member

but do trust it when it says its down. WDYT?

I think you're still going to try periodic reconnection attempts even if the platform says the network is down, because you've got nothing to lose and you'd look a bit stupid insisting there was no network when in fact it was ok.

But if you get an indication that the network is down, I think it's best to disconnect the transport immediately, because you'll then retry connection anyway without a delay, and if the network happened not to be down, and you succeed in connecting, you'll recover state straight away.

@mattheworiordan
Copy link
Member Author

I think you're still going to try periodic reconnection attempts even if the platform says the network is down

Sure, so long as the attempts are not too frequent.

I assume wrapping the platform's internet up method and providing it in our API would not make sense. For example, if a developer decides to reattempt a connection every 5 seconds, it is really up to them to stop doing this if they truly think there is no internet connection.

But if you get an indication that the network is down, I think it's best to disconnect the transport immediately, because you'll then retry connection anyway without a delay, and if the network happened not to be down, and you succeed in connecting, you'll recover state straight away.

Should this be part of the spec for all libraries? I think so

mattheworiordan added a commit that referenced this issue Aug 22, 2015
See #99

This does not address the heartbeat discussion in any way, this only tests for subscription to online / offline events from the browser or Cordova device
@paddybyers
Copy link
Member

Sure, so long as the attempts are not too frequent.

I think they'd be at the usual frequency for the disconnected and suspended states, until we come up with a more complex strategy.

Should this be part of the spec for all libraries? I think so

I think it could be recommended for those platforms that provide usable and reliable network status events. Ultimately I could imagine that we have more advanced approaches that get tailored for each platform, eg interacting with power management for mobile devices. So I don't think we have to be too prescriptive.

@mattheworiordan
Copy link
Member Author

Agreed. I have done a PR and provided some WIP tests, I was not entirely sure where to catch the device events and trigger transport / connection state changes, so handing over to you now.

@mattheworiordan mattheworiordan added the failing-test Where an automated test is failing either locally or in CI. Perhaps flakey, wrong or bug. label Aug 22, 2015
@SimonWoolf SimonWoolf assigned SimonWoolf and unassigned paddybyers Sep 4, 2015
mattheworiordan added a commit that referenced this issue Sep 10, 2015
See #99

This does not address the heartbeat discussion in any way, this only tests for subscription to online / offline events from the browser or Cordova device
@SimonWoolf
Copy link
Member

Addressed (except for the heartbeat thing) by #126.

As well as the online events changes, you can now set Ably.Realtime.Defaults.disconnectTimeout and suspendedTimeout before you initialise the client, and it will use those values (but won't pick up any changes you make after you initialise the client)

mattheworiordan added a commit that referenced this issue Sep 15, 2015
See #99

This does not address the heartbeat discussion in any way, this only tests for subscription to online / offline events from the browser or Cordova device
mattheworiordan added a commit that referenced this issue Sep 15, 2015
See #99

This does not address the heartbeat discussion in any way, this only tests for subscription to online / offline events from the browser or Cordova device
@SimonWoolf
Copy link
Member

With the new heartbeats and maxIdleTimeout implementation in ably-js 0.9 (1f3f5e8), I think we can close this as fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. It's clear that this does need to be fixed. failing-test Where an automated test is failing either locally or in CI. Perhaps flakey, wrong or bug.
Development

Successfully merging a pull request may close this issue.

3 participants