Reconnects hang when WebSocket timed out and fallback was used #2096

Closed
brutaldev opened this Issue May 28, 2013 · 4 comments

Comments

Projects
None yet
5 participants
@brutaldev

The code added for issue #1653 prevents the connection from performing a reconnect because the socket (that timed out) is not cleared correctly. The fix is in place for browsers that claim to support WebSocket but cannot actually connect so a fallback to another transport occurs. When trying to reconnect, the WebSocket object is still available preventing the rest of the connection code from being executed and doing the fallback again.

I will try explain by highlighting areas in the code that are involved in creating the bug.

Inside the signalR.transports.webSockets class in the start method:

// 1. ***** When this is set then it will not reconnect although we are not actually using a WebSocket (it fails and falls back).
if (!connection.socket) {
  ...
  // 2. ***** New socket is created here and it tries to connect, if it fails it is not cleared preventing us from getting back in here. This is normally not a problem but it is if a reconnect is attempted over WS again.
  connection.socket = new window.WebSocket(url);
  ....
  // Issue #1653: Galaxy S3 Android Stock Browser fails silently to establish websocket connections. 
  if (onFailed) {
    initialSocket = connection.socket;
    timeOutHandle = window.setTimeout(function () {
      if (initialSocket === connection.socket) {
        connection.log("WebSocket timed out trying to connect");
        // 3. ***** Should probably close and null connection.socket object here since we can't use it. Because it does not get cleared, it never makes it into the method at point number 1.
        onFailed();
      }
    }, that.timeOut);
  }

The solution we use is to set a flag to determine if WebSockets are actually supported and set that in the timeout failure. We then check that and immediately call onFailed instead of waiting for the timeout again. I would submit a pull request for this but there is too much red-tape involved in contributing so I'll leave it up to you to implement a fix.

@ghost ghost assigned NTaylorMullen May 28, 2013

@DamianEdwards

This comment has been minimized.

Show comment Hide comment
@DamianEdwards

DamianEdwards May 28, 2013

Owner

@NTaylorMullen can you take a look at this please?

Owner

DamianEdwards commented May 28, 2013

@NTaylorMullen can you take a look at this please?

@brutaldev

This comment has been minimized.

Show comment Hide comment
@brutaldev

brutaldev May 29, 2013

To add more information, the issue is compounded if the WebSocket actually connects after the given timeout, you end up with multiple connections to the server and even more fallback connections that disconnect each other.

As an example, the WebSocket connects after 5 seconds but because of the new code added in #1653, the onFailed() would have fired already and an alternative connection would have already been established. Give or take a minute and the WebSocket will close and fire the onclose event which will pass all the checks and call onFailed() because it was not opened correctly. This initialises a new alternative connection which causes the first one to reconnect which causes the second one to reconnect etc etc.

To correct this I have implemented the abort method in the WebSocket transport and call it from the timeout method. Because reconnects will wait 5 seconds each time I also just short-circuit the check for WebSockets to avoid the wait next time.

unreliableWebSocket: false,

...

if (!window.WebSocket || that.unreliableWebSocket === true) {
  onFailed();
  return;
}

...

// Issue #1653: Galaxy S3 Android Stock Browser fails silently to establish websocket connections. 
if (onFailed) {
  initialSocket = connection.socket;
  timeOutHandle = window.setTimeout(function () {
    if (initialSocket === connection.socket) {
      connection.log("WebSocket timed out trying to connect");
      // ***** Kill the connection completely and avoid trying again with this unreliable transport.
      that.abort(connection);
      that.unreliableWebSocket = true;
      onFailed();
    }
  }, that.timeOut);
}

...

// ***** Implement the abort that also disconnects the events so that the close is not fired if it managed to connect.
abort: function (connection) {
  // Don't trigger a reconnect after stopping
  transportLogic.clearReconnectTimeout(connection);

  if (connection.socket !== null) {
    connection.log("Aborting the Websocket");
    // Cancel out events.
    connection.socket.onopen = function () {};
    connection.socket.onclose = function () {};
    connection.socket.onmessage = function () {};
    connection.socket.close();
    connection.socket = null;
  }
}

This fix works for us nicely now and also gets around firewall and proxy issues that could also trigger a WebSocket connection timeout.

To add more information, the issue is compounded if the WebSocket actually connects after the given timeout, you end up with multiple connections to the server and even more fallback connections that disconnect each other.

As an example, the WebSocket connects after 5 seconds but because of the new code added in #1653, the onFailed() would have fired already and an alternative connection would have already been established. Give or take a minute and the WebSocket will close and fire the onclose event which will pass all the checks and call onFailed() because it was not opened correctly. This initialises a new alternative connection which causes the first one to reconnect which causes the second one to reconnect etc etc.

To correct this I have implemented the abort method in the WebSocket transport and call it from the timeout method. Because reconnects will wait 5 seconds each time I also just short-circuit the check for WebSockets to avoid the wait next time.

unreliableWebSocket: false,

...

if (!window.WebSocket || that.unreliableWebSocket === true) {
  onFailed();
  return;
}

...

// Issue #1653: Galaxy S3 Android Stock Browser fails silently to establish websocket connections. 
if (onFailed) {
  initialSocket = connection.socket;
  timeOutHandle = window.setTimeout(function () {
    if (initialSocket === connection.socket) {
      connection.log("WebSocket timed out trying to connect");
      // ***** Kill the connection completely and avoid trying again with this unreliable transport.
      that.abort(connection);
      that.unreliableWebSocket = true;
      onFailed();
    }
  }, that.timeOut);
}

...

// ***** Implement the abort that also disconnects the events so that the close is not fired if it managed to connect.
abort: function (connection) {
  // Don't trigger a reconnect after stopping
  transportLogic.clearReconnectTimeout(connection);

  if (connection.socket !== null) {
    connection.log("Aborting the Websocket");
    // Cancel out events.
    connection.socket.onopen = function () {};
    connection.socket.onclose = function () {};
    connection.socket.onmessage = function () {};
    connection.socket.close();
    connection.socket = null;
  }
}

This fix works for us nicely now and also gets around firewall and proxy issues that could also trigger a WebSocket connection timeout.

@NTaylorMullen

This comment has been minimized.

Show comment Hide comment
@NTaylorMullen

NTaylorMullen Jun 5, 2013

Contributor

This was fixed via #2078.

Contributor

NTaylorMullen commented Jun 5, 2013

This was fixed via #2078.

@ghost ghost assigned muratg Jun 5, 2013

@ghost ghost assigned gustavo-armenta Jun 17, 2013

@gustavo-armenta

This comment has been minimized.

Show comment Hide comment
@gustavo-armenta

gustavo-armenta Jun 20, 2013

Contributor

verified on Galaxy S3 and simulating a slow WebSocket channel creation on server side

Contributor

gustavo-armenta commented Jun 20, 2013

verified on Galaxy S3 and simulating a slow WebSocket channel creation on server side

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment