Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up the logout notification when client power cut off #1091

Closed
msmarks opened this issue Nov 23, 2020 · 30 comments
Closed

Speed up the logout notification when client power cut off #1091

msmarks opened this issue Nov 23, 2020 · 30 comments

Comments

@msmarks
Copy link

msmarks commented Nov 23, 2020

In our project, Presence API is used to detect client connectivity. We set heartbeat interval to 60000ms on the js client library. If exit the client in a normal way, the presence API can immediately get a logout notification. However, if the client is powered off directly, We have to wait about 20 minutes before the presence API notifies the logout. We need configuration suggestions about when the client power down directly, how to get logout notification as soon as possible.

@yasserf
Copy link
Contributor

yasserf commented Nov 24, 2020

What version of the server and client are you using?

@yasserf
Copy link
Contributor

yasserf commented Nov 24, 2020

Also what type of http service are you using? uWS or ws? uWS deals with heartbeats (heartbeatinterval * 2). With the text protocol we use individual timers on each websocket. But going through the code is appears ws binary implementation is missing that. Can you confirm if thats what your using?

@msmarks
Copy link
Author

msmarks commented Nov 24, 2020

Server Version: 5.1.3
Client Version: 5.1.10

httpServer:
  type: default
  options:
    # url path for http health-checks, GET requests to this path will return 200 if deepstream is alive
    healthCheckPath: /health-check
    # -- CORS --
    # if disabled, only requests with an 'Origin' header matching one specified under 'origins'
    # below will be permitted and the 'Access-Control-Allow-Credentials' response header will be
    # enabled
    allowAllOrigins: true
    # a list of allowed origins
    origins:
      - 'https://example.com'
    # Options required to create an ssl app
    # ssl:
    #   key: fileLoad(ssl/key.pem)
    #   cert: fileLoad(ssl/cert.pem)
    #   ca: ...

@yasserf
Copy link
Contributor

yasserf commented Nov 24, 2020

Yeah that would probably be the issue then. I'll try get the fix in shortly.

@yasserf
Copy link
Contributor

yasserf commented Nov 24, 2020

This should be now fixed in 5.1.6, in exchange to a timeout being setup for each socket. If performance is your ultimate endgame I would recommend trying to use uws, although it has it's own mini quirks (with our implementation at least).

TravisCI is super slow nowadays, so let me know if the release doesn't work for you as I need to switch context. Thanks!

@yasserf
Copy link
Contributor

yasserf commented Nov 24, 2020

Ended up publishing the macos and windows myself.

@msmarks
Copy link
Author

msmarks commented Nov 25, 2020

The heartbeat packet mechanism works. When I disconnect the network cable, I can quickly see the message of client disconnection in the log. However, it only outputs disconnection messages continuously. Other clients that subscribe to presence do not receive logout messages.

INCOMING_CONNECTION | from undefined (192.168.11.210)
AUTH_SUCCESSFUL | C-hl-ecd68a5c5603

!!!!! disconnect the network cable !!!!


heartbeat issues
CLIENT_DISCONNECTED | C-hl-ecd68a5c5603
heartbeat issues
CLIENT_DISCONNECTED | C-hl-ecd68a5c5603
heartbeat issues
CLIENT_DISCONNECTED | C-hl-ecd68a5c5603
heartbeat issues
CLIENT_DISCONNECTED | C-hl-ecd68a5c5603
heartbeat issues
CLIENT_DISCONNECTED | C-hl-ecd68a5c5603
heartbeat issues
CLIENT_DISCONNECTED | C-hl-ecd68a5c5603
heartbeat issues
CLIENT_DISCONNECTED | C-hl-ecd68a5c5603
heartbeat issues
CLIENT_DISCONNECTED | C-hl-ecd68a5c5603
heartbeat issues
CLIENT_DISCONNECTED | C-hl-ecd68a5c5603
heartbeat issues
CLIENT_DISCONNECTED | C-hl-ecd68a5c5603
heartbeat issues
....... forever print "heartbeat issues"

@yasserf
Copy link
Contributor

yasserf commented Nov 25, 2020

Yeah so it looks like I was calling close instead of destroy. Close does the side effect of a socket being closed (cleaning subscriptions and what not) but destroy just forces the actual socket closed (which should trigger that).

Anyways, I tagged 5.1.7 now.

Also, for anyone seeing this, the server side doesn't actually use a continuous PING/PONG heartbeat mechanism. It just makes sure a packet was received at least within the heartbeat period. This simplifies the code greatly. The server expects a message every heartbeatInterval * 2. If the server doesn't receive a message it terminates the connection. If it does receive a PING it responds with a PONG (but that doesn't go anywhere near heartbeat verification code).

      if (message.topic === TOPIC.CONNECTION && message.action === CONNECTION_ACTION.PING) {
        // respond to PING message
        socketWrapper.sendMessage({ topic: TOPIC.CONNECTION, action: CONNECTION_ACTION.PONG })
        continue
      }

On the client side its checks if a message wasn't received for a heartbeatInterval. If so it sends a ping. If it hasn't recieved any message within a heartbeatInterval * 2 it assumes it lost the connection to the server.

So you need to make sure the heartbeat intervals are the same. If development picks up again it would probably be good for the sever to send the heartbeat config to the client on an INITIALISE action on the connection which can update the client with the correct values (or at least verify they are correct). Since it's a stateless action (it doesn't have to happen) it's not a breaking change.

Mostly useful for production deployments where you don't want to roll out new deployments to play with heartbeats / use a separate config location.

Anyways, hope the fix works this time. Apologies, I don't have a running environment so I'm relying on tests and typescript (e2e and UT), which work really well at catching all sorts of edge cases but not so much at simulating connection loss.

@yasserf yasserf reopened this Nov 25, 2020
@msmarks
Copy link
Author

msmarks commented Nov 25, 2020

Thank you for your quick response. Could you help me compile a version of windows. It seems that the automatic compilation process failed.

@yasserf
Copy link
Contributor

yasserf commented Nov 25, 2020

It's available on appveyor artefacts

https://ci.appveyor.com/project/yasserf/deepstream-io/builds/36504444/artifacts

@msmarks
Copy link
Author

msmarks commented Nov 26, 2020

On the client side its checks if a message wasn't received for a heartbeatInterval. If so it sends a ping. If it hasn't recieved any message within a heartbeatInterval * 2 it assumes it lost the connection to the server.

If the client only subscribes to the change of server data, and the time interval of server data change is less than heartBeatInterval, the client will not send Ping, and the server will not receive any data from the client, and the client will be considered disconnected.

In client side socket-factory.ts, version 5.1.10

    socket.onopen = () => {
        pingInterval = setInterval(() => {
            if (Date.now() - lastRecievedMessageTimestamp > heartBeatInterval) {
                try {
                    socket.send(pingMessage)
                } catch (e) {
                    clearTimeout(pingInterval!)
                }
            }
        }, heartBeatInterval) as never as number
        socket.onopened()
    }

Maybe It should compare the current time with the timestamp of last sent message .

Please help me sync 5.1.7 to docker hub.

@yasserf
Copy link
Contributor

yasserf commented Nov 27, 2020

I'm afraid CI is down and I haven't used docker in a long time so I don't have the environment running.

you can created the docker image via sh scripts/docker.sh, but it may need a tiny bit of tweaking to push it to your own repo for the time being.

@yasserf
Copy link
Contributor

yasserf commented Nov 27, 2020

Also good point about the client, feel free to raise a PR if you have a moment and I'll publish that!

@msmarks
Copy link
Author

msmarks commented Nov 27, 2020

Also good point about the client, feel free to raise a PR if you have a moment and I'll publish that!

I'm not familiar with typescript. I'll try it.

@msmarks
Copy link
Author

msmarks commented Nov 28, 2020

PR raised, Please check if there is any problem.
deepstreamIO/deepstream.io-client-js#549

@yasserf
Copy link
Contributor

yasserf commented Nov 29, 2020

I think I should have fixed the CI failing issue. Still takes forever, hopefully v5.1.8 should have all the artefacts.

Once you add Changelogs to your fix happy to publish that as well.

@msmarks
Copy link
Author

msmarks commented Dec 1, 2020

That was great! The change log has been added.

@yasserf
Copy link
Contributor

yasserf commented Dec 1, 2020

Can you confirm it works as expected so I can close this issue?

Thanks!

@msmarks
Copy link
Author

msmarks commented Dec 3, 2020

There may be some problems. I set the heatbeattival 10s, but there was a disconnection. The log is as follows. I will continue to investigate this.

log from client

2020-12-03T23:26:30.629Z send heatbeat
2020-12-03T23:26:40.640Z send heatbeat
2020-12-03T23:26:50.648Z send heatbeat
2020-12-03T23:27:00.655Z send heatbeat
2020-12-03T23:27:10.662Z send heatbeat
2020-12-03T23:27:20.677Z send heatbeat
2020-12-03T23:27:30.684Z send heatbeat
2020-12-03T23:27:40.688Z send heatbeat
2020-12-03T23:28:00.702Z send heatbeat

The last line of the log shows that the heartbeat packet sending interval is 20s, not 10s.

then the server log

INFO | Heartbeat missing on websocket, terminating connection

@msmarks
Copy link
Author

msmarks commented Dec 4, 2020

I found the problem. In client side socket-factory.ts.

socket.onopen = function () {
    pingInterval = setInterval(function () {
        if (Date.now() - lastSentMessageTimestamp  > heartBeatInterval) {
            try {
                socket.send(pingMessage);
                lastSentMessageTimestamp = Date.now();
            }
            catch (e) {
                clearTimeout(pingInterval);
            }
        }
    }, heartBeatInterval);
    socket.onopened();
};

There is a time error in lastSentMessageTimestamp and heartBeatInterval.

Specifically, if the heartBeatInterval is set to 1000ms, the callback function of setinvinterval will enter in 1000ms, 2000ms, 3000ms... However, the timer of JavaScript is not so accurate. It may enter in 1200ms, 2100ms and 3040ms... Hypothesis socket.send (pingmessage) took 10ms. The situation became:

1200ms for the first heartbeat, lastSentMessageTimestamp = 1210ms
At the second 2100ms, it is judged at this time Date.now () - lastSentMessageTimestamp = 2100ms - 1210ms = 890ms。 This value is less than the heartbeatinterval, and the judgment condition of if is false. So I lost a heartbeat.

I will raise a PR to fix this problem.

@msmarks
Copy link
Author

msmarks commented Dec 4, 2020

pr raised, deepstreamIO/deepstream.io-client-js#550

@msmarks
Copy link
Author

msmarks commented Dec 4, 2020

I found that heartbeat packet detection is bidirectional, which means that the server detects whether the time of the last message received is greater than twice the heartbeatInterval, and the client will also detect whether the time of the last received message is greater than twice the heartbeatInterval. Therefore, if there is a client that only sends messages and does not subscribe to messages, the client will never receive messages, so the client will think that the server has been disconnected. Therefore, my previous changes are wrong.

I think that for heartbeat packets, the client must send them strictly according to the set value, and do not skip the sending of heartbeat packets just because they have sent messages.

@yasserf
Copy link
Contributor

yasserf commented Dec 8, 2020

Makes sense, thanks!

@msmarks msmarks closed this as completed Dec 9, 2020
@jaime-ez
Copy link
Collaborator

Hi everyone! Sorry for being late to the discussion, but since the 5.1.6 release solves the issue with node ws, there is no need to change the socket-factory in the client from the previous 5.1.10 version. That logic is correct.

I found that heartbeat packet detection is bidirectional, which means that the server detects whether the time of the last message received is greater than twice the heartbeatInterval, and the client will also detect whether the time of the last received message is greater than twice the heartbeatInterval. Therefore, if there is a client that only sends messages and does not subscribe to messages, the client will never receive messages, so the client will think that the server has been disconnected.

This is not correct. For every PING sent there is a PONG response, therefore the client does update itslastRecievedMessageTimestamp value every time it sends a PING. As @yasserf stated here.

I think that for heartbeat packets, the client must send them strictly according to the set value, and do not skip the sending of heartbeat packets just because they have sent messages.

This is unnecesary as stated before, in fact in high payload situations it will add an extra PING message that has no purpose.

This works only in scenarios where the client heartbeatInterval <= 2 x server heartbeatInterval + PingPongTime, as oppposed to the existing logic in version 5.1.10 that works when the client heartbeatInterval <= server heartbeatInterval. Both versions fail when the client heartbeatInterval > 2x server heartbeatInterval + PingPongTime.

In conclusion, I propose:

  • revert socket-factory.ts to the previous 5.1.10 version since it minimizes the number of messages sent in high payload scenarios.
  • Document this situation
  • Create an issue to guide the discussion to give a definitive solution to this as @yasserf says here : send an initialiase message from server to client in order to inform heartbeat intervals. Otherwise we are just changing the time intervals where it breaks.

Comments?

@msmarks
Copy link
Author

msmarks commented Dec 27, 2020

This is not correct. For every PING sent there is a PONG response, therefore the client does update its lastRecievedMessageTimestamp value every time it sends a PING. As @yasserf stated here.

Ping is sent by the client, and the server returns pong and the client update lastRecievedMessageTimestamp. In version 5.1.10, if the client only subscribes to the message and does not publish any message, lastRecievedMessageTimestamp will update, and the client may not send any Ping. The server does not receive any client message, and thinks that the client is disconnected.

revert socket-factory.ts to the previous 5.1.10 version since it minimizes the number of messages sent in high payload scenarios.

So I don't think it's a good idea to go back to 5.1.10, but we can solve the problem of too many ping messages based on the current version.

@jaime-ez
Copy link
Collaborator

Ok I finally understood your problem (being subscribed only AND receiving messages with an interval less than heartbeat), although this is an issue only in http server. uws has an idleTimeout option, set to heartBeatInterval value, that prevents this from happening since it checks for both incoming or outgoing messages.

However as I mentioned previously the fix on the client version 5.1.12 also breaks when the client heartbeatInterval > 2 x server heartbeatInterval + PingPongTime (try it with 3x).

An option that should solve the problem is to keep track of exchanged messages:

  • rename lastMessageRecievedAt to lastMessageExchangedAt
  • set the value also here

This way, we can emulate the idleTimeout functionality of uws on the node server and avoid sending unnecesary PING messages from the client. The tradeoff is that when using uws we store an unnecesary value but it shouldn't matter much.

What do you think?

@yasserf
Copy link
Contributor

yasserf commented Dec 28, 2020

We use different socket wrappers so the code can theoretically stay separate.

TBH given we are sending pings every 10 or 20 seconds and they should be aggregated within the socket wrapper bulk (so if there are other messages they get bundled in one websocket frame) I'm wondering if there's any harm with keeping this in until someone asks to make deepstream faster, which hasn't really happened yet 🙈. It's def something we can improve on but I'm not sure if it's a high priority vs getting CI systems working and possibly spitting out some react hooks integrations to see if they teases any adoption 😁

@jaime-ez
Copy link
Collaborator

jaime-ez commented Dec 28, 2020

Hi, I agree the CI is priority, but this is easier for me now :)

Ok maybe the ping is not relevant, but the connection still is closed when the client heartbeat interval is bigger than 2 x the server one using node http server and that's an issue I think. What I propose in the last comment should fix it regardless of the client code.

That way we also avoid syncing heartbeats as proposed previously

@yasserf
Copy link
Contributor

yasserf commented Dec 30, 2020

I'm afraid I'm not fully following yet. lastMessageExchangedAt means either sent or received, but you can send a message a client that is disconnected and it just disappears into the void (which is one of the reasons heartbeats exist).

The point for syncing heartbeats is the server and client need a single source of truth. If I put in a heartbeat of a thousand on the server and a million on the client then assuming there is a connection loss that doesn't include the correct handshake the server will close but the client will wait for at least a million milliseconds. How exactly does any scenario fix that without having heartbeat synchronization?

@jaime-ez
Copy link
Collaborator

jaime-ez commented Feb 7, 2021

Ok, just had time to come back to this. I was wrong all along, plugging out devices made me realize it :)

uws's idleTimeout also prevents the connection from being closed at the server when the client is plugged out. Therefore the complete solution is to keep the client as is, implement the heartBeat sincronization between server and client, implement the heartbeat verification in the uws service as it was done in the node service and disable the idleTimeout. I'll break it into different issues.

One more thing, @yasserf please your comments:

TBH given we are sending pings every 10 or 20 seconds and they should be aggregated within the socket wrapper bulk (so if there are other messages they get bundled in one websocket frame)

Where's the socket wrapper bulk? Couldn't find it in the client code...I guess it should be on src/connection/connection.ts but I don't see anything that could implement that. All I've found is a bulk subscription utility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants