Skip to content
This repository has been archived by the owner on Feb 7, 2024. It is now read-only.

Inconsistent behavior when using replication #778

Closed
wants to merge 33 commits into from

Conversation

mdprotacio
Copy link
Contributor

@mdprotacio mdprotacio commented Jun 3, 2021

  • channel is removed from set when any of the user unsubscribed without considering that other users may still be connected to the channel
  • channel may be non existent on server A when client is served through server B, but message is received through pusher trigger on server A
  • redundant replicate on call to broadcast on channel, when prior calling broadcast, there is already a call to broadcastAcrossServers
  • if continuously we receive events/message for 2 minutes, ping is never called from pusherjs which renders the connections stale as the lastPongedAt never gets updated. This renders the connection being marked as obsoleted and removed.

Copy link

@myckhel myckhel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test and ensures that when you disconnect your web socket connections, connections are actually removed.

and also ensure that connections are not being piled up on every time you disconnect connection and resubscribe to presence channel.

Tested the branch and some sockets connections just piling up.

Maybe its already an existing issue,
im not sure but it will be better you try to reproduce it.

Screenshot 2021-06-07 at 16 31 46
Screenshot 2021-06-07 at 16 33 51

@mdprotacio
Copy link
Contributor Author

It does seem that the users key has multiple same user values as the key used is the socketId. It could be possible that you have multiple tabs. Say for user_id 4004, you have initially 3 tabs opened and on the next screen you have 4 tabs opened?

The problem that I've been having trouble with when doing and testing the code changes is on the issue of triggered pusher event from backend when I had to mark the connections as ponged when they receive messages, otherwise the pusher javascript will not issue a ping when there is a continuous stream of events getting in. Initially, the removeObsoleteConnections will clean the connections if not ponged after 120 seconds and when continuous stream of events are being received, the client will not issue a ping and will get disconnected.

Since I've marked them now as ponged when I received a message on a connection to keep the connection, when that happens, how can I know if the client is actually still connected and may eventually issue a ping? This is where we may actually be having stale connections that are already disconnected and is not being removed.

I'd rather have that for the time being instead of having to aggressively remove connections that are not being ponged due to continuous event stream which results to clients being not notified of the events when they are actually still connected.

@myckhel
Copy link

myckhel commented Jun 8, 2021

Nice!

the removeObsoleteConnections will clean the connections if not ponged after 120 seconds and when continuous stream of events are being received, the client will not issue a ping and will get disconnected.

As you said connections will be removed after 120 secs.
so the connections were actually there for more than 120 secs and thats why i considered it as an issue.

Since I've marked them now as ponged when I received a message on a connection to keep the connection, when that happens, how can I know if the client is actually still connected and may eventually issue a ping? This is where we may actually be having stale connections that are already disconnected and is not being removed.

I'd rather have that for the time being instead of having to aggressively remove connections that are not being ponged due to continuous event stream which results to clients being not notified of the events when they are actually still connected.

If i may understand ur quotes. you meant you solved a lastPonged issue and there is still likely to be another issue of stale connections.
I initially taught this pull request was to solve stale issue generally.
because i guess ping/pong should fall under stale connection.

Or maybe i didn't get you correctly.

@mdprotacio
Copy link
Contributor Author

mdprotacio commented Jun 8, 2021

Yes. There appears to be still some issues on stale connections specifically on presence channels. We have deployed these changes on production and there seems to be some inconsistencies on unsubscribe. Sometimes, we have say 50 connected users shown but when we refresh the page it was actually just around 30. It appears that there are times that member_removed is not being broadcasted across all instances. I’ll look onto it further and hopefully can get it fixed. Perhaps on a different PR.

Copy link

@myckhel myckhel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this branch for some minutes or more, so far so good.

All seems to be working fine.

  • pusher client ping/pong working fine thou i have just two sockets connected
  • noticed global stale connections are being removed after 120 secs.

Thou i couldn't try to reproduce the original issue this request was created to solve but i can confirm no issues produced by this pull request.

There are still problems existing with presence channels which its stale connections doesn't get removed but this pull request wasn't created to solve it i believe.

as @mdprotacio mentioned he would try to fix on a new push request.

all good at testing with redis connection from my end.

@IlmLV
Copy link

IlmLV commented Aug 27, 2021

@mdprotacio dude, you are legend, last few days for many hours I was struggling to debug lost WS message issues in my project
this PR seems to fix all issues I was facing, thank you!

// if there are no connections.
// If the total connections count gets to 0 after unsubscribe,
// try again to check & unsubscribe from the PubSub topic if needed.
if ($count < 1) {
$this->unsubscribeFromTopic($connection->app->id, $channelName);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdprotacio hey! I am still fighting with one issue, this is not exactly related with your PR, as problem also persists on latest 2.0.0-beta.36 but I am hoping you can help me
what currently I am able to debug, is that after execution unsubscribeFromTopic() on at least one of ws server nodes first message/-s on client side is not received, commenting out this line - problem solved, kinda, but what is consequences? and maybe you will better understand why is that happening

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I perfectly understand your issue. But it appears that maybe when you have unsubscribed from one node, the same is not propagated across other nodes. This may happen if the client is subscribed or connected using the other node but the unsubscribe happened on the other node. This, from what I can recall should have been cleaned up by the removeConnectionFromSet that is the next then of this block. It may be possible that unsubscribeFromTopic may have produced a failure and will stop processing the promise chain and may not remove the connection from the set.

@markeilander
Copy link

markeilander commented Oct 11, 2021

Hi There,

Great work have been debugging to find an answer for

if continuously we receive events/message for 2 minutes, ping is never called from pusherjs which renders the connections stale as the lastPongedAt never gets updated. This renders the connection being marked as obsoleted and removed.

all morning...

Editing Channelmanagers/LocalChannelManager and Channels/Channel with your connectionPonged($connection) fixed that issue.

Any ETA on when this will be merged with master?

@mdprotacio
Copy link
Contributor Author

@myckhel @IlmLV @markeilander guys, would you be able to check #881? I've created another branch and will likely close this PR instead. Hopefully that PR addresses some more other issues. Thanks.

@mdprotacio
Copy link
Contributor Author

Closed in favor of #881

@mdprotacio mdprotacio closed this Nov 12, 2021
mpociot pushed a commit that referenced this pull request Jan 5, 2022
* fixes on replication

* trying to fix #778 #issuecomment-907319726

* code spacing fixes

* codestyle fixes

* trigger workflow locally
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants