Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dissonance issues with players disconnecting and reconnecting #55

Closed
jackoshea-dev opened this issue Sep 25, 2017 · 47 comments
Closed

Dissonance issues with players disconnecting and reconnecting #55

jackoshea-dev opened this issue Sep 25, 2017 · 47 comments

Comments

@jackoshea-dev
Copy link

Context

Hi, I have recently implemented Dissonance into my Unity project using UNet High Level API. My project currently supports players being able to join and disconnect from a host server as they please and this has started causing issues with Dissonance. Upon first connection, Dissonance works perfectly and all players have no issues hearing and communicating with each other. It's when a player leaves and reconnects that I'm starting to see some issues.

Expected Behavior

When a player leaves and reconnects I would like for dissonance to resume working as it did when they first connected. This also begins to cause issues with other players ability to hear each other so I feel like there is something going wrong on the server.

Actual Behavior

The reconnecting player can no longer broadcast to other players or receive audio even though the 'Voice Broadcast Trigger' and 'Voice Receipt Trigger' are enabled. I have noticed that upon connecting to the server again that the client still see's their previous player ID as a peer under the Dissonance Comms Component on the DissonanceSetup GameObject. I feel this is contributing to the issue as these peer entries continue to stack as the client continues to disconnect and reconnect. Another note is that as this only happens on the client, it is not on the server, making me think that this issue may be the result of de-synchronisation between the server and client ?

Steps to Reproduce

This problem can be reproduced within the Dissonance demo scene

  1. Build the demo scene and run multiple instances on the same computer or separate computers.
  2. Connect all players through localhost or over a local IP.
  3. Ensure all players can correctly communicate with each other, this step may be difficult off one PC as multiple instances all working off the same microphone will cause intense feedback.
  4. Disconnect one player and reconnect them, this should display the behaviour outlined in the 'Actual Behaviour' section of this post. The problem has definitely been replicated if there is an extra peer entry on all clients, to check this one client should be ran through Unity.

Your Environment

  • Dissonance version used: v3.0.2

  • Unity version: 5.6.2f1 (Personal and Pro)

  • Editor Operating System and version: Microsoft Windows 7 Professional N 6.1.7601

  • Build Settings: This problem occurs on both Windows and Android

  • Link to your project: Cannot provide source code or project at this time

##Final Notes
This is quite a pressing issue as this project is on a deadline, any help would be much appreciated.

Thanks for your time,
Jack

@martindevans
Copy link
Member

martindevans commented Sep 25, 2017

Hi Jack,

I'll get right on trying to reproduce this in the demo scene. Since it's urgent you won't have time to wait for the next release of Dissonance on the asset store, could you email me your order number and then I can send you a new build once the problem is fixed.

@martindevans
Copy link
Member

martindevans commented Sep 25, 2017

Unfortunately I can't reproduce this using the Dissonance 3.0.2 HLAPI demo scene. My procedure was:

  1. Create 3 peers, the editor is the host
  2. Talk from all three peers to check that voice is working
  3. Click disconnect on a peer
  4. Connect with that instance again, confirm that the editor only shows 3 clients
  5. Speak from all three clients to confirm speech is working

Is this sequence something you would have expected to reproduce the problem?

Here's a short video of a test (watch for the pulsing blue orbs indicating speech is working):

2017-09-25_14-35-08

@josemayi
Copy link

I had this same exact issue. When the server was hosted on the same pc as the clients it worked fine. However, if I put the server instance on some other computer, my client instances are able to connect once. If I stop and replay the clients it results in an error trying to connect to the server

@josemayi
Copy link

To add to my previous message, the server seems to not be able to remove clients correctly on my end upon stoping them (if hosting the server instance elsewhere)...It keeps throwing an error... saying something along the lines that a client with that name had already been added to the table with id 0

@martindevans
Copy link
Member

martindevans commented Sep 26, 2017

I'm setting up a test between my laptop and my PC now to see if I can reproduce this. My theory at the moment is some events are happening out of the expected order - exacerbated in the networked situation by natural networking latency.

Are you getting this error by disconnecting and then immediately reconnecting? i.e. reconnecting and triggering the connection logic before the HLAPI has sent the disconnect event to the server.

@josemayi
Copy link

I disconnect then reconnect after maybe two minutes. Also, to increase latency make sure you have your test running on separate Networks... maybe one in your Wi-Fi and another one in a hotspot using your cellular network for example...accessing a public IP instead of a private

@josemayi
Copy link

But unfortunately I don't think this is a latency issue. Especially since the way my product is set up it takes about two minutes before the user has to reconnect upon starting my game.

@josemayi
Copy link

Forgot to mention my issue is using low level networking API

@martindevans
Copy link
Member

martindevans commented Sep 26, 2017

Ah we might be in luck. Just a few days ago I fixed a bug to do with players leaving behind a bit of state on the server when they quit which was never cleaned up - my guess is that this is related to that.

When the player rejoins will they have the same ID as last time? Either because you're using the same Dissonance components as last time, or because you explicitly set the Dissonance player name to something which doesn't change. If you are then when they reconnect they'll trigger that bug (the player runs the connection logic but encounters some of the state which wasn't torn down from last time, gets confused and crashes).

A fix for this has been merged into the dev branch, so if you email me your order number I can give you a build with that fix included.

Also, to increase latency make sure you have your test running on separate Networks... maybe one in your Wi-Fi and another one in a hotspot using your cellular network for example

Check out clumsy it makes these kind of networking tests so much easier to set up :D

@josemayi
Copy link

Will check it out thanks :). The issue though is a bit weird...my understanding is if I dont set a player ID it will use a new guid. Which should be the case right now as all I do is drop the dissonance prefab and let it do its thing. Ok, I'm on the road right now, but sometime tonight I'll send email you the order number

@martindevans
Copy link
Member

if I dont set a player ID it will use a new guid

That's correct. When the DissonanceComms component is first started it sets a GUID name for itself. It then re-uses this ID forever (until it is destroyed).

So it all comes down to if you destroy and recreate the components between sessions. If you do then you unintentionally sidestep the bug because Dissonance assigns a new ID to the client each time (so the server never sees it as a duplicate).

@josemayi
Copy link

Yep, I definitely agree which is why I thought it was weird that I had the issue at all. By the way, that "email me" link does not appear to work...takes me to a "Not Found" page.

@martindevans
Copy link
Member

Oops, here's the mail me link again, this time with the correct formatting.

@martindevans
Copy link
Member

Can I check if either of you guys have had this issue since you've been using the test version I sent you? I'll be putting together a new release next week and I'll mark this as closed if it's all working for you :)

@josemayi
Copy link

josemayi commented Oct 8, 2017 via email

@LordSkittles
Copy link

Hey Martin. I have taken over for Jack handling this issue, and I am currently in the process of getting a log file for you so you can actually see the editor log that is being produced on the built application.

The strange thing with this issue is almost impossible to track down. Everything I do changes the way it reacts, I'm hoping the log I am about to produce will shed more light on the issue.

@LordSkittles
Copy link

Here is the log. (Output from adb) so it is a little weird, but readable)
log.txt

What concerns me is how many lines are like this:

[Dissonance:Recording] CapturePipelineManager: Detected a frame skip of 200.5512ms, forcing reset of Microphone capture pipeline

This leads me to believe that the issue may be that dissonance is struggling to send the packets from the android device?

@martindevans
Copy link
Member

Hi LordSkittles,

If this is being caused by the frame skip detection then I think we've already got a fix for this in the next version (which should be available within a week). The frame skip detector exponentially backs off so that it will never detect many skips in a row. In the meantime I suggest you simply comment out the skip detection (it's not critical). Find this code in Assets/Plugins/Dissonance/Core/Audio/Capture/CapturePipelineManager.cs:

private bool DetectFrameSkip(float deltaTime)
{
   // If frame rate is less than 8 frames per second reset the transmission pipeline. If FPS is actually 8 then this will break
   // voice transmission (due to reset every frame) but that's fine because we can't really run good voice at 8fps anyway. In the
   // more likely case that this was just a stutter in frame rate it will reset the pipeline and fix the mic desync.
   var skip = deltaTime > 0.125f;
   if (skip)
      Log.Warn("Detected a frame skip of {0}ms, forcing reset of Microphone capture pipeline", deltaTime * 1000f);
   return skip;
}

And simply make it return false.

@jackoshea-dev
Copy link
Author

Hi Martin, thanks for the response. After altering that code it does seem that microphone reliability is much better. However now it seems packets are being sent with the incorrect session id which isn't allowing clients to hear each other once they disconnect and reconnect.

The error message that's happening repeatedly on the server after a client reconnects now is '[Dissonance:Network] HlapiServer: Received a packet with incorrect session ID. Expected 951585607, got 2027750816. Resetting client.'

I've seen another issue thread here, I feel this is the exact same issue I'm now experiencing so I will subscribe to that issue thread and keep an eye out for a fix. If I could also receive the new Dissonance build when it's submitted to the store that would be extremely appreciated.

Thanks for the help,
Jack

@martindevans
Copy link
Member

If you want to email me your order number I can send you the latest build right away. Otherwise it should be on the store in about a week :)

@martindevans
Copy link
Member

Dissonance 3.0.3 just went live on the asset store with the fix for the players disconnecting and the exponentially backed off frame detector. I'll close this issue now, feel free to re-open this if either continues to be a problem.

@martindevans
Copy link
Member

martindevans commented Oct 17, 2017

Hi James,

Tom has merged some fixes of mine today. There were two things which may impact you:

  • As mentioned in my last post disconnecting and then rapidly reconnecting can cause the server to end up in an inconsistent state. I've fixed this.
  • HLAPI integration had some overeager error handling - whenever the server failed to send a reliable packet (which can happen if it sends a reliable packet when the player has disconnected, but the server doesn't know yet) it would immediately kill the server without any kind of graceful shutdown. I've fixed this (server is allowed to drop reliable messages to dead connections) and also neatened up how the server handles errors (it should at least try to tear down the server gracefully on error).

Since I can't reproduce your issue I can't be sure if these will help but I'm hoping they will at least narrow down the issue a little. If the server is handling errors more gracefully it'll create a little less noise in the logs.

I don't think I have your email so I've sent the build to Jack :)

@jackoshea-dev
Copy link
Author

jackoshea-dev commented Oct 18, 2017

Hey Martin,

James isn't in the office today so I'm going to be importing your new packages and doing some testing. Sounds like the new changes should definitely help us out, in previous tests we have definitely rapidly disconnected and reconnected so this was probably an issue all along.

I'll get back to you with feedback as soon as possible and let you know how I go.

Thanks again,
Jack

@martindevans
Copy link
Member

Hi Jack, I've just noticed an issue in one of my bugfixes from yesterday. Don't bother testing with that build I sent you, I'll send you another build with that fixed first :)

@LordSkittles
Copy link

Hey Martin,

We tested your build and it didn't work for us. However, we managed to track down the issue ourselves and make a work around. Jack will email you a document detailing the issue, the fix and we also will be supplying a project with the working fix implemented.

Cheers,
James

@martindevans
Copy link
Member

I'm sorry that didn't work for you :(

Good work tracking down the issue though!

@martindevans
Copy link
Member

Hi Guys, I was wondering if you had the details of the problem you located? I know you're on a deadline so no pressure if you've got more important things to be doing :)

@LordSkittles
Copy link

Hey Martin, sorry we attempted to send an email last Friday but it failed to deliver for some odd reason. We resent the email detailing the fix and the issue.

Cheers,
James

@martindevans
Copy link
Member

I've received it this time, thanks.

@martindevans
Copy link
Member

I've been looking at your pdf but I don't quite understand the root cause of the problem. You say:

When a client disconnects, occasionally the client will disconnect before the server has sent any packets, preventing it from removing it from the connection list.

When you refer to 'connection list' I assume you mean the _addedConnections list in HlapiServer?

The intent is that we keep that in sync with the NetworkServer.connections list (as you've no doubt worked out) by removing items from _addedConnections which no longer appear in NetworkServer.connections. This is a proxy for disconnection events which we can't receive without modifying the NetworkManager class (which we don't want to do unless absolutely necessary, it would make setting up the Hlapi integration significantly more complex).

So is the fundamental problem that items aren't removed from the NetworkServer.connections quickly enough? Or even worse sometimes aren't removed from NetworkServer.connections at all?

@morderkaine1
Copy link

Hi,
I have been having this issue (or a similar one) as well.
For me it happens when someone disconnects from the server, and from tests it might be only if talking is happening during the disconnect. This warning message gets spammed repeatedly when it happens

[Dissonance:Network] HlapiServer: Received a packet with incorrect session ID. Expected 1146703746, got 263488912. Resetting client.

And also this is in the log as well.

no free events for message in the queue
no free events for message in the queue
Failed to send internal buffer channel:1 bytesToSend:236
Send Error: NoResources channel:1 bytesToSend:236

It seems to happen when there are 3 or more people playing and one leaves - they don't need to reconnect for it to happen.

@martindevans
Copy link
Member

Hi morderkaine,

I've just tried to reproduce this in 3.0.3 with no luck. With the HLAPI demo scene I connected 4 clients and then while holding down Push-To-Talk I disconnected one of the clients - nothing untoward happened on the server. Do you get the error in the HLAPI demo scene, or only in your game?

[Dissonance:Network] HlapiServer: Received a packet with incorrect session ID. Expected 1146703746, got 263488912. Resetting client.

This happens when a client sends a message to the server but the client uses the wrong session ID. All Dissonance packets (except the very first client->server handshake request) are prepended with the session ID which the server tells the client in the initial handshake reply. What's particularly odd is that you get this message spammed - the server sends back a message to the client telling it that the session ID is wrong and this should cause the client to disconnect itself and attempt to reconnect - so you should only get 2 or 3 of this message before the client catches up and disconnects itself.

Could you turn on Trace for Networking (Window > Dissonance > Diagnostic Settings) and capture a log where this happens? Make sure to turn it back down to Info afterwards - such a high logging level can badly impact performance.

no free events for message in the queue
no free events for message in the queue
Failed to send internal buffer channel:1 bytesToSend:236
Send Error: NoResources channel:1 bytesToSend:236

All of these messages seem to be saying basically the same thing - The HLAPI is running out of internal resources to send/receive messages on channel 1. If you send me that log I mentioned above hopefully there may be some useful stacktraces attached to these which will help me track it down.

@morderkaine1
Copy link

morderkaine1 commented Nov 7, 2017 via email

@morderkaine1
Copy link

morderkaine1 commented Nov 7, 2017 via email

@morderkaine1
Copy link

morderkaine1 commented Nov 8, 2017 via email

@martindevans
Copy link
Member

Thanks for all those logs, I think I've finally managed to reproduce this issue! Even better, I can only reproduce this in Dissonance 3.0.3 and cannot reproduce it no matter how many times I try with the dev branch - hopefully that means the mitigations I added for LordSkittles/jackoshea-dev are working.

The problem seems to be around handling crashes in the server. The server should recover from crashes by restarting itself, kicking all clients (because they're now using the wrong session ID) and then all clients should recover from the kick by reconnecting - thus fully resetting the server and all clients when a server side error happens. However this doesn't always seem to happen correctly - some/all of the clients end up in a limbo state (I still need to investigate why) where they think they're connected, but aren't really, so they never send/receive voice successfully.

One of the changes I made previously was making the server less strict about failing to send messages. If a reliable send fails due to a timeout then the server considers this ok and continues working. It seems that the HLAPI generates a timeout for most disconnects (as you'd expect when slamming a connection closed by e.g. killing the client process), so we definitely don't want to cause a server restart whenever that happens! This probably doesn't fix the underlying issue with the clients in a limbo state, but does vastly reduce the number of server side errors that ever occur.

@morderkaine1
Copy link

morderkaine1 commented Nov 9, 2017 via email

@martindevans
Copy link
Member

martindevans commented Nov 9, 2017

That sounds promising! Is a new release going to be going up soon? Or if not can I upgrade to the dev branch if its currently stable?

I think we're currently planning to do it next week (assuming Tom merges all my PRs without requesting any changes). It usually takes ~3 days to go up on the store after we submit. If you want a copy of it sooner than that email (martin@placeholder-software.co.uk) me your invoice number and I'll be happy to send that to you :)

[is there a] 'hang up' function

Usually that's tied in to the network integration layer (i.e. HlapiServer, HlapiClient and HlapiCommsNetwork). The integration informs the networking system that a client has left the session and then Dissonance removes that client from it's internal data structures. You could try calling ClientDisconnected on the server (forcing the server to think that client is gone) before they actually disconnect. However, this shouldn't be necessary if the integration layer is doing it's job properly!

@morderkaine1
Copy link

morderkaine1 commented Nov 10, 2017 via email

@martindevans
Copy link
Member

martindevans commented Nov 10, 2017

Woah! Never give out your invoice number publicly! Could you email me so I know which address to send back to.

@martindevans
Copy link
Member

Ah never mind, one of your earlier comments has your address in the footer.

@morderkaine1
Copy link

morderkaine1 commented Nov 10, 2017 via email

@martindevans
Copy link
Member

Dissonance 4.0.0 just went live with the changes for this issue included. I'll close this issue, please feel free to comment here if the issue persists :)

The changes I made in the end to resolve this were based on changes that Jack/LordSkittles recommended. I know you guys made some internal changes to call ClientDisconnected when the NetworkManager event fires. I didn't implement this because I don't want to require Dissonance users to implement a custom NetworkManager. However I did write the code necessary to do it and then comment it out - in HlapiServer.cs find the OnServerDisconnect methods and call them in your custom network manager.

I also made some changes which seem to resolve the issue without needing to modify the NetworkManager. Previously timeouts were considered an error, however it turns out that's a terrible idea because most HLAPI connections close with a timeout! Packet sending is now much more tolerant of timed out connections (it considers a timeout to simply be indicative of a disconnect and nothing more).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants