New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Desync for Australian/New Zealand users on US headless sessions with high player count #3039
Comments
As far as I know, Australia has a very low-bandwidth uplink to the rest of the world, so even if your local connection has high bandwidth, connection to any host outside of Australia would be rate-limited. Desync happens because that rate-limit is saturated with streams from all users coming through the host user. Datamodel changes have lower priority and thus changes get delayed. I'm not sure what the solution to this would be apart from adding more uplink cables from Australia to the rest of the world to add bandwidth. Maybe mesh networking would help a bit as you would have multiple connections to different hosts instead of one. |
That does make sense, although the only thing that I would believe has a higher priority is the voice mode of other users but this doesn't seem to be the case. |
Sadly I'm unsure how much can be done here(besides perhaps mesh networking linked above) however the only possible suggestion is asking users to change their settings to Steam networking sockets in Neos. Mind you this isn't a direct fix, I just remember helping some users before that were desyncing badly due to latency and that Steam networking sockets helped them somewhat, at least allowed them to play in a semi-usable state. |
Unfortunately this is a limitation of the current transport protocol that we are using, it unfortunately degrades with certain connections (typically high latency, but can also be just a result of quirks of given connection). You can try switching to "Prefer Steam Networking Sockets" in the settings, which can behave a lot better in these scenarios, but unfortunately currently doesn't have bandwidth estimation, so it can end up dropping the connection as well. We are currently waiting on Valve to implement this feature so we can switch to it as the main protocol (or specifically the open version Game Networking Sockets). |
Would this also be why people on very low speed connections (e.g. ~3Mb/s down, 0.5Mb up ) are unable to ever reach sync, even in a world with one other person? I have a friend in France and regardless of whether he is host or client, even with a single other person present with almost nothing else happening, he will not be able to maintain sync. |
Speaking of, if bandwidth estimation would be implemented (by SNS or GNS), could it be used by Neos to downgrade audio streams bitrate and maybe pose stream frequency to allow for eventual sync for clients with very limited bandwidth? |
GreaseMonkey here. The server we were getting desynced on is the main Creator Jam hub server which is hosted in Ukraine. I get a ~305 msec ping to it. For reference, about 8 years ago I was getting pings of about 200 msec to US West, 260 msec to US East, and 350 msec to Europe. Nowadays I get about 180 msec to US West and I haven't remeasured the rest. I think I'm hitting the LNL Relay connection in several cases, even though I know that my NAT handles UDP holepunching just fine. Connection speed is... pretty decent? I suspect it does boil down to latency. |
Whenever I had any major desync on some sessions, and switched to SNS, I sometimes got this strange side effect where everything seemed like it was synced, but in realty, all of my actions, voice and my perspective of the world was 30+ seconds behind. And to everyone else, I was lagging behind their conversations/actions by 30+ seconds. It looked very different to the regular desyncing issue which doesn't usually affect voices. |
@Enverex Don't know, we'd need to gather data on that. Does Steam Networking Sockets make a difference? Could be lots of things with the connection. @shadowpanther That's unlikely, typically that detail is hidden in the protocol itself and not accessible. Generally those don't pose much of an issue anyways, since those are using streams and can be lost. It's the reliable changes that start queuing up. Usually it's not even the bandwidth itself, but rather packet rate. @iamgreaser Sometimes it's the combination of connections that just don't work with UDP holepunching. We've had cases where you could have connection A, B and C. A and B would work fine, B and C would work fine with each other, but A and C will always go through the relay. @Lexevolution Steam Networking Socket handles things differently, which is why you get that effect. Essentially you hit the fixed bandwidth limit, so everything starts trickling through and delaying like that. It's why we need the bandwidth estimation to be implemented in the protocol before we can switch to it as primary one. |
Is there any roadmap or really rough date when we could see that happening? Unfortunately the desync issue for Australians which i only experience on Neos pretty much locks us out of things like the MetaMovie which was a ton of fun when it worked and other big events, |
@sveken We asked on the bandwidth estimation here: ValveSoftware/GameNetworkingSockets#108 But currently there's no ETA on their end so we just have to wait and see before the switch. I saw some movement for bandwidth estimation a few months ago. There are a few things that I'm looking into on our end though that might help improve the network performance, mainly with combining smaller messages into a bigger one to reduce the overall packet-rate. |
I pushed upgraded LNL library in 2021.10.25.1351, which should have a number of improvements that should help with this. Can you give that a go and see if it's any better please? |
Will give a little test tonight and i have booked another Metamovie ticket for the 30th to test there, as that is where i ran into the most desync issues with all the cool things that go on. Will report back afterwards, Just to confirm, i am best to disable the "Prefer Steam Sockets" now with the new update? |
I'll do the same this evening, and I'll see how it goes with the creatorjam this weekend as it is really consistent with desync there. @sveken yes. Disable preferring steam sockets as the updated libraries only affect LNL networks. (from what I understand) |
To continue/answer the questions asked in the other issue,
|
@sikirebirth If you can get a screenshots of the user list in the essentials that can help! Did you check the queued packets yourself on your end or did the host check? Ideally if you can get the host to check what it says for you that can help, because the value you're seeing won't be quite up to date, due to the data model being delayed. |
The qued packets were always checked by hosts of the worlds I was in, and other people of the world. |
Sweet thanks for the info! We'll see if we get some more data from others in the meanwhile too. |
This weekend I'd like to see how the changes to networking affect both DeSync and ReSync sessions. Another temporary solution to try limit the issue I'll eventually get around to is spreading both sessions into multiple smaller sessions with tighter controls to limit the number of people in one session, around 5 maybe more, by allowing many more people to be connected to the headless spread over multiple worlds and having items syncronised that may be present in more than one individual session similar to VBLFC using nested sessions. Currently this is much more preferable than organising a private intercontinental network bridge between the two current headless sessions running, since such an arrangement would be very expensive |
I have only done limited testing so far, (still waiting for the weekend). As soon as the user count dropped to 17 or lower the queued packets rapidly started to drop down and go back to 0. |
@sveken Thanks for the info!
It almost sounds like it hit some bandwidth limit on the host. @BigRedWolfy Limiting the session can definitely help as it lowers the overall amount of bandwidth, but we'd like to make sure there can be as many people as possible. There's still aspect that the updated LiteNetLib doesn't help much - it uses the sliding window algorithm, which doesn't scale super well with large latency. The Game Networking Sockets should work much better in this regard, but we're still waiting on bandwidth estimation. |
I can't remember unfortunately, is there a previously visited section i can check for you? |
Thanks for the info! Do you know at least know where they were located and what the ping was? |
After some testing desync is still occurring (but only occurring at around 16 to 18+ users in the session), and once the session reaches a certain amount of network traffic the queued packets increase around a 100 packets every 2 to 3 seconds without stopping, unless the player count decreases. But the updated libraries have certainly helped as I only start getting desync once the player count is around 16 to 18 users which is nearly an additional 10 users than before the update, and the queued packets catch up relatively quickly only being around 30 to 40 seconds until I'm fully synced once the player count is below the threshold. The session had ~140 ping when testing. Here are the logs from the session. Most of the desync occurred at 6:28:39 PM.075 |
Just did the MetaMovie again, Definitely an improvement however. |
I don't know if my issue is related to this, but I have recently been experiencing out-of-sync issues in certain sessions. This has been happening more frequently since the recent update regarding the network came in. Specifically, item locations, Logix processing, etc. are out of sync. Strangely enough, the voice and user locations seem to be perfectly in sync, and when I check the QueuedPackets in Neos, it shows 0. Sometimes I can't see the user even though they are supposed to be there. When I go to the dash menu and look at the session details, it looks like the user is not there. This problem did not seem to be related to whether or not I was using SteamNetworkingSockets. My internet speed is 368.0Mbps↓/29.5Mbps↑. The logs are attached. The problem occurs when I'm in a session called "SLOT開発室", around 3:00. |
Hello, I've also been experiencing this issue the past two weeks and just wanted to add another datapoint. I've been trying to play on a headless server located in the US (Washington to be specific) and I am located in Germany with a 500 MBit/s down, 50 MBit/s up DOCSIS 3.1 based connection. My latency to the server according to the userlist is around 60-80ms (tho I have been told that number is just a single direction, so double that for RTT I guess?). I can play fine with around 6-8 people on the server but more than that and I quickly start getting a massive packet-queue in the hundred-thousands and rising and extreme desync. The connection is an LNL connection and I've tried both directly connecting via IP and by joining the session through my contacts-list which seems to use NAT-Punchthrough. I've also tried to use German and US-based servers from a high speed VPN service to connect just as an experiment and that made no difference. Another user from Germany has experienced the same issue in that server and I know that we both use the same ISP provided router (Arris TG3442DE), so since I wanted to get a better router anyways I'm going to buy a new one soon and check if that has any influence on this issue. I hope this can be resolved soon, as this issue completely prevents me from taking part in the weekend events my community is hosting. Edit: An update to this, we figured out that the world that the headless server was using had a clock in it with extremely unoptimized logix that was spamming network packets every tick. Removing that clock seems to have fixed the issue for me enitrely, though we were "only" able to test it with around 16 people. |
Describe the bug?
For primarily Australian and New Zealand users that are on a US headless servers with a higher player count (around 8 to 10 or more) Desync begins to occur.
Note:
is playable at around 4 to 5 or lower).Desync begins to affect primarily:
Inspector panels taking 100 to 200 seconds to show any slots. (This being due to inspectors being generated by the host and thus desynced users need to wait for the host to generate the UI)
Note: With the addition of the queued packets node This time is directly proportional to the amount of queued packets a given user is experiencing. ( Roughly every 1000 QP is the equivalent of 10 seconds of delay. However this is just an estimate as this ratio varies on ping and other factors like bandwidth usage)
objects other users have grabbed and moved only catching up after around 100 seconds (the same time it takes for inspectors to catch up).
other users voice modes. for example if someone, that is not the desynced user, changes their voice mode to mute they will not be muted for the desynced user until the desync has catches up which they will be muted mid sentence.
(note that their is no desync between player to player communication, like talking to another user is not affected by desync as well as their avatar's movements)
objects other users have spawned only appearing for the desynced user once desync catches up.
any changes other users have made, like changing a material, mesh etc.
When a user changes their avatar, they appear frozen until the desync catches up. This also stops all player to player interaction that isn't normally been affected by desync like talking and avatar movement/player position, but only for the user that has changed their avatar.
Relevant issues
no apparent relevant issues that haven't already been fixed.
To Reproduce
Reproducing this issue may be hard to replicate for users other than Australian and New Zealand users but is a consistent issue in creator jams where the player count is at around 10 or more and the headless is US.
the main things needed to reproduce the bug are to be connected to a headless that is generally a very far distance from where you are playing (a VPN may help with this) and to have the session have 10 or more users.
Expected behavior
to have no desync in headless that are in a different region to users that join, and for there to be no desync with a high player count session.
Log Files
DESKTOP-F65B6FQ - 2021.9.16.105 - 2021-09-19 10_57_51.log
the issue occurs at 1:28:57 PM.179
Screenshots
No response
How often does it happen?
Always
Does the bug persist after restarting Neos?
Yes
Neos Version Number
2021.9.16.105
What Platforms does this occur on?
Windows, Linux
Link to Reproduction Item/World
No response
Did this work before?
I Don't Know
If it worked before, on which build?
No response
Additional context
No response
Reporters
Neos: Crusher
Discord: Crusher#6146
The text was updated successfully, but these errors were encountered: