Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: play easier together with friends from behind home routers #9017

Closed
wants to merge 17 commits into from

Conversation

@TrueBrain
Copy link
Member

@TrueBrain TrueBrain commented Apr 11, 2021

Integrated #9234. Closes #9234.

Test binaries can be found here: https://www.openttd.org/downloads/openttd-branches/pr9017/latest.html
See below how to help out testing.

If you build yourself, make sure to name your branch "pr9017". Otherwise your revision will not match anyone using the binaries above.

!!NOTE!! Currently the Game Coordinator is not working with this PR, while I rebuild this PR from the ground up. See https://gist.github.com/TrueBrain/75c016cdb90efa18cb79f3f6bae9af25 for details.

Motivation / Problem

Currently, to host a network server you really need to know a few things about routers and networking. This is way to advanced for the average Joe, and this becomes very clear when looking at Steam Discussions.
For Rich Presence (Discord, Steam, ..) we need "invite codes" so people can join each others game.

So, can we do this? I wrote a gist to explain the process of getting to this PR: https://gist.github.com/TrueBrain/f5685dc0fdbd012aac48ea2a36158198

But in case you are not into reading, let me summarize:

  • Every invite-only / public server creates a persistent TCP connection to a Game Coordinator (replacing the master-server)
  • Every server gets assigned an unique "join key" (called an "invite code" in the UI)
  • Server listing returns only the "join key" of a server
  • A client can join any server based on this "join key" (without even knowing the IP of the server), either via listing or directly inputting it
  • The Game Coordinator will try to connect the server and client. It tries 3 methods: direct connect, STUN, and if all fails, TURN (see gists for details)

There are now three kind of servers:

  • Local servers: do not connect to the Game Coordinator. "Search LAN" does work for these servers, as do people with the public IP and if port-forwarding is setup correctly. Just the old "non-advertised" servers.
  • Invite-only servers: the owner of the server needs to give the invite-code to someone before he can join.
  • Public servers are the old advertised servers.

Description

See the individual commits and the gist above. Not much more to add.

Limitations

  • Server info is delayed by 30s (by design). You can refresh individual servers if you want up-to-date information.

Testing

If you want to help out testing, great!

Notice how you didn't have to setup firewalls, port-forwarding, etc?

Let me know if you find any problems, etc.

Checklist for review

Some things are not automated, and forgotten often. This list is a reminder for the reviewers.

  • The bug fix is important enough to be backported? (label: 'backport requested')
  • This PR affects the save game format? (label 'savegame upgrade')
  • This PR affects the GS/AI API? (label 'needs review: Script API')
    • ai_changelog.hpp, gs_changelog.hpp need updating.
    • The compatibility wrappers (compat_*.nut) need updating.
  • This PR affects the NewGRF API? (label 'needs review: NewGRF')
@TrueBrain TrueBrain marked this pull request as draft Apr 11, 2021
Copy link
Contributor

@nielsmh nielsmh left a comment

I didn't read everything in detail, but this doesn't remove local network UDP broadcast query, right?

Loading

src/network/network_gui.cpp Outdated Show resolved Hide resolved
Loading
@TrueBrain
Copy link
Member Author

@TrueBrain TrueBrain commented Apr 11, 2021

I didn't read everything in detail, but this doesn't remove local network UDP broadcast query, right?

Local broadcast will remain, yes.

Loading

@James103
Copy link
Contributor

@James103 James103 commented Apr 25, 2021

  • Load test Game Controller if it can handle 2k servers and 5k clients

How exactly did you arrive at the load test goals (2,000 servers and 5,000 clients)?

Loading

@TrueBrain
Copy link
Member Author

@TrueBrain TrueBrain commented Apr 25, 2021

  • Load test Game Controller if it can handle 2k servers and 5k clients

How exactly did you arrive at the load test goals (2,000 servers and 5,000 clients)?

I was sure I wrote that down somewhere .. guess I did not :D

It is a worst-case estimate: on peak on Steam we had 3.2k people online. Say, with a good multiplayer, they all want to play with their friends, round it up, gives an upper value of ~5k.
Assuming most people will just play with their friends, means ~2.5 people per server, so 2k servers.

Infrastructure wise, being able to handle that, means I can sleep peaceful at night.

You think I am understating? Let me know :)

Loading

@TrueBrain TrueBrain changed the title WIP: allow multiplayer without port forwarding (using STUN) WIP: allow multiplayer without port forwarding May 5, 2021
@TrueBrain TrueBrain force-pushed the stun-me branch 2 times, most recently from 63b5564 to 628266f May 5, 2021
src/network/core/tcp_connect.cpp Outdated Show resolved Hide resolved
Loading
src/network/core/tcp_connect.cpp Outdated Show resolved Hide resolved
Loading
src/network/core/address.h Outdated Show resolved Hide resolved
Loading
src/network/network_coordinator.cpp Outdated Show resolved Hide resolved
Loading
src/network/network_gamelist.cpp Outdated Show resolved Hide resolved
Loading
src/network/network_gui.cpp Outdated Show resolved Hide resolved
Loading
src/network/core/tcp_coordinator.cpp Outdated Show resolved Hide resolved
Loading
src/network/network_coordinator.h Outdated Show resolved Hide resolved
Loading
@TrueBrain TrueBrain force-pushed the stun-me branch 12 times, most recently from 7fed847 to 09f08bc May 9, 2021
@TrueBrain TrueBrain marked this pull request as ready for review May 9, 2021
@TrueBrain
Copy link
Member Author

@TrueBrain TrueBrain commented May 9, 2021

This is pretty much done and ready for review. I am 100% sure there are bugs, so lets find them and fix them :)

The Game Coordinator is not yet running on AWS, and is not scalable yet. Otherwise it should do its job. So any client-facing bug you can find, please report that.

My suggestion is to get this PR code-reviewed and merged. After that I will work on the Game Coordinator, to make it production-ready. But that also gives time to test the client itself.

There is plenty of future-work to be done, but this can be found back here:
https://gist.github.com/TrueBrain/d430a5f0e7e4312b895278f092067495

Loading

@TrueBrain TrueBrain changed the title WIP: allow multiplayer without port forwarding Feature: multiplayer without port forwarding May 9, 2021
src/network/core/address.cpp Outdated Show resolved Hide resolved
Loading
src/network/network_gamelist.cpp Outdated Show resolved Hide resolved
Loading
src/network/network_gamelist.cpp Outdated Show resolved Hide resolved
Loading
src/network/network.cpp Outdated Show resolved Hide resolved
Loading
src/network/network_coordinator.cpp Outdated Show resolved Hide resolved
Loading
src/network/network_coordinator.cpp Outdated Show resolved Hide resolved
Loading
src/network/core/address.cpp Outdated Show resolved Hide resolved
Loading
src/network/core/tcp_stun.h Outdated Show resolved Hide resolved
Loading
src/network/network_coordinator.cpp Show resolved Hide resolved
Loading
GRFConfig *c = new GRFConfig();
DeserializeGRFIdentifier(p, &c->ident);
std::string name = {};
if (newgrf_mode == GAME_INFO_NEWGRF_MODE_FULL) {
Copy link
Contributor

@rubidium42 rubidium42 May 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know the average number of times the same NewGRF is used by different servers? Like NewGRF A is used by 10 servers, NewGRF B by 5, NewGRF C by 1, so the average is about 16/5 ~= 5. That could be used to influence choosing an even more efficient way.
You would then be creating a list of used NewGRFs at the top of the packet with the whole listing, and here you pass indices into that list. Those indices can probably be uint16 but stick with uint32 for now for an added safety buffer. At an average of 2 you would be saving about 16 + NewGRF name length bytes per NewGRF, with an average at 3 you would save about 32 + 2 * NewGRF name length, and so forth.

Loading

Copy link
Member Author

@TrueBrain TrueBrain May 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are tons of more efficient ways honestly. I liked the suggestion to apply compression on such packets .. that also sound really useful, and maybe means we don't even need a trick like yours. But that needs more work and benchmarking.

For now, I went for a solution that was simple and elegant, rather than complicated and touching a lot of stuff. Future work could optimize this by a lot. But I rather not do that in this PR :) If you agree, ofc.

Btw, this last commit is purely here as otherwise after this PR we would have lost functionality we had before the PR.

Loading

@glx22
Copy link
Contributor

@glx22 glx22 commented May 9, 2021

Did some testing with a dedicated server and a client.

  • connection type is "Behind NAT", while a build from master is correctly advertised on master server, so direct connection should work.
  • clicked "search internet", server is listed, its address is the invite code
  • clicked "search LAN", server is added with one of the local ipv4
  • clicked "search LAN" again, server is added with the other local ipv4
  • after a short time both local ipv4 are removed from the list, and I can see some redrawing issues for the server list too
  • joined the only left server as spectator, works as expected

I had an issue with a ":3979" server, but I can't reproduce it :(

Edit: I had the IPv6 firewall activated on the router, and it has only 2 modes on or off, no fine tuning. Disabled it to test and connection type became "Public". Build from master was advertising only the IPv4, as IPv6 was blocked by this "better than nothing" firewall.
So to resume:

  • if server is IPv6 only, connection type is "Behind NAT" if IPv6 firewall is on, "Public" otherwise
  • if server is IPv4 only, connection type is "Behind NAT" but should be "Public" because direct connection works (can advertise)
  • if server is IPv4/IPv6, connection type is "Behind NAT" if IPv6 firewall is on, "Public" otherwise

Loading

@James103
Copy link
Contributor

@James103 James103 commented Jul 1, 2021

@TrueBrain You mentioned that you load tested the Game Controller and showed that it can handle at least 10k clients. Do you also plan on load testing the TURN server as well, in order to get a good feel for how many active clients and servers it can support so that the network can scale accordingly?

Loading

@TrueBrain
Copy link
Member Author

@TrueBrain TrueBrain commented Jul 1, 2021

@TrueBrain You mentioned that you load tested the Game Controller and showed that it can handle at least 10k clients. Do you also plan on load testing the TURN server as well, in order to get a good feel for how many active clients and servers it can support so that the network can scale accordingly?

Yeah, tested that in one go too! Also the STUN server btw. Seems to all work and seems rather stable. Python and asyncio have a very small overhead per connected client (few KB of RAM, that is it). And CPU is the one thing we aren't short in, so not a real concern. So that all seems to work fine during testing. Of course .. real world can be a different story (stalling connections, very slow ones, etc) .. but I think we fixed most of those issues already with the current master-server (the framework used is identical between the two).

The reason for the 10k btw is that my computer couldn't spawn more clients in a useful way :P

Loading

src/network/core/tcp_connect.cpp Outdated Show resolved Hide resolved
Loading
src/network/network_coordinator.cpp Outdated Show resolved Hide resolved
Loading
src/lang/english.txt Outdated Show resolved Hide resolved
Loading
src/network/core/address.cpp Outdated Show resolved Hide resolved
Loading
src/network/core/address.cpp Outdated Show resolved Hide resolved
Loading
src/network/network_gui.cpp Show resolved Hide resolved
Loading
TrueBrain and others added 17 commits Jul 2, 2021
…dress

This allows future extensions to have different ways of referencing
a server, instead of forcing to use IP:port.
The Master Server doesn't understand NetworkGameInfo v5, so it is
of no use anyway.
This removes all UDP from the game except for a local broadcast
to find LAN games.
In this mode you do register to the Game Coordinator, but your
server will not show up in the public server listing. You can give
your friends the invite code of the server with which they can
join.
NewGRFs cannot be changed in multiplayer games, and consumes a lot
of bytes. So only send it once to the Game Coordiantor on registration,
and skip the field after that for any update.
A new game will restart the registration, so any NewGRF change
between games is picked up via the new registration.

Clients that look at the NewGRFs a server has, want to know the names
of the NewGRFs more than the IDs. So in addition to the ID and MD5,
send the full name to the clients too.

This solution is picked so the Game Coordinator isn't sending the
names of all the NewGRFs to the clients constantly (which would
make the server-listing very large), but still allow clinets to
see the name of a specific server if he clicks the button.
@TrueBrain
Copy link
Member Author

@TrueBrain TrueBrain commented Jul 19, 2021

Slowly we have been merging parts of this PR into master. #9447 is the last piece. Once that has been merged, everything in this PR (and yes, I checked) has been merged into master too.

By splitting it up, there was a huge jump in quality of what has been delivered. It really was the right call to make everything smaller, although it took significantly more work to do so. Either way, the result is that the backend is running on AWS like the rest of our infra, and that the client/server are doing what they should be .. at least for as far as I could reasonably test.

After #9447 is merged, the testing is up to the public .. and I am sure they will find plenty of edge-cases for us to fix :D But we will tackle them if/when they come in.

For now, I am going to close this PR. Tnx all who contributed, and feel free to test the latest nightly for STUN .. and hopefully soon TURN will hit the nightly too :D

Loading

@TrueBrain TrueBrain closed this Jul 19, 2021
@TrueBrain TrueBrain deleted the stun-me branch Jul 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

9 participants