New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify and fix master server ping logic #15498

Merged
merged 2 commits into from Aug 18, 2018

Conversation

Projects
None yet
2 participants
@pchote
Member

pchote commented Aug 11, 2018

Fixes #15477 and removes a very old wart from the server code.

There are two key points to understanding this PR:

  • MasterServerPinger historically tried to update the master server ASAP when something changes. This is paired with a rate limit to stop the master server from being hammered: if the server data changes again while the first ping is in process, then the subsequent pings are ignored. This is obviously a very bad idea, and historically only worked out of luck. A recent change (didn't follow up to find which one) changed the order that the MasterServerPinger is notified, and it now tries to send a ping immediately before the server state is updated. The rate limiter then, more often than not, discards the ping containing the server state update (lobby → started or started → complete). The stale state lingers for three minutes until the next heartbeat ping finally syncs the correct state, causing #15477.
  • The server's run loop is driven by the Socket.Select call, which blocks until data is received from at least one connection or a given timeout (defined in microseconds) expires. This means that there is no guarantee at all for when a ServerTrait's ITick implementation is going to be called. TickTimeout was added so that the MasterServerPinger could force the loop to run at least once every 3 minutes. The authentication PR added a hack to reduce this to 100ms if it is waiting on a response to process. On closer inspection, MasterServerPinger's TickTimeout doesn't actually work, and the tick call runs every ~500ms anyway.

The first commit removes the unnecessary complexity of asking the ServerTraits for a timeout, and instead hardcodes a value of 1s normally or 0.1s when waiting for an auth callback.

The second commit moves the ping logic into ITick.Tick and change all the update triggers to set a timestamp. We then use this timestamp to decide when to update the master server (and LAN clients), with duplicate pings in a single tick now sending the final state instead of something inconsistent in the middle.

@pchote pchote added this to the Next release milestone Aug 11, 2018

pchote added some commits Aug 11, 2018

Rework master server ping rate-limit logic.
Pings are now delayed instead of dropped to avoid data loss.
@chrisforbes

This comment has been minimized.

Show comment
Hide comment
@chrisforbes

chrisforbes Aug 16, 2018

Member

Gross. OK.

Member

chrisforbes commented Aug 16, 2018

Gross. OK.

@pchote pchote added the PR: Needs +2 label Aug 16, 2018

@pchote

This comment has been minimized.

Show comment
Hide comment
@pchote

pchote Aug 18, 2018

Member

It seems unlikely we'll get a second review on this in time, so taking as-is and I'll fix any potential regressions if they occur.

Member

pchote commented Aug 18, 2018

It seems unlikely we'll get a second review on this in time, so taking as-is and I'll fix any potential regressions if they occur.

@pchote pchote merged commit 03adceb into OpenRA:bleed Aug 18, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@pchote

This comment has been minimized.

Show comment
Hide comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment