Asynchronous send: stop game thread blocking on slow clients#10535
Merged
tool4ever merged 1 commit intoCard-Forge:masterfrom Apr 27, 2026
Merged
Asynchronous send: stop game thread blocking on slow clients#10535tool4ever merged 1 commit intoCard-Forge:masterfrom
tool4ever merged 1 commit intoCard-Forge:masterfrom
Conversation
…mers Replaces channel.writeAndFlush(event).sync() in send() and sendAndWait() with addListener on the returned ChannelFuture: the game thread fires and continues, errors are reported via listener (cause inlined since tinylog config doesn't render throwables), and sendAndWait completes its reply with null on write failure so it doesn't park for the 45 s heartbeat timeout. FServerManager configures WriteBufferWaterMark(64 KB low / 1 MB high) on each client channel and adds a SaturationLoggingHandler. When a channel's outbound buffer crosses the high mark and isWritable() flips false, the handler logs an entry warning with pending bytes; on recovery it logs a single line with the elapsed duration and the count of server sends attempted during the window (tracked via an AtomicInteger on RemoteClient, incremented inside send/write/ sendAndWait whenever isWritable() is false). Network log analyzer replaces the now-dead "send() blocked X ms" parsing with parsing for the new saturation recovery lines, exposing episode count, total / average / max duration, and total queued sends in the network performance section of analysis reports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
|
seems good (as long as we don't switch to UDP :D) |
tool4ever
approved these changes
Apr 27, 2026
RafaelHGOliveira
added a commit
to RafaelHGOliveira/forge
that referenced
this pull request
Apr 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's the issue
In network multiplayer server writes move at the speed of the slowest client. The cost of one bad connection (high latency, low effective throughput, slow client-side processing, anything that delays TCP acknowledgments) is paid by every player in the game, because the game thread is shared.
This is because the server's game thread cannot fire a new event to remote clients until every connected client has received the previous one. Each event goes through
channel.writeAndFlush(event).sync()per client, in sequence — and.sync()blocks the calling thread until the write has been accepted by the OS socket buffer, which can only happen after the receiver's TCP stack has acknowledged enough prior bytes. Slow clients hold up the chain.Bursts of game activity that produce many events in a short time multiply the effect — each event in the burst waits for the full slowest-client round-trip before the next can fire.
What's the history
The synchronous behaviour does not appear to be an intentional design choice. Git history shows
.sync()was added in February 2018 (802036f693e, "print a message when unable to send event") so write exceptions would flow into a surroundingtry/catchfor logging. Without.sync(),writeAndFlushreturns aChannelFuturewhose failure is reported asynchronously via listener — which the 2018 patch did not use. The original Forge networking before Feb 2018 was async. The proper fix in 2018 would have been to attach an error listener.What's the fix
Replace the synchronous
channel.writeAndFlush(event).sync()calls inRemoteClientwith asynchronouschannel.writeAndFlush(event)plus aChannelFuturelistener for error reporting. The game thread fires the write and continues; success/failure is observed off-thread.User-facing effect
.sync(). With the change, the host's UI updates in real time regardless of any remote client's network. This is often the most visible benefit, since the host is usually actively playing while their game freezes whenever a remote opponent's link stalls.sendAndWait): the game still waits for their reply, by design. The engine cannot progress without the answer, before or after the change. If the player on the bad connection happens to be the one being prompted, they're the bottleneck — but that's current behaviour, not regression.How it works
Three changes in
RemoteClient:send()writes async; errors logged via listener. The oldsend() blocked X mstelemetry is removed — once the call is async, the calling thread doesn't block, so blocked-ms is meaningless. The equivalent signal (a client falling behind) now comes from saturation logging — see below.sendAndWait()no longer delegates tosend(); it does its own asyncwriteAndFlush()with a listener that completes the reply future withnullon write failure, so the game thread doesn't park for up to 45 s waiting for heartbeat to detect a dead connection.write()is unchanged in send semantics — it gains a saturation-counter increment, nothing more.Client synchronisation is preserved:
The
sendAndWait()path is the mechanism for "server fires a request and waits for the client's answer" — the engine genuinely cannot progress past a prompt (target selection, mulligan, ordering triggers, paying X) without the response. The reply-future wait insidesendAndWait()is what enforces that synchronisation, and it is unchanged. The async refactor only removes the additional block on the write itself flushing, which today happens on top of the reply wait, contributing nothing to correctness.Per-request wait goes from
(write_flush + client_processing + reply)to(client_processing + reply). The "waiting for the client's answer" part is preserved exactly. If a write actually fails (channel closing, encoder error), the listener now completes the reply future withnullimmediately, and callers already handlenullas "client did not answer" — strictly faster than today, where they parked until the 45s heartbeat noticed.Saturation error logging (replaces
send() blocked X ms)With async writes, per-call blocking time is no longer a meaningful signal — but we still want to know when a client can't keep up. The replacement observes Netty's outbound buffer directly:
WriteBufferWaterMark(64 KB low, 1 MB high)configured on each client channel. Hard-coded constants — same standard everywhere.New
SaturationLoggingHandlerlistens forchannelWritabilityChangedevents and logs entry/recovery as a paired set of lines:The "47 server sends" counter is incremented inside
send()/write()/sendAndWait()wheneverisWritable()is false, so the recovery line shows how many writes piled up during the saturation window.NetworkLogAnalyzerswaps its oldsend() blocked X msparser for one that reads the new recovery lines, reporting episode count and total / average / max duration in the network performance section.🤖 Generated with Claude Code