Fix encoder/game thread race; add ID fallback in event refs#10561
Merged
tool4ever merged 4 commits intoCard-Forge:masterfrom Apr 30, 2026
Merged
Fix encoder/game thread race; add ID fallback in event refs#10561tool4ever merged 4 commits intoCard-Forge:masterfrom
tool4ever merged 4 commits intoCard-Forge:masterfrom
Conversation
The encoder previously ran on the Netty I/O thread, reading TrackableObject state concurrently with the game thread's set() calls. EnumMap.writeObject would intermittently fail with AIOOBE, and because RemoteClient.write had no failure listener, the dropped setGameView left the client tracker in an inconsistent state. Move encoding onto the calling thread (which is also the only mutator) and hand the resulting ByteBuf to the channel for async write, preserving non-blocking sends. Inside wrapped events, replace IdRef/StaleCardRef with a unified EventCardRef/ EventPlayerRef carrying name and imageKey. resolve() prefers the tracker but falls back to a detached view when the id isn't there — handles CardViews referenced by events that aren't reachable from GameView, without relying on cross-message ordering. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tool4ever
reviewed
Apr 29, 2026
tool4ever
reviewed
Apr 29, 2026
tool4ever
reviewed
Apr 29, 2026
Address review on PR <!-- -->Card-Forge#10561: - Drop EventPlayerRef and route PlayerView through IdRef in event mode. PlayerView is stable in the tracker for the lifetime of a game (Player.getView() returns a stable instance, no zone-change copies), so the snapshot/fallback path was guarding a case that doesn't fire. Top-level protocol args have always trusted the tracker for PlayerView; events now do the same. Saves the embedded name on the wire. - Restore the JavaDoc on shouldReplaceTrackables that was over-trimmed, including the rationale for why applyDelta is not in the exclusion list. - Convert IdRef and EventCardRef to records. Same semantics, less boilerplate, free toString() for net log output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address review on PR <!-- -->Card-Forge#10561 (the JavaDoc-removal thread). Restores the seven JavaDoc blocks that were dropped in the original fix commit: class header, replace(), resolve(), measureSize(), WrappedEvent, wrapEvents(), unwrapEvents(). Four come back verbatim; three pick up StaleCardRef -> EventCardRef renames, and replace() adds a paragraph for the new eventMode/preserveSnapshot semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tool4ever
approved these changes
Apr 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
Investigating user report of mid-game network freezes/disconnects on Android multiplayer.
User logs from master showed
EncoderException: ArrayIndexOutOfBoundsException: length=247; index=247onsetGameViewmessages, surfaced by PR #10535'sNetwork send errorlistener. These errors are caught in the encoder pipeline and the game continues, so they don't directly explain the game freeze. But they pointed at a real concurrency bug worth investigating.To dig further, PR #10558 added supplementary instrumentation. Running manual and batch tests with that instrumentation in place revealed a second pattern that wasn't visible in either master or the user logs:
Could not resolve IdRefwarnings followed by NPEs inEventVisualizer/GameLogFormatter. The NPE is caught byGameProtocolHandlerand the game continues — the user-visible cost is a transient UI issue such as a missing sound or game-log entry, not a crash.Root cause of the user-reported freeze is still unknown, and these bugs aren't proven to cause them — but ruling them out either way is a correctness fix.
What was wrong
1. Encoder/game-thread race on the TrackableObject EnumMap
PR #10535 (
Async writes in RemoteClient: stop game thread blocking on slow consumers) replaced synchronous.sync()writes with async listener-based ones.The change was correct but had a side effect: the Netty
MessageToByteEncoderruns on the I/O event-loop thread, and pre-PR-10535 the.sync()implicitly guaranteed the game thread was parked while it ran. Post-PR-10535 the game thread continues mutatingTrackableObjectstate viaset()while the encoder serializes the sameEnumMap.EnumMap.writeObjectuses a size-then-iterate sequence; a concurrentput/removemakes iteration walk past the array bounds → AIOOBE.Compounding:
RemoteClient.write(used by the event-batching code path) had no failure listener — onlysenddid. When the encoder threw on asetGameViewqueued viawrite, the failure was silent; the message was dropped on the wire, leaving the client's tracker missing the just-mutated state.2. Event refs assume the receiver's tracker has every id
The wrap-events mechanism replaces nested
CardView/PlayerViewrefs inside eachGameEventwith compact wire markers (IdRef/StaleCardRef). This design predates the current code: theIdRefmarker was introduced by PR #10018 (inGameEventProxy),StaleCardRefwas added later by PR #9642 (delta sync), and PR #10343 consolidated both intoTrackableSerializer.wrapEventswhile removingGameEventProxy. Resolution requires the receiver's tracker to have the referenced id.That assumption is solid for top-level protocol method args (
selectCard(cv)— the server only invokes the method after registering the object) but not for refs inside events: events can carry CardViews that aren't reachable fromGameView(transient zone-change copies, off-graph refs). On a tracker miss, resolution returns null and downstream visitors throw NPE oncard.getCurrentState(). The NPE is caught byGameProtocolHandlerand the game continues — the user-visible cost is a missing sound or game-log entry, not a crash.StaleCardRefalready carriedname + imageKeyfor the specific case where the server's tracker held a different incarnation of the same id, but the general "tracker doesn't have this id at all" path was uncovered. PR #10343 didn't introduce this gap — it inherited it from the olderGameEventProxydesign.The fix
1. Encode on the calling thread
CompatibleObjectEncoderexposesencodeToBuf(msg, alloc)— runs the encode logic synchronously on the calling thread and returns a freshByteBuf.RemoteClient.send/write/sendAndWaitandFGameClient.sendnow call this beforechannel.writeAndFlush(byteBuf). The network I/O remains async — only the encoding step moves.The thread race becomes structurally impossible: the calling thread is the only mutator of the data being serialized, so reads are sequential with writes by definition.
RemoteClient.writealso gains a failure listener so any future encode failures surface instead of silently dropping the message.Alternative option considered but rejected was to override
TrackableObject.writeObjectto take a brief synchronized snapshot of thepropsEnumMapand serialize the snapshot. That removes the EnumMap-level race without changing where encoding runs. However on testing the snapshot was shallow, andEnumMapvalues that are themselves mutable lists aren't covered. Snapshotting "deep enough" reduces to re-implementing serialization, with a fresh defect at every nested data structure. Solving the root cause of the race is the simpler architectural solution.2. Event refs carry display fallback data
Replaced
IdRef/StaleCardReffor event-internal refs withEventCardRef(id, name, imageKey, preserveSnapshot)andEventPlayerRef(id, name, preserveSnapshot).resolve()tries the tracker first; on a miss, constructs a detached view from the embedded fields. ThepreserveSnapshot=trueflag handles the previousStaleCardRefcase (server's tracker holds a different incarnation than the event captured).IdRefis retained for top-level protocol method args, where tracker presence is guaranteed by construction.🤖 Generated with Claude Code