Skip to content

Don't block IceAdapter.start() on telemetry websocket connect#84

Open
juerghegglin wants to merge 1 commit intoFAForever:masterfrom
juerghegglin:fix/issue-42-async-telemetry-startup
Open

Don't block IceAdapter.start() on telemetry websocket connect#84
juerghegglin wants to merge 1 commit intoFAForever:masterfrom
juerghegglin:fix/issue-42-async-telemetry-startup

Conversation

@juerghegglin
Copy link
Copy Markdown

@juerghegglin juerghegglin commented May 6, 2026

Problem

TelemetryDebugger.startupComplete() calls websocketClient.connectBlocking() synchronously on the IceAdapter startup thread, with no timeout. Since this is the last thing IceAdapter.start() does, the entire startup is held until that call returns.

When the telemetry server is silently unreachable — TCP layer up but application layer hung, a common failure mode where a load balancer fronts a dead backend — the WebSocket Upgrade handshake never receives an HTTP 101 response. connectBlocking() then waits until TCP keepalive eventually kills the socket, around two hours later. The user-visible symptom matches what's reported in #42: the ice adapter "doesn't connect to other players" because the FAF client's adapter-ready orchestration trips its own timeouts long before startup returns.

Why this is safe

Telemetry is a debug-only observability channel:

  • All messages in ice-adapter/src/main/java/com/faforever/iceadapter/telemetry/ implement OutgoingMessageV1 — strictly one-way, adapter → server.
  • TelemetryDebugger.onMessage() only logs incoming messages; no code path consumes inbound telemetry data.
  • On connect failure, Debug.remove(this) runs; subsequent debug().peerStateChanged(...) etc. iterate an empty debugger list and are no-ops.
  • No peer-connect path in GameSession or PeerIceModule waits on or queries telemetry state.

Fix

Run the connect on a virtual thread so startupComplete() returns immediately. The pre-existing reconnect loop in sendingLoop() already handles transient drops; queued telemetry messages catch up once the socket opens because messageQueue is unbounded.

Two small additional cleanups in the same diff:

  • The catch (InterruptedException e) block was missing a return, so an interrupted startup fell through to sendMessage(new RegisterAsPeer(...)), queueing telemetry on a never-opening socket.
  • Re-assert the interrupt flag with Thread.currentThread().interrupt() per Java best practice.

Verification

Verified locally on a JDK 21 / Gradle 8.8 build:

  1. Reproduced the bug with a JUnit 5 test against an in-process "silent" TCP server that accepts the handshake but never replies. Without the fix, the test hangs until JUnit's @Timeout(10s) kills it.
  2. With the fix, the same test passes in well under 100ms.
  3. :ice-adapter:check (compile + spotlessCheck) is green.

The JUnit test isn't part of this PR — ice-adapter doesn't have a JUnit setup yet, and adding one alongside a one-file fix felt like scope creep. Happy to send a follow-up PR introducing JUnit + the regression test if maintainers want it.

Fixes #42

Co-authored with Claude (Anthropic); reviewed and locally verified by me.

Summary by CodeRabbit

  • Refactor
    • Improved telemetry debugger startup performance by handling connections asynchronously, preventing blocking operations during initialization.
    • Enhanced error handling and logging for telemetry connection failures.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

Warning

Rate limit exceeded

@juerghegglin has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 56 minutes and 13 seconds before requesting another review.

To continue reviewing without waiting, purchase usage credits in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 14ed670c-f5eb-47ff-a3f3-e9ed3059dd86

📥 Commits

Reviewing files that changed from the base of the PR and between d49f9fe and ca3282c.

📒 Files selected for processing (1)
  • ice-adapter/src/main/java/com/faforever/iceadapter/debug/TelemetryDebugger.java
📝 Walkthrough

Walkthrough

The TelemetryDebugger.startupComplete() method is refactored to connect asynchronously to the telemetry websocket via a virtual thread, decoupling telemetry availability from IceAdapter startup. Connection failures or interruptions remove the debugger cleanly.

Changes

Asynchronous Telemetry Connection

Layer / File(s) Summary
Control Flow Refactor
ice-adapter/src/main/java/com/faforever/iceadapter/debug/TelemetryDebugger.java
startupComplete() now spawns a virtual thread ("telemetry-connect") to perform the blocking websocket connection asynchronously instead of blocking the caller. Thread completion triggers RegisterAsPeer message on success, or removes the debugger on interruption with appropriate logging.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~18 minutes

Poem

🐰 A thread once blocked the startup dance,
But virtual threads gave it a chance!
Now telemetry waits in async bliss,
While adapters connect—no service they'll miss! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: preventing synchronous blocking of IceAdapter.start() during telemetry websocket connection.
Linked Issues check ✅ Passed The PR directly addresses issue #42 by decoupling telemetry from IceAdapter startup, running websocket connect asynchronously on a virtual thread to prevent blocking.
Out of Scope Changes check ✅ Passed All changes are narrowly scoped to TelemetryDebugger.startupComplete() and directly address the blocking issue; no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…AForever#42)

TelemetryDebugger.startupComplete() called websocketClient.connectBlocking()
synchronously on the IceAdapter startup thread, with no timeout. Telemetry
is a debug-only observability channel - its data flow is strictly outgoing
(see ice-adapter/src/main/java/com/faforever/iceadapter/telemetry/, all
messages implement OutgoingMessageV1) and no game logic depends on it being
connected - so it should never gate peer connectivity.

When the telemetry server is silently unreachable (TCP layer up but app
layer hung - for example an alive load balancer fronting a dead backend),
the WebSocket Upgrade handshake never receives an HTTP 101 response.
connectBlocking() then waits until TCP keepalive eventually kills the
socket, about two hours later. During that whole window IceAdapter.start()
is blocked, the FAF client's adapter-ready orchestration trips its own
timeouts, and the user-visible symptom is "the ice adapter doesn't connect
to other players".

Fix: run the connect on a virtual thread so startup returns immediately.
The pre-existing reconnect loop in sendingLoop() already handles transient
drops, and queued messages catch up once the socket opens because the
messageQueue is unbounded.

Also adds the missing return after the InterruptedException catch
(previously fell through to sendMessage on an interrupted thread) and
re-asserts the interrupt flag, per Java best practice.

Fixes FAForever#42
@juerghegglin
Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ice adapter doesn't connect if telemetry server is down

1 participant