Fix WebSocket connection reliability and domain reload recovery #656

dsarno · 2026-01-30T23:50:58Z

Summary

Add server-side ping/pong heartbeat to detect dead WebSocket connections (fixes silent connection failures on Windows)
Fix connection recovery after Unity domain reloads by increasing timeouts from 2s to 20s
Skip PluginHub session resolution for stdio transport (was causing unnecessary delays)
Add ClassVar annotations and lock protection for thread safety
Fix test flakiness (timer variance, Windows paths, LogAssert.Expect)

Problem

Two related issues were causing connection reliability problems:

Issue PropertyConversion failures crash command dispatcher while telemetry continues #654: After running test suites or other operations that trigger Unity domain reloads, the dispatcher would become unreachable ("No Unity Editor instances found"). The root cause was that domain reloads take 10-20s, but the reconnection timeout was only 2s.
Issue beta branch: when Claude Code reconnects, Unity session becomes "no session" even though handshake succeeds. This breaks the workflow completely. Logs attached. But Sometimes it works,and the bug only occurs in beta by using http to connect #643: On Windows, WebSocket connections can die silently (OSError 64: "The specified network name is no longer available") without either side being notified. Commands would fail with "Unity session not available" until Unity eventually detected the dead connection.

Solution

Ping/Pong Heartbeat (fixes #643)

Server sends {"type": "ping"} every 10 seconds
Unity responds with {"type": "pong", "session_id": "..."}
If no pong within 20 seconds, server closes connection proactively
Dead connections now detected within 20s instead of waiting indefinitely

Domain Reload Recovery (fixes #654)

Increased UNITY_MCP_SESSION_RESOLVE_MAX_WAIT_S from 2s to 20s
Increased UNITY_MCP_RELOAD_MAX_WAIT_S from 2s to 20s
Added _is_http_transport() check so stdio transport skips PluginHub entirely

Code Quality

Added ClassVar type annotations for mutable class attributes
Added lock protection for _last_pong dict access
Changed debug stack trace log from Warn to Debug level
Removed unused timing variable

Test Fixes

Fixed flaky async duration test (allow 20% timer variance on Windows)
Fixed Python test that cleared HOME env var breaking Path.home() on Windows
Skip Unix-path test on Windows (path separator difference)
Added LogAssert.Expect to PropertyConversion tests for expected error logs

Test plan

Verify ping/pong heartbeat works (check server logs for [Ping] messages)
Run full test suite and verify dispatcher stays reachable afterward
Verify commands are fast when Unity is connected (no delays)
All Python tests pass (563)
All Unity EditMode tests pass (510 passed, 54 skipped)
All Unity PlayMode tests pass (5)

Fixes #654
Fixes #643

🤖 Generated with Claude Code

On Windows, WebSocket connections can die silently (OSError 64) without either side being notified. This causes commands to fail with "Unity session not available" until Unity eventually detects the dead connection. Changes: - Add PingMessage model for server->client pings - Add ping loop in PluginHub that sends pings every 10 seconds - Track last pong time per session; close connection if no pong within 20s - Include session_id in pong messages from Unity for server-side tracking - Clean up debug/timing logs from Issue CoplayDev#654 investigation The server will now proactively detect dead connections within 20 seconds instead of waiting indefinitely for the next command to fail. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

When Unity performs a domain reload (after script changes, test runs, or large payload transfers), the MCP connection drops and needs to reconnect. The previous reconnection timeout (2s) was too short for domain reloads which can take 10-30s. Changes: - Increase UNITY_MCP_RELOAD_MAX_WAIT_S default from 2s to 30s - Increase backoff cap when reloading from 0.8s to 5.0s - Skip PluginHub session resolution for stdio transport (was causing unnecessary waits on every command) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…beat # Conflicts: # Server/src/transport/plugin_hub.py # Server/src/transport/unity_transport.py

sourcery-ai · 2026-01-30T23:51:05Z

Reviewer's Guide

Implements a server-initiated WebSocket ping/pong heartbeat to proactively detect dead connections, extends Unity-domain-reload-related timeouts from 2s to 30s (with updated backoff and documentation), skips PluginHub session resolution when using stdio transport, and adjusts logging/timing for better observability and reliability diagnostics.

Sequence diagram for WebSocket ping_pong heartbeat and dead connection detection

sequenceDiagram
    participant PluginHub as PluginHub_Server
    participant UnityClient as UnityWebSocketTransportClient

    rect rgb(230,230,250)
        Note over PluginHub,UnityClient: Registration establishes WebSocket session
        PluginHub->>UnityClient: Register handshake
        UnityClient-->>PluginHub: RegisteredMessage(session_id)
        PluginHub->>PluginHub: _connections[session_id] = websocket
        PluginHub->>PluginHub: _last_pong[session_id] = time.monotonic()
        PluginHub->>PluginHub: start _ping_loop(session_id, websocket)
    end

    loop Every PING_INTERVAL (10s)
        PluginHub->>PluginHub: sleep(PING_INTERVAL)
        PluginHub->>PluginHub: check session_id in _connections
        alt Session no longer connected
            PluginHub->>PluginHub: stop _ping_loop for session_id
        else Session still connected
            PluginHub->>PluginHub: elapsed = now - _last_pong[session_id]
            alt elapsed > PING_TIMEOUT (20s)
                PluginHub->>PluginHub: log [Ping] stale session
                PluginHub->>UnityClient: close(code=1001)
                PluginHub->>PluginHub: stop _ping_loop for session_id
            else Send ping
                PluginHub->>UnityClient: send_json(PingMessage{type: ping})
                alt Send succeeds
                    UnityClient-->>PluginHub: PongMessage{type: pong, session_id}
                    PluginHub->>PluginHub: _handle_pong(payload)
                    PluginHub->>PluginHub: registry.touch(session_id)
                    PluginHub->>PluginHub: _last_pong[session_id] = time.monotonic()
                else Send fails (e.g. OSError 64)
                    PluginHub->>PluginHub: log [Ping] send failure
                    PluginHub->>UnityClient: close(code=1006)
                    PluginHub->>PluginHub: stop _ping_loop for session_id
                end
            end
        end
    end

    Note over PluginHub,UnityClient: on_disconnect cancels ping task and clears _last_pong

Sequence diagram for HTTP vs stdio transport session resolution with PluginHub

sequenceDiagram
    actor MCPClient
    participant Server as UnityTransport_Server
    participant Middleware as UnityInstanceMiddleware
    participant PluginHub as PluginHub
    participant Stdio as LegacyUnityConnection

    MCPClient->>Server: Invoke command
    Server->>Middleware: _inject_unity_instance(context)

    alt HTTP transport
        Middleware->>Server: _is_http_transport() == true
        Middleware->>PluginHub: PluginHub.is_configured()
        alt PluginHub configured
            Middleware->>PluginHub: _resolve_session_id(unity_instance, user_id)
            PluginHub-->>Middleware: session_id (wait up to UNITY_MCP_SESSION_RESOLVE_MAX_WAIT_S <= 30s)
            Middleware->>Server: attach session_id to context
            Server->>PluginHub: route HTTP request via WebSocket session
            PluginHub-->>Server: MCPResponse
        else PluginHub not configured
            Middleware->>Server: fall back to active_instance only
            Server->>Stdio: send_command_with_retry(instance_id, command)
            Stdio-->>Server: Unity response
        end
    else Stdio transport
        Middleware->>Server: _is_http_transport() == false
        Note over Middleware,PluginHub: PluginHub session resolution is skipped entirely
        Middleware->>Server: use active_instance or instance_id only
        Server->>Stdio: send_command_with_retry(instance_id, command)
        Stdio-->>Server: Unity response
    end

    Server-->>MCPClient: MCPResponse

Class diagram for PluginHub heartbeat and related transport models

classDiagram
    class PluginHub {
        +KEEP_ALIVE_INTERVAL: int
        +SERVER_TIMEOUT: int
        +COMMAND_TIMEOUT: int
        +PING_INTERVAL: int
        +PING_TIMEOUT: int
        +FAST_FAIL_TIMEOUT: float
        +_connections: dict~str, WebSocket~
        +_pending: dict~str, dict~str, Any~~
        +_lock: asyncio.Lock
        +_loop: asyncio.AbstractEventLoop
        +_last_pong: dict~str, float~
        +_ping_tasks: dict~str, asyncio.Task~
        +on_disconnect(websocket: WebSocket, close_code: int) void
        +_handle_register(websocket: WebSocket, payload: RegisterMessage) Task
        +_handle_pong(payload: PongMessage) Task
        +_ping_loop(session_id: str, websocket: WebSocket) Task
        +_get_connection(session_id: str) Task~WebSocket~
        +_resolve_session_id(unity_instance: str, user_id: str, allow_unattached: bool) Task~str~
    }

    class PingMessage {
        +type: str
    }

    class PongMessage {
        +type: str
        +session_id: str
    }

    class WebSocketTransportClient {
        -_sessionId: string
        +SendPongAsync(token: CancellationToken) Task
        +SendJsonAsync(payload: JObject, token: CancellationToken) Task
        +HandleSocketClosureAsync(reason: string) Task
    }

    class TransportCommandDispatcher {
    }

    class PendingCommand {
        +PendingCommand(commandJson: string, completionSource: TaskCompletionSource, cancellationToken: CancellationToken, registration: CancellationTokenRegistration)
        +CommandJson: string
        +CompletionSource: TaskCompletionSource
        +CancellationToken: CancellationToken
        +CancellationRegistration: CancellationTokenRegistration
        +IsExecuting: bool
        +QueuedAt: DateTime
        +Dispose() void
    }

    class LegacyUnityConnection {
        +send_command(command_type: str, params: dict~str, Any~, max_attempts: int) dict
    }

    class UnityTransportModule {
        +_is_http_transport() bool
    }

    PluginHub --> PingMessage : uses
    PluginHub --> PongMessage : handles
    WebSocketTransportClient --> PongMessage : sends
    WebSocketTransportClient --> PingMessage : receives
    TransportCommandDispatcher *-- PendingCommand : contains
    LegacyUnityConnection <-- UnityTransportModule : used_by
    PluginHub <-- UnityTransportModule : used_by

Flow diagram for send_command_with_retry and reload timeout handling

flowchart TD
    A[Start send_command_with_retry] --> B[Get LegacyUnityConnection via get_unity_connection]
    B --> C[Read reload_max_retries and reload_retry_ms from config]
    C --> D[Read UNITY_MCP_RELOAD_MAX_WAIT_S env or default 30.0]
    D --> E[Clamp max_wait_s to 0..30]
    E --> F[Initialize waited = 0, attempt = 0]

    F --> G{waited < max_wait_s and
attempt <= max_retries}
    G -->|No| H[Log reload timed out
or retries exhausted]
    H --> I[Return last response or error]

    G -->|Yes| J[Attempt send_command
via LegacyUnityConnection]
    J --> K{Success?}

    K -->|Yes| L[Return successful response]

    K -->|No| M[Inspect status file /
fast_error flags]
    M --> N{status.reloading?}

    N -->|Yes| O[Set backoff cap = 5.0s
domain reload can take 10-30s]
    N -->|No and fast_error| P[Set backoff cap = 0.25s]
    N -->|No and not fast_error| Q[Set backoff cap = 1.0s]

    O --> R[Compute next backoff
and sleep]
    P --> R
    Q --> R

    R --> S[Increment waited and attempt]
    S --> G

File-Level Changes

Change	Details	Files
Add server-side WebSocket ping/pong heartbeat using per-session ping loops and last-pong tracking to close stale connections proactively.	Introduce PingMessage model and import it into the WebSocket plugin hub transport. Track last pong timestamps and per-session ping tasks keyed by session_id in the plugin hub. Start a ping loop upon plugin registration that periodically sends ping messages, checks for pong staleness against PING_TIMEOUT, and closes the connection on failure or inactivity. Update pong handling to refresh last-pong timestamps and ensure proper cleanup of ping tasks and tracking data on disconnect, including cancelling pending commands with debug logging.	`Server/src/transport/models.py` `Server/src/transport/plugin_hub.py` `MCPForUnity/Editor/Services/Transport/Transports/WebSocketTransportClient.cs`
Relax and document timeouts/backoff to handle long Unity domain reloads and improve retry behavior.	Increase UNITY_MCP_SESSION_RESOLVE_MAX_WAIT_S default from 2.0 to 30.0 seconds with clamping, updated comments, and warning messages to match the new default when misconfigured. Increase UNITY_MCP_RELOAD_MAX_WAIT_S default from 2.0 to 30.0 seconds with clamping and updated warning messages, and add documentation comments describing domain reload behavior and future TODOs for deterministic detection. Adjust reload backoff cap to 5.0 seconds when Unity reports reloading, so retries span typical 10–30 second reload windows.	`Server/src/transport/plugin_hub.py` `Server/src/transport/legacy/unity_connection.py`
Skip PluginHub session resolution for stdio transport and normalize server logging configuration.	Add and use an _is_http_transport helper so unity_instance_middleware only resolves session IDs via PluginHub when using HTTP/WebSocket transport, avoiding unnecessary delays on stdio. Switch some logger initializations to use name for more standard module-scoped logging configuration.	`Server/src/transport/unity_instance_middleware.py` `Server/src/transport/unity_transport.py` `Server/src/transport/plugin_hub.py`
Add timing/debug instrumentation around stdio command flows and WebSocket closure handling for diagnostics.	Instrument stdio send_command and send_command_with_retry with timing logs for connect, send, receive, and total retry duration to aid performance analysis. Log when send_command cancels or retries during domain reload scenarios using the new timing information. Add a stack-trace-emitting warning log in WebSocketTransportClient.HandleSocketClosureAsync to help identify code paths that trigger disconnections. Extend PendingCommand to track when commands are queued to support future timing/diagnostic work.	`Server/src/transport/legacy/unity_connection.py` `MCPForUnity/Editor/Services/Transport/Transports/WebSocketTransportClient.cs` `MCPForUnity/Editor/Services/Transport/TransportCommandDispatcher.cs`

Assessment against linked issues

Issue	Objective	Addressed	Explanation
#643	Ensure HTTP/WebSocket transport detects and cleans up dead or stale Unity plugin connections (e.g., Windows OSError 64) so that when Claude Code reconnects, the Unity session does not end up in a "no session" state despite a successful handshake.	✅
#643	Improve reconnection and session recovery behavior around Unity domain reloads and transport selection so that the MCP server no longer reports "Server no longer running; ending orphaned session" in normal reconnect workflows on the beta HTTP path.	✅
#654	Add robust error handling around PropertyConversion so incompatible property values (e.g., wrong types) do not crash the command dispatcher or corrupt the instance registry, and subsequent tool calls can continue to function.	❌	The PR only modifies transport-level behavior (WebSocket ping/pong heartbeat, domain reload timeouts, stdio connection handling, and logging). It does not touch PropertyConversion, manage_components, or any code related to converting property values, so dispatcher crashes from conversion exceptions are not addressed.
#654	Extend PropertyConversion support for common Unity types, including array-based formats for Vector2 (and similar structs) and integer instance IDs for UnityEngine.Object references, so manage_components set_property calls accept these formats.	❌	No changes are made to the conversion logic for Vector2, UnityEngine.Object, or other Unity structs. The PR does not implement array parsing for Vector2 or instance ID handling for UnityEngine.Object, and it does not modify manage_components behavior.
#654	Improve telemetry/status reporting so that telemetry reflects the health of the command dispatcher (e.g., indicating dispatcher errors or unhealthy state instead of always reporting success).	❌	The PR does not modify telemetry models or status reporting. It focuses on connection reliability (ping/pong, timeouts, transport middleware) and adds some diagnostic logging, but does not introduce any new telemetry fields or change how dispatcher health is reported.

Possibly linked issues

beta branch: when Claude Code reconnects, Unity session becomes "no session" even though handshake succeeds. This breaks the workflow completely. Logs attached. But Sometimes it works,and the bug only occurs in beta by using http to connect #643: PR’s WebSocket heartbeat and timeout changes directly target the Windows OSError 64, "no session" reconnect behavior.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

coderabbitai · 2026-01-30T23:51:20Z

📝 Walkthrough

Walkthrough

Adds command enqueue timestamps and WebSocket pong/session payloads; logs socket closure stack traces; implements a server-initiated ping/pong heartbeat with per-session liveness tracking; broad timing instrumentation and increased Unity reload wait defaults to 20s; restricts PluginHub session resolution to HTTP transport; updates logger init and adds multiple Unity editor tests for property-conversion error handling.

Changes

Cohort / File(s)	Summary
Command Dispatcher `MCPForUnity/Editor/Services/Transport/TransportCommandDispatcher.cs`	Adds `QueuedAt` DateTime field/property to PendingCommand to record when commands are enqueued.
WebSocket Client `MCPForUnity/Editor/Services/Transport/Transports/WebSocketTransportClient.cs`	Pong payload now includes `session_id`; socket-closure handler logs stack trace for debugging.
Server: Heartbeat Models & Hub `Server/src/transport/models.py`, `Server/src/transport/plugin_hub.py`	Adds `PingMessage` model and server-side ping loop with per-session `_last_pong` tracking, `_ping_tasks`, `PING_INTERVAL` (10s) and `PING_TIMEOUT` (20s), automatic close on liveness timeout, and ping lifecycle management on register/disconnect.
Server: Connection Timing & Reload Handling `Server/src/transport/legacy/unity_connection.py`	Adds detailed timing instrumentation (connect/send/receive, per-attempt timings), increases read backoff cap, and raises default Unity reload max-wait handling to 20s (env override supported, clamped).
Server: Transport Session Validation `Server/src/transport/unity_instance_middleware.py`	Guards PluginHub-based session validation to run only for HTTP transport; stdio transport skips PluginHub calls.
Server: Logger Init `Server/src/transport/unity_transport.py`	Replaces static logger name with `logging.getLogger(__name__)`.
Unity Editor Tests `TestProjects/UnityMCPTests/Assets/Tests/EditMode/Tools/.cs`, `.../.cs.meta`, `TestProjects/UnityMCPTests/Assets/Tests/EditMode/Services/Server/ServerCommandBuilderTests.cs`	Adds multiple tests for property-conversion error handling and an array-for-float test; adds Guard in a path-specific test to skip on Windows; includes Unity .meta files for new tests.

Sequence Diagram

sequenceDiagram
    actor Client as Unity Client
    participant WS as WebSocketTransportClient
    participant Hub as PluginHub (Server)
    participant Server as Python Server

    Client->>WS: Connect & register (session_id)
    WS->>Hub: Register session
    Hub->>Hub: set _last_pong[session] = now<br/>spawn _ping_loop(session)

    loop every PING_INTERVAL (10s)
        Hub->>WS: PingMessage
        WS->>Client: deliver ping
        Client->>WS: PongMessage (includes session_id)
        WS->>Hub: forward pong
        Hub->>Hub: update _last_pong[session]
    end

    alt no pong within PING_TIMEOUT (20s)
        Hub->>Hub: detect stale session
        Hub->>WS: close connection / unregister session
    end

    Client->>Server: Command requests (normal flow)
    Server->>Client: Responses

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

#643 — reconnect/session becomes "no session": changes add HTTP-only session resolution and a server-side ping/pong that could address session liveness and reconnection detection.
#654 — PropertyConversion failures crash dispatcher: this PR adds extensive property-conversion tests covering error cases, which relate to diagnosing and preventing dispatcher crashes (tests added but core conversion fixes not necessarily applied).

Possibly related PRs

#440 — Related material-management feature set and tooling changes; shares surface with added editor tooling/tests.
#510 — Modifies unity_connection.py and retry/reload wait logic; closely aligned with extended timing and reload-wait changes in this PR.

Suggested labels

codex

Suggested reviewers

msanatan

Poem

🐰
I queued your command at break of day,
I ping the server when connections stray,
I time the sends and watch the clock,
I close the gaps where pongs do not knock,
Hoppity logs — the network’s feeling gay! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 51.52% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Fix WebSocket connection reliability and domain reload recovery' accurately summarizes the main changes: ping/pong heartbeat for dead connection detection and increased timeouts for domain reload recovery.
Description check	✅ Passed	The PR description includes all required template sections: summary, problem explanation, solution details, and test plan with linked issues.
Linked Issues check	✅ Passed	Code changes comprehensively address both linked issues: `#643` (WebSocket detection via ping/pong heartbeat) and `#654` (increased timeouts and graceful error handling for domain reloads).
Out of Scope Changes check	✅ Passed	All changes are directly related to the stated objectives: ping/pong heartbeat, timeout increases, transport-specific logic, logging cleanup, and error handling tests.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

The new [TIMING-STDIO] and WebSocket stack-trace logs are emitted at info/Warn level and on hot paths; consider downgrading to debug or guarding them behind a config flag to avoid flooding logs in normal operation.
The ping heartbeat constants (PING_INTERVAL, PING_TIMEOUT) are currently hard-coded; consider making them configurable (similar to the UNITY_MCP_* timeouts) so they can be tuned without code changes for different environments.
Access to _last_pong in the ping loop and _handle_pong is done without the shared lock while other session state uses the lock; for consistency and to avoid subtle concurrency issues, it may be safer to read/write _last_pong under the same lock as _connections.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The new `[TIMING-STDIO]` and WebSocket stack-trace logs are emitted at `info`/`Warn` level and on hot paths; consider downgrading to `debug` or guarding them behind a config flag to avoid flooding logs in normal operation.
- The ping heartbeat constants (`PING_INTERVAL`, `PING_TIMEOUT`) are currently hard-coded; consider making them configurable (similar to the `UNITY_MCP_*` timeouts) so they can be tuned without code changes for different environments.
- Access to `_last_pong` in the ping loop and `_handle_pong` is done without the shared lock while other session state uses the lock; for consistency and to avoid subtle concurrency issues, it may be safer to read/write `_last_pong` under the same lock as `_connections`.

## Individual Comments

### Comment 1
<location> `Server/src/transport/plugin_hub.py:466-475` </location>
<code_context>
+    async def _ping_loop(cls, session_id: str, websocket: WebSocket) -> None:
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Consider cleaning up `_ping_tasks`/`_last_pong` in `_ping_loop` to avoid relying solely on `on_disconnect`.

Relying on `on_disconnect` for cleanup means that in shutdown/exception edge cases (e.g. `_lock` set to `None`, `close()` failing, or `on_disconnect` not firing), the loop can exit the `while` and hit `finally` without removing this session from `_ping_tasks` or clearing `_last_pong`. Consider doing that cleanup in the `finally` block so the ping loop fully manages its own lifecycle and avoids stale entries.

Suggested implementation:

```python
            await registry.touch(session_id)
            # Record last pong time for staleness detection
            cls._last_pong[session_id] = time.monotonic()

    @classmethod
    def _cleanup_ping_session(cls, session_id: str) -> None:
        """Cleanup ping tracking state for a session.

        This is used by the ping loop's finally block to fully manage its own
        lifecycle, and can also be safely called from on_disconnect as needed.
        """
        # Remove this session's ping task and staleness tracking, if present.
        # Using pop(..., None) makes this idempotent and safe to call multiple times.
        cls._ping_tasks.pop(session_id, None)
        cls._last_pong.pop(session_id, None)

    @classmethod
    async def _ping_loop(cls, session_id: str, websocket: WebSocket) -> None:

```

To fully implement your suggestion, the following additional changes are needed in the same file:

1. Wrap the body of `_ping_loop` in a `try`/`finally` and call the new helper in the `finally` block, so the ping loop always cleans up its own state:

```python
    @classmethod
    async def _ping_loop(cls, session_id: str, websocket: WebSocket) -> None:
        """Server-initiated ping loop to detect dead connections.

        Sends periodic pings to the Unity client. If no pong is received within
        PING_TIMEOUT seconds, the connection is considered dead and closed.
        This helps detect connections that die silently (e.g., Windows OSError 64).
        """
        logger.debug(f"[Ping] Starting ping loop for session {session_id}")
        try:
            while True:
                await asyncio.sleep(cls.PING_INTERVAL)
                # ... existing ping / timeout logic remains unchanged ...
        finally:
            # Ensure we don't leave stale entries if on_disconnect never fires
            cls._cleanup_ping_session(session_id)
```

2. Anywhere else you’re cleaning up `_ping_tasks` / `_last_pong` (likely in `on_disconnect`), update that code to call `cls._cleanup_ping_session(session_id)` instead of manipulating `_ping_tasks` / `_last_pong` directly. Because `_cleanup_ping_session` uses `pop(..., None)`, calling it from both `on_disconnect` and `_ping_loop` is safe and idempotent.

These changes ensure that the ping loop fully manages its own lifecycle and avoids stale entries even in shutdown/exception edge cases where `on_disconnect` might not execute.
</issue_to_address>

### Comment 2
<location> `MCPForUnity/Editor/Services/Transport/Transports/WebSocketTransportClient.cs:654-660` </location>
<code_context>

         private async Task HandleSocketClosureAsync(string reason)
         {
+            // DEBUG: Capture stack trace to identify what triggers disconnection (Issue #654 investigation)
+            var stackTrace = new System.Diagnostics.StackTrace(true);
+            McpLog.Warn($"[WebSocket] HandleSocketClosureAsync called. Reason: {reason}\nStack trace:\n{stackTrace}");
+
             if (_lifecycleCts == null || _lifecycleCts.IsCancellationRequested)
</code_context>

<issue_to_address>
**suggestion:** The WARN-level stack trace log on every socket closure may be too noisy for normal usage.

Since disconnects are often expected, consider either lowering this log to `Info`/`Debug`, gating it behind a diagnostic flag or `#if DEBUG`, or restricting the stack trace to unexpected closure reasons so you keep the investigation signal without overwhelming normal logs.

```suggestion
        private async Task HandleSocketClosureAsync(string reason)
        {
#if DEBUG
            // DEBUG: Capture stack trace to identify what triggers disconnection (Issue #654 investigation)
            var stackTrace = new System.Diagnostics.StackTrace(true);
            McpLog.Debug($"[WebSocket] HandleSocketClosureAsync called. Reason: {reason}\nStack trace:\n{stackTrace}");
#endif

            if (_lifecycleCts == null || _lifecycleCts.IsCancellationRequested)
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-01-30T23:52:14Z

Server/src/transport/plugin_hub.py

+    async def _ping_loop(cls, session_id: str, websocket: WebSocket) -> None:
+        """Server-initiated ping loop to detect dead connections.
+
+        Sends periodic pings to the Unity client. If no pong is received within
+        PING_TIMEOUT seconds, the connection is considered dead and closed.
+        This helps detect connections that die silently (e.g., Windows OSError 64).
+        """
+        logger.debug(f"[Ping] Starting ping loop for session {session_id}")
+        try:
+            while True:


suggestion (bug_risk): Consider cleaning up _ping_tasks/_last_pong in _ping_loop to avoid relying solely on on_disconnect.

Relying on on_disconnect for cleanup means that in shutdown/exception edge cases (e.g. _lock set to None, close() failing, or on_disconnect not firing), the loop can exit the while and hit finally without removing this session from _ping_tasks or clearing _last_pong. Consider doing that cleanup in the finally block so the ping loop fully manages its own lifecycle and avoids stale entries.

Suggested implementation:

await registry.touch(session_id) # Record last pong time for staleness detection cls._last_pong[session_id] = time.monotonic() @classmethod def _cleanup_ping_session(cls, session_id: str) -> None: """Cleanup ping tracking state for a session. This is used by the ping loop's finally block to fully manage its own lifecycle, and can also be safely called from on_disconnect as needed. """ # Remove this session's ping task and staleness tracking, if present. # Using pop(..., None) makes this idempotent and safe to call multiple times. cls._ping_tasks.pop(session_id, None) cls._last_pong.pop(session_id, None) @classmethod async def _ping_loop(cls, session_id: str, websocket: WebSocket) -> None:

To fully implement your suggestion, the following additional changes are needed in the same file:

Wrap the body of _ping_loop in a try/finally and call the new helper in the finally block, so the ping loop always cleans up its own state:

@classmethod async def _ping_loop(cls, session_id: str, websocket: WebSocket) -> None: """Server-initiated ping loop to detect dead connections. Sends periodic pings to the Unity client. If no pong is received within PING_TIMEOUT seconds, the connection is considered dead and closed. This helps detect connections that die silently (e.g., Windows OSError 64). """ logger.debug(f"[Ping] Starting ping loop for session {session_id}") try: while True: await asyncio.sleep(cls.PING_INTERVAL) # ... existing ping / timeout logic remains unchanged ... finally: # Ensure we don't leave stale entries if on_disconnect never fires cls._cleanup_ping_session(session_id)

Anywhere else you’re cleaning up _ping_tasks / _last_pong (likely in on_disconnect), update that code to call cls._cleanup_ping_session(session_id) instead of manipulating _ping_tasks / _last_pong directly. Because _cleanup_ping_session uses pop(..., None), calling it from both on_disconnect and _ping_loop is safe and idempotent.

These changes ensure that the ping loop fully manages its own lifecycle and avoids stale entries even in shutdown/exception edge cases where on_disconnect might not execute.

MCPForUnity/Editor/Services/Transport/Transports/WebSocketTransportClient.cs

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@Server/src/transport/legacy/unity_connection.py`:
- Around line 244-245: Remove the unused variable t0 in the send_command
function: delete the assignment "t0 = time.time()" (or replace it with a
meaningful use if you intended to compute duration) and keep the
logger.info("[TIMING-STDIO] send_command START: command=%s", command_type) line
unchanged; locate this inside the send_command method in unity_connection.py to
make the edit.

🧹 Nitpick comments (2)

MCPForUnity/Editor/Services/Transport/Transports/WebSocketTransportClient.cs (1)
656-658: Debug logging uses warning level unconditionally.

The comment indicates this is for "DEBUG" / "Issue #654 investigation", but McpLog.Warn logs unconditionally at warning level. This could be noisy in production. Consider using McpLog.Debug which respects the debug logging preference, or remove this investigation code before merging.
🔧 Proposed fix: Use conditional debug logging
-            // DEBUG: Capture stack trace to identify what triggers disconnection (Issue `#654` investigation)
-            var stackTrace = new System.Diagnostics.StackTrace(true);
-            McpLog.Warn($"[WebSocket] HandleSocketClosureAsync called. Reason: {reason}\nStack trace:\n{stackTrace}");
+            // Capture stack trace for debugging disconnection triggers
+            var stackTrace = new System.Diagnostics.StackTrace(true);
+            McpLog.Debug($"[WebSocket] HandleSocketClosureAsync called. Reason: {reason}\nStack trace:\n{stackTrace}");
Server/src/transport/plugin_hub.py (1)
86-89: Add ClassVar type annotation for mutable class attributes.

Static analysis correctly identifies that mutable class attributes should be annotated with typing.ClassVar to clarify they are shared across instances and prevent accidental shadowing.
🔧 Proposed fix
+from typing import Any, ClassVar
+
 class PluginHub(WebSocketEndpoint):
     ...
-    # session_id -> last pong timestamp (monotonic)
-    _last_pong: dict[str, float] = {}
-    # session_id -> ping task
-    _ping_tasks: dict[str, asyncio.Task] = {}
+    # session_id -> last pong timestamp (monotonic)
+    _last_pong: ClassVar[dict[str, float]] = {}
+    # session_id -> ping task
+    _ping_tasks: ClassVar[dict[str, asyncio.Task]] = {}

coderabbitai · 2026-01-30T23:54:17Z

Server/src/transport/legacy/unity_connection.py

+        t0 = time.time()
+        logger.info("[TIMING-STDIO] send_command START: command=%s", command_type)


⚠️ Potential issue | 🟡 Minor

Remove unused variable t0.

The variable t0 is assigned but never used. The timing is logged but the stored value isn't referenced for duration calculation later.

🔧 Proposed fix

- t0 = time.time() logger.info("[TIMING-STDIO] send_command START: command=%s", command_type)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

t0 = time.time()

logger.info("[TIMING-STDIO] send_command START: command=%s", command_type)

logger.info("[TIMING-STDIO] send_command START: command=%s", command_type)

🧰 Tools

🪛 Ruff (0.14.14)

[error] 244-244: Local variable t0 is assigned to but never used

Remove assignment to unused variable t0

(F841)

🤖 Prompt for AI Agents

In `@Server/src/transport/legacy/unity_connection.py` around lines 244 - 245, Remove the unused variable t0 in the send_command function: delete the assignment "t0 = time.time()" (or replace it with a meaningful use if you intended to compute duration) and keep the logger.info("[TIMING-STDIO] send_command START: command=%s", command_type) line unchanged; locate this inside the send_command method in unity_connection.py to make the edit.

- Add server-side ping loop to detect dead WebSocket connections - Include session_id in pong messages for tracking - Reduce domain reload timeout from 30s to 20s - Add ClassVar annotations for mutable class attributes - Add lock protection for _last_pong access - Change debug stack trace log from Warn to Debug level - Remove unused TIMING-STDIO variable - Fix flaky async duration test (allow 20% timer variance) - Fix Python test that cleared HOME env var on Windows - Skip Unix-path test on Windows (path separator difference) - Add LogAssert.Expect to PropertyConversion tests Fixes CoplayDev#654, CoplayDev#643 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@Server/src/transport/legacy/unity_connection.py`:
- Around line 780-800: Update the reload wait default and clamp to 30s: change
the environment default for UNITY_MCP_RELOAD_MAX_WAIT_S from "20.0" to "30.0",
adjust any hard-coded fallback max_wait_s values (currently set to 20.0) to
30.0, and modify the clamp statement that enforces the range to use 30.0 as the
upper bound (max_wait_s = max(0.0, min(max_wait_s, 30.0))). Also update the
logger.warning message text (and any nearby comments referencing 20s) to reflect
the new 30s default so messages and docs match the new cap; ensure you reference
the same variable name max_wait_s and the env var UNITY_MCP_RELOAD_MAX_WAIT_S in
your changes.

In `@Server/src/transport/plugin_hub.py`:
- Around line 492-519: In the exception handler for failed ping sends (the block
that creates PingMessage, calls websocket.send_json and handles send_ex),
replace the prohibited close code 1006 with 1001 when calling websocket.close
and ensure any exception thrown by websocket.close is logged instead of
swallowed; update the logger call to include context (session_id and the close
exception) and reference the same symbols (PingMessage, websocket.close,
cls.PING_TIMEOUT, logger) so the error path logs close failures for debugging.

In
`@TestProjects/UnityMCPTests/Assets/Tests/EditMode/Tools/PropertyConversionErrorHandlingTests.cs`:
- Around line 207-235: ManageEditor.HandleCommand currently lacks a case for the
"telemetry_status" action so unknown actions hit the default ErrorResponse;
update ManageEditor.HandleCommand (ManageEditor.cs) to handle "telemetry_status"
by either implementing a new branch that gathers and returns the telemetry
health fields (dispatcher status, success flags, etc.) or routing the action to
the existing telemetry handler (if one exists), ensuring the switch or dispatch
logic recognizes "telemetry_status" and returns a success response object used
by the test.

🧹 Nitpick comments (4)

TestProjects/UnityMCPTests/Assets/Tests/EditMode/Services/Server/ServerCommandBuilderTests.cs (1)
113-118: Use Assert.Ignore to mark the test as skipped on Windows instead of Assert.Pass.

Assert.Pass marks the test as passed in test reports, which masks that it never actually ran. Assert.Ignore correctly records it as skipped, keeping test results accurate.
Suggested change
             if (UnityEngine.Application.platform == UnityEngine.RuntimePlatform.WindowsEditor)
             {
-                Assert.Pass("Skipped on Windows - use BuildUvPathFromUvx_WindowsPath_ConvertsCorrectly instead");
-                return;
+                Assert.Ignore("Skipped on Windows - use BuildUvPathFromUvx_WindowsPath_ConvertsCorrectly instead");
             }
TestProjects/UnityMCPTests/Assets/Tests/EditMode/Tools/PropertyConversion_ArrayForFloat_Test.cs (1)
36-56: Assert error response and unchanged property.

Right now the test only checks for a non-null result, so it would still pass if the command incorrectly succeeds. Consider asserting the error response and that spatialBlend remains unchanged.
♻️ Suggested assertion tightening
 var result = ManageComponents.HandleCommand(setPropertyParams);
-Assert.IsNotNull(result, "Should return a result");
+Assert.IsNotNull(result, "Should return a result");
+if (result is ErrorResponse error)
+{
+    Assert.IsFalse(error.Success, "Should report failure for incompatible value");
+}
+else
+{
+    Assert.Fail($"Expected ErrorResponse but got {result.GetType().Name}");
+}
+Assert.AreEqual(0f, audioSource.spatialBlend, "spatialBlend should remain unchanged");
TestProjects/UnityMCPTests/Assets/Tests/EditMode/Tools/PropertyConversionErrorHandlingTests.cs (2)
40-66: Enforce the expected ErrorResponse.

The test allows a success response to pass; that undermines the “graceful error” assertion. Consider requiring an ErrorResponse with Success == false.
♻️ Suggested assertion tightening
 var result = ManageComponents.HandleCommand(setPropertyParams);
-
-// Main test: should return a result without crashing
-Assert.IsNotNull(result, "Should return a result, not crash dispatcher");
-
-// If it's an ErrorResponse, verify it properly reports failure
-if (result is ErrorResponse errorResp)
-{
-    Assert.IsFalse(errorResp.Success, "Should report failure for incompatible type");
-}
+var errorResp = result as ErrorResponse;
+Assert.IsNotNull(errorResp, "Expected ErrorResponse for incompatible type");
+Assert.IsFalse(errorResp.Success, "Should report failure for incompatible type");
241-280: Test currently passes in all branches.

Assert.Pass is invoked in every path, so this test can’t fail even if behavior regresses. Consider asserting the intended contract once defined (e.g., no throw + null result, or a specific error path).

coderabbitai · 2026-01-31T00:46:26Z

Server/src/transport/legacy/unity_connection.py

+    # Default to 20s to handle domain reloads (which can take 10-20s after tests or script changes).
+    #
+    # NOTE: This wait can impact agentic workflows where domain reloads happen
+    # frequently (e.g., after test runs, script compilation). The 20s default
+    # balances handling slow reloads vs. avoiding unnecessary delays.
+    #
+    # TODO: Make this more deterministic by detecting Unity's actual reload state
+    # rather than blindly waiting up to 20s. See Issue #657.
+    #
+    # Configurable via: UNITY_MCP_RELOAD_MAX_WAIT_S (default: 20.0, max: 20.0)
    try:
        max_wait_s = float(os.environ.get(
-            "UNITY_MCP_RELOAD_MAX_WAIT_S", "2.0"))
+            "UNITY_MCP_RELOAD_MAX_WAIT_S", "20.0"))
    except ValueError as e:
-        raw_val = os.environ.get("UNITY_MCP_RELOAD_MAX_WAIT_S", "2.0")
+        raw_val = os.environ.get("UNITY_MCP_RELOAD_MAX_WAIT_S", "20.0")
        logger.warning(
-            "Invalid UNITY_MCP_RELOAD_MAX_WAIT_S=%r, using default 2.0: %s",
+            "Invalid UNITY_MCP_RELOAD_MAX_WAIT_S=%r, using default 20.0: %s",
            raw_val, e)
-        max_wait_s = 2.0
-    # Clamp to [0, 30] to prevent misconfiguration from causing excessive waits
-    max_wait_s = max(0.0, min(max_wait_s, 30.0))
+        max_wait_s = 20.0
+    # Clamp to [0, 20] to prevent misconfiguration from causing excessive waits
+    max_wait_s = max(0.0, min(max_wait_s, 20.0))


⚠️ Potential issue | 🟠 Major

Reload wait is still capped at 20s—objective calls for 30s.
Issue #654 notes reloads can take 10–30s, but the default, clamp, and log text still enforce 20s, which can prematurely abort retries. Please bump the default and cap (and update messaging) to 30s.

♻️ Proposed fix

- # Default to 20s to handle domain reloads (which can take 10-20s after tests or script changes). + # Default to 30s to handle domain reloads (which can take 10-30s after tests or script changes). @@ - # Configurable via: UNITY_MCP_RELOAD_MAX_WAIT_S (default: 20.0, max: 20.0) + # Configurable via: UNITY_MCP_RELOAD_MAX_WAIT_S (default: 30.0, max: 30.0) try: - max_wait_s = float(os.environ.get( - "UNITY_MCP_RELOAD_MAX_WAIT_S", "20.0")) + max_wait_s = float(os.environ.get( + "UNITY_MCP_RELOAD_MAX_WAIT_S", "30.0")) except ValueError as e: - raw_val = os.environ.get("UNITY_MCP_RELOAD_MAX_WAIT_S", "20.0") + raw_val = os.environ.get("UNITY_MCP_RELOAD_MAX_WAIT_S", "30.0") logger.warning( - "Invalid UNITY_MCP_RELOAD_MAX_WAIT_S=%r, using default 20.0: %s", + "Invalid UNITY_MCP_RELOAD_MAX_WAIT_S=%r, using default 30.0: %s", raw_val, e) - max_wait_s = 20.0 - # Clamp to [0, 20] to prevent misconfiguration from causing excessive waits - max_wait_s = max(0.0, min(max_wait_s, 20.0)) + max_wait_s = 30.0 + # Clamp to [0, 30] to prevent misconfiguration from causing excessive waits + max_wait_s = max(0.0, min(max_wait_s, 30.0))

🤖 Prompt for AI Agents

In `@Server/src/transport/legacy/unity_connection.py` around lines 780 - 800, Update the reload wait default and clamp to 30s: change the environment default for UNITY_MCP_RELOAD_MAX_WAIT_S from "20.0" to "30.0", adjust any hard-coded fallback max_wait_s values (currently set to 20.0) to 30.0, and modify the clamp statement that enforces the range to use 30.0 as the upper bound (max_wait_s = max(0.0, min(max_wait_s, 30.0))). Also update the logger.warning message text (and any nearby comments referencing 20s) to reflect the new 30s default so messages and docs match the new cap; ensure you reference the same variable name max_wait_s and the env var UNITY_MCP_RELOAD_MAX_WAIT_S in your changes.

coderabbitai · 2026-01-31T00:46:26Z

Server/src/transport/plugin_hub.py

+                # Check staleness: has it been too long since we got a pong?
+                elapsed = time.monotonic() - last_pong
+                if elapsed > cls.PING_TIMEOUT:
+                    logger.warning(
+                        f"[Ping] Session {session_id} stale: no pong for {elapsed:.1f}s "
+                        f"(timeout={cls.PING_TIMEOUT}s). Closing connection."
+                    )
+                    try:
+                        await websocket.close(code=1001)  # Going away
+                    except Exception as close_ex:
+                        logger.debug(f"[Ping] Error closing stale websocket: {close_ex}")
+                    break
+
+                # Send a ping to the client
+                try:
+                    ping_msg = PingMessage()
+                    await websocket.send_json(ping_msg.model_dump())
+                    logger.debug(f"[Ping] Sent ping to session {session_id}")
+                except Exception as send_ex:
+                    # Send failed - connection is dead
+                    logger.warning(
+                        f"[Ping] Failed to send ping to session {session_id}: {send_ex}. "
+                        "Connection likely dead."
+                    )
+                    try:
+                        await websocket.close(code=1006)  # Abnormal closure
+                    except Exception:
+                        pass


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

RFC 6455 WebSocket close code 1006 must not be sent close frame specification

💡 Result:

In RFC 6455, WebSocket close code 1006 (“abnormal closure”) is reserved and must not be sent on the wire:

“1006 is a reserved value and MUST NOT be set as a status code in a Close control frame by an endpoint.” It is only for local/reporting use to indicate the connection closed abnormally (e.g., the TCP connection dropped) without sending or receiving a Close control frame. [1]

RFC 6455 also defines that if the connection is closed and no Close control frame was received, then the implementation considers the connection close code to be 1006 (again, internal/observed state—not a transmitted close frame). [2]

So: you will never legitimately see a peer send a Close frame with code 1006; implementations use it to report “abnormal closure” when the closing handshake didn’t happen. [1][2]

Sources:
[1] RFC 6455, Section 7.4.1 “Defined Status Codes” (1006) https://datatracker.ietf.org/doc/html/rfc6455
[2] RFC 6455, Section 7.1.5 “The WebSocket Connection Close Code” (1006 when no Close frame received) https://datatracker.ietf.org/doc/html/rfc6455

🏁 Script executed:

# Find and examine the file mentioned in the review fd -t f "plugin_hub.py" | head -5

Repository: CoplayDev/unity-mcp

Length of output: 97

🏁 Script executed:

# Read the specific lines mentioned in the review sed -n '492,519p' Server/src/transport/plugin_hub.py

Repository: CoplayDev/unity-mcp

Length of output: 1455

🏁 Script executed:

# Let's see more context around these lines to understand the full function sed -n '470,530p' Server/src/transport/plugin_hub.py

Repository: CoplayDev/unity-mcp

Length of output: 2947

Replace close code 1006 with 1001 and log close failures.

Line 517 uses close code 1006, which RFC 6455 explicitly reserves and forbids sending on the wire—it must only be used internally to report abnormal closure when no Close frame was exchanged. Sending it violates the specification and can cause protocol violations. Additionally, the exception handler at lines 518–519 silently swallows errors without logging, hiding diagnostics crucial for debugging connection failures.

🔧 Proposed fix

await websocket.close(code=1006) # Abnormal closure - except Exception: - pass + await websocket.close(code=1001) # Going away + except Exception as close_ex: + logger.debug(f"[Ping] Error closing websocket after send failure: {close_ex}")

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 501-501: Do not catch blind exception: Exception

(BLE001)

[warning] 510-510: Do not catch blind exception: Exception

(BLE001)

[error] 518-519: try-except-pass detected, consider logging the exception

(S110)

[warning] 518-518: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In `@Server/src/transport/plugin_hub.py` around lines 492 - 519, In the exception handler for failed ping sends (the block that creates PingMessage, calls websocket.send_json and handles send_ex), replace the prohibited close code 1006 with 1001 when calling websocket.close and ensure any exception thrown by websocket.close is logged instead of swallowed; update the logger call to include context (session_id and the close exception) and reference the same symbols (PingMessage, websocket.close, cls.PING_TIMEOUT, logger) so the error path logs close failures for debugging.

coderabbitai · 2026-01-31T00:46:26Z

TestProjects/UnityMCPTests/Assets/Tests/EditMode/Tools/PropertyConversionErrorHandlingTests.cs

+        [Test]
+        public void ManageEditor_TelemetryStatus_ReportsAccurateHealth()
+        {
+            // Trigger multiple conversion failures first
+            var audioSource = testGameObject.AddComponent<AudioSource>();
+
+            for (int i = 0; i < 3; i++)
+            {
+                var badParam = new JObject
+                {
+                    ["action"] = "set_property",
+                    ["target"] = testGameObject.name,
+                    ["componentType"] = "AudioSource",
+                    ["property"] = "clip",
+                    ["value"] = i * 1000  // bad
+                };
+                ManageComponents.HandleCommand(badParam);
+            }
+
+            // Now check telemetry
+            var telemetryParams = new JObject { ["action"] = "telemetry_status" };
+            var telemetryResult = ManageEditor.HandleCommand(telemetryParams);
+
+            Assert.IsNotNull(telemetryResult, "Telemetry should return result");
+
+            // NOTE: Issue #654 noted that telemetry returns success even when dispatcher is dead.
+            // If telemetry returns success, that's the actual current behavior (which may be a problem).
+            // This test just documents what happens.
+        }


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

rg -n --type=cs -C3 'telemetry_status'

Repository: CoplayDev/unity-mcp

Length of output: 8475

🏁 Script executed:

rg -n --type=cs 'telemetry_status' -B5 -A10 | grep -E '(case|HandleCommand|dispatcher|health|success)' | head -50

Repository: CoplayDev/unity-mcp

Length of output: 4682

🏁 Script executed:

# Find the ManageEditor implementation fd -e cs -type f | xargs rg -l 'ManageEditor' | head -10

Repository: CoplayDev/unity-mcp

Length of output: 490

🏁 Script executed:

cat -n MCPForUnity/Editor/Tools/ManageEditor.cs | head -100

Repository: CoplayDev/unity-mcp

Length of output: 4734

🏁 Script executed:

# Search for telemetry_status handler specifically rg -n 'telemetry_status|case "telemetry' MCPForUnity/Editor/Tools/ManageEditor.cs -A 20

Repository: CoplayDev/unity-mcp

Length of output: 45

🏁 Script executed:

# Search for telemetry in the entire codebase rg -n 'case.*telemetry' --type=cs -A 15

Repository: CoplayDev/unity-mcp

Length of output: 45

🏁 Script executed:

# Get line count to see full file wc -l MCPForUnity/Editor/Tools/ManageEditor.cs

Repository: CoplayDev/unity-mcp

Length of output: 107

🏁 Script executed:

cat -n MCPForUnity/Editor/Tools/ManageEditor.cs | tail -300

Repository: CoplayDev/unity-mcp

Length of output: 15288

🏁 Script executed:

# Search for where telemetry_status is handled - might be in a different file rg -rn 'telemetry_status' --type=cs -B 5 -A 15 | grep -E '(case|return|new|telemetry_status|dispatcher|health)' | head -60

Repository: CoplayDev/unity-mcp

Length of output: 4442

🏁 Script executed:

# Search for dispatcher or health related code rg -rn 'dispatcher' --type=cs | head -20

Repository: CoplayDev/unity-mcp

Length of output: 2117

🏁 Script executed:

cat -n TestProjects/UnityMCPTests/Assets/Tests/EditMode/Tools/PropertyConversionErrorHandlingTests.cs | sed -n '207,235p'

Repository: CoplayDev/unity-mcp

Length of output: 1502

🏁 Script executed:

# Search for where telemetry might be handled as a resource fd -type f -name '*.cs' | xargs grep -l 'telemetry' | head -20

Repository: CoplayDev/unity-mcp

Length of output: 233

🏁 Script executed:

# Search for telemetry in all files find . -name '*.cs' -type f | xargs grep -l telemetry

Repository: CoplayDev/unity-mcp

Length of output: 9188

🏁 Script executed:

# Look for resource implementations find . -name '*Resource*.cs' -type f | head -10

Repository: CoplayDev/unity-mcp

Length of output: 184

🏁 Script executed:

cat -n MCPForUnity/Editor/Helpers/TelemetryHelper.cs

Repository: CoplayDev/unity-mcp

Length of output: 9323

🏁 Script executed:

# Look for how telemetry_status action is actually handled find . -name '*.cs' -type f -exec grep -l 'telemetry_status.*action\|HandleCommand.*telemetry' {} \; | head -20

Repository: CoplayDev/unity-mcp

Length of output: 265

🏁 Script executed:

# Search for response classes that might include dispatcher health find . -name '*Response*.cs' -type f | head -20

Repository: CoplayDev/unity-mcp

Length of output: 1298

🏁 Script executed:

# Look for a McpServer or handler that might route telemetry_status find . -name '*Server*.cs' -o -name '*Handler*.cs' -o -name '*Dispatcher*.cs' | head -20

Repository: CoplayDev/unity-mcp

Length of output: 922

🏁 Script executed:

cat -n MCPForUnity/Editor/Services/Transport/TransportCommandDispatcher.cs

Repository: CoplayDev/unity-mcp

Length of output: 17315

🏁 Script executed:

# Find CommandRegistry find . -name 'CommandRegistry.cs' -type f

Repository: CoplayDev/unity-mcp

Length of output: 108

🏁 Script executed:

# Search for telemetry_status registration grep -rn 'telemetry_status' --include='*.cs' | grep -v Test | grep -v '.Trim()'

Repository: CoplayDev/unity-mcp

Length of output: 45

🏁 Script executed:

cat -n MCPForUnity/Editor/Tools/CommandRegistry.cs

Repository: CoplayDev/unity-mcp

Length of output: 18443

🏁 Script executed:

# Find telemetry_status as a resource or tool find . -name '*.cs' -type f -exec grep -l 'telemetry_status\|TelemetryStatus' {} \; | grep -v Test | grep -v '.Trim()'

Repository: CoplayDev/unity-mcp

Length of output: 45

🏁 Script executed:

# Search for McpForUnityResource or McpForUnityTool with telemetry context grep -rn 'McpForUnityResource\|McpForUnityTool' --include='*.cs' | grep -v Test | head -20

Repository: CoplayDev/unity-mcp

Length of output: 2277

🏁 Script executed:

# Search for all HandleCommand methods across the codebase grep -rn 'public static.*HandleCommand' --include='*.cs' | grep -v Test

Repository: CoplayDev/unity-mcp

Length of output: 3557

🏁 Script executed:

# Look for files that might contain telemetry action find . -path ./TestProjects -prune -o -name '*.cs' -type f -print | xargs grep -l 'action.*telemetry\|telemetry.*action' | head -20

Repository: CoplayDev/unity-mcp

Length of output: 45

🏁 Script executed:

# Read the full test to see what it actually checks cat -n TestProjects/UnityMCPTests/Assets/Tests/EditMode/Tools/PropertyConversionErrorHandlingTests.cs | sed -n '200,250p'

Repository: CoplayDev/unity-mcp

Length of output: 2599

🏁 Script executed:

# Let's also check if ManageEditor has a default case that might handle telemetry grep -n 'default:' MCPForUnity/Editor/Tools/ManageEditor.cs -A 10

Repository: CoplayDev/unity-mcp

Length of output: 616

Test will fail with unknown action error before reaching health assertions.

The test calls ManageEditor.HandleCommand with action telemetry_status, but this action is not implemented in ManageEditor.cs (lines 50-141). The switch statement only handles: play, pause, stop, set_active_tool, add_tag, remove_tag, add_layer, remove_layer. Any unknown action hits the default case and returns an ErrorResponse, so the test will fail immediately rather than proceed to check telemetry response content.

Either:

Implement telemetry_status handling in ManageEditor, or

Route telemetry_status to the correct handler if it exists elsewhere.

Once the action is properly implemented, the suggestion to assert on health fields (dispatcher status, success flags) aligns with issue #654's intent and should be added.

🤖 Prompt for AI Agents

In `@TestProjects/UnityMCPTests/Assets/Tests/EditMode/Tools/PropertyConversionErrorHandlingTests.cs` around lines 207 - 235, ManageEditor.HandleCommand currently lacks a case for the "telemetry_status" action so unknown actions hit the default ErrorResponse; update ManageEditor.HandleCommand (ManageEditor.cs) to handle "telemetry_status" by either implementing a new branch that gathers and returns the telemetry health fields (dispatcher status, success flags, etc.) or routing the action to the existing telemetry handler (if one exists), ensuring the switch or dispatch logic recognizes "telemetry_status" and returns a success response object used by the test.

dsarno and others added 3 commits January 30, 2026 15:28

Merge remote-tracking branch 'upstream/beta' into fix/ping-pong-heart…

4a58fad

…beat # Conflicts: # Server/src/transport/plugin_hub.py # Server/src/transport/unity_transport.py

sourcery-ai bot reviewed Jan 30, 2026

View reviewed changes

coderabbitai bot reviewed Jan 30, 2026

View reviewed changes

dsarno mentioned this pull request Jan 31, 2026

Make domain reload detection more deterministic (reduce 20s timeout) #657

Open

dsarno force-pushed the fix/ping-pong-heartbeat branch from 2bfffe7 to 0c4c5cc Compare January 31, 2026 00:38

dsarno merged commit 4d96941 into CoplayDev:beta Jan 31, 2026
1 of 2 checks passed

coderabbitai bot reviewed Jan 31, 2026

View reviewed changes

		t0 = time.time()
		logger.info("[TIMING-STDIO] send_command START: command=%s", command_type)

Fix WebSocket connection reliability and domain reload recovery #656

Fix WebSocket connection reliability and domain reload recovery #656

Conversation

dsarno commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Ping/Pong Heartbeat (fixes #643)

Domain Reload Recovery (fixes #654)

Code Quality

Test Fixes

Test plan

Uh oh!

sourcery-ai bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for WebSocket ping_pong heartbeat and dead connection detection

Sequence diagram for HTTP vs stdio transport session resolution with PluginHub

Class diagram for PluginHub heartbeat and related transport models

Flow diagram for send_command_with_retry and reload timeout handling

File-Level Changes

Assessment against linked issues

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dsarno commented Jan 30, 2026 •

edited

Loading

sourcery-ai bot commented Jan 30, 2026 •

edited

Loading

coderabbitai bot commented Jan 30, 2026 •

edited

Loading