Fix flaky e2e test: use PID-based unique socket paths to eliminate bridge races#156
Merged
Fix flaky e2e test: use PID-based unique socket paths to eliminate bridge races#156
Conversation
The `restartSession` function calls `showServerDetails` after successfully starting a new bridge. Under CI load, this health check can fail transiently (bridge socket not yet ready), causing the entire restart command to fail even though the restart itself succeeded. Make `showServerDetails` non-fatal in the restart path: catch errors, reset session status to 'active' (since the bridge was started), and show a warning instead of failing. This fixes the flaky sessions/mcp-session e2e test. https://claude.ai/code/session_0138WV4ffyNZzBFuzoTDzuBn
When tests run in parallel with a shared home directory, a background bridge reconnect (triggered by consolidateSessions in a parallel CLI process) can race with an explicit restart. The exiting bridge's cleanup() would delete the socket file at this.socketPath — but a NEW bridge for the same session may have already created its socket at that same path. This causes the new bridge's socket to vanish, producing the ENOENT error seen in the flaky sessions/mcp-session test. Fix: stop deleting the socket file in the bridge's cleanup(). Stale sockets are already cleaned up in three other places: - startBridge() removes old sockets before spawning a new bridge - createSocketServer() removes old sockets before listening - consolidateSessions() removes sockets for expired/unauthorized sessions https://claude.ai/code/session_0138WV4ffyNZzBFuzoTDzuBn
When tests run in parallel with a shared home directory, a background reconnect (triggered by consolidateSessions in a parallel CLI process) can race with an explicit restart. Both start bridges for the same session, and the exiting bridge's cleanup deletes the new bridge's socket at the shared path, causing ENOENT. Fix by giving each bridge instance a unique socket path based on its PID: `@session-name.<pid>.sock`. Since each bridge owns its own path, cleanup never conflicts with other bridges. Changes: - getSocketPath() accepts optional `pid` parameter for unique paths - Bridge uses process.pid to compute its socket path - startBridge computes socket path after spawn (when PID is known) - ensureBridgeReady reads session PID to find the right socket - ensureBridgeReady sets lastConnectionAttemptAt before restarting, preventing parallel consolidateSessions from triggering concurrent background restarts for the same session - consolidateSessions cleans up PID-based sockets when clearing dead bridge PIDs and removing expired sessions - SessionClient reconnect logic re-reads session after restartBridge to get the new PID-based socket path https://claude.ai/code/session_0138WV4ffyNZzBFuzoTDzuBn
- Use the PID returned by restartBridge() directly instead of re-reading sessions.json via getSession() - Add cleanupOrphanedSockets() to remove stale PID-based socket files from the bridges directory during `mcpc clean`. Only deletes sockets older than 5 minutes (configurable) to avoid racing with a bridge that was just spawned. - Wire orphaned socket cleanup into cleanStale/cleanAll and display https://claude.ai/code/session_0138WV4ffyNZzBFuzoTDzuBn
Every call site always passes a PID. Making it optional would silently produce a path without the PID suffix that no bridge ever listens on. https://claude.ai/code/session_0138WV4ffyNZzBFuzoTDzuBn
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
@session.1234.sock), eliminating races where an exiting bridge deletes a new bridge's socketensureBridgeReadysetslastConnectionAttemptAtbefore restarting, preventing parallel CLI processes from triggering concurrent background restarts viaconsolidateSessionsmcpc cleannow removes orphaned PID-based socket files (older than 5 min) from the bridges directoryRoot cause
When e2e tests run with
--parallel 6and a shared home directory, a parallel CLI process runningmcpc --jsontriggersconsolidateSessions()→reconnectCrashedSessions(). This starts a background bridge restart for a session whose bridge was just killed by another test. Both the background restart and the test's own restart spawn bridges that share the same socket path (@session-name.sock). When the losing bridge exits, itscleanup()deletes the socket file — which now belongs to the winning bridge — causingENOENT.Changes
getSocketPath(name, pid?)— accepts optional PID to produce unique pathsprocess.pidfor its socket path; cleanup safely deletes only its own unique socketstartBridge— computes socket path after spawn (when PID is known); no longer needs pre-spawn socket cleanupensureBridgeReady— reads session PID to find the right socket; setslastConnectionAttemptAtbefore restarting to block parallel restarts; uses PID fromrestartBridgereturn value directlySessionClientreconnect — uses PID fromrestartBridgereturn value directly (no extragetSessioncall)consolidateSessions— cleans up PID-based sockets when clearing dead PIDs and removing expired sessionscleanupOrphanedSockets()— new function that globs the bridges directory and removes stale socket files not matching any active session+PID, using mtime to avoid racing with just-spawned bridgesmcpc clean— wired up orphaned socket cleanup into both default and--allmodesTest plan
npm run lintpassesnpm run buildpassesnpm run test:unit— 500/500 passsessions/mcp-sessione2e test passes (previously flaky)sessions/close,sessions/restart,sessions/lifecycleall pass--parallel 6) — no regressions from this changehttps://claude.ai/code/session_0138WV4ffyNZzBFuzoTDzuBn