Skip to content

fix: reduce WS spawn pre-registration timeout from 15s to 3s#597

Merged
khaliqgant merged 5 commits intomainfrom
fix/codex-spawn-regression
Mar 20, 2026
Merged

fix: reduce WS spawn pre-registration timeout from 15s to 3s#597
khaliqgant merged 5 commits intomainfrom
fix/codex-spawn-regression

Conversation

@khaliqgant
Copy link
Member

@khaliqgant khaliqgant commented Mar 20, 2026

Summary

  • Reduces REG_TIMEOUT from 15s to 3s in both WS spawn handlers in src/main.rs, fixing a regression introduced in PR fix: MCP tools unavailable for agents spawned via agent_add #591 where agent spawns (especially Codex) were delayed by a blocking HTTP call
  • Adds ~/.local/bin, ~/.opencode/bin, ~/.claude/local to fallback PATH in src/pty.rs for CLI binary resolution

Root cause

PR #591 added a synchronous register_agent_token() HTTP call with a 15s timeout in the WS event loop before spawning agents. Combined with the existing 25s boot marker timeout, Codex agents could take ~40s to become ready — exceeding caller timeouts and causing apparent spawn failures. Claude agents were unaffected because they bake the token into their MCP JSON config.

Test plan

  • Spawn a Codex agent via agent_add and verify it comes online within ~5s (not 25-40s)
  • Verify Claude agent spawns still work normally
  • Verify agent self-registration works when pre-registration times out (disconnect network briefly during spawn)
  • Verify CLI binaries in ~/.local/bin are found by PTY spawner

🤖 Generated with Claude Code


Open with Devin

PR #591 added a synchronous register_agent_token() HTTP call with a 15s
timeout in the WS event loop before spawning agents. This blocked the
event loop and delayed Codex agent spawns by up to 15s (on top of the
existing 25s boot marker timeout), causing apparent spawn failures.

Reduce the timeout to 3s so the spawn proceeds quickly. On timeout or
failure, the agent self-registers via its MCP server (pre-#591 behavior).

Also adds ~/.local/bin, ~/.opencode/bin, ~/.claude/local to the fallback
PATH in pty.rs so CLIs installed in user-local directories are found.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

Claude bakes the API key into --mcp-config JSON and self-registers
reliably, so the blocking HTTP registration call is unnecessary.
Non-Claude CLIs still get a 3s registration attempt since they need
the token injected into their CLI args at spawn time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

khaliqgant and others added 2 commits March 20, 2026 12:48
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Issue 1: Keep dedup seeding before spawn (so WS echoes during spawn are
deduplicated) but remove the dedup entry if spawn fails, preventing
failed spawns from blocking retries for the 5-minute dedup window.
Adds DedupCache::remove() and remove_local_spawn_control_dedup().

Issue 2: Already fixed in prior commit (parse_cli_command before
normalize_cli_name for is_claude check).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

When a second spawn request for an already-running agent fails with
"already exists", we must not remove the dedup entry from the first
successful spawn. Doing so would allow WebSocket echoes through.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@khaliqgant khaliqgant merged commit 09e38c4 into main Mar 20, 2026
39 of 40 checks passed
@khaliqgant khaliqgant deleted the fix/codex-spawn-regression branch March 20, 2026 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant