Skip to content

fix(agents): L1 — crew/agent runtime reliability (parallel gather, per-tool timeout, dead code)#171

Merged
AVADSA25 merged 1 commit into
mainfrom
agents-reliability-fixes
May 31, 2026
Merged

fix(agents): L1 — crew/agent runtime reliability (parallel gather, per-tool timeout, dead code)#171
AVADSA25 merged 1 commit into
mainfrom
agents-reliability-fixes

Conversation

@AVADSA25
Copy link
Copy Markdown
Owner

Summary

Four findings from a focused review of codec_agents.py (the self-contained multi-agent runtime). All reliability/correctness — no behavior change for the happy path.

# Severity Line Fix
1 High :872 Parallel crews no longer sink on one agent's failuregather now return_exceptions=True
2 High _execute_tool_with_hooks Per-tool 120s wall-clock budget — a hanging skill can't hang the agent forever
3 Medium :838 Dead tautology removed (agent.tools != agent.tools)
4 Low :72 Dead eager SERPER_API_KEY global removed (Keychain shellout at import, never read)

Detail

1. Parallel crew isolation. asyncio.gather(*coros) had no return_exceptions=True. If one agent raised (e.g. a stuck-detection abort), every other agent's result was discarded and the whole crew failed — while sequential mode isolates each agent with try/except. Parallel now degrades gracefully: a failed agent contributes [Name failed: Type: msg], the rest still return and join, and each failure gets its own agent_finish (outcome=error) audit.

2. Per-tool timeout. Built-in tools have internal timeouts (web_fetch 30s, shell 30s) but arbitrary skill tools invoked via _make_lazy_fn had none — input(), a no-timeout network call, or a deadlock would hang the agent and its default-thread-pool worker indefinitely. The executor call is now asyncio.wait_for(..., 120s); on timeout the agent gets a recoverable error string + a tool_result/timeout audit and its ReAct loop continues. Documented residual: the abandoned worker thread can't be killed (no thread.kill in Python) and stays parked until the blocking call returns — the generous 120s budget keeps that rare and tools doing real work finish well inside it.

3 & 4. Pure cleanup — the tautology collapsed to the existing if stripped: (allowlist scoping unchanged, pinned by test); the SERPER_API_KEY global was never read (web_search fetches its own key via codec_search.search). The _serper_api_key() getter stays (live reads + satisfies the test_remaining_secrets source scan).

Test plan

  • tests/test_agents_reliability_l1.py — 7 tests: parallel degrade vs all-succeed, hanging-tool-bounded (measures the coroutine's return at the budget, not the orphaned thread's 2s finish), fast-tool-still-works, allowlist-strips + no-allowlist-keeps-all, dead-global-gone
  • Existing 236 agent/crew tests still pass (no regression)
  • python3.13 -m pytest --ignore=tests/test_skills.py -q2,062 passed, 77 skipped
  • ruff check: 0 issues

Branches off main; touches only codec_agents.py — independent of the other queued PRs.

🤖 Generated with Claude Code

Four findings from the codec_agents.py review:

1. Parallel crews no longer sink on one agent's failure (codec_agents.py:872)
   asyncio.gather(*coros) had no return_exceptions=True, so if ONE agent
   raised (e.g. a stuck-detection abort), every other agent's result was
   discarded and the whole crew failed. Sequential mode isolates per-agent;
   parallel now does too — a failed agent contributes an error marker, the
   rest still return + are joined. Each failure gets its own agent_finish
   audit (outcome=error), matching sequential.

2. Per-tool wall-clock budget (codec_agents.py:_execute_tool_with_hooks)
   A skill tool with no internal timeout (input(), a no-timeout network call,
   a deadlock) would hang the agent — and its default-thread-pool worker —
   forever. The executor call is now wrapped in asyncio.wait_for(120s); on
   timeout the agent gets a recoverable error string (+ a tool_result/timeout
   audit) and its ReAct loop continues. Documented residual: the abandoned
   worker thread can't be killed and stays parked until the blocking call
   returns — the generous budget keeps that rare.

3. Dead tautology removed (codec_agents.py:838)
   `if agent.tools != agent.tools or before != len(agent.tools)` — the first
   operand compares a list to itself (always False); the condition collapsed
   to the inner `if stripped:`. Simplified; allowlist scoping behavior is
   unchanged (pinned by test).

4. Dead eager module global removed (codec_agents.py:72)
   `SERPER_API_KEY = _serper_api_key()` was never read (web_search routes
   through codec_search.search, which fetches the key itself) yet did a
   Keychain shellout at every import. Dropped; the _serper_api_key() getter
   stays for live reads (+ satisfies the test_remaining_secrets source scan).

Test surface: tests/test_agents_reliability_l1.py (7 tests) — parallel
degrade-vs-all-succeed, hanging-tool-bounded (coroutine returns at the budget,
not the orphaned thread's finish), fast-tool-still-works, allowlist-strips +
no-allowlist-keeps-all, dead-global-gone.

Existing 236 agent/crew tests still pass. Full suite: 2,062 passed / 77 skipped.
ruff clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@AVADSA25 AVADSA25 merged commit 5e94289 into main May 31, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants