Skip to content

fix(session): retry network errors, cap at 3, add retry_exhausted status#28792

Open
OrShmuel22 wants to merge 13 commits into
anomalyco:devfrom
OrShmuel22:dev
Open

fix(session): retry network errors, cap at 3, add retry_exhausted status#28792
OrShmuel22 wants to merge 13 commits into
anomalyco:devfrom
OrShmuel22:dev

Conversation

@OrShmuel22
Copy link
Copy Markdown

@OrShmuel22 OrShmuel22 commented May 22, 2026

Issue for this PR

Closes #20822, #21716, #21893, #23287
Related #19394, #20466, #22448, #26369

Type of change

  • Bug fix

What does this PR do?

Network errors (ECONNRESET, ECONNREFUSED, ETIMEDOUT, fetch failed, socket hang up) were never retried — retryable() only matched rate limits and 5xx. Sessions halted with no recovery path. Had to ESC and type "continue" every time laptop slept or WiFi blinked.

Changes:

  1. Network error patterns in retryable() — Added ECONNRESET, ECONNREFUSED, ETIMEDOUT, ECONNABORTED, fetch failed, Failed to fetch, socket hang up, network error, connection reset/refused/timeout. Also handles nested error envelopes (server_error, upstream_error, stream_read_error, service_unavailable_error) and fixes the OpenRouter numeric code bug.

  2. Caps retries at 3 (RETRY_MAX_ATTEMPTS = 3, 2→4→8s backoff ≈14s budget). Prevents infinite-retry loops.

  3. retry_exhausted status — When retries run out on a retryable error, status is set to retry_exhausted instead of idle. TUI shows error message with enter retry · esc dismiss. Enter re-sends last user message. Escape dismisses. Subagents skip this status and fall through to idle+error so parent handles it.

Why this approach:

  • Targets the root cause (retryable() whitelist too narrow) without over-engineering
  • Reuses existing retry infrastructure — no new UI states, no new components
  • Subagent recovery works through parent: error → tool failure → parent retry_exhausted → user retries → subagent re-invoked

How did you verify your code works?

  • bun test test/session/ — 364 pass, 0 fail
  • bun test test/session/retry.test.ts — 49 pass, 0 fail
  • bun run typecheck — clean, no errors
  • No existing tests broken
  • Double-submit guarded (optimistic busy flip + revert on failure)
  • Escape dismisses via server-side abort (not local-only)
  • Subagents never show retry_exhausted (parentID check before status set)

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

OrShmuel22 and others added 13 commits May 15, 2026 05:43
When a parent agent (e.g. an orchestrator) has edit:deny and spawns a
subagent (e.g. an editor) that has edit:allow, the parent's deny was
unconditionally inherited into the subagent's session permission. Because
permission evaluation is last-match-wins, the inherited deny overrode the
subagent's own allow — removing the edit tool from the subagent's palette.

Fix: only inherit parent edit:deny rules when the subagent does NOT
explicitly declare edit:allow. If a subagent says it can edit, the parent's
self-restriction should not override that declared capability.

This preserves Plan Mode security: subagents without explicit edit
declarations (like general, explore) still inherit the parent's edit:deny
as before.

Relates to anomalyco#26700 anomalyco#26747 anomalyco#26758 anomalyco#27123
… max retries at 3

- Add RETRY_MAX_ATTEMPTS = 3 to prevent infinite retry loops
- Add NETWORK_ERROR_PATTERNS for ECONNRESET, ECONNREFUSED, ETIMEDOUT,
  fetch failed, socket hang up, network error, connection reset/refused/timeout
- Add nested error envelope inspection (server_error, upstream_error,
  stream_read_error, service_unavailable_error)
- Fix OpenRouter numeric code bug (typeof json.code === 'number')
- Add comprehensive test coverage for all new retry patterns

Closes anomalyco#20822, anomalyco#21716, anomalyco#21893, anomalyco#23287
Related anomalyco#19394, anomalyco#20466, anomalyco#22448, anomalyco#26369
- Add retry_exhausted to SessionStatus.Info schema union
- Add retriesExhausted tracking to ProcessorContext
- Detect exhausted retries in halt() and set retry_exhausted status
- Return retry_exhausted from process() when retries are exhausted
- Preserve retry_exhausted status in run-state onIdle (don't reset to idle)
- Handle retry_exhausted in compaction.ts (treat as stop)
- Add tests for retry_exhausted status lifecycle
@github-actions github-actions Bot added needs:issue needs:compliance This means the issue will auto-close after 2 hours. labels May 22, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

@github-actions github-actions Bot removed needs:compliance This means the issue will auto-close after 2 hours. needs:issue labels May 22, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for updating your PR! It now meets our contributing guidelines. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(session): UnknownError should be retried by default

1 participant