feat(mcp): reconnect_with_backoff — exponential retry for transient failures#235
Merged
emal-avala merged 1 commit intomainfrom Apr 23, 2026
Merged
feat(mcp): reconnect_with_backoff — exponential retry for transient failures#235emal-avala merged 1 commit intomainfrom
emal-avala merged 1 commit intomainfrom
Conversation
…ailures
Adds `McpClient::reconnect_with_backoff(max_attempts)` which drops the
stale transport and retries `connect()` with exponential backoff:
1s → 2s → 4s → 8s → 16s → 30s (cap) → 30s …
After every attempt fails, the client status transitions to
`McpConnectionStatus::Error(last)` and the call returns the
accumulated error, so callers can surface the failure cleanly.
The backoff schedule is factored into a pure `backoff_delay_ms`
function so the curve can be unit-tested without a real transport.
Huge attempt counts are clamped to avoid shift overflow.
Use case: an MCP subprocess that died mid-session can now be brought
back without tearing down the agent loop.
Tests:
- schedule doubles to cap at 30s, stable past u32::MAX attempts
- schedule is monotonic non-decreasing
- `max_attempts = 0` is rejected with a clear error
Full MCP suite: 3 pass. Clippy clean.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
McpClient::reconnect_with_backoff(max_attempts)so a transient MCP failure (subprocess died, network blip) can be recovered without tearing down the agent loop.Backoff schedule
The delay curve is a pure helper (
backoff_delay_ms) so it's unit-tested without an actual transport. Huge attempt counts are clamped to avoid1 << u32::MAXpanics.Behavior
McpTransportConnectionfirst (soconnect()installs a fresh subprocess or SSE stream).tokio::time::sleepofbackoff_delay_ms(n - 1).Connected, tools/resources re-discovered.Error(last)and the accumulated error is returned with the attempt count so callers can surface a helpful message.max_attempts = 0is rejected with a clear error rather than silently succeeding.Tests
backoff_schedule_doubles_until_cap— 1/2/4/8/16 s ramp, then flat 30 s; stable atu32::MAX.backoff_is_monotonic_non_decreasing— no accidental regressions in the curve.reconnect_zero_attempts_is_rejected— guard against caller mistakes.Test plan
cargo test -p agent-code-lib --lib services::mcp(3/3 pass)cargo clippy --workspace --tests --no-deps -- -D warningscargo fmt --all --check