Skip to content

Conversation

@ammar-agent
Copy link
Collaborator

Reduces max SSH backoff from 60s to 10s for faster recovery after transient failures.

Changes:

  • Backoff schedule: [1, 2, 4, 7, 10] seconds (was [1, 5, 10, 20, 40, 60])
  • Add ±20% jitter to prevent thundering herd when different hosts recover simultaneously
  • Same-host serialization via singleflighting remains unchanged (herd only released on success)

Tests added:

  • Backoff caps at ~10s with jitter
  • Callers waking from backoff share single probe

Generated with mux • Model: anthropic:claude-opus-4-5 • Thinking: high

- Backoff schedule: [1, 2, 4, 7, 10] seconds (was [1, 5, 10, 20, 40, 60])
- Add ±20% jitter to prevent thundering herd across different hosts
- Same-host serialization via singleflighting remains unchanged
- Added tests for backoff cap and herd serialization
@ammario ammario merged commit ffe9c34 into main Dec 20, 2025
20 checks passed
@ammario ammario deleted the ssh-backoff-p1se branch December 20, 2025 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants