Skip to content

feat: Wire rate limiting & circuit breaker into watch command (#515)#522

Closed
tamirdresher wants to merge 6 commits intobradygaster:devfrom
tamirdresher:tamirdresher/515-rate-limit-watch-integration
Closed

feat: Wire rate limiting & circuit breaker into watch command (#515)#522
tamirdresher wants to merge 6 commits intobradygaster:devfrom
tamirdresher:tamirdresher/515-rate-limit-watch-integration

Conversation

@tamirdresher
Copy link
Copy Markdown
Collaborator

@tamirdresher tamirdresher commented Mar 22, 2026

Summary

Wires the rate limiting SDK (PR #518) into Ralph's watch command polling loop.

Changes

  • \�xecuteRound()\ wraps
    unCheck()\ with circuit breaker state machine (closed/open/half-open)
  • Predictive circuit breaker opens BEFORE hitting 429 based on quota trend
  • Exponential cooldown: 10min → 20min → 40min (capped at 60min)
  • \ghRateLimitCheck()\ + \isRateLimitError()\ utilities in \gh-cli.ts\
  • 7 integration tests

Fixes (Q Review)

  • Race condition guard: Added
    oundInProgress\ flag to prevent overlapping \setInterval\ rounds when a round takes longer than the interval
  • Dead code removed: Removed unused \getRetryDelay\ call and import
  • Unused import removed: \shouldProceed\ was imported but never used

Architecture Note: Model Fallback

This circuit breaker handles GitHub API rate limits (429 from \gh\ CLI). It is intentionally separate from the SDK's \ModelFallbackExecutor\ which handles LLM model switching (trying cheaper models when premium ones fail). The watch command makes no LLM calls — it uses deterministic triage logic — so model fallback does not apply here. If LLM-based triage is added in the future, \ModelFallbackExecutor\ should be integrated at that point.

Testing

  • 29/29 rate limiting tests pass (22 SDK + 7 CLI)
  • Full suite: 4818 pass, 0 new failures
  • 2 pre-existing failures (same as \dev\ branch)

Closes #515
Depends on #518

…ygaster#515)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot and others added 3 commits March 23, 2026 00:21
- Add non-null assertions for array indexing in PredictiveCircuitBreaker
- Add ./ralph/rate-limiting export entry to SDK package.json

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…aster#515)

Integrates PredictiveCircuitBreaker from squad-sdk into Ralph's watch loop:
- Pre-round rate limit check via gh api rate_limit
- Traffic light gating (GREEN/AMBER/RED)
- Predictive circuit opening before 429
- Exponential backoff on rate limit errors
- Half-open recovery with 2-success threshold
- State persistence to .squad/ralph-circuit-breaker.json
- Adds ghRateLimitCheck() and isRateLimitError() to gh-cli.ts

Depends on bradygaster#518 for the SDK module.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds ./core/gh-cli subpath export to CLI package.json so the
rate limit utilities (isRateLimitError, ghRateLimitCheck) are
accessible from tests and external consumers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@tamirdresher tamirdresher force-pushed the tamirdresher/515-rate-limit-watch-integration branch from 9187404 to d97cd7c Compare March 22, 2026 22:24
Copilot and others added 2 commits March 23, 2026 05:30
- Add roundInProgress flag to prevent overlapping setInterval rounds
- Remove unused getRetryDelay import and dead variable (line 391)
- Remove unused shouldProceed import

Fixes Q review findings: race condition when executeRound() takes
longer than interval causes double-triaging. Now skips round if
previous is still running.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- roundInProgress prevents overlapping concurrent rounds
- roundInProgress resets to false after a round throws (finally block)
- Rate limit amber traffic light blocks P1/P2 agents
- Rate limit red traffic light blocks all agents
- PredictiveCircuitBreaker stays closed with no samples

Fixes Q review finding: only 7 tests for 534 lines of new code
Copy link
Copy Markdown
Owner

@bradygaster bradygaster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The circuit breaker state machine and exponential backoff implementation are solid — this is clearly production-informed code. However, the delivery approach needs adjustment:\n\nMain issue: Full rewrite of watch.ts. This PR deletes all 355 lines of the existing watch.ts and replaces them with 534 new lines. The actual new functionality (executeRound wrapper, CircuitBreakerState, circuit breaker lifecycle) is modest — it should be addable as a surgical patch on top of the existing file rather than a full rewrite.\n\nWhy this matters:\n- Creates guaranteed merge conflicts with #520 (which also modifies watch.ts)\n- Removes the existing try/catch error handling from
unCheck, letting all errors from \ghIssueList\ crash the round (behavior change)\n- Makes future watch.ts changes harder to review\n\nRequest: Please rework this as an additive patch — keep the existing watch.ts structure and wrap the circuit breaker around the existing flow. The algorithms are great; just the delivery needs to change.\n\nA few smaller notes:\n- \saveCBState\ is called with \�wait\ but uses \writeFileSync\ — harmless but inconsistent\n- ./core/gh-cli\ export added to CLI package.json exposes an internal module as public API — is that intentional?\n\nHappy to discuss the approach if you'd like to chat about it! 💬

@tamirdresher
Copy link
Copy Markdown
Collaborator Author

Code Review — Race Condition Analysis

Hi @bradygaster — reviewing PR #522 as a downstream consumer of the Squad framework.

Race Condition Fix ✅

The roundInProgress boolean guard in the setInterval callback is the correct approach for single-threaded Node.js. The check-and-set is synchronous before any await, so there's no atomicity concern. The try/finally guarantees the lock releases even when executeRound throws. Two tests were added that verify both the happy path and the throw-and-release case. This fix is solid.

One Small Issue: await on a sync function

saveCBState is declared as void/synchronous (uses writeFileSync internally), but is called with await in several places:

await saveCBState(squadDir, cbState); // await on void — no-op

This is harmless but misleading. Suggest either:

Option A — make it truly async (preferred if disk I/O matters under load):

async function saveCBState(squadDir: string, cbState: CircuitBreakerState): Promise<void> {
  await fs.promises.writeFile(
    path.join(squadDir, 'ralph-circuit-breaker.json'),
    JSON.stringify(cbState, null, 2)
  );
}

Option B — drop the await at call sites (simpler):

saveCBState(squadDir, cbState); // synchronous, fine for this use case

Test Gap: executeRound State Transitions

The circuit breaker state machine (open → half-open → closed) has no direct tests. The roundInProgress guard tests are great, but the core logic (e.g., "cooldown expires → half-open", "2 consecutive successes → closed") is not covered. Worth adding before merge:

it('transitions to half-open after cooldown expires', async () => { ... });
it('closes circuit after 2 consecutive successes in half-open', async () => { ... });
it('re-opens circuit on 429 in half-open state', async () => { ... });

The executeRound function signature has all the deps injected already — just needs mocks for ghRateLimitCheck and runCheck.


The core algorithms here are well-designed. The await writeFileSync is a minor cleanup and the state machine tests are a medium-priority gap. Otherwise this is production-quality rate limiting code.

@tamirdresher
Copy link
Copy Markdown
Collaborator Author

Closing this in favor of a reworked PR that addresses all feedback:

  • Additive patch — existing
    unCheck, \checkPRs,
    eportBoard\ are completely untouched
  • \saveCBState\ is synchronous — uses \writeFileSync, no misleading \�wait\
  • State machine tests — 16 new tests covering all transitions

  • oundInProgress\ guard
    — prevents overlapping setInterval rounds
  • ./core/gh-cli\ export unchanged — rate limit helpers stay internal

New PR incoming from \ amirdresher/515-rate-limit-watch-v2\ branch.

chrislomonico pushed a commit to clomonico/squad that referenced this pull request Mar 26, 2026
…r#546)

Consolidated 15 remaining issues into a single PR. Redesigns help UX with grouped categories, surfaces all /help commands, aligns docs with actual CLI behavior, adds first-run welcome, standardizes naming.

Closes bradygaster#510, Closes bradygaster#513, Closes bradygaster#514, Closes bradygaster#515, Closes bradygaster#516, Closes bradygaster#517, Closes bradygaster#518, Closes bradygaster#521, Closes bradygaster#522, Closes bradygaster#523, Closes bradygaster#524, Closes bradygaster#525, Closes bradygaster#527, Closes bradygaster#528, Closes bradygaster#529

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Cooperative Rate Limiting & Predictive Circuit Breaker for Multi-Agent Deployments

2 participants