Conversation
- Added BACKOFF_BASE_DELAY, BACKOFF_MAX_DELAY, BACKOFF_MULTIPLIER constants - Implemented calculate_backoff_delay() method with exponential growth formula - Added wait_seconds field to RecoveryAction dataclass - Integrated backoff delays into determine_recovery_action() for retry actions - Prevents API rate limiting and thundering herd problems - Follows pattern from apps/backend/runners/github/rate_limiter.py
…covery_action - Add use_model_fallback field to RecoveryAction dataclass - Enable model fallback after first retry attempt for VERIFICATION_FAILED failures - Enable model fallback on retry for UNKNOWN errors - Update docstring to document model fallback strategy (opus -> sonnet -> haiku) - Prevents model-specific failures from causing unnecessary escalations
- Import DeadLetterQueue class - Initialize DLQ in RecoveryManager.__init__ - Add add_failure_to_dlq() method with full context capture - Add convenience methods: get_dlq_pending_failures(), get_dlq_statistics(), export_dlq_report() - Integrate DLQ calls into determine_recovery_action() on escalation - Automatically add failures to DLQ when escalation action is returned - Captures attempt history, error messages, and recovery actions for manual review
Added strategy tracking statistics to RecoveryMetrics: - get_strategy_statistics(): Returns usage count and success rate per strategy - get_most_successful_strategy(): Identifies best-performing strategy - Updated get_summary() to include strategy stats - Updated format_summary() to display strategy performance Tracks success rates for retry strategies (direct_retry, model_fallback, alternative_approach) to help identify which recovery strategies are most effective.
- Added imports for FailureType, RecoveryAction, MODEL_FALLBACK_CHAIN, resolve_model_id - Added tracking variables for pending recovery actions and model fallback - Implemented pre-session recovery action handling (exponential backoff, rollback) - Replaced simple stuck-subtask logic with enhanced recovery system - Integrated smart failure classification and recovery action determination - Added support for model fallback when recommended by recovery strategy - Added recovery strategy guidance injection into prompts - Integrated notification manager for threshold-based user notifications - Handles retry, skip, escalate, rollback, and continue recovery actions - Records recovery notifications vs silent failures for metrics Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Integrated enhanced recovery system into QA fixer agent: - Added support for exponential backoff with configurable wait times - Integrated model fallback strategy (opus -> sonnet -> haiku) - Added retry strategy guidance for alternative approaches - Implemented notification thresholds to reduce noise - Added dead-letter queue integration for unrecoverable failures - Smart recovery actions: retry/skip/escalate/rollback/continue - Backward compatible with default model parameter Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Created comprehensive end-to-end verification test suite for the auto-recovery loop system. Test Results (9/9 tests passed): 1. ✅ Failure Classification - Error pattern database correctly classifies 5 failure types 2. ✅ Exponential Backoff - Verified exponential progression (0s→2s→4s→8s→16s→32s→60s) 3. ✅ Retry Strategy Selection - Verified progressive strategies (direct→fallback→alternative) 4. ✅ Recovery Action Determination - Verified correct actions, wait times, fallback flags 5. ✅ Dead-Letter Queue - Verified unrecoverable failures captured for manual review 6. ✅ Notification Thresholds - Verified silent retries below threshold, notify at threshold 7. ✅ Full Recovery Loop - Verified complete loop with progressive strategies 8. ✅ UNKNOWN Failure Escalation - Verified UNKNOWN errors escalate to DLQ 9. ✅ is_recoverable() Method - Verified correct recoverability distinction Recovery Flows Verified: - VERIFICATION_FAILED: direct_retry → model_fallback → alternative → skip (3 attempts) - UNKNOWN: direct_retry → model_fallback_alternative → escalate to DLQ (2 attempts) - BROKEN_BUILD: rollback to last good commit → escalate if no commit - CIRCULAR_FIX: skip immediately (no retry) - CONTEXT_EXHAUSTED: continue in next session All acceptance criteria met: ✅ Smart failure detection distinguishes recoverable vs unrecoverable errors ✅ Automated rollback to last working state on critical failures ✅ Retry with alternative strategies (different model, different approach) ✅ Exponential backoff prevents API rate limiting ✅ Dead-letter queue captures unrecoverable failures for manual review ✅ User notification thresholds (notify after N retries, not every failure) ✅ Recovery statistics and success rate tracking ✅ Manual intervention option with preserved state Phase 6 (Integration) now complete (3/3 subtasks). Feature 149-auto-recovery-loop-enhancement COMPLETE (13/13 subtasks). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added comprehensive summary document for the completed Auto-Recovery Loop Enhancement. Includes: - All 13 completed subtasks across 6 phases - Verification results (9/9 tests passed, 100% pass rate) - Recovery loop behavior for each failure type - Files modified/created - All acceptance criteria met - Usage instructions for the enhanced recovery system Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
PR Summary: Auto-Recovery Loop Enhancement: adds smart failure classification, exponential backoff, progressive retry strategies (including model fallback), automated rollback, dead-letter queue, notification thresholds, and recovery strategy metrics — integrated into coder and QA fixer with a full end-to-end test suite. Key changes:
Files added/modified (high-level):
Acceptance: feature verified by end-to-end tests and marked ready for production in verification summary. |
|
Auto review disabled due to large PR. If you still want me to review this PR? Please comment |
…to-recovery-loop-enhancement # Conflicts: # apps/backend/agents/coder.py
…ments - Resolve merge conflicts in coder.py (integrate notify_stuck_subtask from develop) - Remove unused imports: FailureType in coder.py and fixer.py - Remove unused variable: attempt_count in fixer.py line 552 - Remove unused imports in test_recovery_loop.py: asyncio, json, RecoveryAction, RetryStrategy, DeadLetterQueue, NotificationManager - Fix ruff I001 (import sorting) in dead_letter_queue.py, notification_manager.py, test_recovery_e2e.py - Fix ruff F541 (f-string without placeholders) in fixer.py, test_recovery_loop.py - Apply ruff format fixes (line length, trailing commas) across 6 files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR implements a six-phase auto-recovery loop enhancement across the backend system. It introduces new components (DeadLetterQueue, NotificationManager), extends the RecoveryManager with exponential backoff and multi-strategy retry logic, integrates recovery workflows into coder and QA fixer agents, and includes comprehensive end-to-end tests verifying failure classification, retry strategies, notifications, and DLQ integration. Changes
Sequence DiagramsequenceDiagram
participant Agent as Coder/Fixer Agent
participant RecoveryMgr as RecoveryManager
participant Classifier as Failure Classifier
participant RetrySelector as Retry Strategy Selector
participant Backoff as Backoff Calculator
participant DLQ as DeadLetterQueue
participant NotifyMgr as NotificationManager
Agent->>RecoveryMgr: check_and_recover(subtask_id, error)
RecoveryMgr->>Classifier: classify_failure(error_message)
Classifier-->>RecoveryMgr: FailureType
alt Recoverable Failure
RecoveryMgr->>RetrySelector: select_retry_strategy(failure_type, attempt_count)
RetrySelector-->>RecoveryMgr: RetryStrategy or None
alt Strategy Available
RecoveryMgr->>Backoff: calculate_backoff_delay(attempt_count)
Backoff-->>RecoveryMgr: wait_seconds (exponential)
RecoveryMgr->>NotifyMgr: should_notify(subtask_id, attempt_count)
NotifyMgr-->>RecoveryMgr: bool
alt Should Notify
RecoveryMgr->>NotifyMgr: record_notification(details)
NotifyMgr-->>RecoveryMgr: recorded
end
RecoveryMgr-->>Agent: RecoveryAction(retry, wait_seconds, fallback_model)
else No Strategy
RecoveryMgr->>DLQ: add_failure_to_dlq(context)
DLQ-->>RecoveryMgr: persisted
RecoveryMgr->>NotifyMgr: record_notification(escalation)
NotifyMgr-->>RecoveryMgr: recorded
RecoveryMgr-->>Agent: RecoveryAction(escalate)
end
else Unrecoverable Failure
RecoveryMgr->>DLQ: add_failure_to_dlq(unrecoverable_context)
DLQ-->>RecoveryMgr: persisted
RecoveryMgr-->>Agent: RecoveryAction(escalate)
end
Agent->>Agent: Apply recovery action (retry, skip, escalate, rollback, continue)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
| retry strategies, dead-letter queue, and notification thresholds. | ||
| """ | ||
|
|
||
| import json |
Check notice
Code scanning / CodeQL
Unused import Note test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
In general, unused imports should be removed to reduce clutter and avoid unnecessary dependencies. The best fix here is to delete the unused import json line and leave all other imports and code intact, preserving existing functionality.
Concretely, in apps/backend/test_recovery_e2e.py, remove line 11 (import json) from the imports section at the top of the file. No additional methods, imports, or definitions are required, since the module is not used anywhere in the shown code.
| @@ -8,7 +8,6 @@ | ||
| retry strategies, dead-letter queue, and notification thresholds. | ||
| """ | ||
|
|
||
| import json | ||
| import shutil | ||
| import time | ||
| from pathlib import Path |
|
|
||
| import json | ||
| import shutil | ||
| import time |
Check notice
Code scanning / CodeQL
Unused import Note test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
In general, to fix an unused import warning, remove the import statement for any module that is not referenced anywhere in the file. This reduces unnecessary dependencies and makes the code clearer.
For this specific file, the best fix that does not change functionality is to delete the import time line at the top of apps/backend/test_recovery_e2e.py. All other imports appear potentially relevant to the shown code. No additional methods, definitions, or imports are required because time is not used in the provided sections.
Concretely:
- Edit
apps/backend/test_recovery_e2e.py. - Locate line 13:
import time. - Remove that line, leaving the rest of the imports unchanged.
| @@ -10,7 +10,6 @@ | ||
|
|
||
| import json | ||
| import shutil | ||
| import time | ||
| from pathlib import Path | ||
| from typing import Any | ||
|
|
| import shutil | ||
| import time | ||
| from pathlib import Path | ||
| from typing import Any |
Check notice
Code scanning / CodeQL
Unused import Note test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
To fix an unused import, the general approach is to remove the import statement (or the specific unused name from a multi-name import) so that the dependency and associated clutter are eliminated.
In this specific file, apps/backend/test_recovery_e2e.py, the best fix is to delete the from typing import Any line at line 15. No other changes are required, since there are no references to Any visible in the provided code, and removing the import does not alter runtime behavior—typing is only for static typing. Concretely, remove the entire line 15 and leave the rest of the imports (json, shutil, time, Path, and the services.* imports) unchanged.
No additional methods, imports, or definitions are needed to implement this change.
| @@ -12,7 +12,6 @@ | ||
| import shutil | ||
| import time | ||
| from pathlib import Path | ||
| from typing import Any | ||
|
|
||
| from services.dead_letter_queue import DeadLetterQueue | ||
| from services.notification_manager import NotificationManager |
| from services.recovery import ( | ||
| FailureType, | ||
| RecoveryAction, | ||
| RecoveryManager, | ||
| RetryStrategy, | ||
| ) |
Check notice
Code scanning / CodeQL
Unused import Note test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
To fix the problem, remove the unused names from the import statement so that only the symbols actually used in this file are imported. This preserves all existing behavior while cleaning up unnecessary dependencies.
Concretely, in apps/backend/test_recovery_e2e.py, locate the from services.recovery import (...) block around lines 19–24. Currently it imports FailureType, RecoveryAction, RecoveryManager, and RetryStrategy. Since only FailureType and RecoveryManager are used, edit this block to import just those two names. No additional methods, imports, or definitions are required; we only need to adjust the existing import line(s).
| @@ -18,9 +18,7 @@ | ||
| from services.notification_manager import NotificationManager | ||
| from services.recovery import ( | ||
| FailureType, | ||
| RecoveryAction, | ||
| RecoveryManager, | ||
| RetryStrategy, | ||
| ) | ||
|
|
||
| # ============================================================================= |
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/backend/services/recovery.py (1)
836-858: 🧹 Nitpick | 🔵 TrivialConsider using logging instead of
print()for error output.Using
print()for error messages (line 857) is not ideal for production code. Structured logging would provide better observability, allow log level filtering, and integrate with monitoring systems.♻️ Suggested improvement
+import logging + +logger = logging.getLogger(__name__) + ... def rollback_to_commit(self, commit_hash: str) -> bool: ... except subprocess.CalledProcessError as e: - print(f"Error rolling back to {commit_hash}: {e.stderr}") + logger.error("Error rolling back to %s: %s", commit_hash, e.stderr) return False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/backend/services/recovery.py` around lines 836 - 858, Replace the print-based error handling in rollback_to_commit with structured logging: in the except block for subprocess.CalledProcessError, call an appropriate logger (e.g., self.logger.error or a module-level logger) and include both a descriptive message and the error details (e.stderr and/or e) so the failure to reset to commit_hash is recorded at error level; ensure the method still returns False after logging. Use the existing logger on the class if available (self.logger) or create/get a module logger if not.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/backend/agents/coder.py`:
- Around line 326-332: The override_model variable is never reset after being
used as a fallback, so subsequent subtasks can accidentally keep using it; after
you resolve and assign phase_model via resolve_model_id(override_model) and
print the status, clear the override by setting override_model to None (or the
empty value your codebase expects) so the override is single-use; update the
same block where override_model is checked and resolved (the code that calls
resolve_model_id and assigns phase_model) to clear override_model immediately
after selecting the model.
In `@apps/backend/qa/fixer.py`:
- Around line 92-97: The fallback selection currently derives
current_model_shorthand from the original model argument, causing repeated
retries to pick the same fallback; change the logic to track and use an active
model variable (e.g., add/maintain current_model or current_model_shorthand
alongside current_client and override_model) and update it whenever you replace
the client or set override_model so that get_next_fallback (or whatever fallback
lookup uses current_model_shorthand) computes the next fallback from the active
model, not the base model; apply the same change where similar fallback
progression occurs (the other block referenced around the 581-599 area) so each
retry advances through the fallback chain (opus → sonnet → haiku) rather than
restarting from the original model.
- Around line 236-258: The fallback client created when override_model is set
(using resolve_model_id() then create_client(...) and awaited with
current_client.__aenter__()) is not guaranteed to be closed; wrap the iteration
loop that creates/uses current_client in a try/finally so that in the finally
block you check if current_client is not the original client and, if so, await
current_client.__aexit__(None, None, None) to always close the fallback
ClaudeSDKClient; ensure this cleanup logic references the same identifiers
(current_client, client, override_model, create_client, __aenter__, __aexit__)
and does not suppress exceptions other than safe cleanup errors.
In `@apps/backend/qa/recovery_metrics.py`:
- Around line 307-328: In get_most_successful_strategy, best_strategy stays None
when all qualifying strategies have 0.0% because best_rate is initialized to 0.0
and the comparison uses '>'; change the selection logic so a 0% success strategy
can be chosen (e.g., initialize best_rate to -1.0 or use '>=' with a guard for
first assignment) while still enforcing the "min 2 uses" rule (check
stats["total_uses"] >= 2) and comparing stats["success_rate_percent"] to
best_rate before assigning best_strategy and best_rate.
In `@apps/backend/services/dead_letter_queue.py`:
- Around line 306-337: The export_pending_failures function currently only
writes the report when output_path is provided, contradicting the docstring
default of spec_dir/dlq_report.txt; update export_pending_failures to use the
documented default by setting path = Path(output_path) if output_path is given
else Path(self.spec_dir) / "dlq_report.txt" (or similar) and write the report to
that path, ensuring report is still returned; reference the
export_pending_failures method and get_pending_failures for locating the logic
and ensure the docstring matches the implemented behavior.
In `@apps/backend/services/notification_manager.py`:
- Around line 350-371: Validate thresholds in update_thresholds and when loading
persisted settings: in update_thresholds (method name) check that provided
retry_threshold and escalation_threshold are integers >= 1 before assigning
(reject invalid inputs by returning False or raising ValueError instead of
assigning and calling _save_data); also add validation in _load_data to verify
loaded retry_threshold and escalation_threshold are ints >= 1 and if invalid
replace them with safe defaults (e.g., current class defaults or constants) so
corrupted/manual edits to the persisted notifications.json don't propagate and
cause the ZeroDivisionError in should_notify (which uses retry_threshold).
In `@apps/backend/services/recovery.py`:
- Around line 93-123: ERROR_PATTERNS contains overly generic tokens that cause
false positives; update the entries for FailureType.VERIFICATION_FAILED and
FailureType.CONTEXT_EXHAUSTED in ERROR_PATTERNS to use more specific phrases
(e.g., replace "expected" with "expected but got" or "expected value", replace
"timeout" with "timed out" or "timeout exceeded", and replace "context" with
"context limit exceeded" or "context window exhausted"), and remove or tighten
any other ambiguous patterns so matching is done on clear error phrases only;
modify the list literals under ERROR_PATTERNS for FailureType.BROKEN_BUILD,
FailureType.VERIFICATION_FAILED, and FailureType.CONTEXT_EXHAUSTED accordingly.
- Around line 209-240: The docstring for calculate_backoff_delay is inconsistent
with the implementation: it claims "Attempt 0: 1.0s" but the function returns
0.0 for attempt_count <= 0; update the docstring (not the logic) to state that
attempt_count is 0-indexed and that attempt_count == 0 returns 0.0 (no delay),
then show progression starting at attempt_count == 1: BACKOFF_BASE_DELAY (e.g.,
Attempt 1: BACKOFF_BASE_DELAY, Attempt 2: BACKOFF_BASE_DELAY *
BACKOFF_MULTIPLIER, ... capped at BACKOFF_MAX_DELAY) and mention the constants
BACKOFF_BASE_DELAY, BACKOFF_MULTIPLIER, and BACKOFF_MAX_DELAY so callers
understand the behavior.
In `@apps/backend/test_recovery_e2e.py`:
- Around line 139-167: Update the expected_delays in test_exponential_backoff to
match the implementation of calculate_backoff_delay (delay = min(1.0 *
2**attempt, 60.0)); replace the current list with [(0, 0.0), (1, 2.0), (2, 4.0),
(3, 8.0), (4, 16.0), (5, 32.0), (6, 60.0), (10, 60.0)] so the test expectations
align with the exponential backoff calculation used by
RecoveryManager.calculate_backoff_delay.
In `@apps/backend/test_recovery_loop.py`:
- Around line 566-616: run_all_tests currently calls self.print_summary() but
discards its boolean result, causing main() to receive None; update
run_all_tests (the method that iterates tests and calls self.print_summary()) to
return the boolean returned by print_summary() (i.e., capture result =
self.print_summary() and return result) so the CLI's exit code logic receives
the correct pass/fail value; ensure no other early returns break this flow and
keep using self.test_results in print_summary() as-is.
In `@RECOVERY_LOOP_VERIFICATION_SUMMARY.md`:
- Around line 13-37: The markdown headings (e.g., "Phase 1: Smart Failure
Detection ✅", "Phase 2: Alternative Retry Strategies ✅", etc.) lack surrounding
blank lines which triggers MD022; fix by adding a single blank line both above
and below each top-level heading and subheading in
RECOVERY_LOOP_VERIFICATION_SUMMARY.md so every heading is separated from
adjacent paragraphs/lists, ensuring consistent spacing throughout the Phase 1–6
headings and their subtasks.
- Around line 120-125: The lines mentioning agents/coder.py and qa/fixer.py
repeat the adverb "automatically"; edit the two bullet points so each reads
without the duplicated word (e.g., "Coder agents use enhanced recovery via
`agents/coder.py`" and "QA fixer uses enhanced recovery via `qa/fixer.py`"),
leaving the other bullets about
`.auto-claude/specs/XXX/memory/recovery_metrics.json`, `dead_letter_queue.json`,
and `notifications.json` unchanged.
---
Outside diff comments:
In `@apps/backend/services/recovery.py`:
- Around line 836-858: Replace the print-based error handling in
rollback_to_commit with structured logging: in the except block for
subprocess.CalledProcessError, call an appropriate logger (e.g.,
self.logger.error or a module-level logger) and include both a descriptive
message and the error details (e.stderr and/or e) so the failure to reset to
commit_hash is recorded at error level; ensure the method still returns False
after logging. Use the existing logger on the class if available (self.logger)
or create/get a module logger if not.
| # Use override model if set (for model fallback), otherwise use phase model | ||
| if override_model: | ||
| phase_model = resolve_model_id(override_model) | ||
| print_status(f"Using fallback model: {override_model}", "progress") | ||
| else: | ||
| phase_model = get_phase_model(spec_dir, current_phase, model) | ||
|
|
There was a problem hiding this comment.
Clear override_model after applying the fallback.
Once set, override_model is never reset, so a successful retry can leave subsequent unrelated subtasks running on the fallback model. If the override is meant to be single-use, clear it after selecting the phase model.
🛠️ Suggested fix
- if override_model:
- phase_model = resolve_model_id(override_model)
- print_status(f"Using fallback model: {override_model}", "progress")
+ if override_model:
+ phase_model = resolve_model_id(override_model)
+ print_status(f"Using fallback model: {override_model}", "progress")
+ override_model = None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/backend/agents/coder.py` around lines 326 - 332, The override_model
variable is never reset after being used as a fallback, so subsequent subtasks
can accidentally keep using it; after you resolve and assign phase_model via
resolve_model_id(override_model) and print the status, clear the override by
setting override_model to None (or the empty value your codebase expects) so the
override is single-use; update the same block where override_model is checked
and resolved (the code that calls resolve_model_id and assigns phase_model) to
clear override_model immediately after selecting the model.
| # === ENHANCED RECOVERY: Track recovery state === | ||
| pending_recovery_action: RecoveryAction | None = None | ||
| override_model: str | None = None # For model fallback | ||
| recovery_guidance: str | None = None # Strategy guidance for next attempt | ||
| current_client = client # Track current client (may be replaced for fallback) | ||
|
|
There was a problem hiding this comment.
Fallback chain should progress from the current model, not the base model.
current_model_shorthand is derived from the original model argument, so repeated fallback retries can keep selecting the same fallback instead of progressing (e.g., opus → sonnet → haiku). Track the active model and derive the next fallback from that value.
🛠️ Suggested fix
- current_client = client # Track current client (may be replaced for fallback)
+ current_client = client # Track current client (may be replaced for fallback)
+ active_model = model # Track active model for fallback chaining
...
- current_model_shorthand = model
+ current_model_shorthand = active_model
...
- override_model = fallback_chain[0] # Use first fallback
+ override_model = fallback_chain[0] # Use first fallback
+ active_model = override_modelAlso applies to: 581-599
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/backend/qa/fixer.py` around lines 92 - 97, The fallback selection
currently derives current_model_shorthand from the original model argument,
causing repeated retries to pick the same fallback; change the logic to track
and use an active model variable (e.g., add/maintain current_model or
current_model_shorthand alongside current_client and override_model) and update
it whenever you replace the client or set override_model so that
get_next_fallback (or whatever fallback lookup uses current_model_shorthand)
computes the next fallback from the active model, not the base model; apply the
same change where similar fallback progression occurs (the other block
referenced around the 581-599 area) so each retry advances through the fallback
chain (opus → sonnet → haiku) rather than restarting from the original model.
| # Recreate client if model fallback is needed | ||
| if override_model: | ||
| print(f"🔄 Using fallback model: {override_model}") | ||
| fallback_model = resolve_model_id(override_model) | ||
| # Close old client if it's not the original one | ||
| if current_client != client: | ||
| try: | ||
| await current_client.__aexit__(None, None, None) | ||
| except Exception: | ||
| pass # Ignore cleanup errors | ||
| # Create new client with fallback model | ||
| current_client = create_client( | ||
| project_dir, | ||
| spec_dir, | ||
| fallback_model, | ||
| agent_type="qa_fixer", | ||
| max_thinking_tokens=None, | ||
| ) | ||
| # Enter async context for new client | ||
| await current_client.__aenter__() | ||
| # Reset override after creating new client | ||
| override_model = None | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's examine the full context of the fixer.py file
wc -l apps/backend/qa/fixer.pyRepository: OBenner/Auto-Coding
Length of output: 91
🏁 Script executed:
#!/bin/bash
# Get the function containing lines 236-258
sed -n '1,300p' apps/backend/qa/fixer.py | head -300Repository: OBenner/Auto-Coding
Length of output: 11821
🏁 Script executed:
#!/bin/bash
# Check for run_qa_fixer_session callers and create_client usage
rg -n "run_qa_fixer_session" -g '*.py' apps/backendRepository: OBenner/Auto-Coding
Length of output: 721
🏁 Script executed:
#!/bin/bash
# Look for the complete function signature and structure
rg -n "async def run_qa_fixer_session\|def run_qa_fixer_session" apps/backend/qa/fixer.py -A 200 | head -250Repository: OBenner/Auto-Coding
Length of output: 45
🏁 Script executed:
#!/bin/bash
# Get the rest of the function after line 300 to see all return paths
sed -n '200,400p' apps/backend/qa/fixer.pyRepository: OBenner/Auto-Coding
Length of output: 9354
🏁 Script executed:
#!/bin/bash
# Get the final part of the function to see how it ends
tail -300 apps/backend/qa/fixer.py | head -350Repository: OBenner/Auto-Coding
Length of output: 12700
🏁 Script executed:
#!/bin/bash
# Search for all return statements in the function
rg -n "return " apps/backend/qa/fixer.py | grep -E "(line [0-9]+|[0-9]+:)"Repository: OBenner/Auto-Coding
Length of output: 527
Close fallback ClaudeSDKClient instances to avoid leaking connections.
When a fallback model is used (lines 236-258), a new client is created and __aenter__() is called. However, this client is never properly closed if the function returns early—either via successful completion (lines 487, 522) or early exit on error (lines 622, 640). The client only gets cleaned up if another fallback is created on a subsequent iteration. For long-running sessions or multiple fallback attempts, this can leak connections and resources. Add a try/finally block wrapping the iteration loop to ensure any non-original client is always closed on exit.
Suggested fix (structure)
- for fixer_iteration in range(1, MAX_FIXER_ITERATIONS + 1):
+ try:
+ for fixer_iteration in range(1, MAX_FIXER_ITERATIONS + 1):
...
- return "fixed", response_text
+ return "fixed", response_text
+ finally:
+ if current_client is not client:
+ try:
+ await current_client.__aexit__(None, None, None)
+ except Exception:
+ pass🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/backend/qa/fixer.py` around lines 236 - 258, The fallback client created
when override_model is set (using resolve_model_id() then create_client(...) and
awaited with current_client.__aenter__()) is not guaranteed to be closed; wrap
the iteration loop that creates/uses current_client in a try/finally so that in
the finally block you check if current_client is not the original client and, if
so, await current_client.__aexit__(None, None, None) to always close the
fallback ClaudeSDKClient; ensure this cleanup logic references the same
identifiers (current_client, client, override_model, create_client, __aenter__,
__aexit__) and does not suppress exceptions other than safe cleanup errors.
| def get_most_successful_strategy(self) -> tuple[str, float] | None: | ||
| """ | ||
| Get the strategy with the highest success rate. | ||
|
|
||
| Returns: | ||
| Tuple of (strategy_name, success_rate_percent) or None if no strategies used | ||
| """ | ||
| strategy_stats = self.get_strategy_statistics() | ||
|
|
||
| if not strategy_stats: | ||
| return None | ||
|
|
||
| # Find strategy with highest success rate (min 2 uses to avoid statistical noise) | ||
| best_strategy = None | ||
| best_rate = 0.0 | ||
|
|
||
| for strategy, stats in strategy_stats.items(): | ||
| if stats["total_uses"] >= 2 and stats["success_rate_percent"] > best_rate: | ||
| best_strategy = strategy | ||
| best_rate = stats["success_rate_percent"] | ||
|
|
||
| return (best_strategy, best_rate) if best_strategy else None |
There was a problem hiding this comment.
Handle 0% success strategies to avoid returning None.
If all strategies have ≥2 uses but 0% success, best_strategy never gets set because best_rate starts at 0.0 and the comparison uses >. That hides available data in the summary.
🛠️ Suggested fix
- best_rate = 0.0
+ best_rate = -1.0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def get_most_successful_strategy(self) -> tuple[str, float] | None: | |
| """ | |
| Get the strategy with the highest success rate. | |
| Returns: | |
| Tuple of (strategy_name, success_rate_percent) or None if no strategies used | |
| """ | |
| strategy_stats = self.get_strategy_statistics() | |
| if not strategy_stats: | |
| return None | |
| # Find strategy with highest success rate (min 2 uses to avoid statistical noise) | |
| best_strategy = None | |
| best_rate = 0.0 | |
| for strategy, stats in strategy_stats.items(): | |
| if stats["total_uses"] >= 2 and stats["success_rate_percent"] > best_rate: | |
| best_strategy = strategy | |
| best_rate = stats["success_rate_percent"] | |
| return (best_strategy, best_rate) if best_strategy else None | |
| def get_most_successful_strategy(self) -> tuple[str, float] | None: | |
| """ | |
| Get the strategy with the highest success rate. | |
| Returns: | |
| Tuple of (strategy_name, success_rate_percent) or None if no strategies used | |
| """ | |
| strategy_stats = self.get_strategy_statistics() | |
| if not strategy_stats: | |
| return None | |
| # Find strategy with highest success rate (min 2 uses to avoid statistical noise) | |
| best_strategy = None | |
| best_rate = -1.0 | |
| for strategy, stats in strategy_stats.items(): | |
| if stats["total_uses"] >= 2 and stats["success_rate_percent"] > best_rate: | |
| best_strategy = strategy | |
| best_rate = stats["success_rate_percent"] | |
| return (best_strategy, best_rate) if best_strategy else None |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/backend/qa/recovery_metrics.py` around lines 307 - 328, In
get_most_successful_strategy, best_strategy stays None when all qualifying
strategies have 0.0% because best_rate is initialized to 0.0 and the comparison
uses '>'; change the selection logic so a 0% success strategy can be chosen
(e.g., initialize best_rate to -1.0 or use '>=' with a guard for first
assignment) while still enforcing the "min 2 uses" rule (check
stats["total_uses"] >= 2) and comparing stats["success_rate_percent"] to
best_rate before assigning best_strategy and best_rate.
| def export_pending_failures(self, output_path: Path | str | None = None) -> str: | ||
| """ | ||
| Export pending failures to a human-readable format. | ||
|
|
||
| Args: | ||
| output_path: Optional path to save the export (default: spec_dir/dlq_report.txt) | ||
|
|
||
| Returns: | ||
| Formatted report as a string | ||
| """ | ||
| pending = self.get_pending_failures() | ||
| if not pending: | ||
| report = "No pending failures in the dead-letter queue.\n" | ||
| else: | ||
| lines = ["DEAD-LETTER QUEUE - PENDING FAILURES", "=" * 50, ""] | ||
| for failure in pending: | ||
| lines.append(f"ID: {failure['id']}") | ||
| lines.append(f"Subtask: {failure['subtask_id']}") | ||
| lines.append(f"Type: {failure['failure_type']}") | ||
| lines.append(f"Attempts: {failure['attempt_count']}") | ||
| lines.append(f"Timestamp: {failure['timestamp']}") | ||
| lines.append(f"Error: {failure['error_message'][:200]}") | ||
| if failure.get("recovery_action"): | ||
| lines.append(f"Last Action: {failure['recovery_action']}") | ||
| lines.append("-" * 50) | ||
| report = "\n".join(lines) | ||
|
|
||
| # Save to file if path provided | ||
| if output_path: | ||
| path = Path(output_path) | ||
| path.write_text(report, encoding="utf-8") | ||
|
|
There was a problem hiding this comment.
Honor the documented default export path.
The docstring says the default is spec_dir/dlq_report.txt, but when output_path is None nothing is written. Either implement the default or update the docs to avoid surprises.
🛠️ Suggested fix
- # Save to file if path provided
- if output_path:
- path = Path(output_path)
- path.write_text(report, encoding="utf-8")
+ # Save to file if path provided (or default)
+ if output_path is None:
+ output_path = self.spec_dir / "dlq_report.txt"
+ if output_path:
+ path = Path(output_path)
+ path.write_text(report, encoding="utf-8")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def export_pending_failures(self, output_path: Path | str | None = None) -> str: | |
| """ | |
| Export pending failures to a human-readable format. | |
| Args: | |
| output_path: Optional path to save the export (default: spec_dir/dlq_report.txt) | |
| Returns: | |
| Formatted report as a string | |
| """ | |
| pending = self.get_pending_failures() | |
| if not pending: | |
| report = "No pending failures in the dead-letter queue.\n" | |
| else: | |
| lines = ["DEAD-LETTER QUEUE - PENDING FAILURES", "=" * 50, ""] | |
| for failure in pending: | |
| lines.append(f"ID: {failure['id']}") | |
| lines.append(f"Subtask: {failure['subtask_id']}") | |
| lines.append(f"Type: {failure['failure_type']}") | |
| lines.append(f"Attempts: {failure['attempt_count']}") | |
| lines.append(f"Timestamp: {failure['timestamp']}") | |
| lines.append(f"Error: {failure['error_message'][:200]}") | |
| if failure.get("recovery_action"): | |
| lines.append(f"Last Action: {failure['recovery_action']}") | |
| lines.append("-" * 50) | |
| report = "\n".join(lines) | |
| # Save to file if path provided | |
| if output_path: | |
| path = Path(output_path) | |
| path.write_text(report, encoding="utf-8") | |
| def export_pending_failures(self, output_path: Path | str | None = None) -> str: | |
| """ | |
| Export pending failures to a human-readable format. | |
| Args: | |
| output_path: Optional path to save the export (default: spec_dir/dlq_report.txt) | |
| Returns: | |
| Formatted report as a string | |
| """ | |
| pending = self.get_pending_failures() | |
| if not pending: | |
| report = "No pending failures in the dead-letter queue.\n" | |
| else: | |
| lines = ["DEAD-LETTER QUEUE - PENDING FAILURES", "=" * 50, ""] | |
| for failure in pending: | |
| lines.append(f"ID: {failure['id']}") | |
| lines.append(f"Subtask: {failure['subtask_id']}") | |
| lines.append(f"Type: {failure['failure_type']}") | |
| lines.append(f"Attempts: {failure['attempt_count']}") | |
| lines.append(f"Timestamp: {failure['timestamp']}") | |
| lines.append(f"Error: {failure['error_message'][:200]}") | |
| if failure.get("recovery_action"): | |
| lines.append(f"Last Action: {failure['recovery_action']}") | |
| lines.append("-" * 50) | |
| report = "\n".join(lines) | |
| # Save to file if path provided (or default) | |
| if output_path is None: | |
| output_path = self.spec_dir / "dlq_report.txt" | |
| if output_path: | |
| path = Path(output_path) | |
| path.write_text(report, encoding="utf-8") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/backend/services/dead_letter_queue.py` around lines 306 - 337, The
export_pending_failures function currently only writes the report when
output_path is provided, contradicting the docstring default of
spec_dir/dlq_report.txt; update export_pending_failures to use the documented
default by setting path = Path(output_path) if output_path is given else
Path(self.spec_dir) / "dlq_report.txt" (or similar) and write the report to that
path, ensuring report is still returned; reference the export_pending_failures
method and get_pending_failures for locating the logic and ensure the docstring
matches the implemented behavior.
| def calculate_backoff_delay(self, attempt_count: int) -> float: | ||
| """ | ||
| Calculate exponential backoff delay for retry attempts. | ||
|
|
||
| Implements exponential backoff to prevent API rate limiting and | ||
| thundering herd problems. Delay doubles with each attempt, capped | ||
| at BACKOFF_MAX_DELAY. | ||
|
|
||
| Formula: delay = min(base * multiplier^attempt, max_delay) | ||
| Example progression (base=1.0, multiplier=2.0): | ||
| - Attempt 0: 1.0s | ||
| - Attempt 1: 2.0s | ||
| - Attempt 2: 4.0s | ||
| - Attempt 3: 8.0s | ||
| - Attempt 4: 16.0s | ||
| - Attempt 5: 32.0s | ||
| - Attempt 6+: 60.0s (capped) | ||
|
|
||
| Args: | ||
| attempt_count: Number of previous attempts (0-indexed) | ||
|
|
||
| Returns: | ||
| Delay in seconds before next retry | ||
| """ | ||
| if attempt_count <= 0: | ||
| return 0.0 | ||
|
|
||
| # Calculate exponential delay: base * multiplier^attempt | ||
| delay = BACKOFF_BASE_DELAY * (BACKOFF_MULTIPLIER**attempt_count) | ||
|
|
||
| # Cap at maximum delay to prevent excessively long waits | ||
| return min(delay, BACKOFF_MAX_DELAY) |
There was a problem hiding this comment.
Docstring example contradicts the implementation.
The docstring states "Attempt 0: 1.0s" but the code returns 0.0 for attempt_count <= 0. This is confusing for callers.
Either update the docstring to reflect actual behavior, or adjust the implementation if the first attempt should indeed have a 1.0s delay.
📝 Option 1: Fix docstring to match implementation (recommended)
Example progression (base=1.0, multiplier=2.0):
- - Attempt 0: 1.0s
- - Attempt 1: 2.0s
- - Attempt 2: 4.0s
- - Attempt 3: 8.0s
- - Attempt 4: 16.0s
- - Attempt 5: 32.0s
- - Attempt 6+: 60.0s (capped)
+ - Attempt 0: 0.0s (no delay for first attempt)
+ - Attempt 1: 2.0s
+ - Attempt 2: 4.0s
+ - Attempt 3: 8.0s
+ - Attempt 4: 16.0s
+ - Attempt 5: 32.0s
+ - Attempt 6+: 60.0s (capped)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def calculate_backoff_delay(self, attempt_count: int) -> float: | |
| """ | |
| Calculate exponential backoff delay for retry attempts. | |
| Implements exponential backoff to prevent API rate limiting and | |
| thundering herd problems. Delay doubles with each attempt, capped | |
| at BACKOFF_MAX_DELAY. | |
| Formula: delay = min(base * multiplier^attempt, max_delay) | |
| Example progression (base=1.0, multiplier=2.0): | |
| - Attempt 0: 1.0s | |
| - Attempt 1: 2.0s | |
| - Attempt 2: 4.0s | |
| - Attempt 3: 8.0s | |
| - Attempt 4: 16.0s | |
| - Attempt 5: 32.0s | |
| - Attempt 6+: 60.0s (capped) | |
| Args: | |
| attempt_count: Number of previous attempts (0-indexed) | |
| Returns: | |
| Delay in seconds before next retry | |
| """ | |
| if attempt_count <= 0: | |
| return 0.0 | |
| # Calculate exponential delay: base * multiplier^attempt | |
| delay = BACKOFF_BASE_DELAY * (BACKOFF_MULTIPLIER**attempt_count) | |
| # Cap at maximum delay to prevent excessively long waits | |
| return min(delay, BACKOFF_MAX_DELAY) | |
| def calculate_backoff_delay(self, attempt_count: int) -> float: | |
| """ | |
| Calculate exponential backoff delay for retry attempts. | |
| Implements exponential backoff to prevent API rate limiting and | |
| thundering herd problems. Delay doubles with each attempt, capped | |
| at BACKOFF_MAX_DELAY. | |
| Formula: delay = min(base * multiplier^attempt, max_delay) | |
| Example progression (base=1.0, multiplier=2.0): | |
| - Attempt 0: 0.0s (no delay for first attempt) | |
| - Attempt 1: 2.0s | |
| - Attempt 2: 4.0s | |
| - Attempt 3: 8.0s | |
| - Attempt 4: 16.0s | |
| - Attempt 5: 32.0s | |
| - Attempt 6+: 60.0s (capped) | |
| Args: | |
| attempt_count: Number of previous attempts (0-indexed) | |
| Returns: | |
| Delay in seconds before next retry | |
| """ | |
| if attempt_count <= 0: | |
| return 0.0 | |
| # Calculate exponential delay: base * multiplier^attempt | |
| delay = BACKOFF_BASE_DELAY * (BACKOFF_MULTIPLIER**attempt_count) | |
| # Cap at maximum delay to prevent excessively long waits | |
| return min(delay, BACKOFF_MAX_DELAY) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/backend/services/recovery.py` around lines 209 - 240, The docstring for
calculate_backoff_delay is inconsistent with the implementation: it claims
"Attempt 0: 1.0s" but the function returns 0.0 for attempt_count <= 0; update
the docstring (not the logic) to state that attempt_count is 0-indexed and that
attempt_count == 0 returns 0.0 (no delay), then show progression starting at
attempt_count == 1: BACKOFF_BASE_DELAY (e.g., Attempt 1: BACKOFF_BASE_DELAY,
Attempt 2: BACKOFF_BASE_DELAY * BACKOFF_MULTIPLIER, ... capped at
BACKOFF_MAX_DELAY) and mention the constants BACKOFF_BASE_DELAY,
BACKOFF_MULTIPLIER, and BACKOFF_MAX_DELAY so callers understand the behavior.
| def test_exponential_backoff(manager: RecoveryManager, results: TestResults): | ||
| """Test 2: Verify exponential backoff calculations.""" | ||
| print("\n--- Test 2: Exponential Backoff ---") | ||
|
|
||
| expected_delays = [ | ||
| (0, 0.0), # First attempt - no delay | ||
| (1, 1.0), # 1.0 * 2^1 = 2.0, but wait starts after first failure | ||
| (2, 2.0), # 1.0 * 2^2 = 4.0 | ||
| (3, 4.0), # 1.0 * 2^3 = 8.0 | ||
| (4, 8.0), # 1.0 * 2^4 = 16.0 | ||
| (5, 16.0), # 1.0 * 2^5 = 32.0 | ||
| (6, 32.0), # 1.0 * 2^6 = 64.0, capped at 60.0 | ||
| (10, 60.0), # Capped at max | ||
| ] | ||
|
|
||
| for attempt, expected_delay in expected_delays: | ||
| actual_delay = manager.calculate_backoff_delay(attempt) | ||
| # Allow for small floating point differences | ||
| if abs(actual_delay - expected_delay) < 0.01: | ||
| results.add_pass( | ||
| f"Backoff delay for attempt {attempt}", | ||
| f"{actual_delay:.1f}s (expected {expected_delay:.1f}s)", | ||
| ) | ||
| else: | ||
| results.add_fail( | ||
| f"Backoff delay for attempt {attempt}", | ||
| f"Expected {expected_delay:.1f}s, got {actual_delay:.1f}s", | ||
| ) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find apps/backend -name "*.py" -type f | grep -E "(recovery|backoff)" | head -20Repository: OBenner/Auto-Coding
Length of output: 369
🏁 Script executed:
cat -n apps/backend/recovery.py | head -100Repository: OBenner/Auto-Coding
Length of output: 530
🏁 Script executed:
cat -n apps/backend/services/recovery.py | head -150Repository: OBenner/Auto-Coding
Length of output: 6068
🏁 Script executed:
rg -n "def calculate_backoff_delay" apps/backend/services/recovery.py -A 10Repository: OBenner/Auto-Coding
Length of output: 581
🏁 Script executed:
rg -n "def calculate_backoff_delay" apps/backend/services/recovery.py -A 20Repository: OBenner/Auto-Coding
Length of output: 897
🏁 Script executed:
rg -n "def calculate_backoff_delay" apps/backend/services/recovery.py -A 25Repository: OBenner/Auto-Coding
Length of output: 1047
🏁 Script executed:
rg -n "def calculate_backoff_delay" apps/backend/services/recovery.py -A 30Repository: OBenner/Auto-Coding
Length of output: 1272
🏁 Script executed:
rg -n "def calculate_backoff_delay" apps/backend/services/recovery.py -A 35Repository: OBenner/Auto-Coding
Length of output: 1477
🏁 Script executed:
cat -n apps/backend/test_recovery_loop.py | grep -A 30 "backoff\|BACKOFF" | head -60Repository: OBenner/Auto-Coding
Length of output: 2264
🏁 Script executed:
rg -n "expected_delays = " apps/backend/test_recovery_loop.py -A 10Repository: OBenner/Auto-Coding
Length of output: 483
Fix expected backoff delay values to match the exponential formula.
The expected_delays values are incorrect due to an off-by-one pattern. The implementation calculates delay = min(1.0 * 2^attempt, 60.0), which produces: attempt 0→0.0s, 1→2.0s, 2→4.0s, 3→8.0s, 4→16.0s, 5→32.0s, 6→60.0s. However, your test expects: 1→1.0s, 2→2.0s, 3→4.0s, etc. This test will fail when executed. Update expected_delays to [(0, 0.0), (1, 2.0), (2, 4.0), (3, 8.0), (4, 16.0), (5, 32.0), (6, 60.0), (10, 60.0)] to match test_recovery_loop.py and the actual calculate_backoff_delay() implementation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/backend/test_recovery_e2e.py` around lines 139 - 167, Update the
expected_delays in test_exponential_backoff to match the implementation of
calculate_backoff_delay (delay = min(1.0 * 2**attempt, 60.0)); replace the
current list with [(0, 0.0), (1, 2.0), (2, 4.0), (3, 8.0), (4, 16.0), (5, 32.0),
(6, 60.0), (10, 60.0)] so the test expectations align with the exponential
backoff calculation used by RecoveryManager.calculate_backoff_delay.
| def run_all_tests(self): | ||
| """Run all recovery loop tests.""" | ||
| print("\n" + "=" * 70) | ||
| print("RUNNING RECOVERY LOOP TESTS") | ||
| print("=" * 70) | ||
|
|
||
| tests = [ | ||
| self.test_1_failure_classification, | ||
| self.test_2_exponential_backoff, | ||
| self.test_3_retry_strategy_selection, | ||
| self.test_4_recovery_action_determination, | ||
| self.test_5_dead_letter_queue, | ||
| self.test_6_notification_thresholds, | ||
| self.test_7_full_recovery_loop_simulation, | ||
| self.test_8_unknown_failure_escalation, | ||
| self.test_9_is_recoverable_method, | ||
| ] | ||
|
|
||
| for test in tests: | ||
| try: | ||
| test() | ||
| except Exception as e: | ||
| print(f"\n✗ Test failed with exception: {e}") | ||
| import traceback | ||
|
|
||
| traceback.print_exc() | ||
|
|
||
| self.print_summary() | ||
|
|
||
| def print_summary(self): | ||
| """Print test summary.""" | ||
| print("\n" + "=" * 70) | ||
| print("TEST SUMMARY") | ||
| print("=" * 70) | ||
|
|
||
| passed = sum(1 for r in self.test_results if r["passed"]) | ||
| total = len(self.test_results) | ||
| pass_rate = (passed / total * 100) if total > 0 else 0 | ||
|
|
||
| print(f"\nTotal Tests: {total}") | ||
| print(f"Passed: {passed}") | ||
| print(f"Failed: {total - passed}") | ||
| print(f"Pass Rate: {pass_rate:.1f}%") | ||
|
|
||
| if passed == total: | ||
| print("\n✓ ALL TESTS PASSED - Recovery loop is working correctly!") | ||
| return True | ||
| else: | ||
| print("\n✗ SOME TESTS FAILED - Review failures above") | ||
| return False | ||
|
|
There was a problem hiding this comment.
run_all_tests never returns a boolean, so the CLI always exits 1.
main() uses the return value to determine the exit code, but run_all_tests() returns None unconditionally, making the script fail even when tests pass.
🛠️ Suggested fix
- self.print_summary()
+ return self.print_summary()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/backend/test_recovery_loop.py` around lines 566 - 616, run_all_tests
currently calls self.print_summary() but discards its boolean result, causing
main() to receive None; update run_all_tests (the method that iterates tests and
calls self.print_summary()) to return the boolean returned by print_summary()
(i.e., capture result = self.print_summary() and return result) so the CLI's
exit code logic receives the correct pass/fail value; ensure no other early
returns break this flow and keep using self.test_results in print_summary()
as-is.
| ### Phase 1: Smart Failure Detection ✅ | ||
| - **subtask-1-1:** Added error pattern database to classify_failure (5 failure types) | ||
| - **subtask-1-2:** Added is_recoverable() method to FailureType enum | ||
|
|
||
| ### Phase 2: Alternative Retry Strategies ✅ | ||
| - **subtask-2-1:** Added exponential backoff to RecoveryManager (1s→60s, 2x multiplier) | ||
| - **subtask-2-2:** Integrated model_fallback into determine_recovery_action | ||
| - **subtask-2-3:** Added alternative strategy selection logic (direct → fallback → alternative) | ||
|
|
||
| ### Phase 3: Dead-Letter Queue ✅ | ||
| - **subtask-3-1:** Created DeadLetterQueue class (339 lines) | ||
| - **subtask-3-2:** Integrated DLQ with RecoveryManager | ||
|
|
||
| ### Phase 4: Notification Thresholds ✅ | ||
| - **subtask-4-1:** Created NotificationManager class | ||
| - **subtask-4-2:** Integrated notifications with RecoveryManager | ||
|
|
||
| ### Phase 5: Recovery Statistics ✅ | ||
| - **subtask-5-1:** Added strategy tracking to RecoveryMetrics | ||
|
|
||
| ### Phase 6: Integration ✅ | ||
| - **subtask-6-1:** Update coder agent to use enhanced recovery | ||
| - **subtask-6-2:** Update qa_fixer to use enhanced recovery | ||
| - **subtask-6-3:** End-to-end verification of recovery loop ⭐ **THIS SUBTASK** | ||
|
|
There was a problem hiding this comment.
Add blank lines around headings to satisfy MD022.
Markdownlint flags several headings missing surrounding blank lines; fixing these improves consistency and lint cleanliness.
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 13-13: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 17-17: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 22-22: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 26-26: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 30-30: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 33-33: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@RECOVERY_LOOP_VERIFICATION_SUMMARY.md` around lines 13 - 37, The markdown
headings (e.g., "Phase 1: Smart Failure Detection ✅", "Phase 2: Alternative
Retry Strategies ✅", etc.) lack surrounding blank lines which triggers MD022;
fix by adding a single blank line both above and below each top-level heading
and subheading in RECOVERY_LOOP_VERIFICATION_SUMMARY.md so every heading is
separated from adjacent paragraphs/lists, ensuring consistent spacing throughout
the Phase 1–6 headings and their subtasks.
| To use the enhanced recovery: | ||
| 1. Coder agents automatically use enhanced recovery via `agents/coder.py` | ||
| 2. QA fixer automatically uses enhanced recovery via `qa/fixer.py` | ||
| 3. Recovery statistics tracked in `.auto-claude/specs/XXX/memory/recovery_metrics.json` | ||
| 4. Dead-letter queue available in `.auto-claude/specs/XXX/memory/dead_letter_queue.json` | ||
| 5. Notification history in `.auto-claude/specs/XXX/memory/notifications.json` |
There was a problem hiding this comment.
Remove the duplicated “automatically”.
LanguageTool flagged the repeated adverb; simplifying the phrasing reads better.
🛠️ Suggested edit
-1. Coder agents automatically use enhanced recovery via `agents/coder.py`
-2. QA fixer automatically uses enhanced recovery via `qa/fixer.py`
+1. Coder agents use enhanced recovery via `agents/coder.py`
+2. QA fixer uses enhanced recovery via `qa/fixer.py`🧰 Tools
🪛 LanguageTool
[style] ~122-~122: This adverb was used twice in the sentence. Consider removing one of them or replacing them with a synonym.
Context: ...overy via agents/coder.py 2. QA fixer automatically uses enhanced recovery via `qa/fixer.py...
(ADVERB_REPETITION_PREMIUM)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@RECOVERY_LOOP_VERIFICATION_SUMMARY.md` around lines 120 - 125, The lines
mentioning agents/coder.py and qa/fixer.py repeat the adverb "automatically";
edit the two bullet points so each reads without the duplicated word (e.g.,
"Coder agents use enhanced recovery via `agents/coder.py`" and "QA fixer uses
enhanced recovery via `qa/fixer.py`"), leaving the other bullets about
`.auto-claude/specs/XXX/memory/recovery_metrics.json`, `dead_letter_queue.json`,
and `notifications.json` unchanged.
The enhanced recovery system now returns reason strings with the format "failure_type: Strategy description (attempt N/M)" instead of the old "retry with different approach" format. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|



Enhance the existing auto-recovery mechanism with smarter failure detection, automated rollback, retry with alternative strategies, and learning from failures. Includes exponential backoff, dead-letter queue for unrecoverable failures, and user notification thresholds.
Summary by CodeRabbit
Release Notes
New Features
Tests