fix: recover agent state on timeout instead of losing progress#162
Merged
LeeCampbell merged 1 commit intoHdrHistogram:mainfrom Mar 23, 2026
Merged
Conversation
When Claude timed out (exit 124), `set -euo pipefail` in agent-loop.sh prevented sync_state from running, so all work from that iteration was lost. Additionally, entrypoint.sh treated timeout as fatal and broke the loop, preventing any retry. Now run_claude captures the timeout exit code so sync_state always runs to preserve progress, and entrypoint.sh continues the loop on timeout so the next iteration can pick up where the previous one left off. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
run_claude()now captures timeout exit codes instead of lettingset -euo pipefailkill the script, ensuringsync_statealways runs to commit and push progressContext
When running
./scripts/run 141, the agent completed all implementation work for issue #141 but timed out right as it was about to mark tasks complete and move plan files todone/. Two compounding bugs prevented recovery:set -euo pipefailcausedagent-loop.shto exit beforesync_statecould commit the workentrypoint.shtreated exit code 124 (timeout) as fatal and broke the loopThe agent's work was lost and no PR was created despite all code changes being complete.
Test plan
./scripts/run <issue>and verify that if Claude times out mid-iteration, the work is committed and pushed before the next iteration starts🤖 Generated with Claude Code