Skip to content

Preserve in-progress message history when agent run errors#517

Merged
jahooma merged 2 commits intomainfrom
jahooma/preserve-error-history
Apr 19, 2026
Merged

Preserve in-progress message history when agent run errors#517
jahooma merged 2 commits intomainfrom
jahooma/preserve-error-history

Conversation

@jahooma
Copy link
Copy Markdown
Contributor

@jahooma jahooma commented Apr 19, 2026

Summary

  • Fixes bug where assistant messages, tool calls, and tool results produced during a run were lost when the server returned an error (e.g. 503 timeout).
  • loopAgentSteps now mutates the caller's shared AgentState instead of a shallow copy, so in-progress work propagates back to the SDK's sessionState.mainAgentState even when an error is thrown up (e.g. 402 or pre-loop setup errors).
  • SDK's error-path getCancelledSessionState now detects runtime progress and avoids duplicating the user prompt when the runtime already added it.
  • Adds reproducing tests in sdk/src/__tests__/run-error-preserves-history.test.ts.

Test plan

  • bun test sdk/src/__tests__/ — 354 pass, 0 fail (incl. new tests)
  • bun test packages/agent-runtime/src/__tests__/main-prompt.test.ts — 6 pass (fixed latent test bug exposed by new mutation semantics)

🤖 Generated with Claude Code

Previously, when the server returned an error (e.g. 503 timeout) mid-run,
any assistant messages, tool calls, and tool results produced since the
user's last prompt were lost. loopAgentSteps created a shallow copy of the
initial agent state, so mutations didn't propagate back to the SDK's
shared sessionState reference if an error threw up.

Now loopAgentSteps mutates the caller's state directly, and the SDK's
cancellation path detects runtime progress so the user prompt isn't
duplicated.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 19, 2026

Greptile Summary

This PR fixes a data-loss bug where assistant messages, tool calls, and tool results produced during an agent run were silently discarded whenever the server returned an error (e.g., a 503 timeout or a 402 payment-required). Two coordinated changes achieve the fix:

  • loopAgentSteps now mutates the caller's shared AgentState in-place (via direct property assignment and Object.assign) instead of working on a shallow copy. This ensures that any LLM responses or tool-call records written during the run are visible to the SDK the moment they are produced — even if an exception propagates out before the function returns normally.
  • getCancelledSessionState in the SDK's run.ts gains a runtimeMadeProgress guard that compares the current history length against a pre-call baseline. When the runtime added messages (i.e., progress was made), the user's original prompt is already in the history and must not be re-appended; when no progress was made, the old behaviour of adding the prompt is preserved.

Key observations:

  • The mutation semantics change is an intentional, clearly-commented API contract shift for loopAgentSteps. A latent test in main-prompt.test.ts that compared against sessionState.mainAgentState.stepsRemaining after the call (which is now mutated in-place) was correctly updated to capture the baseline before the call.
  • The runtimeMadeProgress check relies on history length strictly growing during a run. In the unlikely event that context pruning shrinks the history below its baseline while the run is still active, the check could return a false negative and duplicate the user prompt.
  • All currentAgentState.messageHistory = [...] assignments within the loop now propagate immediately to the shared caller state due to currentAgentState === initialAgentState. This is consistent with the PR's intent but is an implicit contract that future contributors should be aware of.

Confidence Score: 4/5

Safe to merge; both bugs and their root causes are addressed, tests pass, and issues found are non-blocking P2 style/edge-case concerns.

The fix is well-targeted and the logic holds up under scrutiny. The mutation semantics change is intentional and documented. The two P2 comments (shared-reference aliasing awareness and the history-length heuristic edge case) are informational notes rather than blocking bugs. Test coverage for the new scenario is solid. Score is 4 rather than 5 because the history-length heuristic is a subtle implicit contract that could silently fail under aggressive context-pruning combined with an error mid-run.

packages/agent-runtime/src/run-agent-step.ts — the new mutation semantics affect all callers of loopAgentSteps (main agent and subagents); worth confirming no other test suites rely on the old immutable-copy behaviour.

Important Files Changed

Filename Overview
packages/agent-runtime/src/run-agent-step.ts Changed loopAgentSteps to mutate initialAgentState in-place (via direct assignment and Object.assign) instead of a shallow copy, so in-progress work is visible to callers even when an error is thrown mid-run.
sdk/src/run.ts Added initialHistoryLength baseline capture and updated getCancelledSessionState to detect runtime progress via history-length comparison, avoiding duplicate user-prompt insertion when the runtime already added it before throwing.
sdk/src/tests/run-error-preserves-history.test.ts New 313-line test file reproducing the bug: simulates a 503-mid-run error via mock mutations of shared state, then verifies the preserved tool-call/result history is passed to a follow-up run.
packages/agent-runtime/src/tests/main-prompt.test.ts Fixed a latent test assertion that compared against sessionState.mainAgentState.stepsRemaining after the call; now captures initialStepsRemaining before the call to avoid the off-by-one caused by the new mutation semantics.

Sequence Diagram

sequenceDiagram
    participant SDK as sdk/run.ts (runOnce)
    participant RT as agent-runtime (loopAgentSteps)
    participant LLM as LLM / Tool Calls

    SDK->>SDK: capture initialHistoryLength
    SDK->>RT: callMainPrompt(sessionState)
    Note over RT: initialAgentState.messageHistory = initialMessages<br/>(user msg now in shared state)
    RT->>LLM: runAgentStep (step 1)
    LLM-->>RT: agentState + messages
    Note over RT: Object.assign(initialAgentState, newAgentState)<br/>(tool calls/results now in shared state)
    RT->>LLM: runAgentStep (step N...)
    LLM--xRT: 503 / timeout error thrown
    Note over RT: Error propagates – shared state already mutated
    RT--xSDK: throws error
    SDK->>SDK: getCancelledSessionState()
    SDK->>SDK: runtimeMadeProgress? history.length > initialHistoryLength → true
    Note over SDK: Skip re-adding user prompt<br/>Preserve in-progress tool calls/results
    SDK-->>Caller: RunResult { output: error, sessionState: preserved }
Loading

Comments Outside Diff (1)

  1. packages/agent-runtime/src/run-agent-step.ts, line 941-947 (link)

    P2 currentAgentState.messageHistory mutation escapes via shared reference

    After the mutation semantics change, currentAgentState === initialAgentState for the entire loop lifetime. The line:

    currentAgentState.messageHistory = [
      ...currentAgentState.messageHistory,
      userMessage({ ... }),
    ]

    now also mutates initialAgentState.messageHistory, which is the caller's shared sessionState.mainAgentState.messageHistory. This is consistent with the PR's intent, so it won't cause a bug here. However, this is a subtle invariant: every intermediate array assignment to currentAgentState.messageHistory (including the expireMessages call after the loop) now propagates directly to the caller's shared state object rather than remaining local until the function returns.

    This is fine as long as all callers expect the passed-in agentState to be mutated throughout the run, but it's worth documenting more explicitly in the function signature/JSDoc to prevent future confusion when currentAgentState.messageHistory = ... is used inside the loop.

Reviews (1): Last reviewed commit: "Preserve in-progress message history on ..." | Re-trigger Greptile

Comment thread sdk/src/run.ts Outdated
Comment on lines +298 to +299
const runtimeMadeProgress =
state.mainAgentState.messageHistory.length > initialHistoryLength
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 History-length heuristic can yield false negatives with context pruning

runtimeMadeProgress is determined solely by comparing the current history length against initialHistoryLength. If a run starts, the runtime mutates the history to add messages, but then a context-pruning pass removes enough messages to bring the length back at or below initialHistoryLength before the error is thrown, runtimeMadeProgress will be false. The user's prompt would then be inserted again by getCancelledSessionState, potentially duplicating it.

This is an unlikely edge case (pruning typically keeps recent messages including the user prompt), and the alternative of tracking whether state mutation occurred would add complexity. It's worth noting as a known limitation of the length-based check, perhaps with an inline comment:

Suggested change
const runtimeMadeProgress =
state.mainAgentState.messageHistory.length > initialHistoryLength
const runtimeMadeProgress =
state.mainAgentState.messageHistory.length > initialHistoryLength
// Note: relies on history only growing during a run. If context pruning
// shortens history below the baseline length, this may be a false negative.

…tract

Addresses review feedback:
- Switch SDK's runtimeMadeProgress check from history-length comparison to
  array-reference inequality. This is robust to context pruning shrinking
  history below its starting length mid-run, which would cause a false
  negative and duplicate the user prompt.
- Document loopAgentSteps' mutation contract: it mutates params.agentState
  in place throughout the run so callers see in-progress work even when
  an error throws before a normal return.
@jahooma jahooma merged commit 5c8a0a3 into main Apr 19, 2026
29 of 34 checks passed
@jahooma jahooma deleted the jahooma/preserve-error-history branch April 19, 2026 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant