Preserve in-progress message history when agent run errors by jahooma · Pull Request #517 · CodebuffAI/codebuff

jahooma · 2026-04-19T22:58:28Z

Summary

Fixes bug where assistant messages, tool calls, and tool results produced during a run were lost when the server returned an error (e.g. 503 timeout).
loopAgentSteps now mutates the caller's shared AgentState instead of a shallow copy, so in-progress work propagates back to the SDK's sessionState.mainAgentState even when an error is thrown up (e.g. 402 or pre-loop setup errors).
SDK's error-path getCancelledSessionState now detects runtime progress and avoids duplicating the user prompt when the runtime already added it.
Adds reproducing tests in sdk/src/__tests__/run-error-preserves-history.test.ts.

Test plan

bun test sdk/src/__tests__/ — 354 pass, 0 fail (incl. new tests)
bun test packages/agent-runtime/src/__tests__/main-prompt.test.ts — 6 pass (fixed latent test bug exposed by new mutation semantics)

🤖 Generated with Claude Code

Previously, when the server returned an error (e.g. 503 timeout) mid-run, any assistant messages, tool calls, and tool results produced since the user's last prompt were lost. loopAgentSteps created a shallow copy of the initial agent state, so mutations didn't propagate back to the SDK's shared sessionState reference if an error threw up. Now loopAgentSteps mutates the caller's state directly, and the SDK's cancellation path detects runtime progress so the user prompt isn't duplicated.

greptile-apps · 2026-04-19T23:03:05Z

Greptile Summary

This PR fixes a data-loss bug where assistant messages, tool calls, and tool results produced during an agent run were silently discarded whenever the server returned an error (e.g., a 503 timeout or a 402 payment-required). Two coordinated changes achieve the fix:

loopAgentSteps now mutates the caller's shared AgentState in-place (via direct property assignment and Object.assign) instead of working on a shallow copy. This ensures that any LLM responses or tool-call records written during the run are visible to the SDK the moment they are produced — even if an exception propagates out before the function returns normally.
getCancelledSessionState in the SDK's run.ts gains a runtimeMadeProgress guard that compares the current history length against a pre-call baseline. When the runtime added messages (i.e., progress was made), the user's original prompt is already in the history and must not be re-appended; when no progress was made, the old behaviour of adding the prompt is preserved.

Key observations:

The mutation semantics change is an intentional, clearly-commented API contract shift for loopAgentSteps. A latent test in main-prompt.test.ts that compared against sessionState.mainAgentState.stepsRemaining after the call (which is now mutated in-place) was correctly updated to capture the baseline before the call.
The runtimeMadeProgress check relies on history length strictly growing during a run. In the unlikely event that context pruning shrinks the history below its baseline while the run is still active, the check could return a false negative and duplicate the user prompt.
All currentAgentState.messageHistory = [...] assignments within the loop now propagate immediately to the shared caller state due to currentAgentState === initialAgentState. This is consistent with the PR's intent but is an implicit contract that future contributors should be aware of.

Confidence Score: 4/5

Safe to merge; both bugs and their root causes are addressed, tests pass, and issues found are non-blocking P2 style/edge-case concerns.

The fix is well-targeted and the logic holds up under scrutiny. The mutation semantics change is intentional and documented. The two P2 comments (shared-reference aliasing awareness and the history-length heuristic edge case) are informational notes rather than blocking bugs. Test coverage for the new scenario is solid. Score is 4 rather than 5 because the history-length heuristic is a subtle implicit contract that could silently fail under aggressive context-pruning combined with an error mid-run.

packages/agent-runtime/src/run-agent-step.ts — the new mutation semantics affect all callers of loopAgentSteps (main agent and subagents); worth confirming no other test suites rely on the old immutable-copy behaviour.

Important Files Changed

Filename	Overview
packages/agent-runtime/src/run-agent-step.ts	Changed `loopAgentSteps` to mutate `initialAgentState` in-place (via direct assignment and `Object.assign`) instead of a shallow copy, so in-progress work is visible to callers even when an error is thrown mid-run.
sdk/src/run.ts	Added `initialHistoryLength` baseline capture and updated `getCancelledSessionState` to detect runtime progress via history-length comparison, avoiding duplicate user-prompt insertion when the runtime already added it before throwing.
sdk/src/tests/run-error-preserves-history.test.ts	New 313-line test file reproducing the bug: simulates a 503-mid-run error via mock mutations of shared state, then verifies the preserved tool-call/result history is passed to a follow-up run.
packages/agent-runtime/src/tests/main-prompt.test.ts	Fixed a latent test assertion that compared against `sessionState.mainAgentState.stepsRemaining` after the call; now captures `initialStepsRemaining` before the call to avoid the off-by-one caused by the new mutation semantics.

Sequence Diagram

sequenceDiagram
    participant SDK as sdk/run.ts (runOnce)
    participant RT as agent-runtime (loopAgentSteps)
    participant LLM as LLM / Tool Calls

    SDK->>SDK: capture initialHistoryLength
    SDK->>RT: callMainPrompt(sessionState)
    Note over RT: initialAgentState.messageHistory = initialMessages<br/>(user msg now in shared state)
    RT->>LLM: runAgentStep (step 1)
    LLM-->>RT: agentState + messages
    Note over RT: Object.assign(initialAgentState, newAgentState)<br/>(tool calls/results now in shared state)
    RT->>LLM: runAgentStep (step N...)
    LLM--xRT: 503 / timeout error thrown
    Note over RT: Error propagates – shared state already mutated
    RT--xSDK: throws error
    SDK->>SDK: getCancelledSessionState()
    SDK->>SDK: runtimeMadeProgress? history.length > initialHistoryLength → true
    Note over SDK: Skip re-adding user prompt<br/>Preserve in-progress tool calls/results
    SDK-->>Caller: RunResult { output: error, sessionState: preserved }

Comments Outside Diff (1)

packages/agent-runtime/src/run-agent-step.ts, line 941-947 (link)

currentAgentState.messageHistory mutation escapes via shared reference

After the mutation semantics change, currentAgentState === initialAgentState for the entire loop lifetime. The line:
```
currentAgentState.messageHistory = [
  ...currentAgentState.messageHistory,
  userMessage({ ... }),
]
```
now also mutates initialAgentState.messageHistory, which is the caller's shared sessionState.mainAgentState.messageHistory. This is consistent with the PR's intent, so it won't cause a bug here. However, this is a subtle invariant: every intermediate array assignment to currentAgentState.messageHistory (including the expireMessages call after the loop) now propagates directly to the caller's shared state object rather than remaining local until the function returns.

This is fine as long as all callers expect the passed-in agentState to be mutated throughout the run, but it's worth documenting more explicitly in the function signature/JSDoc to prevent future confusion when currentAgentState.messageHistory = ... is used inside the loop.

_{Reviews (1): Last reviewed commit: "Preserve in-progress message history on ..." | Re-trigger Greptile}

greptile-apps · 2026-04-19T23:03:12Z

+    const runtimeMadeProgress =
+      state.mainAgentState.messageHistory.length > initialHistoryLength


History-length heuristic can yield false negatives with context pruning

runtimeMadeProgress is determined solely by comparing the current history length against initialHistoryLength. If a run starts, the runtime mutates the history to add messages, but then a context-pruning pass removes enough messages to bring the length back at or below initialHistoryLength before the error is thrown, runtimeMadeProgress will be false. The user's prompt would then be inserted again by getCancelledSessionState, potentially duplicating it.

This is an unlikely edge case (pruning typically keeps recent messages including the user prompt), and the alternative of tracking whether state mutation occurred would add complexity. It's worth noting as a known limitation of the length-based check, perhaps with an inline comment:

Suggested change

const runtimeMadeProgress =

state.mainAgentState.messageHistory.length > initialHistoryLength

const runtimeMadeProgress =

state.mainAgentState.messageHistory.length > initialHistoryLength

// Note: relies on history only growing during a run. If context pruning

// shortens history below the baseline length, this may be a false negative.

…tract Addresses review feedback: - Switch SDK's runtimeMadeProgress check from history-length comparison to array-reference inequality. This is robust to context pruning shrinking history below its starting length mid-run, which would cause a false negative and duplicate the user prompt. - Document loopAgentSteps' mutation contract: it mutates params.agentState in place throughout the run so callers see in-progress work even when an error throws before a normal return.

jahooma requested review from brandonkachen and charleslien as code owners April 19, 2026 22:58

greptile-apps bot reviewed Apr 19, 2026

View reviewed changes

jahooma merged commit 5c8a0a3 into main Apr 19, 2026
29 of 34 checks passed

jahooma deleted the jahooma/preserve-error-history branch April 19, 2026 23:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve in-progress message history when agent run errors#517

Preserve in-progress message history when agent run errors#517
jahooma merged 2 commits intomainfrom
jahooma/preserve-error-history

jahooma commented Apr 19, 2026

Uh oh!

greptile-apps bot commented Apr 19, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps bot Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		const runtimeMadeProgress =
		state.mainAgentState.messageHistory.length > initialHistoryLength

Conversation

jahooma commented Apr 19, 2026

Summary

Test plan

Uh oh!

greptile-apps bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

greptile-apps bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps bot commented Apr 19, 2026 •

edited

Loading