fix(think,ai-chat): preserve settled work when a recovery turn is given up (#1631)#1634
Merged
Merged
Conversation
🦋 Changeset detectedLatest commit: f4bfb14 The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
This was referenced May 31, 2026
Merged
agents
@cloudflare/ai-chat
@cloudflare/codemode
hono-agents
@cloudflare/shell
@cloudflare/think
@cloudflare/voice
@cloudflare/worker-bundler
commit: |
1893c2a to
a4f8698
Compare
…en up (#1631) Two paths could throw away a partial assistant message holding completed, often non-idempotent tool results — the most valuable, least-reproducible state in a turn: 1. Framework budget exhaustion sealed the turn (terminal status + banner) BEFORE the orphaned stream was persisted, so settled tool results were discarded and re-run on the next message. Exhaustion now persists the settled partial first, reusing the normal path's gating so it can't duplicate an already-saved partial. (This now also covers N1/#1638's wall-clock and no-progress exhaustion paths, not just the attempt cap.) 2. A subclass onChatRecovery returning { persist: false } silently dropped the settled partial. Settled work is now NEVER dropped: { persist: false } only suppresses persistence of a partial that has nothing settled to lose; a partial carrying settled tool results is persisted regardless. An app can no longer accidentally discard completed work, and never needs { persist: true } just to stay safe. A safe default beats a warning about an unsafe one (R1). Adds _shouldPersistOrphanedPartial / _partialHasSettledToolResults helpers. Applied identically to @cloudflare/think and @cloudflare/ai-chat. Tests: - Unit (think + ai-chat): exhaustion preserves a settled TOOL RESULT (not just text); { persist: false } never drops settled tool results, and is honored for a text-only partial. (Exhaustion test aged past N1/#1638's alarm-debounce so the wake counts as a genuine attempt and actually exhausts.) - E2E (think, real SIGKILL): a recordStep loop whose onChatRecovery returns { persist: false, continue: false } is killed mid-turn; after recovery the settled tool results produced before the kill are still in the durable transcript (R1) and the turn does not continue. (ThinkPersistFalseE2EAgent.) NOTE on coverage: the EXHAUSTION-with-prior-settled-work path stays unit-tested (not e2e) on purpose — under N1's budget, settling a tool result IS forward progress that resets the budget and prevents exhaustion, so that scenario can't be forced deterministically under real churn. Rebased onto main (dropping the already-merged #1633 commit). Co-authored-by: Cursor <cursoragent@cursor.com>
a4f8698 to
f4bfb14
Compare
Contributor
Author
Review-driven coverage hardening (pushed)Following a confidence review of the test coverage, added:
Two findings from the review (both resolved/understood, no code change needed):
Suites green: think 463, ai-chat 485; new e2e green (2 deterministic runs, ~20s); |
threepointone
added a commit
that referenced
this pull request
Jun 1, 2026
…sted payload (#1631) Lets products build a terminal-state policy without re-deriving anything: - ChatRecoveryContext (onChatRecovery) gains recoveryRootRequestId — the stable request id for the whole continuation chain, the right key for per-incident budget tracking / fresh-incident detection (no re-deriving from message IDs). - ChatRecoveryExhaustedContext (onExhausted) gains recoveryRootRequestId, terminalMessage (exact user-facing text), partialText/partialParts (what the turn produced before it was given up on), and streamId/createdAt — enough to render/persist a terminal banner AND emit correlated terminal telemetry (msSinceTurnStart, stream correlation) directly. streamId + createdAt were added after verifying the payload against the actual consumer (g3's _emitExhaustedRecovery): it reads both from the recovery context for telemetry, and they already exist on ChatRecoveryContext (the Pick source), so adding them to the exhausted context is additive and unblocks re-homing the exhaustion handler onto onExhausted with zero re-derivation (D4). Shared types in `agents`; wired through think + ai-chat (_exhaustChatRecovery now receives streamId + createdAt). Test agents capture the exhausted context; tests assert both contexts (incl. streamId + createdAt) in both packages. Rebased onto main (dropping the merged #1633/#1634/#1635 commits); adapted the exhausted-ctx test to N1/#1638's alarm-debounce and gave the think harness an explicit return shape (the context's MessagePart[] over-instantiates the RPC stub type). Co-authored-by: Cursor <cursoragent@cursor.com>
threepointone
added a commit
that referenced
this pull request
Jun 1, 2026
…sted payload (#1631) (#1636) Lets products build a terminal-state policy without re-deriving anything: - ChatRecoveryContext (onChatRecovery) gains recoveryRootRequestId — the stable request id for the whole continuation chain, the right key for per-incident budget tracking / fresh-incident detection (no re-deriving from message IDs). - ChatRecoveryExhaustedContext (onExhausted) gains recoveryRootRequestId, terminalMessage (exact user-facing text), partialText/partialParts (what the turn produced before it was given up on), and streamId/createdAt — enough to render/persist a terminal banner AND emit correlated terminal telemetry (msSinceTurnStart, stream correlation) directly. streamId + createdAt were added after verifying the payload against the actual consumer (g3's _emitExhaustedRecovery): it reads both from the recovery context for telemetry, and they already exist on ChatRecoveryContext (the Pick source), so adding them to the exhausted context is additive and unblocks re-homing the exhaustion handler onto onExhausted with zero re-derivation (D4). Shared types in `agents`; wired through think + ai-chat (_exhaustChatRecovery now receives streamId + createdAt). Test agents capture the exhausted context; tests assert both contexts (incl. streamId + createdAt) in both packages. Rebased onto main (dropping the merged #1633/#1634/#1635 commits); adapted the exhausted-ctx test to N1/#1638's alarm-debounce and gave the think harness an explicit return shape (the context's MessagePart[] over-instantiates the RPC stub type). Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a chat-recovery turn is given up, the framework could throw away a partial assistant message holding completed, often non-idempotent tool results — the most valuable, least-reproducible state in a turn. This PR makes sure that never happens.
Two paths fixed
1. Framework budget exhaustion dropped the settled partial
The budget check returns before
onChatRecoveryis consulted, and_exhaustChatRecoverysealed the turn (terminal status + banner) without persisting the orphaned stream. So when the framework's own budget tripped, every settled tool result was discarded and the model re-ran them on the next message.Fix: the exhaustion branch now persists the settled partial first, reusing the normal path's gating (
_shouldPersistOrphanedPartial) so it can never duplicate a partial an earlier attempt already saved. Because this sits in the sameif (exhausted)branch that N1/#1638 rewrote, it now covers every exhaustion path (no-progress window + 15-min ceiling + attempt cap), not just the raw attempt cap.2.
onChatRecoveryreturning{ persist: false }dropped settled work (R1 — stronger default){ persist: false }reads like "don't bother continuing," but it actually deleted the settled partial — losing completed tool calls with no signal.Fix (R1): settled work is now NEVER dropped.
{ persist: false }only suppresses persistence of a partial that has nothing settled to lose; a partial carrying settled tool results is persisted regardless. An app can no longer accidentally discard completed work — and never needs{ persist: true }just to stay safe. (A safe default beats a warning about an unsafe one — chosen over the earlier "warn once" approach so there is no footgun and no app decision.)New helpers:
_shouldPersistOrphanedPartial,_partialHasSettledToolResults(recognizesoutput-available/output-error/output-deniedandoutput/resultshapes). Applied identically to@cloudflare/thinkand@cloudflare/ai-chat.g3 impact
Lets g3 delete its
{ persist: true }recovery override (the default already persists by default, and now never loses settled work even on an explicitpersist: false).Tests
exhausted. (Adapted to N1/fix(think,ai-chat): wall-clock-keyed-to-progress recovery budget + alarm debounce (#1637) #1638's alarm-debounce: the seeded incident'slastAttemptAtis aged past the debounce window so the wake counts as a genuine attempt and actually exhausts.){ persist: false }never drops settled tool results — settled partial IS persisted, with no warning.{ persist: false }honored for a text-only partial — nothing settled to lose → nothing persisted, no warning.npm run checkclean (91 projects).Notes for reviewers
main(the original branch carried the now-merged fix(think,ai-chat): make chat-recovery progress signal compaction-immune (#1628) #1633 commit; dropped it). The diff is just this PR's change.console.warnonpersist: falsedata loss; this revision replaces it with the stronger no-loss default (R1) per the defaults-over-APIs review.Test plan
npm run checkclean