Skip to content

fix(think,ai-chat): preserve settled work when a recovery turn is given up (#1631)#1634

Merged
threepointone merged 1 commit into
mainfrom
fix/chat-recovery-preserve-settled-work
Jun 1, 2026
Merged

fix(think,ai-chat): preserve settled work when a recovery turn is given up (#1631)#1634
threepointone merged 1 commit into
mainfrom
fix/chat-recovery-preserve-settled-work

Conversation

@threepointone
Copy link
Copy Markdown
Contributor

@threepointone threepointone commented May 31, 2026

Summary

When a chat-recovery turn is given up, the framework could throw away a partial assistant message holding completed, often non-idempotent tool results — the most valuable, least-reproducible state in a turn. This PR makes sure that never happens.

Updated: rebased onto main (it now includes #1633, #1638/N1, #1640, and #1641/N9) and the persist: false footgun is now fixed with the stronger default (R1) instead of a warning — see below.

Two paths fixed

1. Framework budget exhaustion dropped the settled partial

The budget check returns before onChatRecovery is consulted, and _exhaustChatRecovery sealed the turn (terminal status + banner) without persisting the orphaned stream. So when the framework's own budget tripped, every settled tool result was discarded and the model re-ran them on the next message.

Fix: the exhaustion branch now persists the settled partial first, reusing the normal path's gating (_shouldPersistOrphanedPartial) so it can never duplicate a partial an earlier attempt already saved. Because this sits in the same if (exhausted) branch that N1/#1638 rewrote, it now covers every exhaustion path (no-progress window + 15-min ceiling + attempt cap), not just the raw attempt cap.

2. onChatRecovery returning { persist: false } dropped settled work (R1 — stronger default)

{ persist: false } reads like "don't bother continuing," but it actually deleted the settled partial — losing completed tool calls with no signal.

Fix (R1): settled work is now NEVER dropped. { persist: false } only suppresses persistence of a partial that has nothing settled to lose; a partial carrying settled tool results is persisted regardless. An app can no longer accidentally discard completed work — and never needs { persist: true } just to stay safe. (A safe default beats a warning about an unsafe one — chosen over the earlier "warn once" approach so there is no footgun and no app decision.)

New helpers: _shouldPersistOrphanedPartial, _partialHasSettledToolResults (recognizes output-available / output-error / output-denied and output/result shapes). Applied identically to @cloudflare/think and @cloudflare/ai-chat.

g3 impact

Lets g3 delete its { persist: true } recovery override (the default already persists by default, and now never loses settled work even on an explicit persist: false).

Tests

  • Exhaustion preserves the settled partial — seed an incident at the cap + a terminal stream with a partial, trigger recovery, assert the partial is persisted and the incident is exhausted. (Adapted to N1/fix(think,ai-chat): wall-clock-keyed-to-progress recovery budget + alarm debounce (#1637) #1638's alarm-debounce: the seeded incident's lastAttemptAt is aged past the debounce window so the wake counts as a genuine attempt and actually exhausts.)
  • { persist: false } never drops settled tool results — settled partial IS persisted, with no warning.
  • { persist: false } honored for a text-only partial — nothing settled to lose → nothing persisted, no warning.
  • Full suites green: think 463, ai-chat 485; npm run check clean (91 projects).

Notes for reviewers

Test plan

  • CI green
  • think + ai-chat suites green (463 / 485)
  • npm run check clean

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 31, 2026

🦋 Changeset detected

Latest commit: f4bfb14

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages
Name Type
@cloudflare/think Patch
@cloudflare/ai-chat Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 31, 2026

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1634

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1634

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1634

hono-agents

npm i https://pkg.pr.new/hono-agents@1634

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1634

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1634

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1634

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1634

commit: f4bfb14

Base automatically changed from fix/chat-recovery-progress-monotonic to main June 1, 2026 01:54
@threepointone threepointone force-pushed the fix/chat-recovery-preserve-settled-work branch from 1893c2a to a4f8698 Compare June 1, 2026 13:53
…en up (#1631)

Two paths could throw away a partial assistant message holding completed,
often non-idempotent tool results — the most valuable, least-reproducible
state in a turn:

1. Framework budget exhaustion sealed the turn (terminal status + banner)
   BEFORE the orphaned stream was persisted, so settled tool results were
   discarded and re-run on the next message. Exhaustion now persists the
   settled partial first, reusing the normal path's gating so it can't
   duplicate an already-saved partial. (This now also covers N1/#1638's
   wall-clock and no-progress exhaustion paths, not just the attempt cap.)

2. A subclass onChatRecovery returning { persist: false } silently dropped
   the settled partial. Settled work is now NEVER dropped: { persist: false }
   only suppresses persistence of a partial that has nothing settled to lose;
   a partial carrying settled tool results is persisted regardless. An app can
   no longer accidentally discard completed work, and never needs
   { persist: true } just to stay safe. A safe default beats a warning about
   an unsafe one (R1).

Adds _shouldPersistOrphanedPartial / _partialHasSettledToolResults helpers.
Applied identically to @cloudflare/think and @cloudflare/ai-chat.

Tests:
- Unit (think + ai-chat): exhaustion preserves a settled TOOL RESULT (not just
  text); { persist: false } never drops settled tool results, and is honored
  for a text-only partial. (Exhaustion test aged past N1/#1638's alarm-debounce
  so the wake counts as a genuine attempt and actually exhausts.)
- E2E (think, real SIGKILL): a recordStep loop whose onChatRecovery returns
  { persist: false, continue: false } is killed mid-turn; after recovery the
  settled tool results produced before the kill are still in the durable
  transcript (R1) and the turn does not continue. (ThinkPersistFalseE2EAgent.)

NOTE on coverage: the EXHAUSTION-with-prior-settled-work path stays unit-tested
(not e2e) on purpose — under N1's budget, settling a tool result IS forward
progress that resets the budget and prevents exhaustion, so that scenario can't
be forced deterministically under real churn.

Rebased onto main (dropping the already-merged #1633 commit).

Co-authored-by: Cursor <cursoragent@cursor.com>
@threepointone threepointone force-pushed the fix/chat-recovery-preserve-settled-work branch from a4f8698 to f4bfb14 Compare June 1, 2026 14:14
@threepointone
Copy link
Copy Markdown
Contributor Author

Review-driven coverage hardening (pushed)

Following a confidence review of the test coverage, added:

  • Strengthened the exhaustion unit tests (think + ai-chat): they now seed a settled tool result (not just text) and assert it survives budget exhaustion — directly proving the headline claim ("completed, non-idempotent tool results aren't dropped on exhaustion").
  • New real-SIGKILL e2e (ThinkPersistFalseE2EAgent + persist-false-preserves.test.ts): a recordStep loop whose onChatRecovery returns { persist: false, continue: false } is killed mid-turn; after recovery the settled tool results produced before the kill are still in the durable transcript (R1) and the turn does not continue. This validates the R1 no-loss default under a real process kill (it would fail on the pre-R1 "persist:false drops the partial" behavior).

Two findings from the review (both resolved/understood, no code change needed):

  1. ai-chat's streamStillActive-only persist gate vs think's _shouldPersistOrphanedPartial is an intentional asymmetry, not a data-loss gap: ai-chat restores interrupted streams as active (insertInterruptedStreamstatus:'streaming' + _resumableStream.restore()), so recovery always sees an active orphan that the gate covers; terminal orphans are persisted by the client-reconnect ACK path. (think tracks an explicit terminal stream status and needs the extra branch.)
  2. The exhaustion-with-prior-settled-work path stays unit-tested, not e2e — on purpose. Under N1/fix(think,ai-chat): wall-clock-keyed-to-progress recovery budget + alarm debounce (#1637) #1638's budget, settling a tool result is forward progress that resets the recovery budget, which (correctly) prevents exhaustion. So that exact scenario can't be forced deterministically under real churn; the unit test (which seeds the at-cap incident directly) is the right level. The real-kill e2e instead covers the deterministic R1 persist:false path.

Suites green: think 463, ai-chat 485; new e2e green (2 deterministic runs, ~20s); npm run check clean.

@threepointone threepointone merged commit a4225fd into main Jun 1, 2026
4 of 5 checks passed
@threepointone threepointone deleted the fix/chat-recovery-preserve-settled-work branch June 1, 2026 14:28
threepointone added a commit that referenced this pull request Jun 1, 2026
…sted payload (#1631)

Lets products build a terminal-state policy without re-deriving anything:

- ChatRecoveryContext (onChatRecovery) gains recoveryRootRequestId — the stable
  request id for the whole continuation chain, the right key for per-incident
  budget tracking / fresh-incident detection (no re-deriving from message IDs).
- ChatRecoveryExhaustedContext (onExhausted) gains recoveryRootRequestId,
  terminalMessage (exact user-facing text), partialText/partialParts (what the
  turn produced before it was given up on), and streamId/createdAt — enough to
  render/persist a terminal banner AND emit correlated terminal telemetry
  (msSinceTurnStart, stream correlation) directly.

streamId + createdAt were added after verifying the payload against the actual
consumer (g3's _emitExhaustedRecovery): it reads both from the recovery context
for telemetry, and they already exist on ChatRecoveryContext (the Pick source),
so adding them to the exhausted context is additive and unblocks re-homing the
exhaustion handler onto onExhausted with zero re-derivation (D4).

Shared types in `agents`; wired through think + ai-chat (_exhaustChatRecovery
now receives streamId + createdAt). Test agents capture the exhausted context;
tests assert both contexts (incl. streamId + createdAt) in both packages.

Rebased onto main (dropping the merged #1633/#1634/#1635 commits); adapted the
exhausted-ctx test to N1/#1638's alarm-debounce and gave the think harness an
explicit return shape (the context's MessagePart[] over-instantiates the RPC
stub type).

Co-authored-by: Cursor <cursoragent@cursor.com>
threepointone added a commit that referenced this pull request Jun 1, 2026
…sted payload (#1631) (#1636)

Lets products build a terminal-state policy without re-deriving anything:

- ChatRecoveryContext (onChatRecovery) gains recoveryRootRequestId — the stable
  request id for the whole continuation chain, the right key for per-incident
  budget tracking / fresh-incident detection (no re-deriving from message IDs).
- ChatRecoveryExhaustedContext (onExhausted) gains recoveryRootRequestId,
  terminalMessage (exact user-facing text), partialText/partialParts (what the
  turn produced before it was given up on), and streamId/createdAt — enough to
  render/persist a terminal banner AND emit correlated terminal telemetry
  (msSinceTurnStart, stream correlation) directly.

streamId + createdAt were added after verifying the payload against the actual
consumer (g3's _emitExhaustedRecovery): it reads both from the recovery context
for telemetry, and they already exist on ChatRecoveryContext (the Pick source),
so adding them to the exhausted context is additive and unblocks re-homing the
exhaustion handler onto onExhausted with zero re-derivation (D4).

Shared types in `agents`; wired through think + ai-chat (_exhaustChatRecovery
now receives streamId + createdAt). Test agents capture the exhausted context;
tests assert both contexts (incl. streamId + createdAt) in both packages.

Rebased onto main (dropping the merged #1633/#1634/#1635 commits); adapted the
exhausted-ctx test to N1/#1638's alarm-debounce and gave the think harness an
explicit return shape (the context's MessagePart[] over-instantiates the RPC
stub type).

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant