fix(think,ai-chat): make chat-recovery progress signal compaction-immune (#1628)#1633
Merged
Merged
Conversation
…une (#1628) The deploy-churn forward-progress signal that resets the recovery budget was recomputed from the live transcript by counting assistant messages. Compaction collapses older assistant messages into a summary, lowering that count, so a turn that had genuinely advanced could read as "no progress" between recovery attempts and exhaust at maxAttempts — sealing a healthy turn. Track progress with a durable, monotonic counter (cf:chat-recovery:progress) incremented when _persistOrphanedStream materializes a non-empty partial — the exact event the message count was proxying for — so compaction can never lower it. A turn that genuinely fails to advance still exhausts at the cap, and the 15-minute wall-clock ceiling is unchanged. Applied identically to think and ai-chat (the recovery code is duplicated verbatim). Co-authored-by: Cursor <cursoragent@cursor.com>
🦋 Changeset detectedLatest commit: 5cde588 The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
This was referenced May 31, 2026
Merged
Contributor
Author
|
/bonk do a deep review on this pr, look for edge cases, how's the ux on this |
agents
@cloudflare/ai-chat
@cloudflare/codemode
hono-agents
@cloudflare/shell
@cloudflare/think
@cloudflare/voice
@cloudflare/worker-bundler
commit: |
Contributor
|
UnknownError: ProviderInitError |
Contributor
|
@threepointone Bonk workflow failed. Check the logs for details. View workflow run · To retry, trigger Bonk again. |
The forward-progress bump in _persistOrphanedStream awaited _bumpChatRecoveryProgress but fired persistMessages without awaiting it. persistMessages is async; today its body is synchronous so ordering holds, but the bump should only advance after persistence settles. This aligns ai-chat with think's _persistOrphanedStream, which awaits its persist before bumping (#1628). Co-authored-by: Cursor <cursoragent@cursor.com>
Merged
threepointone
added a commit
that referenced
this pull request
Jun 1, 2026
…en up (#1631) Two paths could throw away a partial assistant message holding completed, often non-idempotent tool results — the most valuable, least-reproducible state in a turn: 1. Framework budget exhaustion sealed the turn (terminal status + banner) BEFORE the orphaned stream was persisted, so settled tool results were discarded and re-run on the next message. Exhaustion now persists the settled partial first, reusing the normal path's gating so it can't duplicate an already-saved partial. (This now also covers N1/#1638's wall-clock and no-progress exhaustion paths, not just the attempt cap.) 2. A subclass onChatRecovery returning { persist: false } silently dropped the settled partial. Settled work is now NEVER dropped: { persist: false } only suppresses persistence of a partial that has nothing settled to lose; a partial carrying settled tool results is persisted regardless. An app can no longer accidentally discard completed work, and never needs { persist: true } just to stay safe. A safe default beats a warning about an unsafe one (R1). Adds _shouldPersistOrphanedPartial / _partialHasSettledToolResults helpers. Applied identically to @cloudflare/think and @cloudflare/ai-chat. Rebased onto main (dropping the already-merged #1633 commit) and adapted the exhaustion test to N1/#1638's alarm-debounce (age the seeded incident past the debounce window so the wake counts as a genuine attempt and exhausts). Co-authored-by: Cursor <cursoragent@cursor.com>
threepointone
added a commit
that referenced
this pull request
Jun 1, 2026
…en up (#1631) Two paths could throw away a partial assistant message holding completed, often non-idempotent tool results — the most valuable, least-reproducible state in a turn: 1. Framework budget exhaustion sealed the turn (terminal status + banner) BEFORE the orphaned stream was persisted, so settled tool results were discarded and re-run on the next message. Exhaustion now persists the settled partial first, reusing the normal path's gating so it can't duplicate an already-saved partial. (This now also covers N1/#1638's wall-clock and no-progress exhaustion paths, not just the attempt cap.) 2. A subclass onChatRecovery returning { persist: false } silently dropped the settled partial. Settled work is now NEVER dropped: { persist: false } only suppresses persistence of a partial that has nothing settled to lose; a partial carrying settled tool results is persisted regardless. An app can no longer accidentally discard completed work, and never needs { persist: true } just to stay safe. A safe default beats a warning about an unsafe one (R1). Adds _shouldPersistOrphanedPartial / _partialHasSettledToolResults helpers. Applied identically to @cloudflare/think and @cloudflare/ai-chat. Tests: - Unit (think + ai-chat): exhaustion preserves a settled TOOL RESULT (not just text); { persist: false } never drops settled tool results, and is honored for a text-only partial. (Exhaustion test aged past N1/#1638's alarm-debounce so the wake counts as a genuine attempt and actually exhausts.) - E2E (think, real SIGKILL): a recordStep loop whose onChatRecovery returns { persist: false, continue: false } is killed mid-turn; after recovery the settled tool results produced before the kill are still in the durable transcript (R1) and the turn does not continue. (ThinkPersistFalseE2EAgent.) NOTE on coverage: the EXHAUSTION-with-prior-settled-work path stays unit-tested (not e2e) on purpose — under N1's budget, settling a tool result IS forward progress that resets the budget and prevents exhaustion, so that scenario can't be forced deterministically under real churn. Rebased onto main (dropping the already-merged #1633 commit). Co-authored-by: Cursor <cursoragent@cursor.com>
threepointone
added a commit
that referenced
this pull request
Jun 1, 2026
…en up (#1631) (#1634) Two paths could throw away a partial assistant message holding completed, often non-idempotent tool results — the most valuable, least-reproducible state in a turn: 1. Framework budget exhaustion sealed the turn (terminal status + banner) BEFORE the orphaned stream was persisted, so settled tool results were discarded and re-run on the next message. Exhaustion now persists the settled partial first, reusing the normal path's gating so it can't duplicate an already-saved partial. (This now also covers N1/#1638's wall-clock and no-progress exhaustion paths, not just the attempt cap.) 2. A subclass onChatRecovery returning { persist: false } silently dropped the settled partial. Settled work is now NEVER dropped: { persist: false } only suppresses persistence of a partial that has nothing settled to lose; a partial carrying settled tool results is persisted regardless. An app can no longer accidentally discard completed work, and never needs { persist: true } just to stay safe. A safe default beats a warning about an unsafe one (R1). Adds _shouldPersistOrphanedPartial / _partialHasSettledToolResults helpers. Applied identically to @cloudflare/think and @cloudflare/ai-chat. Tests: - Unit (think + ai-chat): exhaustion preserves a settled TOOL RESULT (not just text); { persist: false } never drops settled tool results, and is honored for a text-only partial. (Exhaustion test aged past N1/#1638's alarm-debounce so the wake counts as a genuine attempt and actually exhausts.) - E2E (think, real SIGKILL): a recordStep loop whose onChatRecovery returns { persist: false, continue: false } is killed mid-turn; after recovery the settled tool results produced before the kill are still in the durable transcript (R1) and the turn does not continue. (ThinkPersistFalseE2EAgent.) NOTE on coverage: the EXHAUSTION-with-prior-settled-work path stays unit-tested (not e2e) on purpose — under N1's budget, settling a tool result IS forward progress that resets the budget and prevents exhaustion, so that scenario can't be forced deterministically under real churn. Rebased onto main (dropping the already-merged #1633 commit). Co-authored-by: Cursor <cursoragent@cursor.com>
threepointone
added a commit
that referenced
this pull request
Jun 1, 2026
…sted payload (#1631) Lets products build a terminal-state policy without re-deriving anything: - ChatRecoveryContext (onChatRecovery) gains recoveryRootRequestId — the stable request id for the whole continuation chain, the right key for per-incident budget tracking / fresh-incident detection (no re-deriving from message IDs). - ChatRecoveryExhaustedContext (onExhausted) gains recoveryRootRequestId, terminalMessage (exact user-facing text), partialText/partialParts (what the turn produced before it was given up on), and streamId/createdAt — enough to render/persist a terminal banner AND emit correlated terminal telemetry (msSinceTurnStart, stream correlation) directly. streamId + createdAt were added after verifying the payload against the actual consumer (g3's _emitExhaustedRecovery): it reads both from the recovery context for telemetry, and they already exist on ChatRecoveryContext (the Pick source), so adding them to the exhausted context is additive and unblocks re-homing the exhaustion handler onto onExhausted with zero re-derivation (D4). Shared types in `agents`; wired through think + ai-chat (_exhaustChatRecovery now receives streamId + createdAt). Test agents capture the exhausted context; tests assert both contexts (incl. streamId + createdAt) in both packages. Rebased onto main (dropping the merged #1633/#1634/#1635 commits); adapted the exhausted-ctx test to N1/#1638's alarm-debounce and gave the think harness an explicit return shape (the context's MessagePart[] over-instantiates the RPC stub type). Co-authored-by: Cursor <cursoragent@cursor.com>
threepointone
added a commit
that referenced
this pull request
Jun 1, 2026
…sted payload (#1631) (#1636) Lets products build a terminal-state policy without re-deriving anything: - ChatRecoveryContext (onChatRecovery) gains recoveryRootRequestId — the stable request id for the whole continuation chain, the right key for per-incident budget tracking / fresh-incident detection (no re-deriving from message IDs). - ChatRecoveryExhaustedContext (onExhausted) gains recoveryRootRequestId, terminalMessage (exact user-facing text), partialText/partialParts (what the turn produced before it was given up on), and streamId/createdAt — enough to render/persist a terminal banner AND emit correlated terminal telemetry (msSinceTurnStart, stream correlation) directly. streamId + createdAt were added after verifying the payload against the actual consumer (g3's _emitExhaustedRecovery): it reads both from the recovery context for telemetry, and they already exist on ChatRecoveryContext (the Pick source), so adding them to the exhausted context is additive and unblocks re-homing the exhaustion handler onto onExhausted with zero re-derivation (D4). Shared types in `agents`; wired through think + ai-chat (_exhaustChatRecovery now receives streamId + createdAt). Test agents capture the exhausted context; tests assert both contexts (incl. streamId + createdAt) in both packages. Rebased onto main (dropping the merged #1633/#1634/#1635 commits); adapted the exhausted-ctx test to N1/#1638's alarm-debounce and gave the think harness an explicit return shape (the context's MessagePart[] over-instantiates the RPC stub type). Co-authored-by: Cursor <cursoragent@cursor.com>
This was referenced Jun 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack
This is PR 1 of 4 addressing #1631 (recovery/repair ergonomics) and #1628. The series is stacked; review/merge bottom-up:
repairInterruptedToolParthook · minoronExhaustedpayload · minorEach PR is independently reviewable; this one has no dependency on the others (it's the base of the stack). Targets
main.Problem (#1628)
Bounded chat recovery (#1623) resets the per-incident retry budget when an interrupted turn is making forward progress, so that a turn repeatedly interrupted by deploy churn isn't falsely treated as a poison turn and sealed at
maxAttempts. That progress signal was recomputed from the live transcript by counting assistant messages:this.messagesis the live, mutable transcript. Compaction collapses older assistant messages into a summary, lowering the count. So a turn that genuinely advanced between two recovery attempts can read ascurrentProgress < prevProgress→madeProgress = false→ the budget is not reset → the turn exhausts atmaxAttemptsand is sealed, even though it was healthy. The storedprogress: Math.max(prevProgress, currentProgress)only protects the persisted high-water; the comparison still uses the live (compacted) recompute, so it doesn't help.This is the data-loss path a tool-heavy agent under continuous deploys actually hits.
Fix
Track progress with a durable, monotonic counter that compaction can never lower, instead of recomputing from the transcript:
cf:chat-recovery:progress._persistOrphanedStreammaterializes a non-empty partial assistant message — which is precisely the "forward progress" event the old assistant-message count was a proxy for._chatRecoveryProgressMarker()now reads that counter;_beginChatRecoveryIncidentawaits it.Because it only ever increments, compaction is irrelevant. Semantics are otherwise preserved:
madeProgresstrue → budget resets. ✅_persistOrphanedStreamearly-returns on empty chunks → counter flat → budget counts toward the cap → exhausts. ✅CHAT_RECOVERY_MAX_WINDOW_MSwall-clock ceiling.Timing (why the counter is read at incident-begin and bumped at persist)
Per interruption:
_beginChatRecoveryIncidentreads the counter and compares against the stored high-water → then (if not exhausted)_persistOrphanedStreambumps it. So attempt N's bump is observed at attempt N+1's comparison, exactly mirroring how the old assistant-message count grew after each persist.Back-compat / migration
A session interrupted mid-incident before this counter existed has no
cf:chat-recovery:progresskey → it reads0. For a new incident that's fine (no comparison on first detection). For an in-flight incident at upgrade time the comparison is conservative (treats it as no-progress for that one alarm), which is bounded and harmless. No persisted-data migration required.Bonus
@cloudflare/ai-chat's_persistOrphanedStreammerges a continuation partial into the existing assistant message in place, so its old marker barely grew across continuations (only the first persist added a message) — it was already under-counting progress. The monotonic counter fixes that too and brings ai-chat in line with think.Files
packages/think/src/think.ts— counter constant,_chatRecoveryProgressMarker(now async, reads storage), new_bumpChatRecoveryProgress, bump in_persistOrphanedStream,awaitin_beginChatRecoveryIncident, updatedChatRecoveryIncident.progressdoc.packages/ai-chat/src/index.ts— same changes;_persistOrphanedStreammadeasync(callers updated toawait).bumpRecoveryProgressForTest,dropAssistantMessagesForTest)..changeset/chat-recovery-progress-monotonic.md(patch · think + ai-chat).Tests
New regression test in both packages — "detects forward progress even after compaction collapses the transcript (#1628)": open an incident, make progress (bump the counter), then drop every assistant message from the live cache (simulating compaction), and assert the next alarm still resets the budget. The old message-count marker fails this; the durable counter passes. Existing progress-reset and wall-clock-ceiling tests were re-pointed at the counter and still pass.
Verification
tsc --noEmitclean for@cloudflare/thinkand@cloudflare/ai-chat.oxlintclean;oxfmtapplied.Test plan
npm run build && npm run check && nx affected -t test)mainand passes hereMade with Cursor