You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The chat-recovery + transcript-repair machinery is duplicated, nearly verbatim, between packages/think/src/think.ts and packages/ai-chat/src/index.ts. Every recovery fix has to be written twice, the two copies drift, and the drift is a real source of bugs and confusion.
This was painful and concrete across the recent recovery stack (#1633, #1634, #1635, #1636, #1638, #1640, #1641): each touched think and ai-chat in lockstep, and the divergence caused issues — e.g. ai-chat's persist gate is streamStillActive-only while think uses _shouldPersistOrphanedPartial (an intentional asymmetry, but one that has to be re-reasoned every time), and ai-chat has no transcript-repair pass at all while think does.
Goal
Hoist the shared recovery/repair machinery into packages/agents/src/chat/ so it is written once and both Think and AIChatAgent consume it. Pure refactor — no behavior change. Design home: design/chat-shared-layer.md.
Duplicated surface to unify (audit, not exhaustive)
Budget / incident:_beginChatRecoveryIncident, _handleInternalFiberRecovery, _exhaustChatRecovery, _sweepStaleChatRecoveryIncidents, the ChatRecoveryIncident shape, and the constants (CHAT_RECOVERY_NO_PROGRESS_WINDOW_MS, CHAT_RECOVERY_ALARM_DEBOUNCE_MS, CHAT_RECOVERY_MAX_WINDOW_MS, DEFAULT_CHAT_RECOVERY_MAX_ATTEMPTS, DEFAULT_CHAT_RECOVERY_STABLE_TIMEOUT_MS, CHAT_RECOVERY_PROGRESS_KEY, AGENT_TOOL_STREAM_PROGRESS_BUMP_THROTTLE_MS).
Progress signal:_chatRecoveryProgressMarker, _bumpChatRecoveryProgress, and the production-time bump sites (_storeChunkDurably in think / _storeStreamChunk in ai-chat) + the N9 _onAgentToolStreamProgress override.
Known intentional asymmetries to reconcile (decide: unify or document at the seam):
ai-chat persist gate is streamStillActive-only vs think's streamStillActive || (streamIsTerminal && !alreadyPersisted).
Transcript repair is think-only (_repairTranscriptForProvider, _repairToolTranscriptParts, repairInterruptedToolPart, _toolPartHasSettledResult, _normalizeToolInput) — ai-chat has none. Decide whether the shared layer offers it to both or stays think-only by design.
Problem
The chat-recovery + transcript-repair machinery is duplicated, nearly verbatim, between
packages/think/src/think.tsandpackages/ai-chat/src/index.ts. Every recovery fix has to be written twice, the two copies drift, and the drift is a real source of bugs and confusion.This was painful and concrete across the recent recovery stack (#1633, #1634, #1635, #1636, #1638, #1640, #1641): each touched think and ai-chat in lockstep, and the divergence caused issues — e.g. ai-chat's persist gate is
streamStillActive-only while think uses_shouldPersistOrphanedPartial(an intentional asymmetry, but one that has to be re-reasoned every time), and ai-chat has no transcript-repair pass at all while think does.Goal
Hoist the shared recovery/repair machinery into
packages/agents/src/chat/so it is written once and bothThinkandAIChatAgentconsume it. Pure refactor — no behavior change. Design home:design/chat-shared-layer.md.Duplicated surface to unify (audit, not exhaustive)
_beginChatRecoveryIncident,_handleInternalFiberRecovery,_exhaustChatRecovery,_sweepStaleChatRecoveryIncidents, theChatRecoveryIncidentshape, and the constants (CHAT_RECOVERY_NO_PROGRESS_WINDOW_MS,CHAT_RECOVERY_ALARM_DEBOUNCE_MS,CHAT_RECOVERY_MAX_WINDOW_MS,DEFAULT_CHAT_RECOVERY_MAX_ATTEMPTS,DEFAULT_CHAT_RECOVERY_STABLE_TIMEOUT_MS,CHAT_RECOVERY_PROGRESS_KEY,AGENT_TOOL_STREAM_PROGRESS_BUMP_THROTTLE_MS)._chatRecoveryProgressMarker,_bumpChatRecoveryProgress, and the production-time bump sites (_storeChunkDurablyin think /_storeStreamChunkin ai-chat) + the N9_onAgentToolStreamProgressoverride._persistOrphanedStream,_shouldPersistOrphanedPartial,_partialHasSettledToolResults,_hasPersistedRecoveredAssistant.streamStillActive-only vs think'sstreamStillActive || (streamIsTerminal && !alreadyPersisted)._repairTranscriptForProvider,_repairToolTranscriptParts,repairInterruptedToolPart,_toolPartHasSettledResult,_normalizeToolInput) — ai-chat has none. Decide whether the shared layer offers it to both or stays think-only by design.Constraints / sequencing
main), to avoid massive rebase conflicts. Do it BEFORE piling more recovery work (discussion: route a stream-stall watchdog abort into bounded recovery instead of a terminal error #1626 watchdog→recovery, Surface a live "recovering…" status to chat clients during durable recovery #1620 recovering-status, durable telemetry) on top, so those land write-once instead of twice. (Note: the team has chosen to land discussion: route a stream-stall watchdog abort into bounded recovery instead of a terminal error #1626/Surface a live "recovering…" status to chat clients during durable recovery #1620/telemetry first, then this refactor before the formal release — so this issue is the "after those three" checkpoint.)packages/agents/src/chat/*requiresnx run agents:buildbefore think/ai-chat typecheck (they compile against builtdist).Acceptance
packages/agents/src/chat/;ThinkandAIChatAgentdelegate to it.npm run checkgreen; full suites green with no behavior change; no new public API.