Skip to content

refactor: unify duplicated chat-recovery/repair machinery into the shared agents/chat layer (N3) #1642

@threepointone

Description

@threepointone

Problem

The chat-recovery + transcript-repair machinery is duplicated, nearly verbatim, between packages/think/src/think.ts and packages/ai-chat/src/index.ts. Every recovery fix has to be written twice, the two copies drift, and the drift is a real source of bugs and confusion.

This was painful and concrete across the recent recovery stack (#1633, #1634, #1635, #1636, #1638, #1640, #1641): each touched think and ai-chat in lockstep, and the divergence caused issues — e.g. ai-chat's persist gate is streamStillActive-only while think uses _shouldPersistOrphanedPartial (an intentional asymmetry, but one that has to be re-reasoned every time), and ai-chat has no transcript-repair pass at all while think does.

Goal

Hoist the shared recovery/repair machinery into packages/agents/src/chat/ so it is written once and both Think and AIChatAgent consume it. Pure refactor — no behavior change. Design home: design/chat-shared-layer.md.

Duplicated surface to unify (audit, not exhaustive)

  • Budget / incident: _beginChatRecoveryIncident, _handleInternalFiberRecovery, _exhaustChatRecovery, _sweepStaleChatRecoveryIncidents, the ChatRecoveryIncident shape, and the constants (CHAT_RECOVERY_NO_PROGRESS_WINDOW_MS, CHAT_RECOVERY_ALARM_DEBOUNCE_MS, CHAT_RECOVERY_MAX_WINDOW_MS, DEFAULT_CHAT_RECOVERY_MAX_ATTEMPTS, DEFAULT_CHAT_RECOVERY_STABLE_TIMEOUT_MS, CHAT_RECOVERY_PROGRESS_KEY, AGENT_TOOL_STREAM_PROGRESS_BUMP_THROTTLE_MS).
  • Progress signal: _chatRecoveryProgressMarker, _bumpChatRecoveryProgress, and the production-time bump sites (_storeChunkDurably in think / _storeStreamChunk in ai-chat) + the N9 _onAgentToolStreamProgress override.
  • Orphan persistence: _persistOrphanedStream, _shouldPersistOrphanedPartial, _partialHasSettledToolResults, _hasPersistedRecoveredAssistant.
  • Known intentional asymmetries to reconcile (decide: unify or document at the seam):
    • ai-chat persist gate is streamStillActive-only vs think's streamStillActive || (streamIsTerminal && !alreadyPersisted).
    • Transcript repair is think-only (_repairTranscriptForProvider, _repairToolTranscriptParts, repairInterruptedToolPart, _toolPartHasSettledResult, _normalizeToolInput) — ai-chat has none. Decide whether the shared layer offers it to both or stays think-only by design.

Constraints / sequencing

Acceptance

  • Recovery/repair logic lives once in packages/agents/src/chat/; Think and AIChatAgent delegate to it.
  • npm run check green; full suites green with no behavior change; no new public API.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions