Skip to content

Chat System

refact-planner edited this page Jun 7, 2026 · 1 revision

Chat System

Stateful chat orchestration, SSE delivery, tool execution, compression, and trajectory persistence in the engine.

Overview

The chat subsystem is centered on ChatSession, which owns the thread transcript, runtime state, command queue, SSE event stream, compression flags, and trajectory persistence metadata. It coordinates the full path from a user command to LLM streaming, tool execution, and eventual persistence of the chat trajectory.

Authoritative behavior comes from AGENTS.md and the src/chat/ modules, especially session.rs, queue.rs, stream_core.rs, linearize.rs, history_limit.rs, and trajectories.rs.

Core session model

ChatSession is the mutable in-memory representation of a chat thread. Relevant state includes:

  • chat_id
  • messages
  • runtime: RuntimeState
  • command_queue: VecDeque<CommandRequest>
  • event_tx for SSE delivery
  • event_seq monotonic counter
  • compression fields: is_compressing, compression_phase, compression_reason
  • draft fields: draft_message, draft_usage
  • trajectory fields: trajectory_dirty, trajectory_version, trajectory_save_in_flight, trajectory_save_queued
  • abort / wakeup coordination: abort_flag, abort_notify, user_interrupt_flag, queue_notify

SessionState enum values are:

  • Idle
  • Generating
  • ExecutingTools
  • Paused
  • WaitingIde
  • WaitingUserInput
  • Completed
  • Error

SessionState transitions

stateDiagram-v2
    [*] --> Idle
    Idle --> Generating: UserMessage / RetryFromIndex / Regenerate
    Generating --> ExecutingTools: model emits tool calls
    Generating --> Idle: stream finishes without tool calls
    Generating --> Completed: assistant/tool flow ends with completion
    Generating --> WaitingUserInput: tool asks questions / wait_agents / ask_questions
    Generating --> WaitingIde: IDE-dependent tool result required
    Generating --> Paused: pause-required / user decision boundary
    ExecutingTools --> Generating: tool results processed, continue loop
    ExecutingTools --> Completed: task_done / agent_finish
    ExecutingTools --> WaitingUserInput: tool decision path needs user input
    ExecutingTools --> Idle: abort
    Paused --> Generating: ApproveTools / RejectTools outcome resumes
    WaitingIde --> Generating: IdeToolResult
    WaitingIde --> Idle: Abort
    WaitingUserInput --> Generating: queued resume command
    WaitingUserInput --> Idle: Abort
    Completed --> Idle: new command / regenerate path
    Error --> Idle: recovery / next queued command
Loading

Evidence for terminal/active runtime logic appears in session.rs (is_terminal_runtime_state) and queue.rs where the queue processor gates on Generating, ExecutingTools, Paused, and WaitingIde.

Message flow

Canonical flow from user input to model output is:

flowchart LR
    U[UserMessage] --> Q[command queue]
    Q --> P[prepare
system prompt + knowledge RAG + history limit]
    P --> L[linearize]
    L --> S[LLM stream]
    S --> C[StreamCollector]
    C --> T[tool calls]
    T -->|continue| P
    T -->|finish| R[save trajectory / update runtime]
Loading

What each stage does

  1. UserMessage enters the queue

    • Commands are enqueued and processed by queue.rs.
    • Priority user messages may be injected first.
  2. Prepare phase

    • prepare_session_preamble_and_knowledge() builds the prompt.
    • Authoritative plan in AGENTS.md states preparation includes:
      • system prompt
      • knowledge RAG
      • history limit
  3. Linearization

    • linearize.rs merges consecutive user messages.
    • It strips linearization-only messages such as summarization artifacts and compression reports.
    • It also strips thinking blocks for LLM cache compatibility.
  4. LLM streaming

    • stream_core.rs drives the HTTP/SSE or websocket stream.
    • Stream deltas are accumulated via a StreamCollector implementation.
  5. Tool calls

    • Tool call deltas are collected, finalized, and executed.
    • The loop returns to prepare/stream when tool output requires further model turns.
  6. Persistence and loop termination

    • Trajectories are saved during or after each major boundary.
    • Final runtime state becomes Idle, Completed, WaitingUserInput, or Error depending on outcome.

SSE event model

Chat SSE is served from the subscription endpoint documented in AGENTS.md:

  • GET /v1/chats/subscribe?chat_id={id}

Events carry a monotonic seq: u64. Clients must reconnect if they detect a gap in sequence numbers.

Event types

Authoritative event types listed in AGENTS.md:

  • Snapshot
  • StreamStarted
  • StreamDelta
  • StreamFinished
  • MessageAdded
  • MessageUpdated
  • MessageRemoved
  • MessagesTruncated
  • ThreadUpdated
  • QueueUpdated
  • RuntimeUpdated
  • PauseRequired

Reconnect-on-gap behavior

The sequence number is incremented on each emitted chat event. Because subscribers receive an ordered stream, any missing sequence number indicates the client missed one or more events and should resubscribe and request a fresh snapshot.

Practical note

Background process completion is not a dedicated SSE envelope in the authoritative contract; it is represented as a hidden event(process_completed) message delivered through MessageAdded.

Commands

POST /v1/chats/{chat_id}/commands accepts queued chat commands. The authoritative command set is:

  • UserMessage
  • SetParams
  • UpdateMessage
  • RemoveMessage
  • TruncateMessages
  • RetryFromIndex
  • Abort
  • ApproveTools
  • RejectTools
  • BranchFromChat
  • RestoreFromTrajectory
  • ClearDraft
  • SetDraft
  • Regenerate

Command notes

  • UserMessage, RetryFromIndex, and Regenerate can trigger generation.
  • ApproveTools / RejectTools are used when the session is paused for user decision.
  • BranchFromChat and RestoreFromTrajectory are trajectory-oriented commands that interact with persisted chat history.
  • ClearDraft and SetDraft manipulate the transient draft state.

Delta operations

The delta op set documented in AGENTS.md is:

  • AppendContent
  • AppendReasoning
  • SetToolCalls
  • SetThinkingBlocks
  • AddCitation
  • AddServerContentBlock
  • SetUsage
  • MergeExtra

These operations represent the incremental assembly of a streamed assistant message.

Streaming details

stream_core.rs contains the low-level LLM transport and delta processing.

merge_thinking_blocks

SetThinkingBlocks deltas are merged via merge_thinking_blocks() rather than replaced naively. The merge logic is designed to preserve Anthropic-style thinking content and signatures across streaming updates.

The authoritative merge order is:

  1. match by (type, index)
  2. then (type, id)
  3. then (type, signature)
  4. signatures are opaque and latest-wins replacement is used when needed

This preserves the stability of thinking/signature blocks while still allowing incremental streaming updates.

Anthropic thinking/signature preservation

The stream core explicitly handles Anthropic reasoning/thinking blocks so that signatures are not lost during merge and finalization. This is important for provider compatibility and for replaying assistant state accurately.

Linearization and history limiting

linearize.rs

linearize.rs is responsible for converting the stored conversation into a model-friendly sequence.

Key behaviors:

  • merges consecutive user messages
  • strips thinking blocks for LLM cache compatibility
  • suppresses linearization-only messages such as summarization artifacts
  • preserves required anchor messages according to compression exemptions

history_limit.rs

history_limit.rs re-exports the shared history limiting logic from refact_chat_history::history_limit.

The authoritative AGENTS.md summary describes a 4-stage compression pipeline:

  1. deduplicate context files
  2. compress tool results
  3. fix tool calls
  4. limit history

CompressionStrength values are:

  • Absent
  • Low
  • Medium
  • High

Trajectory storage

Trajectories are persisted under:

  • .refact/trajectories/{chat_id}.json

trajectories.rs handles:

  • saving and loading trajectory snapshots
  • restoring sessions from trajectory data
  • listing and subscribing to trajectory events
  • repairing and validating trajectory identity

Trajectory relevance to chat

ChatSession::new_with_trajectory() rebuilds an in-memory session from persisted data. This is the mechanism behind restore/reload flows and chat continuity across restarts.

Queue processor behavior

The queue processor in queue.rs is the execution engine for commands. It:

  • drains priority user messages
  • handles allowed commands while paused or waiting on IDE input
  • prepares preamble and knowledge before generation
  • invokes start_generation()
  • saves trajectories at important boundaries

Observed runtime gating includes:

  • WaitingIde only accepts IDE result commands and Abort
  • Paused only accepts tool-decision commands and Abort
  • busy states are Generating and ExecutingTools

Related pages

Clone this wiki locally