Chat System

Stateful chat orchestration, SSE delivery, tool execution, compression, and trajectory persistence in the engine.

Overview

The chat subsystem is centered on ChatSession, which owns the thread transcript, runtime state, command queue, SSE event stream, compression flags, and trajectory persistence metadata. It coordinates the full path from a user command to LLM streaming, tool execution, and eventual persistence of the chat trajectory.

Authoritative behavior comes from AGENTS.md and the src/chat/ modules, especially session.rs, queue.rs, stream_core.rs, linearize.rs, history_limit.rs, and trajectories.rs.

Core session model

ChatSession is the mutable in-memory representation of a chat thread. Relevant state includes:

chat_id
messages
runtime: RuntimeState
command_queue: VecDeque<CommandRequest>
event_tx for SSE delivery
event_seq monotonic counter
compression fields: is_compressing, compression_phase, compression_reason
draft fields: draft_message, draft_usage
trajectory fields: trajectory_dirty, trajectory_version, trajectory_save_in_flight, trajectory_save_queued
abort / wakeup coordination: abort_flag, abort_notify, user_interrupt_flag, queue_notify

SessionState enum values are:

Idle
Generating
ExecutingTools
Paused
WaitingIde
WaitingUserInput
Completed
Error

SessionState transitions

stateDiagram-v2
    [*] --> Idle
    Idle --> Generating: UserMessage / RetryFromIndex / Regenerate
    Generating --> ExecutingTools: model emits tool calls
    Generating --> Idle: stream finishes without tool calls
    Generating --> Completed: assistant/tool flow ends with completion
    Generating --> WaitingUserInput: tool asks questions / wait_agents / ask_questions
    Generating --> WaitingIde: IDE-dependent tool result required
    Generating --> Paused: pause-required / user decision boundary
    ExecutingTools --> Generating: tool results processed, continue loop
    ExecutingTools --> Completed: task_done / agent_finish
    ExecutingTools --> WaitingUserInput: tool decision path needs user input
    ExecutingTools --> Idle: abort
    Paused --> Generating: ApproveTools / RejectTools outcome resumes
    WaitingIde --> Generating: IdeToolResult
    WaitingIde --> Idle: Abort
    WaitingUserInput --> Generating: queued resume command
    WaitingUserInput --> Idle: Abort
    Completed --> Idle: new command / regenerate path
    Error --> Idle: recovery / next queued command

Evidence for terminal/active runtime logic appears in session.rs (is_terminal_runtime_state) and queue.rs where the queue processor gates on Generating, ExecutingTools, Paused, and WaitingIde.

Message flow

Canonical flow from user input to model output is:

flowchart LR
    U[UserMessage] --> Q[command queue]
    Q --> P[prepare
system prompt + knowledge RAG + history limit]
    P --> L[linearize]
    L --> S[LLM stream]
    S --> C[StreamCollector]
    C --> T[tool calls]
    T -->|continue| P
    T -->|finish| R[save trajectory / update runtime]

What each stage does

UserMessage enters the queue
- Commands are enqueued and processed by queue.rs.
- Priority user messages may be injected first.
Prepare phase
- prepare_session_preamble_and_knowledge() builds the prompt.
- Authoritative plan in AGENTS.md states preparation includes:
  - system prompt
  - knowledge RAG
  - history limit
Linearization
- linearize.rs merges consecutive user messages.
- It strips linearization-only messages such as summarization artifacts and compression reports.
- It also strips thinking blocks for LLM cache compatibility.
LLM streaming
- stream_core.rs drives the HTTP/SSE or websocket stream.
- Stream deltas are accumulated via a StreamCollector implementation.
Tool calls
- Tool call deltas are collected, finalized, and executed.
- The loop returns to prepare/stream when tool output requires further model turns.
Persistence and loop termination
- Trajectories are saved during or after each major boundary.
- Final runtime state becomes Idle, Completed, WaitingUserInput, or Error depending on outcome.

SSE event model

Chat SSE is served from the subscription endpoint documented in AGENTS.md:

GET /v1/chats/subscribe?chat_id={id}

Events carry a monotonic seq: u64. Clients must reconnect if they detect a gap in sequence numbers.

Event types

Authoritative event types listed in AGENTS.md:

Snapshot
StreamStarted
StreamDelta
StreamFinished
MessageAdded
MessageUpdated
MessageRemoved
MessagesTruncated
ThreadUpdated
QueueUpdated
RuntimeUpdated
PauseRequired

Reconnect-on-gap behavior

The sequence number is incremented on each emitted chat event. Because subscribers receive an ordered stream, any missing sequence number indicates the client missed one or more events and should resubscribe and request a fresh snapshot.

Practical note

Background process completion is not a dedicated SSE envelope in the authoritative contract; it is represented as a hidden event(process_completed) message delivered through MessageAdded.

Commands

POST /v1/chats/{chat_id}/commands accepts queued chat commands. The authoritative command set is:

UserMessage
SetParams
UpdateMessage
RemoveMessage
TruncateMessages
RetryFromIndex
Abort
ApproveTools
RejectTools
BranchFromChat
RestoreFromTrajectory
ClearDraft
SetDraft
Regenerate

Command notes

UserMessage, RetryFromIndex, and Regenerate can trigger generation.
ApproveTools / RejectTools are used when the session is paused for user decision.
BranchFromChat and RestoreFromTrajectory are trajectory-oriented commands that interact with persisted chat history.
ClearDraft and SetDraft manipulate the transient draft state.

Delta operations

The delta op set documented in AGENTS.md is:

AppendContent
AppendReasoning
SetToolCalls
SetThinkingBlocks
AddCitation
AddServerContentBlock
SetUsage
MergeExtra

These operations represent the incremental assembly of a streamed assistant message.

Streaming details

stream_core.rs contains the low-level LLM transport and delta processing.

`merge_thinking_blocks`

SetThinkingBlocks deltas are merged via merge_thinking_blocks() rather than replaced naively. The merge logic is designed to preserve Anthropic-style thinking content and signatures across streaming updates.

The authoritative merge order is:

match by (type, index)
then (type, id)
then (type, signature)
signatures are opaque and latest-wins replacement is used when needed

This preserves the stability of thinking/signature blocks while still allowing incremental streaming updates.

Anthropic thinking/signature preservation

The stream core explicitly handles Anthropic reasoning/thinking blocks so that signatures are not lost during merge and finalization. This is important for provider compatibility and for replaying assistant state accurately.

Linearization and history limiting

`linearize.rs`

linearize.rs is responsible for converting the stored conversation into a model-friendly sequence.

Key behaviors:

merges consecutive user messages
strips thinking blocks for LLM cache compatibility
suppresses linearization-only messages such as summarization artifacts
preserves required anchor messages according to compression exemptions

`history_limit.rs`

history_limit.rs re-exports the shared history limiting logic from refact_chat_history::history_limit.

The authoritative AGENTS.md summary describes a 4-stage compression pipeline:

deduplicate context files
compress tool results
fix tool calls
limit history

CompressionStrength values are:

Absent
Low
Medium
High

Trajectory storage

Trajectories are persisted under:

.refact/trajectories/{chat_id}.json

trajectories.rs handles:

saving and loading trajectory snapshots
restoring sessions from trajectory data
listing and subscribing to trajectory events
repairing and validating trajectory identity

Trajectory relevance to chat

ChatSession::new_with_trajectory() rebuilds an in-memory session from persisted data. This is the mechanism behind restore/reload flows and chat continuity across restarts.

Queue processor behavior

The queue processor in queue.rs is the execution engine for commands. It:

drains priority user messages
handles allowed commands while paused or waiting on IDE input
prepares preamble and knowledge before generation
invokes start_generation()
saves trajectories at important boundaries

Uh oh!

Chat System

Chat System

Overview

Core session model

SessionState transitions

Message flow

What each stage does

SSE event model

Event types

Reconnect-on-gap behavior

Practical note

Commands

Command notes

Delta operations

Streaming details

merge_thinking_blocks

Anthropic thinking/signature preservation

Linearization and history limiting

linearize.rs

history_limit.rs

Trajectory storage

Trajectory relevance to chat

Queue processor behavior

Related pages

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

`merge_thinking_blocks`

`linearize.rs`

`history_limit.rs`