fix(harness+desktop): coalesce streaming text deltas, add detached turns, cross-surface invariant#6
Merged
OmGuptaIND merged 9 commits intomainfrom Apr 22, 2026
Merged
Conversation
…rns, cross-surface render invariant
Three related bodies of work — the streaming-deltas fix was the user-
visible bug that triggered the rest, and the detached-turns feature
made us audit the full disconnect flow.
Streaming text delta coalesce (mirror.ts)
- CodexHarnessSession emits one `{type:'text'}` SessionEvent per
`item/agentMessage/delta` notification (~200 per reply). The mirror
synthesizer pushed each as a separate TextBlock, then
readHarnessHistory expanded every block into its own
SessionHistoryEntry. Users saw 200 single-word message bubbles
stacked in the transcript.
- synthesizeHarnessTurn now merges consecutive text/thinking events
into one block on the write side, and readHarnessHistory coalesces
adjacent same-type blocks on the read side as a safety net that
heals legacy messages.jsonl files already on disk. Tool boundaries
still split the run correctly; `!last.isThinking && !last.toolName`
gates the coalesce on read.
- Adds 3 mirror checks (streamed text/thinking deltas coalesce, tool-
boundary does not coalesce across) + 1 legacy-mirror read check.
Detached/attached disconnect mode (spec + config + server + UI)
- New `sessions.disconnectMode: 'attached' | 'detached'` config field
with `detachedTurnMaxMs` (default 10 min) budget. Default is
attached — current behavior, no surprise cost. Detached lets the
turn finish in the background when the tab closes; a wall-clock
timer on the server cancels runaway turns if the client never comes
back. Reconnect clears all pending detached budgets.
- `'sessions'` added to ConfigQueryMessage/ConfigUpdateMessage key
union so the desktop can query + toggle the mode via the existing
config protocol.
- server.ts ws.on('close') branches on mode: detached skips cancel,
leaves activeTurns populated, schedules a per-session timer via
scheduleDetachedTurnBudget. clearDetachedTurnBudget fires in the
processMessage finally block so natural turn completion cleans up.
Timer is .unref()'d so shutdown isn't blocked.
- Desktop Settings → Behavior → Autonomy gains "Keep running when I
close the tab" toggle. UIStore hydrates the value from server on
every auth_ok; setter echoes to server unless the update came from
server hydration (fromServer flag prevents ping-pong).
- Spec: specs/features/DETACHED_TURNS.md documents the mode contract,
safeguards still TODO (tool-call budget, destructive-tool ask_user
gate, per-session override), and the deferred structural event-type
split.
Cross-surface render invariant (check.ts)
- Desktop chatHandler.appendText, webhook agent-runner chunks.join,
and mirror readHarnessHistory each implement their own "event
stream → assistant text" logic. They agreed today only by
coincidence. The 200-bubble bug was a case where mirror diverged
from the other two; we fixed mirror but nothing stopped the next
adapter change from causing a fresh divergence.
- New test block simulates all three surfaces against shared fixtures
(single text, 200-delta stress, 1000-delta stress, unicode
surrogate-pair split across deltas, empty-deltas interleaved,
text-after-tool_call) and asserts: every surface's final assistant
text run matches `expectedFinalText`; for no-tool fixtures, all
three surfaces produce identical full bubble arrays. 23 checks.
- Pointer comment at each of the three surface sites so the next dev
touching them sees the invariant.
Also included on the branch
- set-session-title MCP tool + server wiring for harness titling
- codex-harness-session + harness-session hardening (pre-existing)
- tool-registry + prompt-layers + factories tweaks (pre-existing)
- deploy ansible + huddle cloud-init updates (pre-existing)
Tests: 77 green (7 fixture + 5 prompt-layer + 4 registry + 5 snapshot
+ 15 identity + 5 memory-guidelines + 9 mirror + 3 round-trip + 1
legacy-mirror + 1 replay-seed + 23 cross-surface). Typechecks clean
across protocol/agent-config/agent-core/agent-server/desktop/cli;
mobile has a pre-existing RoutineStatusEvent cast issue that
reproduces on branch HEAD, not caused by this diff.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every pending ask_user swapped the composer out for the AskUserInline card. That meant the user couldn't type anything else — the composer UI was just gone, which is disorienting when the questions are optional nudges rather than hard blockers. - ChatInput.tsx now only takes over the composer for specialized cards (routine_create / routine_delete), where the explicit Confirm/ Cancel buttons are the right UX. Generic ask_user leaves the composer visible. - handleSend routes the user's typed text as the answer to every pending question when a generic ask_user is outstanding, so the server-side handler still unblocks cleanly. Single free-form string answer per question — matches what the AskUserInline "Or write your own answer" path already does. - Added pendingAskUser + onAskUserSubmit to handleSend's useCallback deps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AskUserInline rendered every unanswered question at once — a 3- or 4-question ask_user occupied the whole viewport with options and free-text inputs stacked vertically. Now only the first unanswered question is visible; answering it advances to the next, and the progress dots in the header already showed the right state. - Replaced `questions.map(... if (isAnswered) return null)` with a single `findIndex`-based render. Same submit/autosubmit flow; no state changes needed. - Nothing else touched — the progress counter, done-state, and specialized routine cards above are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ConfirmDialog already uses ix--accent (the accent-card pattern from the shared .ix-* interaction shell). AskUserInline was on ix--bordered, which reads as a plain card rather than a prompt that needs attention. Matches the pattern the design system already establishes for agent interaction prompts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ports the SessionFilesBar component from the design handoff (anton-computer/project/session-files.jsx, topbar variant) into the real app. Gives the task view a compact "Files" pill in the topbar next to Usage / More options that opens a popover listing every artifact produced in the current conversation. - New SessionFilesBar.tsx under components/chat/. Subscribes to artifactStore, filters by the active conversation's sessionId so artifacts from other sessions don't leak in. Clicking a row calls setActiveArtifact + setArtifactPanelOpen — same path the existing ActionsGroup uses. - Component renders nothing when there are no artifacts, so the pill only shows up once there's something to list. - Thumbnails port the design's SessionFileThumb: SVG via innerHTML, HTML via sandboxed iframe scaled to 0.25, code via first-6-lines <pre>, docs via first heading + placeholder lines. - Extension labels map renderType + language to TS/TSX/MD/SVG/HTML etc. fmtAgoShort + artifactTitle helpers keep the meta row tidy. - CSS appended to index.css. Tokens used (--bg-elev-1, --border, --border-strong, --text, --text-2/3/4, --accent, --accent-dim) already exist in Anton's theme, so no new variables needed. - Injected into App.tsx's workspace-topbar__actions between the existing chat-only gate (activeView === 'chat' && hasMessages) and Usage/MoreHorizontal buttons. Placement matches the screenshot in the handoff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Anton's model can already call the `publish` tool to push content to a
public URL. Previously there was no user-in-the-loop — the model's call
published immediately, which is the wrong trust posture for a tool
that produces a public artifact URL. This wires the tool through a
new specialized PublishConfirmCard so the user always confirms and
picks the final slug before anything goes live.
Server / tool
- buildPublishTool(deps) now takes { domain, askUser } instead of a
bare domain string. When askUser is wired (all desktop contexts),
execute() first fires an ask_user prompt with metadata.type =
'publish_confirm' carrying { title, contentType, language,
suggestedSlug, domain }. The card's answer is the final slug
string; empty string means cancel.
- Description updated: model should call publish with a suggested
slug; do NOT pre-ask via ask_user — this tool has its own gate.
- Fallback path preserved for non-desktop callers (evals/runner.ts
has no human, no ask_user handler): publishes directly, same as
before. Keeps eval scripts working.
- Slug validation: VALID_SLUG.test on the user's answer, fall back
to the suggestedSlug if they type garbage. executePublish itself
also throws on bad slugs, so double-protected.
- factories.ts passes ctx.onAskUser through — same pattern as the
existing routine-factory approval gate.
Desktop
- AskUserInline.tsx: new PublishConfirmMeta + isPublishConfirm
detector + PublishConfirmCard. Reuses .routine-confirm styles so
no new CSS needed; shows Title / Type / Domain / editable Slug /
live public URL preview. Confirm button is disabled when the slug
isn't a valid [a-zA-Z0-9_-]+ string.
- Card submits the slug on confirm, empty string on cancel, matching
what the publish tool's execute() expects.
- ChatInput.tsx: publish_confirm added to the specialized-card
allowlist so the composer yields to it (same behavior as
routine_create / routine_delete). Generic ask_user still leaves
the composer visible.
Existing "Publish" button in the artifact panel is untouched — it
goes through `publish_artifact` on a different server channel and
was already user-initiated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SessionFilesBar: filter fallback `return true` leaked orphan artifacts
(anything with no conversationId) into every session's Files popover,
not just the active one. Tightened to strict match: artifacts with no
conversationId, or with a conversationId that doesn't match the
active session/conversation, are now excluded.
ChatInput auto-answer: when a multi-question generic ask_user was
pending, typing into the composer and sending would populate every
question's answer with the same string — the model would see
`{ "Q1": "foo", "Q2": "foo", "Q3": "foo" }`. Restricted the auto-
answer shortcut to single-question ask_user only. Multi-question
prompts fall through to the normal send path so the AskUserInline
card (rendered elsewhere in the transcript) can collect each
answer separately.
Both caught on review pass over PR #6; no runtime reports. Typecheck
+ harness tests still green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
My earlier "stop ask_user from hiding the composer" commit removed the composer-replacement for generic ask_user but never added an inline render anywhere else — so a generic ask_user would set pendingAskUser on the client, fire "Ask User" tool-label in the transcript, and then render absolutely nothing. User couldn't see or answer the question. RoutineChat now renders AskUserInline in its own .chat-shell__ask-user block between MessageList and the composer, gated on !isSpecializedCard (routine_create / routine_delete / publish_confirm still take over the composer via ChatInput — rendering there would double-show the card). The inline block reuses the transcript's 760px max-width so the card aligns with the message column. No changes to the specialized-card flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reverts the "keep composer visible while ask_user is pending" attempt from earlier in this PR. The new behavior users asked for is the original behavior: while an ask_user is pending, the card replaces the composer entirely so the user can't type until they answer. Matches Claude Code's interactive prompts. Undoes: - RoutineChat inline chat-shell__ask-user render + import - ChatInput's isSpecializedCard gate on the composer replacement — every ask_user now takes over the composer again - ChatInput.handleSend's auto-answer-on-send branch — unreachable now that composer is always replaced while pending - .chat-shell__ask-user CSS rule Kept from the earlier passes: - AskUserInline one-question-at-a-time rendering (still the right UX improvement) - ix--accent variant (style upgrade) - Specialized routine/publish detection inside AskUserInline (those still show their dedicated cards — no change) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OmGuptaIND
added a commit
that referenced
this pull request
Apr 22, 2026
### Other - fix(harness+desktop): coalesce streaming text deltas, add detached turns, cross-surface invariant (#6) - feat(settings): mark Claude CLI as coming soon, simplify provider form (#5) - fix(harness+server): MCP shim hardening + SessionRegistry (#4) - fix(caddy): preserve /health and /status paths upstream to sidecar (#3)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
textevents per reply; mirror was expanding each into its ownSessionHistoryEntry, so users saw one word per message bubble. Coalesces on write (synthesizeHarnessTurn) and on read (readHarnessHistory) to heal legacymessages.jsonlfiles already on disk.sessions.disconnectModeconfig (defaultattached) + Settings toggle lets a turn keep running in the background when you close the tab. Wall-clock timer (detachedTurnMaxMs, default 10m) prevents runaway turns; reconnect clears all pending timers. Spec:specs/features/DETACHED_TURNS.md.Test plan
pnpm --filter @anton/agent-core check:harness— all 77 checks green (9 mirror + 3 round-trip + 1 legacy-mirror + 23 cross-surface + pre-existing suites)RoutineStatusEventcast issue (reproduces on branch HEAD with my changes reverted; not caused by this diff)Notes
'sessions'added to existingConfigQueryMessage/ConfigUpdateMessagekey union.setDisconnectMode(mode, {fromServer:true})flag prevents echo loops.ask_usergate when detached, hard Stop button on reconnect, per-session mode override.🤖 Generated with Claude Code