fix(harness+desktop): coalesce streaming text deltas, add detached turns, cross-surface invariant by OmGuptaIND · Pull Request #6 · billionzeros/computer

OmGuptaIND · 2026-04-21T20:40:16Z

Summary

Fixes 200-bubble rendering bug — codex streams ~200 per-token text events per reply; mirror was expanding each into its own SessionHistoryEntry, so users saw one word per message bubble. Coalesces on write (synthesizeHarnessTurn) and on read (readHarnessHistory) to heal legacy messages.jsonl files already on disk.
Adds detached/attached disconnect mode — new sessions.disconnectMode config (default attached) + Settings toggle lets a turn keep running in the background when you close the tab. Wall-clock timer (detachedTurnMaxMs, default 10m) prevents runaway turns; reconnect clears all pending timers. Spec: specs/features/DETACHED_TURNS.md.
Cross-surface render invariant test — desktop live stream, webhook runner (Telegram/Slack), and mirror/history each have their own event-stream → text logic. They agreed only by coincidence. New simulator-based test pins that all three surfaces produce identical text for the same event sequence (200-delta stress, 1000-delta stress, unicode surrogate-split, empty-deltas, text-after-tool_call). 23 checks.
Pre-existing work on the branch (title MCP tool, harness/codex-harness-session hardening, prompt-layer tweaks, deploy configs) rides along.

Test plan

pnpm --filter @anton/agent-core check:harness — all 77 checks green (9 mirror + 3 round-trip + 1 legacy-mirror + 23 cross-surface + pre-existing suites)
Typechecks clean across protocol, agent-config, agent-core, agent-server, desktop, cli
Known: mobile has a pre-existing RoutineStatusEvent cast issue (reproduces on branch HEAD with my changes reverted; not caused by this diff)
Manual verification pending: open the existing 200-bubble session in desktop and confirm the history renders as one bubble (read-side coalesce heals the legacy mirror)
Manual verification pending: toggle Settings → Behavior → "Keep running when I close the tab" on, start a turn, close the tab, reopen before 10 min — turn should still be running. Leave closed past 10 min — turn should auto-cancel.
Manual verification pending: confirm Telegram/Slack replies still work (webhook runner untouched; covered by cross-surface test)

Notes

No new protocol types; 'sessions' added to existing ConfigQueryMessage / ConfigUpdateMessage key union.
Desktop is optimistic-local + server-authoritative for the mode — setDisconnectMode(mode, {fromServer:true}) flag prevents echo loops.
Safeguards deferred to follow-up PRs: tool-call budget, destructive-tool ask_user gate when detached, hard Stop button on reconnect, per-session mode override.

🤖 Generated with Claude Code

…rns, cross-surface render invariant Three related bodies of work — the streaming-deltas fix was the user- visible bug that triggered the rest, and the detached-turns feature made us audit the full disconnect flow. Streaming text delta coalesce (mirror.ts) - CodexHarnessSession emits one `{type:'text'}` SessionEvent per `item/agentMessage/delta` notification (~200 per reply). The mirror synthesizer pushed each as a separate TextBlock, then readHarnessHistory expanded every block into its own SessionHistoryEntry. Users saw 200 single-word message bubbles stacked in the transcript. - synthesizeHarnessTurn now merges consecutive text/thinking events into one block on the write side, and readHarnessHistory coalesces adjacent same-type blocks on the read side as a safety net that heals legacy messages.jsonl files already on disk. Tool boundaries still split the run correctly; `!last.isThinking && !last.toolName` gates the coalesce on read. - Adds 3 mirror checks (streamed text/thinking deltas coalesce, tool- boundary does not coalesce across) + 1 legacy-mirror read check. Detached/attached disconnect mode (spec + config + server + UI) - New `sessions.disconnectMode: 'attached' | 'detached'` config field with `detachedTurnMaxMs` (default 10 min) budget. Default is attached — current behavior, no surprise cost. Detached lets the turn finish in the background when the tab closes; a wall-clock timer on the server cancels runaway turns if the client never comes back. Reconnect clears all pending detached budgets. - `'sessions'` added to ConfigQueryMessage/ConfigUpdateMessage key union so the desktop can query + toggle the mode via the existing config protocol. - server.ts ws.on('close') branches on mode: detached skips cancel, leaves activeTurns populated, schedules a per-session timer via scheduleDetachedTurnBudget. clearDetachedTurnBudget fires in the processMessage finally block so natural turn completion cleans up. Timer is .unref()'d so shutdown isn't blocked. - Desktop Settings → Behavior → Autonomy gains "Keep running when I close the tab" toggle. UIStore hydrates the value from server on every auth_ok; setter echoes to server unless the update came from server hydration (fromServer flag prevents ping-pong). - Spec: specs/features/DETACHED_TURNS.md documents the mode contract, safeguards still TODO (tool-call budget, destructive-tool ask_user gate, per-session override), and the deferred structural event-type split. Cross-surface render invariant (check.ts) - Desktop chatHandler.appendText, webhook agent-runner chunks.join, and mirror readHarnessHistory each implement their own "event stream → assistant text" logic. They agreed today only by coincidence. The 200-bubble bug was a case where mirror diverged from the other two; we fixed mirror but nothing stopped the next adapter change from causing a fresh divergence. - New test block simulates all three surfaces against shared fixtures (single text, 200-delta stress, 1000-delta stress, unicode surrogate-pair split across deltas, empty-deltas interleaved, text-after-tool_call) and asserts: every surface's final assistant text run matches `expectedFinalText`; for no-tool fixtures, all three surfaces produce identical full bubble arrays. 23 checks. - Pointer comment at each of the three surface sites so the next dev touching them sees the invariant. Also included on the branch - set-session-title MCP tool + server wiring for harness titling - codex-harness-session + harness-session hardening (pre-existing) - tool-registry + prompt-layers + factories tweaks (pre-existing) - deploy ansible + huddle cloud-init updates (pre-existing) Tests: 77 green (7 fixture + 5 prompt-layer + 4 registry + 5 snapshot + 15 identity + 5 memory-guidelines + 9 mirror + 3 round-trip + 1 legacy-mirror + 1 replay-seed + 23 cross-surface). Typechecks clean across protocol/agent-config/agent-core/agent-server/desktop/cli; mobile has a pre-existing RoutineStatusEvent cast issue that reproduces on branch HEAD, not caused by this diff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Every pending ask_user swapped the composer out for the AskUserInline card. That meant the user couldn't type anything else — the composer UI was just gone, which is disorienting when the questions are optional nudges rather than hard blockers. - ChatInput.tsx now only takes over the composer for specialized cards (routine_create / routine_delete), where the explicit Confirm/ Cancel buttons are the right UX. Generic ask_user leaves the composer visible. - handleSend routes the user's typed text as the answer to every pending question when a generic ask_user is outstanding, so the server-side handler still unblocks cleanly. Single free-form string answer per question — matches what the AskUserInline "Or write your own answer" path already does. - Added pendingAskUser + onAskUserSubmit to handleSend's useCallback deps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

AskUserInline rendered every unanswered question at once — a 3- or 4-question ask_user occupied the whole viewport with options and free-text inputs stacked vertically. Now only the first unanswered question is visible; answering it advances to the next, and the progress dots in the header already showed the right state. - Replaced `questions.map(... if (isAnswered) return null)` with a single `findIndex`-based render. Same submit/autosubmit flow; no state changes needed. - Nothing else touched — the progress counter, done-state, and specialized routine cards above are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ConfirmDialog already uses ix--accent (the accent-card pattern from the shared .ix-* interaction shell). AskUserInline was on ix--bordered, which reads as a plain card rather than a prompt that needs attention. Matches the pattern the design system already establishes for agent interaction prompts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Ports the SessionFilesBar component from the design handoff (anton-computer/project/session-files.jsx, topbar variant) into the real app. Gives the task view a compact "Files" pill in the topbar next to Usage / More options that opens a popover listing every artifact produced in the current conversation. - New SessionFilesBar.tsx under components/chat/. Subscribes to artifactStore, filters by the active conversation's sessionId so artifacts from other sessions don't leak in. Clicking a row calls setActiveArtifact + setArtifactPanelOpen — same path the existing ActionsGroup uses. - Component renders nothing when there are no artifacts, so the pill only shows up once there's something to list. - Thumbnails port the design's SessionFileThumb: SVG via innerHTML, HTML via sandboxed iframe scaled to 0.25, code via first-6-lines <pre>, docs via first heading + placeholder lines. - Extension labels map renderType + language to TS/TSX/MD/SVG/HTML etc. fmtAgoShort + artifactTitle helpers keep the meta row tidy. - CSS appended to index.css. Tokens used (--bg-elev-1, --border, --border-strong, --text, --text-2/3/4, --accent, --accent-dim) already exist in Anton's theme, so no new variables needed. - Injected into App.tsx's workspace-topbar__actions between the existing chat-only gate (activeView === 'chat' && hasMessages) and Usage/MoreHorizontal buttons. Placement matches the screenshot in the handoff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Anton's model can already call the `publish` tool to push content to a public URL. Previously there was no user-in-the-loop — the model's call published immediately, which is the wrong trust posture for a tool that produces a public artifact URL. This wires the tool through a new specialized PublishConfirmCard so the user always confirms and picks the final slug before anything goes live. Server / tool - buildPublishTool(deps) now takes { domain, askUser } instead of a bare domain string. When askUser is wired (all desktop contexts), execute() first fires an ask_user prompt with metadata.type = 'publish_confirm' carrying { title, contentType, language, suggestedSlug, domain }. The card's answer is the final slug string; empty string means cancel. - Description updated: model should call publish with a suggested slug; do NOT pre-ask via ask_user — this tool has its own gate. - Fallback path preserved for non-desktop callers (evals/runner.ts has no human, no ask_user handler): publishes directly, same as before. Keeps eval scripts working. - Slug validation: VALID_SLUG.test on the user's answer, fall back to the suggestedSlug if they type garbage. executePublish itself also throws on bad slugs, so double-protected. - factories.ts passes ctx.onAskUser through — same pattern as the existing routine-factory approval gate. Desktop - AskUserInline.tsx: new PublishConfirmMeta + isPublishConfirm detector + PublishConfirmCard. Reuses .routine-confirm styles so no new CSS needed; shows Title / Type / Domain / editable Slug / live public URL preview. Confirm button is disabled when the slug isn't a valid [a-zA-Z0-9_-]+ string. - Card submits the slug on confirm, empty string on cancel, matching what the publish tool's execute() expects. - ChatInput.tsx: publish_confirm added to the specialized-card allowlist so the composer yields to it (same behavior as routine_create / routine_delete). Generic ask_user still leaves the composer visible. Existing "Publish" button in the artifact panel is untouched — it goes through `publish_artifact` on a different server channel and was already user-initiated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

SessionFilesBar: filter fallback `return true` leaked orphan artifacts (anything with no conversationId) into every session's Files popover, not just the active one. Tightened to strict match: artifacts with no conversationId, or with a conversationId that doesn't match the active session/conversation, are now excluded. ChatInput auto-answer: when a multi-question generic ask_user was pending, typing into the composer and sending would populate every question's answer with the same string — the model would see `{ "Q1": "foo", "Q2": "foo", "Q3": "foo" }`. Restricted the auto- answer shortcut to single-question ask_user only. Multi-question prompts fall through to the normal send path so the AskUserInline card (rendered elsewhere in the transcript) can collect each answer separately. Both caught on review pass over PR #6; no runtime reports. Typecheck + harness tests still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

My earlier "stop ask_user from hiding the composer" commit removed the composer-replacement for generic ask_user but never added an inline render anywhere else — so a generic ask_user would set pendingAskUser on the client, fire "Ask User" tool-label in the transcript, and then render absolutely nothing. User couldn't see or answer the question. RoutineChat now renders AskUserInline in its own .chat-shell__ask-user block between MessageList and the composer, gated on !isSpecializedCard (routine_create / routine_delete / publish_confirm still take over the composer via ChatInput — rendering there would double-show the card). The inline block reuses the transcript's 760px max-width so the card aligns with the message column. No changes to the specialized-card flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reverts the "keep composer visible while ask_user is pending" attempt from earlier in this PR. The new behavior users asked for is the original behavior: while an ask_user is pending, the card replaces the composer entirely so the user can't type until they answer. Matches Claude Code's interactive prompts. Undoes: - RoutineChat inline chat-shell__ask-user render + import - ChatInput's isSpecializedCard gate on the composer replacement — every ask_user now takes over the composer again - ChatInput.handleSend's auto-answer-on-send branch — unreachable now that composer is always replaced while pending - .chat-shell__ask-user CSS rule Kept from the earlier passes: - AskUserInline one-question-at-a-time rendering (still the right UX improvement) - ix--accent variant (style upgrade) - Specialized routine/publish detection inside AskUserInline (those still show their dedicated cards — no change) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

### Other - fix(harness+desktop): coalesce streaming text deltas, add detached turns, cross-surface invariant (#6) - feat(settings): mark Claude CLI as coming soon, simplify provider form (#5) - fix(harness+server): MCP shim hardening + SessionRegistry (#4) - fix(caddy): preserve /health and /status paths upstream to sidecar (#3)

OmGuptaIND and others added 9 commits April 22, 2026 02:09

OmGuptaIND merged commit 33634c0 into main Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(harness+desktop): coalesce streaming text deltas, add detached turns, cross-surface invariant#6

fix(harness+desktop): coalesce streaming text deltas, add detached turns, cross-surface invariant#6
OmGuptaIND merged 9 commits intomainfrom
OmGuptaIND/harness-tool-debug

OmGuptaIND commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

OmGuptaIND commented Apr 21, 2026

Summary

Test plan

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant