fix(diagnose): realpath bin path + filter Next.js SSE noise by chorus-codes · Pull Request #4 · chorus-codes/chorus

chorus-codes · 2026-05-08T06:21:43Z

Summary

Two cosmetic improvements to chorus diagnose output, surfaced from running it against a real v0.8.27 install:

install mode: unknown for global npm installs — process.argv[1] returns the /usr/bin/chorus symlink, not the resolved node_modules/.../chorus.mjs. Wrap with realpathSync before classifying.
web.log was leaking Next.js 16 SSE pipe-close traces — looked scary in bug reports but is benign client-disconnect behaviour. filterBenignNoise drops the trace block (full and orphan-tail variants); footer counts what was filtered so nothing is silently dropped.

Test plan

pnpm typecheck clean
pnpm test 654/654 (5 new in tests/diagnose.test.ts)
Live chorus diagnose smoke — web.log now shows real Next.js startup messages without the trace blocks; "(N benign SSE-disconnect traces filtered)" footer added

Holding for the next batched npm publish.

Two cosmetic improvements to chorus diagnose output that emerged from running it against a real install (v0.8.27 on /usr/bin/chorus). (1) install mode showed "unknown" for global npm installs because process.argv[1] returns the symlink path (/usr/bin/chorus), not the resolved target inside node_modules. Wrap with realpathSync before classification — falls back to raw path on broken symlink or "(unknown)" sentinel. (2) web.log tail was leaking Next.js 16's SSE pipe-close trace into the bug report — `Error: failed to pipe response` followed by a UND_ERR_SOCKET cause block. Looks like a real error but it's expected behaviour when a browser tab closes mid-stream. filterBenignNoise() drops the multi-line trace block (matches full traces and orphan tails when the window starts mid-trace). Footer reports the filter count so nothing is silently dropped. Raw read window bumped 60 → 300 lines so most traces are fully captured before trimming back to 20 surfaced lines. Both helpers are pure and exported via _testing for direct assertions. 654/654 tests green; pnpm typecheck clean. Holding for publish — accumulating with other fixes for the next chorus-codes npm release.

* feat(cli): add diagnose command + crash-hook Bundles two upstream changes that ship a self-service triage path for chorus users hitting opaque failures: - `chorus diagnose` walks the install, daemon, recent failed chats, voice health, and produces a sharable bug report. - Crash hook captures uncaught exceptions in the CLI and writes them to a crash log alongside instructions to attach during a bug report. Folded back from upstream chorus-codes/chorus: 7ea712b feat: chorus diagnose command + crash hook for bug reports (#1) 4a5ea20 fix(diagnose): realpath bin path + filter Next.js SSE noise (#4) Co-Authored-By: chorus-codes <info@chorus.codes> * feat(cli): add quickstart self-test command `chorus quickstart` runs a 30-second activation flow that verifies the daemon comes up, the SQLite DB initializes, and a minimal chat round-trips end-to-end. Aimed at first-run users who want to know "is this thing actually working" before authoring a template. Folded back from upstream chorus-codes/chorus: 56610cf feat(cli): chorus quickstart — 30-second activation self-test (chorus-codes#30) Co-Authored-By: chorus-codes <info@chorus.codes> * fix(cli): use dynamic import for open package (Node 22 ERR_REQUIRE_ESM) The `open` package and `chokidar` are both ESM-only as of recent versions. On Node 22 (the daily-driver target) static `require()` calls into them throw ERR_REQUIRE_ESM and crash the CLI at boot. Switch to dynamic import in: - src/cli/commands/start.ts (open browser after boot) - src/cli/open-browser.ts (new helper) - src/cli/index.ts (route open import) - src/daemon/output-watcher.ts (chokidar file watch) Includes upstream's post-merge hardening: the setTimeout that triggers the browser-open no longer wraps an async callback bare, so a missing default browser doesn't surface as an unhandled rejection. Folded back from upstream chorus-codes/chorus: e8ca2ee fix(cli): dynamic import for open package (chorus-codes#14) dcd1837 fix: post-merge hardening for chorus-codes#14 (start.ts portion only; cli-precheck.test.ts portion ships with the Keychain fix) Co-Authored-By: Julien Deudon <deudon.j@gmail.com> Co-Authored-By: chorus-codes <info@chorus.codes> * feat(cockpit): seed empty round-1 so QUEUED renders from t=0 Before: when a chat starts but no reviewer has produced an event yet, enrichRounds returned an empty rounds array and the live-run page showed nothing for several seconds — the user couldn't tell whether their chat had launched. After: seed a synthetic round-1 with QUEUED placeholders for every expected participant so the page renders the per-reviewer cards immediately. Real events overwrite placeholders as they arrive. Folded back from upstream chorus-codes/chorus: 53e8fb6 feat(cockpit): seed empty round-1 so QUEUED placeholders render from t=0 (#2) Co-Authored-By: chorus-codes <info@chorus.codes> * feat(daemon): runtime fallback-collision dedup across reviewer slots When two reviewer slots both fall through their per-slot chains to the same template-level fallback target (common case: every slot ends in anthropic/claude-sonnet-4-6), both used to dispatch the same (lineage, model) in parallel — wasted cost and the lineage diversity that's the point of multi-LLM peer review collapsed. Build-time dedup (template-fallback.ts) couldn't catch it because each slot only knows about other slots' PRIMARIES, not their fallback chains. Fix: new per-chat/per-round (lineage, model) registry. reviewer-driver tryClaim's before each chain attempt and releases in a finally. On collision, return null + emit cli_warning(reason='fallback_collision') so runWithChainFallback advances to the next entry and the cockpit can show why the slot skipped. Ported into fork's reviewer-driver.ts surgically so the verdict-isolation refactor (2a2cde2) and per-slot repoPath threading stay intact. Folded back from upstream chorus-codes/chorus: c4751fe feat(daemon): runtime fallback-collision dedup (#3) Co-Authored-By: chorus-codes <info@chorus.codes> * fix(daemon): write REVIEWER FAILED summary on pre-spawn failure Before: when a reviewer's precheck fails (e.g. underlying CLI not installed) or the chat is cancelled while the slot is queued for a CLI semaphore slot, runReviewer used to return null silently — leaving NO on-disk participant directory. The cockpit's enrich-rounds loop then couldn't reconcile the synthesised template slot against any real participant, so the card sat at "Queued — waiting for an open slot." forever and the actual error was invisible. Reproduction: install chorus on a host with only one CLI on PATH (e.g. just claude-code), open a template that includes lineages requiring codex/gemini/kimi, fire it. Every reviewer card stayed "Queued" — chat never visibly progressed even though it was already done failing. Fix: - Create the reviewer dir BEFORE the precheck runs. - Add a writePreSpawnFailure helper that writes a `## REVIEWER FAILED` summary in the canonical format (Kind / Lineage / Model / message) that the cockpit's `parseFailureSummary` already understands. - Wire it into the precheck-failed and cancelled-while-queued paths. Card now transitions out of pending and shows the actual error (cli_missing, cancelled, ...). Folded back from upstream chorus-codes/chorus: afc59cc fix(daemon): REVIEWER FAILED summary on pre-spawn failure (chorus-codes#26) Co-Authored-By: chorus-codes <info@chorus.codes> * feat(voices): auto-disable on persistent quota_exhausted + lsof timeout Real pain (upstream chorus-codes#11): a Pro Gemini model on a Flash-only account fails every chorus run with "exhausted your capacity on this model" — but Gemini doesn't return a resetAt because the model isn't going to become available for that account. Without auto-disable, the runner keeps picking the dead voice on every chat and the user keeps seeing the same opaque error. Voice auto-disable: - New src/lib/voice-failure-tracker.ts records per-voice consecutive quota_exhausted strikes in a settings counter. - Trigger: 2 consecutive strikes WITH no resetAt → set voices.enabled=false + disabled_reason='auto_quota'. - Counter resets on participant_done success; rate-limit strikes (hasResetAt=true) bypass the counter entirely so a transient 429 + a later permanent failure can't trip the threshold on the first permanent strike. - Wired into reviewer-driver alongside recordHealth; emits a cli_warning(reason='voice_auto_disabled') so the cockpit can show a one-line explanation. - VoiceDisabledReason union gains 'auto_quota' (schema column was already TEXT — no migration). Lsof timeout (upstream chorus-codes#12): - findPidsOnPort and findPidsOnPortWithSudo now bound execSync / execFileSync to 3s, so a slow-but-functional lsof on a loaded macOS box doesn't hang chorus boot. 3s leaves headroom while still bounding the hang case. Ported into fork's reviewer-driver.ts tmux pollHandle + success path. voices.ts disabled_reason union extended alongside fork's voice-tier column. Folded back from upstream chorus-codes/chorus: 4f6becc v0.8.30 — voice auto-disable (chorus-codes#11) + lsof timeout (chorus-codes#12) (chorus-codes#17) Co-Authored-By: chorus-codes <info@chorus.codes> Co-Authored-By: Lumina Mao <luminamao@mac.lan> * fix(daemon, schema): codex isolation + template-schema validation Two issues caused chats to fail opaquely at run-start: CODEX ISOLATION (chorus-codes#10, chorus-codes#16) The user's ~/.codex/config.toml may declare MCP servers, plugins, or notification hooks. In headless `codex exec` those integrations have caused codex to hang or cancel mid-call — two independent reproductions: codex as our reviewer (chorus-codes#10) and codex as MCP client of chorus (chorus-codes#16). Add --ignore-user-config to every headless codex argv. Extracted to a pure `buildHeadlessArgs(opts)` so the argv shape is unit-testable. TEMPLATE VALIDATION (chorus-codes#15) `reviewer.require > candidates.length` used to surface as "Job moves immediately to failure upon Start press" — the runner queued, failed to grant enough slots, and emitted an opaque chat-failure. Same for `require > distinct lineages` when crossLineage:true. Both now caught at TemplateSchema.parse() time with a clear error message the user can fix before the run starts. ReviewerSchema.superRefine() additions slot in cleanly alongside the fork's audit/orchestrate phase schema work — both are additive constraints on the same ReviewerSchema object. Folded back from upstream chorus-codes/chorus: 8ed970b fix(daemon, schema): codex isolation + template validation Co-Authored-By: chorus-codes <info@chorus.codes> * fix(runner): honour iterate.onDisagreement accept-doer/escalate The template schema, cockpit dialog, and SPEC-D-templates have always exposed three values for iterate.onDisagreement — 'continue', 'escalate', 'accept-doer' — but the runner only honoured 'continue'. Picking the other two from the cockpit form was a silent no-op: chats fell through to phase_failed with 'doer_failed_all_rounds' regardless. This wires both new branches into the round loop and the terminal chat_done emission: - 'accept-doer': after maxRounds without consensus, mark doerSucceeded and continue. The chat carries on (subsequent phases, ship, approval) as if reviewers had agreed on the doer's last answer. - 'escalate': halt with status='failed' but verdict='request_changes' and error='escalated_on_disagreement', so cockpits can render "reviewers disagreed, needs human" distinctly from "doer broke." Policy table extracted into a pure decidePhaseOutcome() helper so the 3 × 2 input matrix (policy × disagreement-in-last-round) is unit-tested without standing up the full runChat scaffold. Gated on disagreementInLastRound (reset at top of every round + on doer-crash path) so a partial / empty doer answer can never be silently "accept-doer"'d as final. Preserves the fork's existing standardPhaseRoundsExhausted #7 surfacing for the 'continue' path; the 'escalate' path takes precedence with its own distinct chat_done. Upstream PRs chorus-codes#49, chorus-codes#50 (commit 67572e9). Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(cli-precheck): cover macOS Keychain fallback for Claude Code v2 The fork already implements the Keychain fallback in cli-precheck (hasDarwinKeychainEntry). This adds the missing test coverage: - passes when no cred file but keychain entry exists - blocks when no cred file and no keychain entry - skips keychain check when cred file exists (fast-path preserved) - does not consult keychain for non-anthropic lineages vi.mock('node:child_process') uses the importOriginal spread pattern so spawn / exec / etc. keep their real implementations — a bare module replacement would silently break any sibling test that imports from child_process. Upstream PRs #7, #8, plus the dcd1837 test-mock hardening. Co-Authored-By: Yura <yurahalych@gmail.com> Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cockpit): derive candidatesWithModels from snapshot's candidates field Daemon-side TemplateSchema only carries `candidates` on each ReviewerRule. The cockpit Template type expects `candidatesWithModels` populated — enrich-rounds iterates that field to build slot→model mappings for run-page cards. When fromRow parsed template_snapshot and cast it to Template, the cast was a TypeScript lie: at runtime the parsed object lacked candidatesWithModels, enrichRounds iterated zero reviewer slots, and no model name reached the cards (badge appeared empty). Derive candidatesWithModels at the parse seam (chats.fromRow) so the cockpit's Template contract is honoured regardless of which path produced the data. Idempotent — if a future daemon ever serialises the field directly, that wins. Persona forwarded if present. Audit- phase single-voice reviewers (no candidates array) are skipped via a runtime narrow. Upstream PR #6 (chorus-codes/chorus@ac0c7fd). Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(diagnose): capture failure context — CLI smoke, voice health, recent failed chats Extends `chorus diagnose` with three signals that triage the most common breakage modes: - **CLI smoke**: spawn `<bin> --version` per detected CLI with a hard 2s SIGKILL timeout (wrapper scripts may trap SIGTERM). Distinguishes `timedOut` from non-zero exit so the report can tell hangs apart from crashes. - **Voice health**: counts `enabled=0` voices grouped by `disabled_reason` ('user' vs 'auto_missing' vs 'quota_exhausted'). Added `idx_voices_enabled` so the `WHERE enabled = 0` scan stays cheap as the table grows. - **Recent failed chats**: last 5 chats with `status='blocked'` plus the errored participants pulled from `~/.chorus/chats/<id>/round-*/<part>/_attempts.jsonl`. Only `errorMessageBytes` is exposed — raw error text never leaves the user's machine. `$HOME` is redacted from any embedded path strings via `redactHomePaths`. Adapted from upstream chorus-codes#19 (0666dca). Preserves the fork's existing diagnose shape and adds tests for smokeOneCli / readLatestAttempt / formatReport rendering of the three new sections. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(diagnose): include no_review in recent failed chats query The recent-failed-chats section was meant to surface per-participant failure context from `_attempts.jsonl`, but the WHERE clause only covered 'failed', 'blocked', 'cancelled'. The most common failure shape — every reviewer down for missing CLI / auth / quota — ends the chat in 'no_review', which was being silently filtered out. So the exact case the section exists to diagnose returned an empty list, forcing users back into manual log collection. Adds 'no_review' to the IN-list and a regression test that asserts both the status and a quota_exhausted errorKind render in the report. Addresses chatgpt-codex review P2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: chorus-codes <info@chorus.codes> Co-authored-by: Julien Deudon <deudon.j@gmail.com> Co-authored-by: Lumina Mao <luminamao@mac.lan> Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Yura <yurahalych@gmail.com>

All six reviewers requested changes. The two HIGH findings were real correctness bugs that would have allowed the exact failure mode the gate was built to prevent. Addressed: HIGH (4-6/6 convergent): - swap=0 sentinel ambiguity. Prior revision used 0 as "platform reports no swap data". On Linux with SwapFree genuinely 0 kB (incident-2026-05-20 case!) the gate would admit chats at the moment of maximum vulnerability. Fixed: readSwapFreeMb returns -1 for "no data"; evaluateAdmission treats >=0 as a real number, including 0 as "swap exhausted, block". - tryGrantHead error path. If getChatConcurrency / readResourceStats throws, prior catch just logged. Queue would hang until an external event (release / abort / new admit) re-poked the gate. Fixed: schedule a recheck timer from the catch so the gate retries in 30s. - pokeGate export + PUT-route hook. Settings change (e.g. raise maxConcurrentChats 3 → 5) had no path to re-admit queued waiters until an active chat ended. Fixed: PUT /settings/chat-concurrency calls pokeGate() after persisting, immediately admitting queued chats under the new (loosened) cap. MEDIUM: - Double-settle protection on Waiter. settled flag prevents abort signal firing AFTER tryGrantHead resolved from rejecting an already- settled promise. Belt-and-suspenders against future Promise impl changes. - AdmitDenyReason 'unknown' fallback eliminated. decision.reason is always populated when admit=false; the prior `?? 'unknown'` would emit a string not in the type union. Now falls back to 'chats_at_cap' (the only invariant-protected default). - Lucide Minus/Plus icons replace Unicode -/+ per project rule #25. - PUT response merge preserves defaults block — UI hint lines no longer vanish after first save (until next 5s poll re-loads). New tests (+3): - evaluateAdmission with cpuCount=0 — divide-by-zero guard works - evaluateAdmission with swap=-1 (sentinel) → admit - evaluateAdmission with swap=0 (exhausted) → BLOCK (the incident case) - pokeGate-after-setChatConcurrency admits queued chats immediately Reviewer attribution (6 reviewers, all request_changes): - codex-cli-0: tryGrantHead error path, UI defaults overwrite, test gaps - gemini-cli-1: swap=0 sentinel (high), getChatConcurrency throw stall (high), settings-change doesn't wake queue (medium) - opencode-cli-2: TOCTOU race in admitChat signal-aborted check (analyzed — actually safe because addEventListener runs same tick), swap=0 sentinel, missing resource-block integration tests - opencode-cli-4: cpuCount=0 untested, recheck timer untested, double-reject risk (settled flag), Unicode icons vs rule #25 - opencode-cli-5: orphaned waiter on tryGrantHead exception, reason 'unknown' not in union, missing tests for reentrant scenarios - opencode-cli-6: tryGrantHead silent error strand (HIGH — same as codex/gemini), settings change doesn't poke queue (medium) Not changed (deliberate): - Snapshot stats once per loop iteration (cli-6 #4): sub-ms cost, current loop invariant (no awaits between reads/decisions) keeps the values consistent. Worth noting but not a real bug. - Circuit breaker for consecutive tryGrantHead failures (cli-2 #5): acceptable retry loop for now; will add if real-world disk blips cause notable log spam. Tests: 910/911 green (was 907; +3 new). Typecheck + build clean.

… aware (#64) * feat(daemon): chat admission gate — caps concurrent chats + swap/load aware Caused by the 2026-05-20 incident: three concurrent chats × 8 reviewers each (24 LLM subprocesses) overwhelmed the 31 GB dev box. Load avg hit 320 (15-min), swap exhausted (24 MB free of 8.5 GB), system became unresponsive. Per-CLI semaphore (cli-semaphore.ts) caps subprocesses within each binary family but doesn't prevent N chats × M reviewers from oversubscribing the host. New higher-level gate at runner-entry — every `runChat` call awaits `admitChat(...)` before fanning out. Three constraints, all daemon- wide, all configurable in /settings: - maxConcurrentChats (1..20, default 3): hard cap on chats actively fanning out reviewers. Whichever chat hits the gate fourth queues. - swapMinFreeMb (0..16384, default 1024): refuse new admissions when free swap drops below threshold. 0 disables (CI / macOS without /proc/meminfo / containers). - loadAvgMaxPerCore (0..10, default 4.0): refuse when 1-min load avg / CPU count exceeds. Per-core multiplier so a 4-core at 16 looks identical to a 16-core at 64. 0 disables. Gated chats stay in the DB as status='drafting'. The runner emits a new SSE event `chat_queued` with the binding reason + 1-indexed queue position so the cockpit can render "Waiting for slot — N chats ahead" without polling. Architecture mirrors cli-semaphore.ts: FIFO queue, mutex+dirty pattern to prevent reentrant double-grant under release races, settings re- read per attempt so changes apply on the next admission (no daemon restart). Resource recheck timer (30s) fires while the head is blocked on swap/load — chat_end events drain chats_at_cap; periodic recheck drains the resource-pressure cases. Files: src/daemon/chat-gate.ts — gate + evaluateAdmission (pure) + admitChat (async, FIFO, signal) + snapshot for diagnostics src/daemon/resource-stats.ts — readResourceStats: swap from /proc/meminfo, loadavg from os src/lib/settings/chat-concurrency.ts — Zod schema + get/setChatConcurrency src/daemon/runner.ts — await admitChat at top, release in finally; cancellation handled src/daemon/runner/types.ts — new 'chat_queued' RunnerEvent type src/daemon/routes/settings.ts — GET/PUT /settings/chat-concurrency response includes live snapshot (active/queued/swap/load) so the UI can show "currently 2/3 chats…" src/lib/api/settings.ts — ChatConcurrencySettings + client src/app/settings/chat-concurrency-section.tsx — UI section with three stepper inputs + live snapshot block (5s auto-refresh) src/app/settings/form-view.tsx — wire in below the existing ConcurrencySection tests/chat-gate.test.ts — 15 cases: pure evaluateAdmission matrix (constraint priority, per-core scaling, disable knobs) + FIFO ordering + cancellation + release idempotence Tests: 907/908 green (was 892 baseline; +15 new). Typecheck + build clean. The 2026-05-20 incident pattern (3 chats × 8 reviewers) will now queue the third chat at the chats_at_cap gate (default 3). Users on tight boxes can tighten to 1-2; users on big boxes can loosen to 10+ and rely on the swap+load guardrails instead. * fix(chat-gate): close 6/6 convergent self-review findings on PR #64 All six reviewers requested changes. The two HIGH findings were real correctness bugs that would have allowed the exact failure mode the gate was built to prevent. Addressed: HIGH (4-6/6 convergent): - swap=0 sentinel ambiguity. Prior revision used 0 as "platform reports no swap data". On Linux with SwapFree genuinely 0 kB (incident-2026-05-20 case!) the gate would admit chats at the moment of maximum vulnerability. Fixed: readSwapFreeMb returns -1 for "no data"; evaluateAdmission treats >=0 as a real number, including 0 as "swap exhausted, block". - tryGrantHead error path. If getChatConcurrency / readResourceStats throws, prior catch just logged. Queue would hang until an external event (release / abort / new admit) re-poked the gate. Fixed: schedule a recheck timer from the catch so the gate retries in 30s. - pokeGate export + PUT-route hook. Settings change (e.g. raise maxConcurrentChats 3 → 5) had no path to re-admit queued waiters until an active chat ended. Fixed: PUT /settings/chat-concurrency calls pokeGate() after persisting, immediately admitting queued chats under the new (loosened) cap. MEDIUM: - Double-settle protection on Waiter. settled flag prevents abort signal firing AFTER tryGrantHead resolved from rejecting an already- settled promise. Belt-and-suspenders against future Promise impl changes. - AdmitDenyReason 'unknown' fallback eliminated. decision.reason is always populated when admit=false; the prior `?? 'unknown'` would emit a string not in the type union. Now falls back to 'chats_at_cap' (the only invariant-protected default). - Lucide Minus/Plus icons replace Unicode -/+ per project rule #25. - PUT response merge preserves defaults block — UI hint lines no longer vanish after first save (until next 5s poll re-loads). New tests (+3): - evaluateAdmission with cpuCount=0 — divide-by-zero guard works - evaluateAdmission with swap=-1 (sentinel) → admit - evaluateAdmission with swap=0 (exhausted) → BLOCK (the incident case) - pokeGate-after-setChatConcurrency admits queued chats immediately Reviewer attribution (6 reviewers, all request_changes): - codex-cli-0: tryGrantHead error path, UI defaults overwrite, test gaps - gemini-cli-1: swap=0 sentinel (high), getChatConcurrency throw stall (high), settings-change doesn't wake queue (medium) - opencode-cli-2: TOCTOU race in admitChat signal-aborted check (analyzed — actually safe because addEventListener runs same tick), swap=0 sentinel, missing resource-block integration tests - opencode-cli-4: cpuCount=0 untested, recheck timer untested, double-reject risk (settled flag), Unicode icons vs rule #25 - opencode-cli-5: orphaned waiter on tryGrantHead exception, reason 'unknown' not in union, missing tests for reentrant scenarios - opencode-cli-6: tryGrantHead silent error strand (HIGH — same as codex/gemini), settings change doesn't poke queue (medium) Not changed (deliberate): - Snapshot stats once per loop iteration (cli-6 #4): sub-ms cost, current loop invariant (no awaits between reads/decisions) keeps the values consistent. Worth noting but not a real bug. - Circuit breaker for consecutive tryGrantHead failures (cli-2 #5): acceptable retry loop for now; will add if real-world disk blips cause notable log spam. Tests: 910/911 green (was 907; +3 new). Typecheck + build clean. --------- Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com>

PR #75 audit findings: - HIGH (opencode-cli-2 #1, introduced by this PR): `usePathname()` can return `null` during the brief client-render window before the App Router has hydrated. The `pathname.startsWith(...)` in AppShellRouter would then crash the entire shell with a TypeError on first paint. Coalesce to "" — falls through to the wrapped (sidebar-present) branch by default. - MEDIUM drive-by (opencode-cli-2 #4): the `err instanceof DaemonError ? "error" : "error"` ternary in app-sidebar.tsx had both branches equal — dead code that looked unfinished. Collapse to a single set, drop the now-unused DaemonError import. Other PR #75 audit findings deferred to followup: - Slot matching defaults non-conforming names to index 0 (pre-existing) - Concurrent fetchChats race (pre-existing) - Redundant initial fetch / poll race (pre-existing perf) Tests: 942 pass.

…75) * fix(ux): persist sidebar across navigation + suppress doer-artifact phantom card Two Victor reports from the v0.8.50 cockpit: 1. Sidebar shows "Loading…" for several seconds after clicking any past run. Root cause: every page wraps itself in `<AppShell>` — so each route navigation unmounts the entire shell (sidebar included) and mounts a fresh one. `useState("loading")` re-initialises, the SSE subscription tears down, and the chat list refetches from scratch. Fix: move AppShell into the root layout via a tiny client-side router (`AppShellRouter`) that opts out for /onboarding (the only standalone flow). Strip the per-page `<AppShell>` wrappers. Sidebar is now a single long-lived instance — `chatsState` survives, chats stay visible, SSE stays connected across nav. Also dropped the unnecessary `[pathname]` dep on the sidebar's chats-fetch effect. Active-route highlighting reads `pathname` during render, not from the effect. The dep was forcing a full SSE teardown + rebuild on every nav even after the AppShell fix above kept the component instance alive. 2. Transient "extra card" — a DONE card showing the audit prompt body appears alongside the reviewer grid, then disappears a few seconds later. Root cause: review-only chats create a synthetic `doer-artifact` dir whose answer.md holds the input artifact (the audit prompt) with a `## DONE` sentinel. The artifacts route surfaces it as a participant; enrich-rounds' leftover loop appends it as a card because no doer slot exists in review-only templates. Fix: in review-only chats, suppress role='doer' participants from the cards grid. The artifact IS the prompt being reviewed — not a participant. Reviewer leftovers (a dir whose lineage isn't in the template's candidate list) still get a defensive card so users see "something ran here". Tests: +2 enrich-rounds cases (filter doer-artifact, preserve reviewer leftover). 938 pass. * fix(cockpit): guard against usePathname() null + drop dead error ternary PR #75 audit findings: - HIGH (opencode-cli-2 #1, introduced by this PR): `usePathname()` can return `null` during the brief client-render window before the App Router has hydrated. The `pathname.startsWith(...)` in AppShellRouter would then crash the entire shell with a TypeError on first paint. Coalesce to "" — falls through to the wrapped (sidebar-present) branch by default. - MEDIUM drive-by (opencode-cli-2 #4): the `err instanceof DaemonError ? "error" : "error"` ternary in app-sidebar.tsx had both branches equal — dead code that looked unfinished. Collapse to a single set, drop the now-unused DaemonError import. Other PR #75 audit findings deferred to followup: - Slot matching defaults non-conforming names to index 0 (pre-existing) - Concurrent fetchChats race (pre-existing) - Redundant initial fetch / poll race (pre-existing perf) Tests: 942 pass. --------- Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com>

Pickup from chorus audit of PR #77 (antigravity-cli-8 finding #4): when pickShimForVoice for a cross-lineage fallback target throws AFTER tryClaimFallbackTarget succeeds but BEFORE the try block, `threw` never flips and releaseFallbackClaim never fires. The target stays claimed for the rest of the round, blocking every other slot that might have reached it via a fresh shim. Move the entryShim resolution inside the try block so any resolution error releases the claim along the existing throw path.

) * fix(runner): hold fallback claim regardless of attempt outcome Three reviewer cards on chat=019E45413E126AFCD83146524A22BFC4 all swapped to anthropic/claude-sonnet-4-6 in sequence — codex (auth race with VSCode), gemini (quota), opencode/qwen3.6-plus (variant of fast- fail). Diversity collapse, the whole point of multi-LLM peer review defeated. Cause: when slot A's claude attempt returned null, the finally block released the claim. Slot B arrived 36s later, claimed claude itself, also failed, released. Slot C 43s later, same story. The runtime registry was working but the policy was wrong — releasing on null meant a persistently-broken fallback got tried by every slot in sequence, three reviewers all running the same model. New policy: claim is sticky for the round on ANY result that isn't an exception. First slot to reach a shared fallback target wins it. Subsequent slots see false from tryClaim and advance to the NEXT chain entry (fallback_collision warning on those cards, no duplicate run). If the first slot's attempt fails, the other slots simply have no backup — that's the honest signal that the template's one shared fallback didn't cover all primary failures. Fix at the template layer by configuring multiple fallbacks (one per likely-failing lineage). Throw still releases — an exception means something went wrong outside the model call (shim resolution, fs write, etc.) and other slots should get a chance to try with a fresh shim. Tests: +3 cases documenting the policy in fallback-registry tests. 945 pass. * fix(runner): release claim if shim resolution throws after tryClaim Pickup from chorus audit of PR #77 (antigravity-cli-8 finding #4): when pickShimForVoice for a cross-lineage fallback target throws AFTER tryClaimFallbackTarget succeeds but BEFORE the try block, `threw` never flips and releaseFallbackClaim never fires. The target stays claimed for the rest of the round, blocking every other slot that might have reached it via a fresh shim. Move the entryShim resolution inside the try block so any resolution error releases the claim along the existing throw path. --------- Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com>

chorus-codes merged commit 4a5ea20 into main May 8, 2026
2 checks passed

chorus-codes deleted the fix/diagnose-polish branch May 8, 2026 06:21

ElRaxy mentioned this pull request May 9, 2026

fix(daemon): persist reviewer phase_events for successful reviewers #24

Closed

chorus-codes mentioned this pull request May 20, 2026

feat(daemon): chat admission gate — caps concurrent chats + swap/load aware #64

Merged

4 tasks

chorus-codes mentioned this pull request May 21, 2026

fix: audit followup bundle — runOne crash, doer claim, tmux abort, proc-group kill, slot-idx, fetchChats race #83

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(diagnose): realpath bin path + filter Next.js SSE noise#4

fix(diagnose): realpath bin path + filter Next.js SSE noise#4
chorus-codes merged 1 commit into
mainfrom
fix/diagnose-polish

chorus-codes commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chorus-codes commented May 8, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant