Skip to content

feat: fold upstream T1+T2 fixes back into fork (12 commits)#2

Merged
crypticpy merged 13 commits into
feat/pr-review-and-hardeningfrom
feat/upstream-fold-back
May 17, 2026
Merged

feat: fold upstream T1+T2 fixes back into fork (12 commits)#2
crypticpy merged 13 commits into
feat/pr-review-and-hardeningfrom
feat/upstream-fold-back

Conversation

@crypticpy
Copy link
Copy Markdown
Owner

@crypticpy crypticpy commented May 17, 2026

Summary

PR 1 of 3 in the upstream fold-back series. Picks up all Tier 1 (5 free pickup commits) and Tier 2 (high-value bug fixes) from chorus-codes/chorus upstream, adapted to fork conventions and preserving fork-specific behavior (audit-phase reviewer shape, fork's standardPhaseRoundsExhausted promotion, fork's per-vendor agent shims).

All commits authored as fork-native edits with Co-Authored-By: Claude Opus 4.7 attribution. Skips Windows-specific hunks (out of scope for fork) and contributor stack (deferred to PR 3).

Commits

Tier 1 — free pickup (5)

  • 9fd258a feat(cli): add chorus diagnose command + crash-hook
  • 33e7d57 feat(cli): add chorus quickstart self-test command
  • d8a4ef4 fix(cli): dynamic import for open package (Node 22 ERR_REQUIRE_ESM)
  • 0d68fe3 feat(cockpit): seed empty round-1 so QUEUED renders from t=0
  • f595102 feat(daemon): runtime fallback-collision dedup across reviewer slots

Tier 2 — bug fixes (7)

  • f96a4f9 fix(daemon): write REVIEWER FAILED summary on pre-spawn failure
  • 357dc0d feat(voices): auto-disable on persistent quota_exhausted + lsof timeout
  • 865de94 fix(daemon, schema): codex isolation + template-schema validation
  • 15013f0 fix(runner): honour iterate.onDisagreement accept-doer/escalate
  • 4dd03ee test(cli-precheck): cover macOS Keychain fallback for Claude Code v2
  • e403e39 fix(cockpit): derive candidatesWithModels from snapshot's candidates field
  • 5f770d8 feat(diagnose): capture failure context — CLI smoke, voice health, recent failed chats

Fork-specific adaptations

  • Runner: decidePhaseOutcome pure helper (6-case matrix: 3 policies × 2 disagreement-gate states). Fork's standardPhaseRoundsExhausted promotion preserved alongside new escalated_on_disagreement branch.
  • Cockpit candidatesWithModels: runtime narrowing skips audit-phase single-voice reviewers (fork's audit phase has a different reviewer shape than upstream).
  • Diagnose: privacy-preserving — only errorMessageBytes exposed for recent failed chats; $HOME redacted via redactHomePaths; hard 2s SIGKILL timeout for CLI smoke (wrappers may trap SIGTERM); timedOut distinct from non-zero exit.
  • Schema: new idx_voices_enabled index for WHERE enabled = 0 voice-health scan.

Test plan

  • pnpm typecheck clean
  • pnpm test — 821 tests passing (62 files)
  • Each commit cherry-picked and built incrementally; no commit left the tree in a broken state
  • Smoke chorus diagnose on a real ~/.chorus after merge
  • Smoke chorus quickstart on a clean machine

🤖 Generated with Claude Code

Summary by Sourcery

Integrate upstream CLI, daemon, cockpit, and schema improvements into the fork, adding diagnostic and quickstart workflows while tightening reviewer orchestration, voice health handling, and template/CLI robustness.

New Features:

  • Add a chorus diagnose command that prints a redacted diagnostic bundle for bug reports, including daemon state, DB health, logs, crash previews, CLI detection, voice health, and recent failed chats.
  • Add a chorus quickstart command that fires a short self-test review against the first detected CLI and streams the result, seeding a private quickstart template on demand.
  • Introduce a crash hook for the CLI entrypoint to capture uncaught errors into ~/.chorus/crashes and guide users to file issues or run diagnostics.
  • Add per-voice failure tracking that can auto-disable voices on repeated quota_exhausted failures without a reset window, and surface auto-disabled voices in diagnostics.
  • Add a runtime fallback-collision registry so reviewer slots avoid running the same fallback (lineage, model) in parallel, preserving lineage diversity and reducing wasted cost.

Bug Fixes:

  • Ensure reviewer precheck failures write a REVIEWER FAILED summary on disk so cockpit cards exit the queued state with a visible error.
  • Correct iterate onDisagreement handling so accept-doer and escalate policies are honoured distinctly from the legacy continue path, including terminal chat verdicts.
  • Prevent false auth_missing failures for Claude Code v2+ on macOS by falling back to a Keychain probe when no credential file is present.
  • Avoid ESM runtime errors for the open package by using a dynamic import when launching the browser from the CLI.
  • Harden port and process utilities with bounded-time ss/lsof invocations and more robust PID/cmdline inspection to avoid hangs and misclassification.

Enhancements:

  • Enrich template snapshots loaded from the DB by deriving candidatesWithModels from reviewer candidates so cockpit run cards always know which model to display, while preserving existing data when already present.
  • Tighten template validation by rejecting reviewer configurations where require exceeds the number of candidates or distinct lineages when crossLineage=true, turning opaque run-time failures into clear schema errors.
  • Refine cockpit round enrichment to seed an initial empty round-1 and synthesise queued reviewer placeholders from t=0 so cards render deterministically even before any reviewer directories exist.
  • Factor codex headless invocation into a pure argv builder that always skips user config and git-repo checks, ensuring stable, sandboxed codex exec behaviour in reviewer/doer runs.
  • Extend voice schema and indexing (including an enabled index) to support new auto-disable reasons and faster health scans used by diagnostics.
  • Improve CLI browser-opening paths to use a shared helper with proper error handling and timeouts, and adjust Windows ESM imports in the launcher to be URL-safe.
  • Load the file watcher library lazily in the daemon output watcher to reduce upfront dependencies and keep timeouts explicit.

Documentation:

  • Document the new chorus diagnose command and crash log location in the README, including guidance for filing bug reports with diagnostic output.

Tests:

  • Add extensive unit and integration-style tests for diagnostics, quickstart template generation, voice failure tracking, reviewer pre-spawn failures, fallback collision handling, codex headless args, port utilities, template parsing/validation, and cockpit round enrichment to cover the new behaviours and regressions.

crypticpy and others added 12 commits May 17, 2026 11:28
Bundles two upstream changes that ship a self-service triage path for
chorus users hitting opaque failures:

- `chorus diagnose` walks the install, daemon, recent failed chats,
  voice health, and produces a sharable bug report.
- Crash hook captures uncaught exceptions in the CLI and writes them
  to a crash log alongside instructions to attach during a bug report.

Folded back from upstream chorus-codes/chorus:
  7ea712b feat: chorus diagnose command + crash hook for bug reports (#1)
  4a5ea20 fix(diagnose): realpath bin path + filter Next.js SSE noise (#4)

Co-Authored-By: chorus-codes <info@chorus.codes>
`chorus quickstart` runs a 30-second activation flow that verifies
the daemon comes up, the SQLite DB initializes, and a minimal chat
round-trips end-to-end. Aimed at first-run users who want to know
"is this thing actually working" before authoring a template.

Folded back from upstream chorus-codes/chorus:
  56610cf feat(cli): chorus quickstart — 30-second activation self-test (chorus-codes#30)

Co-Authored-By: chorus-codes <info@chorus.codes>
The `open` package and `chokidar` are both ESM-only as of recent
versions. On Node 22 (the daily-driver target) static `require()`
calls into them throw ERR_REQUIRE_ESM and crash the CLI at boot.

Switch to dynamic import in:
- src/cli/commands/start.ts (open browser after boot)
- src/cli/open-browser.ts (new helper)
- src/cli/index.ts (route open import)
- src/daemon/output-watcher.ts (chokidar file watch)

Includes upstream's post-merge hardening: the setTimeout that triggers
the browser-open no longer wraps an async callback bare, so a missing
default browser doesn't surface as an unhandled rejection.

Folded back from upstream chorus-codes/chorus:
  e8ca2ee fix(cli): dynamic import for open package (chorus-codes#14)
  dcd1837 fix: post-merge hardening for chorus-codes#14 (start.ts portion only;
          cli-precheck.test.ts portion ships with the Keychain fix)

Co-Authored-By: Julien Deudon <deudon.j@gmail.com>
Co-Authored-By: chorus-codes <info@chorus.codes>
Before: when a chat starts but no reviewer has produced an event yet,
enrichRounds returned an empty rounds array and the live-run page
showed nothing for several seconds — the user couldn't tell whether
their chat had launched.

After: seed a synthetic round-1 with QUEUED placeholders for every
expected participant so the page renders the per-reviewer cards
immediately. Real events overwrite placeholders as they arrive.

Folded back from upstream chorus-codes/chorus:
  53e8fb6 feat(cockpit): seed empty round-1 so QUEUED placeholders
          render from t=0 (#2)

Co-Authored-By: chorus-codes <info@chorus.codes>
When two reviewer slots both fall through their per-slot chains to the
same template-level fallback target (common case: every slot ends in
anthropic/claude-sonnet-4-6), both used to dispatch the same (lineage,
model) in parallel — wasted cost and the lineage diversity that's the
point of multi-LLM peer review collapsed.

Build-time dedup (template-fallback.ts) couldn't catch it because each
slot only knows about other slots' PRIMARIES, not their fallback chains.

Fix: new per-chat/per-round (lineage, model) registry. reviewer-driver
tryClaim's before each chain attempt and releases in a finally. On
collision, return null + emit cli_warning(reason='fallback_collision')
so runWithChainFallback advances to the next entry and the cockpit can
show why the slot skipped.

Ported into fork's reviewer-driver.ts surgically so the verdict-isolation
refactor (2a2cde2) and per-slot repoPath threading stay intact.

Folded back from upstream chorus-codes/chorus:
  c4751fe feat(daemon): runtime fallback-collision dedup (#3)

Co-Authored-By: chorus-codes <info@chorus.codes>
Before: when a reviewer's precheck fails (e.g. underlying CLI not
installed) or the chat is cancelled while the slot is queued for a
CLI semaphore slot, runReviewer used to return null silently —
leaving NO on-disk participant directory. The cockpit's enrich-rounds
loop then couldn't reconcile the synthesised template slot against
any real participant, so the card sat at "Queued — waiting for an
open slot." forever and the actual error was invisible.

Reproduction: install chorus on a host with only one CLI on PATH
(e.g. just claude-code), open a template that includes lineages
requiring codex/gemini/kimi, fire it. Every reviewer card stayed
"Queued" — chat never visibly progressed even though it was already
done failing.

Fix:
- Create the reviewer dir BEFORE the precheck runs.
- Add a writePreSpawnFailure helper that writes a `## REVIEWER FAILED`
  summary in the canonical format (Kind / Lineage / Model / message)
  that the cockpit's `parseFailureSummary` already understands.
- Wire it into the precheck-failed and cancelled-while-queued paths.

Card now transitions out of pending and shows the actual error
(cli_missing, cancelled, ...).

Folded back from upstream chorus-codes/chorus:
  afc59cc fix(daemon): REVIEWER FAILED summary on pre-spawn failure (chorus-codes#26)

Co-Authored-By: chorus-codes <info@chorus.codes>
Real pain (upstream chorus-codes#11): a Pro Gemini model on a Flash-only account
fails every chorus run with "exhausted your capacity on this model"
— but Gemini doesn't return a resetAt because the model isn't going
to become available for that account. Without auto-disable, the
runner keeps picking the dead voice on every chat and the user keeps
seeing the same opaque error.

Voice auto-disable:
- New src/lib/voice-failure-tracker.ts records per-voice consecutive
  quota_exhausted strikes in a settings counter.
- Trigger: 2 consecutive strikes WITH no resetAt → set
  voices.enabled=false + disabled_reason='auto_quota'.
- Counter resets on participant_done success; rate-limit strikes
  (hasResetAt=true) bypass the counter entirely so a transient
  429 + a later permanent failure can't trip the threshold on the
  first permanent strike.
- Wired into reviewer-driver alongside recordHealth; emits a
  cli_warning(reason='voice_auto_disabled') so the cockpit can show
  a one-line explanation.
- VoiceDisabledReason union gains 'auto_quota' (schema column was
  already TEXT — no migration).

Lsof timeout (upstream chorus-codes#12):
- findPidsOnPort and findPidsOnPortWithSudo now bound execSync /
  execFileSync to 3s, so a slow-but-functional lsof on a loaded
  macOS box doesn't hang chorus boot. 3s leaves headroom while
  still bounding the hang case.

Ported into fork's reviewer-driver.ts tmux pollHandle + success
path. voices.ts disabled_reason union extended alongside fork's
voice-tier column.

Folded back from upstream chorus-codes/chorus:
  4f6becc v0.8.30 — voice auto-disable (chorus-codes#11) + lsof timeout (chorus-codes#12) (chorus-codes#17)

Co-Authored-By: chorus-codes <info@chorus.codes>
Co-Authored-By: Lumina Mao <luminamao@mac.lan>
Two issues caused chats to fail opaquely at run-start:

CODEX ISOLATION (chorus-codes#10, chorus-codes#16)
The user's ~/.codex/config.toml may declare MCP servers, plugins, or
notification hooks. In headless `codex exec` those integrations have
caused codex to hang or cancel mid-call — two independent
reproductions: codex as our reviewer (chorus-codes#10) and codex as MCP client of
chorus (chorus-codes#16). Add --ignore-user-config to every headless codex argv.
Extracted to a pure `buildHeadlessArgs(opts)` so the argv shape is
unit-testable.

TEMPLATE VALIDATION (chorus-codes#15)
`reviewer.require > candidates.length` used to surface as "Job moves
immediately to failure upon Start press" — the runner queued, failed
to grant enough slots, and emitted an opaque chat-failure. Same for
`require > distinct lineages` when crossLineage:true. Both now
caught at TemplateSchema.parse() time with a clear error message
the user can fix before the run starts.

ReviewerSchema.superRefine() additions slot in cleanly alongside the
fork's audit/orchestrate phase schema work — both are additive
constraints on the same ReviewerSchema object.

Folded back from upstream chorus-codes/chorus:
  8ed970b fix(daemon, schema): codex isolation + template validation

Co-Authored-By: chorus-codes <info@chorus.codes>
The template schema, cockpit dialog, and SPEC-D-templates have always
exposed three values for iterate.onDisagreement — 'continue', 'escalate',
'accept-doer' — but the runner only honoured 'continue'. Picking the
other two from the cockpit form was a silent no-op: chats fell through
to phase_failed with 'doer_failed_all_rounds' regardless.

This wires both new branches into the round loop and the terminal
chat_done emission:

- 'accept-doer': after maxRounds without consensus, mark doerSucceeded
  and continue. The chat carries on (subsequent phases, ship, approval)
  as if reviewers had agreed on the doer's last answer.
- 'escalate': halt with status='failed' but verdict='request_changes'
  and error='escalated_on_disagreement', so cockpits can render
  "reviewers disagreed, needs human" distinctly from "doer broke."

Policy table extracted into a pure decidePhaseOutcome() helper so the
3 × 2 input matrix (policy × disagreement-in-last-round) is unit-tested
without standing up the full runChat scaffold.

Gated on disagreementInLastRound (reset at top of every round + on
doer-crash path) so a partial / empty doer answer can never be silently
"accept-doer"'d as final. Preserves the fork's existing
standardPhaseRoundsExhausted #7 surfacing for the 'continue' path; the
'escalate' path takes precedence with its own distinct chat_done.

Upstream PRs chorus-codes#49, chorus-codes#50 (commit 67572e9).

Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The fork already implements the Keychain fallback in cli-precheck
(hasDarwinKeychainEntry). This adds the missing test coverage:

- passes when no cred file but keychain entry exists
- blocks when no cred file and no keychain entry
- skips keychain check when cred file exists (fast-path preserved)
- does not consult keychain for non-anthropic lineages

vi.mock('node:child_process') uses the importOriginal spread pattern so
spawn / exec / etc. keep their real implementations — a bare module
replacement would silently break any sibling test that imports from
child_process.

Upstream PRs #7, #8, plus the dcd1837 test-mock hardening.

Co-Authored-By: Yura <yurahalych@gmail.com>
Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…field

Daemon-side TemplateSchema only carries `candidates` on each ReviewerRule.
The cockpit Template type expects `candidatesWithModels` populated —
enrich-rounds iterates that field to build slot→model mappings for
run-page cards. When fromRow parsed template_snapshot and cast it to
Template, the cast was a TypeScript lie: at runtime the parsed object
lacked candidatesWithModels, enrichRounds iterated zero reviewer slots,
and no model name reached the cards (badge appeared empty).

Derive candidatesWithModels at the parse seam (chats.fromRow) so the
cockpit's Template contract is honoured regardless of which path
produced the data. Idempotent — if a future daemon ever serialises
the field directly, that wins. Persona forwarded if present. Audit-
phase single-voice reviewers (no candidates array) are skipped via a
runtime narrow.

Upstream PR #6 (chorus-codes/chorus@ac0c7fd).

Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cent failed chats

Extends `chorus diagnose` with three signals that triage the most common
breakage modes:

- **CLI smoke**: spawn `<bin> --version` per detected CLI with a hard 2s
  SIGKILL timeout (wrapper scripts may trap SIGTERM). Distinguishes
  `timedOut` from non-zero exit so the report can tell hangs apart from
  crashes.
- **Voice health**: counts `enabled=0` voices grouped by `disabled_reason`
  ('user' vs 'auto_missing' vs 'quota_exhausted'). Added
  `idx_voices_enabled` so the `WHERE enabled = 0` scan stays cheap as
  the table grows.
- **Recent failed chats**: last 5 chats with `status='blocked'` plus the
  errored participants pulled from `~/.chorus/chats/<id>/round-*/<part>/_attempts.jsonl`.
  Only `errorMessageBytes` is exposed — raw error text never leaves the
  user's machine. `$HOME` is redacted from any embedded path strings via
  `redactHomePaths`.

Adapted from upstream chorus-codes#19 (0666dca). Preserves the
fork's existing diagnose shape and adds tests for smokeOneCli /
readLatestAttempt / formatReport rendering of the three new sections.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crypticpy has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 17, 2026

Reviewer's Guide

Backports multiple upstream Tier 1/2 CLI, daemon, cockpit, and schema fixes into the fork, adding a new chorus diagnose command and crash hook, chorus quickstart self-test, improved reviewer/disagreement semantics, voice auto-disable on persistent quota_exhausted, codex headless hardening, macOS keychain auth fallback, richer template/cockpit schema handling, port/CLI robustness, and extensive regression tests — all while preserving fork-specific behaviors.

Sequence diagram for voice auto-disable on persistent quota_exhausted

sequenceDiagram
  participant Runner as runReviewer
  participant Tracker as recordVoiceFailure
  participant Voices as voices
  participant Settings as settings
  participant TrackerOK as recordVoiceSuccess

  Runner->>Runner: reviewer throws err.kind="quota_exhausted"
  Runner->>Tracker: recordVoiceFailure(lineage, model, hasResetAt)
  Tracker->>Voices: list({ lineage })
  Voices-->>Tracker: [voiceRow]
  alt hasResetAt is true
    Tracker-->>Runner: { disabled:false, voiceId }
  else hasResetAt is false
    Tracker->>Settings: get("voice_failures.{voiceId}")
    Settings-->>Tracker: previousCount
    Tracker->>Settings: set("voice_failures.{voiceId}", previousCount+1)
    alt failures >= AUTO_DISABLE_THRESHOLD
      Tracker->>Voices: update(voiceId,{ enabled:false, disabled_reason: auto_quota })
      Tracker->>Settings: set("voice_failures.{voiceId}", 0)
      Tracker-->>Runner: { disabled:true, voiceId }
      Runner->>Runner: onEvent(cli_warning, reason=voice_auto_disabled)
    else
      Tracker-->>Runner: { disabled:false, voiceId }
    end
  end

  %% On successful run
  Runner->>TrackerOK: recordVoiceSuccess(lineage, model)
  TrackerOK->>Voices: list({ lineage })
  Voices-->>TrackerOK: [voiceRow]
  TrackerOK->>Settings: get("voice_failures.{voiceId}")
  Settings-->>TrackerOK: count
  alt count > 0
    TrackerOK->>Settings: set("voice_failures.{voiceId}", 0)
  end
Loading

File-Level Changes

Change Details Files
Improve reviewer lifecycle, fallback behavior, and disagreement handling while preserving fork-specific runner semantics.
  • Create reviewer directories before precheck and write ## REVIEWER FAILED summaries for all pre-spawn failure paths so cockpit can surface errors instead of leaving cards queued forever.
  • Introduce a per-chat/round fallback registry and use it in reviewer fallback chains to prevent multiple slots from concurrently running the same (lineage, model) fallback target, emitting fallback_collision warnings when collisions are avoided.
  • Add per-voice quota tracking so repeated quota_exhausted errors without reset windows increment counters, auto-disable affected voices with auto_quota reason, and clear counters on successful runs.
  • Track per-round disagreement state and add a pure decidePhaseOutcome helper to correctly honor iterate.onDisagreement policies (continue, accept-doer, escalate) while preserving existing standardPhaseRoundsExhausted handling and fork-specific chat_done behavior.
src/daemon/runner/reviewer-driver.ts
src/daemon/runner.ts
src/daemon/runner/template-fallback.ts
src/daemon/runner/fallback-registry.ts
src/lib/voice-failure-tracker.ts
tests/voice-failure-tracker.test.ts
tests/reviewer-driver-pre-spawn-failure.test.ts
tests/iterate-on-disagreement.test.ts
Add chorus diagnose diagnostic command and global crash-hook for better bug reports and crash visibility.
  • Implement chorus diagnose command that gathers a redact-home diagnostic snapshot (versions, install mode, daemon state, DB counts, voice health, recent failed chats, CLI detection + --version smokes, crash previews, log tails) and prints it as a fenced markdown bundle.
  • Add helpers to safely resolve bin paths through symlinks, detect install mode, redact $HOME from free-form strings, filter benign Next.js SSE disconnect noise from web logs, and summarise recent failed chats using on-disk _attempts.jsonl without leaking raw error messages.
  • Introduce a minimal, dependency-free crash hook (and a matching inline twin in bin/chorus.mjs) that writes structured crash logs to ~/.chorus/crashes on uncaught exceptions/unhandled rejections and nudges users toward GitHub issues or chorus diagnose.
  • Wire the diagnose command into the CLI entrypoint and README, and add unit tests for all helper functions and formatting paths.
src/cli/commands/diagnose.ts
src/cli/crash-hook.ts
bin/chorus.mjs
src/cli/index.ts
src/lib/db/connection.ts
src/lib/db/schema.sql
README.md
tests/diagnose.test.ts
tests/crash-hook.test.ts
Introduce chorus quickstart self-test command to fire a minimal review-only chat against the first detected CLI and surface its result inline.
  • Implement chorus quickstart command that detects available CLIs, maps the first one to a template reviewer lineage, upserts a private quickstart-self-test review-only template, posts a chat with a hardcoded off-by-one sample artifact, polls its status with SIGINT cancellation, and displays reviewer output or failure summaries inline.
  • Add a small YAML builder that generates a schema-valid review-only template matching the live TemplateSchema, with crossLineage=false and require=1 so it works for single-CLI users, and ship disabled.
  • Resolve cockpit URLs robustly via daemon.json instead of string substitution, and add tests covering the YAML builder, sample artifact, and mapping to the active schema.
src/cli/commands/quickstart.ts
src/cli/index.ts
tests/quickstart.test.ts
Harden CLI browser-opening and TCP port-inspection behavior for reliability across Node/OS combinations.
  • Replace direct open usage in CLI with a new openBrowser helper that dynamically imports the ESM-only open package to avoid ERR_REQUIRE_ESM under CJS builds, and await/catch failures when opening the cockpit URL (including from chorus start and status/auto-open paths).
  • Enhance port-utils to set timeouts on ss/lsof subprocesses (with and without sudo), standardize on double-quoted strings, and ensure process-kill helpers use explicit signals and safer process lookup, adding tests to assert the timeout behavior.
src/cli/open-browser.ts
src/cli/commands/start.ts
src/cli/index.ts
src/cli/port-utils.ts
tests/port-utils.test.ts
Improve cockpit template/candidate handling and round enrichment so reviewer cards and models render correctly from t=0.
  • Extend fromRow to validate template_snapshot with TemplateSchema.safeParse and, on success, derive or preserve candidatesWithModels from each reviewer’s candidates while leaving single-voice audit-phase reviewers untouched, so cockpit code can iterate reviewer slots and show model names even when snapshots only carry daemon-side shapes.
  • Add tests ensuring malformed/structurally-invalid/non-object snapshots fall back gracefully, and that both derived and pre-populated candidatesWithModels behave idempotently.
  • Update enrichRounds to seed an empty round-1 when no rounds exist yet so the run page immediately renders QUEUED placeholder cards for all expected reviewer slots instead of cards appearing only as dirs are created, and add tests for the placeholder behavior and model propagation.
src/lib/api/chats.ts
tests/api-chats-from-row.test.ts
src/components/live-run-real/enrich-rounds.ts
tests/enrich-rounds.test.ts
Tighten template schema validation and review configuration to catch misconfigured reviewer pools early.
  • Enhance ReviewerSchema with superRefine rules that reject require values exceeding candidates.length and, when crossLineage=true, exceeding the number of distinct lineages, surfacing clear schema errors at template-save time instead of opaque run-time failures.
  • Add tests covering invalid require/candidates combinations and valid edge cases (e.g. require=N with N distinct lineages and crossLineage=true, and the non-cross-lineage cases).
src/lib/template-schema.ts
tests/template-schema.test.ts
Harden codex integration for headless runs by centralizing argv construction and ignoring user config that can hang review jobs.
  • Refactor codex headless execution to a pure buildHeadlessArgs helper that always includes --skip-git-repo-check and --ignore-user-config, encodes sandbox/network/model flags, and tells codex exec to read the prompt from stdin.
  • Update codexShim.runHeadless to use the new helper while preserving accountId/model validation, workspace pre-trust, and spawn options; add tests to lock the expected argv shape and the presence of --ignore-user-config.
  • Normalize string quoting in codex.ts to consistent double quotes.
src/daemon/agents/codex.ts
tests/codex-headless-args.test.ts
Extend CLI precheck behavior with macOS Keychain support for Claude Code v2 credentials and improve the associated tests.
  • Mock node:child_process.execFileSync in cli-precheck tests to simulate macOS Keychain behavior, and expand test coverage to ensure quota gating, cred file detection, per-lineage CTAs, and the new anthropic-on-darwin keychain fallback behave as intended.
  • Ensure that when cred files exist, keychain is skipped; when running on non-anthropic lineages, keychain is not consulted; and that tests correctly reset HOME, DB, and mocks between runs.
tests/cli-precheck.test.ts
Add and wire voice-health metadata to support diagnose and auto-disable surfaces.
  • Extend VoiceRowSchema.disabled_reason to accept a new auto_quota value and document semantics for user, auto_missing, and auto_quota in code comments.
  • Add indexes on voices.enabled (both in SQL and initDb) to speed up disabled-voice scans used by chorus diagnose, and expose voice-health summary (total voices, auto-disabled-by-quota/missing, user-disabled count) via the diagnose snapshot and report formatting.
src/lib/db/voices.ts
src/lib/db/connection.ts
src/lib/db/schema.sql
src/cli/commands/diagnose.ts
tests/diagnose.test.ts
Misc daemon and CLI robustness improvements and test coverage expansions.
  • Convert the output watcher’s waitForAnswer to dynamically import chokidar to avoid bundling issues and better align with async usage.
  • Normalize string quoting/style across several modules for consistency, and add small correctness tweaks such as using pathToFileURL when dynamically importing dist/src entrypoints from the bin script to avoid Windows ESM URL scheme issues.
  • Add or expand tests around template snapshot parsing, daemon chat-from-row behavior, and other touched areas to ensure no regressions from the upstream fold-in.
src/daemon/output-watcher.ts
bin/chorus.mjs
tests/api-chats-from-row.test.ts
tests/enrich-rounds.test.ts
tests/iterate-on-disagreement.test.ts

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 17, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8ba5d7a2-c07c-408c-8821-70c31c170afa

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 security issue, 3 other issues, and left some high level feedback:

Security issues:

  • Detected calls to child_process from a function argument bin. This could lead to a command injection if the input is user controllable. Try to avoid calls to child_process, and if it is needed ensure user input is correctly sanitized or sandboxed. (link)

General comments:

  • The new fallback registry (fallback-registry.ts) is only used for per-attempt claims; I don’t see any call to resetRound() in the runner, so it’d be good to either wire resetRound(chatId, round) into the end-of-round/phase path or document why cross-round stickiness is intentional to avoid subtle over‑deduplication or state leaks across rounds.
  • Crash handling logic now exists both in bin/chorus.mjs and src/cli/crash-hook.ts with slightly different responsibilities; consider centralizing shared pieces (e.g. log format, field set) or adding a small shared helper to keep them from drifting over time.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new fallback registry (`fallback-registry.ts`) is only used for per-attempt claims; I don’t see any call to `resetRound()` in the runner, so it’d be good to either wire `resetRound(chatId, round)` into the end-of-round/phase path or document why cross-round stickiness is intentional to avoid subtle over‑deduplication or state leaks across rounds.
- Crash handling logic now exists both in `bin/chorus.mjs` and `src/cli/crash-hook.ts` with slightly different responsibilities; consider centralizing shared pieces (e.g. log format, field set) or adding a small shared helper to keep them from drifting over time.

## Individual Comments

### Comment 1
<location path="src/daemon/runner/reviewer-driver.ts" line_range="434-438" />
<code_context>
+            // return null so runWithChainFallback advances to the next chain
+            // entry; emit a cli_warning tagged `fallback_collision` so the
+            // cockpit can show why the slot skipped.
+            const claimed = tryClaimFallbackTarget(
               chatId,
-              phase,
               round,
-              reviewerIdx,
-              candidateLineage: entry.lineage,
-              candidateModel: entry.model,
-              agentName,
-              askContent: ask,
-              answerFile,
-              reviewerDir,
-              repoPath,
-              abortSignal: handle.signal,
-              onEvent,
-            });
+              entry.lineage,
+              entry.model,
+            );
+            if (!claimed) {
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid releasing a fallback claim that wasn’t acquired.

Because `releaseFallbackClaim(chatId, round, entry.lineage, entry.model)` is always called in `finally`, it runs even when `tryClaimFallbackTarget` returns `false`. That allows a slot that never held the claim to release it, potentially clearing another slot’s valid claim and causing two reviewers to collide on the same fallback. Please call `releaseFallbackClaim` only when `claimed` is true (e.g., `if (claimed) releaseFallbackClaim(...)`).
</issue_to_address>

### Comment 2
<location path="src/cli/commands/diagnose.ts" line_range="163" />
<code_context>
+ * earn their entry by being explicitly added — we don't want to hide
+ * an actual bug because its message vaguely matches a regex.
+ */
+function filterBenignNoise(text: string): {
+  kept: string;
+  filteredCount: number;
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting shared helpers (log-noise filtering, CLI smoking, path utilities, and DB/daemon sections) into dedicated functions/modules so the diagnose command stays focused and easier to follow.

The new command is functionally rich but quite dense; a few small extractions would reduce complexity without changing behavior.

### 1. Simplify `filterBenignNoise` to a line/block‑level filter

The current brace‑depth + orphan‑tail logic is quite intricate for a very specific pattern. You can treat the Next.js SSE trace as a block starting from a header and ending at the first blank line (or a fixed number of lines), which is easier to reason about and extend.

For example:

```ts
// lib/diagnostics/log-noise.ts
const NEXT_PIPE_HEADER = "Error: failed to pipe response";
const NEXT_BLOCK_MAX_LINES = 20;

export function filterBenignNoise(text: string): { kept: string; filteredCount: number } {
  if (!text || text.startsWith("(")) return { kept: text, filteredCount: 0 };

  const lines = text.split("\n");
  const kept: string[] = [];
  let filteredCount = 0;

  for (let i = 0; i < lines.length; i++) {
    const line = lines[i];
    if (line.includes(NEXT_PIPE_HEADER)) {
      filteredCount++;
      let skipped = 0;
      // Drop the header + following lines until blank or cap
      while (i + 1 < lines.length && skipped < NEXT_BLOCK_MAX_LINES) {
        const next = lines[i + 1];
        if (!next.trim()) {
          i++; // consume blank line and stop
          break;
        }
        i++;
        skipped++;
      }
      continue;
    }
    kept.push(line);
  }

  return { kept: kept.join("\n"), filteredCount };
}
```

Then the command module only wires it:

```ts
import { filterBenignNoise } from "../../lib/diagnostics/log-noise.js";

// ...
webTail: (() => {
  const raw = tailFile(path.join(chorusDir, "logs", "web.log"), 300);
  const { kept, filteredCount } = filterBenignNoise(raw);
  const trimmed = kept.split("\n").slice(-20).join("\n").trim();
  return filteredCount > 0
    ? `${trimmed}\n  (${filteredCount} benign SSE-disconnect trace${filteredCount === 1 ? "" : "s"} filtered)`
    : trimmed;
})(),
```

This keeps the “hide SSE noise” behavior while making the implementation much simpler.

### 2. Extract CLI smoking into a reusable helper

`smokeOneCli` is fairly sophisticated (timeouts, stdout/stderr capture, redaction, signals). Pulling it into a small reusable helper keeps this command thin and makes the behavior shareable across diagnostics.

```ts
// lib/cli/smoke.ts
import { spawn } from "child_process";
import { redactHomePaths } from "../path-utils.js";

export interface SmokeResult {
  ok: boolean;
  exitCode?: number;
  version?: string;
  stderrFirstLine?: string;
  timedOut?: boolean;
}

export function smokeOneCli(bin: string): Promise<SmokeResult> {
  // (move the existing implementation here unchanged)
}
```

In `diagnose.ts`:

```ts
import { smokeOneCli, type SmokeResult } from "../../lib/cli/smoke.js";

// ...
const smokes: Array<SmokeResult | undefined> = await Promise.all(
  found.map((d) => (d.found && d.path ? smokeOneCli(d.path) : Promise.resolve(undefined))),
);
```

This immediately shortens the command file and isolates process‑spawning concerns.

### 3. Centralize path helpers

`abbreviateHome`, `redactHomePaths`, and `resolveBinPath` are generic utilities and can live in a small shared module:

```ts
// lib/path-utils.ts
import os from "os";
import fs from "fs";

export function abbreviateHome(p: string): string {
  const home = os.homedir();
  return p.startsWith(home) ? "~" + p.slice(home.length) : p;
}

export function redactHomePaths(s: string): string {
  const home = os.homedir();
  if (!home) return s;
  const escaped = home.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
  return s.replace(new RegExp(escaped, "g"), "~");
}

export function resolveBinPath(rawBinPath: string): string {
  try {
    return fs.realpathSync(rawBinPath);
  } catch {
    return rawBinPath;
  }
}
```

Then in `diagnose.ts`:

```ts
import { abbreviateHome, redactHomePaths, resolveBinPath } from "../../lib/path-utils.js";
```

This reduces the “utility noise” in the command and makes these helpers available to other commands.

### 4. Optionally split `gather()` by concern

Even without creating separate files immediately, you can reduce `gather()`’s cognitive load by factoring the major sections into small helpers (which can later be moved to `lib/diagnostics/*`):

```ts
async function getDbCounts(): Promise<{ chats: number | string; voices: number | string }> {
  try {
    const { getDb } = await import("../../lib/db/connection.js");
    const db = await getDb();
    const cr = await db.execute("SELECT COUNT(*) AS n FROM chats");
    const vr = await db.execute("SELECT COUNT(*) AS n FROM voices");
    return {
      chats: Number((cr.rows[0] as any).n),
      voices: Number((vr.rows[0] as any).n),
    };
  } catch (err) {
    return {
      chats: `(error: ${err instanceof Error ? err.message.slice(0, 80) : "unknown"})`,
      voices: "(unavailable)",
    };
  }
}

// inside gather():
const { chats, voices } = await getDbCounts();
// ...
db: { chats, voices },
```

Similar extractions for “daemon state”, “voice health”, and “recent failed chats” would turn `gather()` into a high‑level orchestrator rather than a long, mixed‑concern function.
</issue_to_address>

### Comment 3
<location path="bin/chorus.mjs" line_range="21" />
<code_context>
+import { fileURLToPath, pathToFileURL } from "node:url";
+import { dirname, join, resolve } from "node:path";
+
+// Crash hook — installed BEFORE any other import so it captures early
+// startup failures. The src/cli/crash-hook.ts version is the testable
+// canonical source; this is its zero-dependency twin, kept inline so it
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting the shared crash-log formatting into a small reusable module so both the TS crash hook and the bin script can use it instead of duplicating logic.

The duplicated crash‑handling logic in `bin/chorus.mjs` does increase complexity and maintenance risk. You can keep the “early install, zero deps” behavior without hand‑maintaining a second implementation by extracting a tiny shared formatter module and reusing it from both places.

One concrete way to do this:

1. **Extract a pure formatter from the TS side**

Create a small, dependency‑free module that only knows how to turn an error + context into `{ body, headline }`. It should not do any IO, so it’s safe to call from both the bin stub and the existing TS crash hook.

```ts
// src/cli/crash-log-core.ts
export interface CrashContext {
  err: unknown;
  source: string;
  version: string;
  node: string;
  platform: string;
  argv: string;
  cwd: string;
  uptimeMs: number;
}

export function buildCrashLog(ctx: CrashContext): {
  body: string;
  headline: string;
} {
  const stack =
    ctx.err instanceof Error
      ? `${ctx.err.name}: ${ctx.err.message}\n${ctx.err.stack ?? "(no stack)"}`
      : String(ctx.err);

  const body = [
    "# Chorus crash report",
    "",
    `timestamp:    ${new Date().toISOString()}`,
    `source:       ${ctx.source}`,
    `chorus:       ${ctx.version}`,
    `node:         ${ctx.node}`,
    `platform:     ${ctx.platform}`,
    `argv:         ${ctx.argv}`,
    `cwd:          ${ctx.cwd}`,
    `uptime_ms:    ${ctx.uptimeMs}`,
    "",
    "## Error",
    "",
    stack,
    "",
  ].join("\n");

  const headline =
    ctx.err instanceof Error
      ? `${ctx.err.name}: ${ctx.err.message}`
      : String(ctx.err);

  return { body, headline };
}
```

Ensure this is compiled as a tiny JS helper (e.g. `dist/cli/crash-log-core.js`) as part of your normal build.

2. **Use the shared formatter in the existing TS crash hook**

Refactor `src/cli/crash-hook.ts` to call the formatter and only own the wiring + IO:

```ts
// src/cli/crash-hook.ts
import { homedir } from "node:os";
import { join } from "node:path";
import { mkdirSync, writeFileSync } from "node:fs";
import { buildCrashLog } from "./crash-log-core";

const crashDir = join(homedir(), ".chorus", "crashes");

function writeCrash(err: unknown, source: string, version: string) {
  const { body, headline } = buildCrashLog({
    err,
    source,
    version,
    node: process.versions.node,
    platform: `${process.platform} ${process.arch}`,
    argv: process.argv.slice(1).join(" "),
    cwd: process.cwd(),
    uptimeMs: Math.round(process.uptime() * 1000),
  });

  // (existing mkdirSync/writeFileSync + stderr messaging here)
}

// existing process.on(...) hooks call writeCrash()
```

3. **Thin the bin crash hook down to “wire + IO” and reuse the same formatter**

In `bin/chorus.mjs`, import the compiled helper before anything else, but keep the rest minimal. You still read the version locally to avoid depending on `src`, but you no longer duplicate the log format / fields.

```js
// bin/chorus.mjs
import { existsSync, mkdirSync, readFileSync, writeFileSync } from "node:fs";
import { homedir } from "node:os";
import { fileURLToPath, pathToFileURL } from "node:url";
import { dirname, join, resolve } from "node:path";
import { buildCrashLog } from "../dist/cli/crash-log-core.js"; // tiny, prebuilt helper

const ISSUE_URL = "https://github.com/chorus-codes/chorus/issues/new";

function readChorusVersion() {
  try {
    const __dn = dirname(fileURLToPath(import.meta.url));
    const raw = readFileSync(resolve(__dn, "..", "package.json"), "utf-8");
    const parsed = JSON.parse(raw);
    return typeof parsed.version === "string" ? parsed.version : "(unknown)";
  } catch {
    return "(unknown)";
  }
}

function installCrashHook() {
  const crashDir = join(homedir(), ".chorus", "crashes");
  const version = readChorusVersion();

  const handle = (err, source) => {
    const ctx = {
      err,
      source,
      version,
      node: process.versions.node,
      platform: `${process.platform} ${process.arch}`,
      argv: process.argv.slice(1).join(" "),
      cwd: process.cwd(),
      uptimeMs: Math.round(process.uptime() * 1000),
    };

    const { body, headline } = buildCrashLog(ctx);
    // minimal duplication: mkdirSync/writeFileSync + stderr messaging
  };

  process.on("uncaughtException", (err) => handle(err, "uncaughtException"));
  process.on("unhandledRejection", (err) => handle(err, "unhandledRejection"));
}

installCrashHook();
```

This preserves:

- Early installation in the bin file before `await import(distEntry)`.
- Zero reliance on importing TS or the main CLI entrypoint.
- Identical crash log structure and messaging in both code paths, with a single source of truth for semantics.

The remaining duplication is now limited to small, obvious IO glue (where the paths differ anyway), and all formatting / fields live in one place.
</issue_to_address>

### Comment 4
<location path="src/cli/commands/diagnose.ts" line_range="302" />
<code_context>
      child = spawn(bin, ["--version"], { windowsHide: true });
</code_context>
<issue_to_address>
**security (javascript.lang.security.detect-child-process):** Detected calls to child_process from a function argument `bin`. This could lead to a command injection if the input is user controllable. Try to avoid calls to child_process, and if it is needed ensure user input is correctly sanitized or sandboxed.

*Source: opengrep*
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/daemon/runner/reviewer-driver.ts
Comment thread src/cli/commands/diagnose.ts
Comment thread bin/chorus.mjs
Comment thread src/cli/commands/diagnose.ts
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5f770d88e3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +654 to +656
`SELECT id, status, created_at FROM chats
WHERE status IN ('failed', 'blocked', 'cancelled')
ORDER BY created_at DESC LIMIT 5`,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include no_review chats in diagnostics

When every reviewer fails (for example missing CLI/auth/quota exhaustion), runChat ends the chat with status = 'no_review', and those runs are exactly where _attempts.jsonl contains the failure context this new diagnose section is meant to surface. Because this query only includes failed, blocked, and cancelled, chorus diagnose reports no recent failed chats for the common all-reviewers-failed case, forcing users back into manual log collection.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f404044 — added no_review to the IN-list (and a regression test that asserts both the status and quota_exhausted errorKind render). Good catch: no_review is exactly the all-reviewers-failed terminal state that this section was meant to surface, so the original query was reporting empty for the most useful case.

The recent-failed-chats section was meant to surface per-participant
failure context from `_attempts.jsonl`, but the WHERE clause only
covered 'failed', 'blocked', 'cancelled'. The most common failure
shape — every reviewer down for missing CLI / auth / quota — ends the
chat in 'no_review', which was being silently filtered out. So the
exact case the section exists to diagnose returned an empty list,
forcing users back into manual log collection.

Adds 'no_review' to the IN-list and a regression test that asserts
both the status and a quota_exhausted errorKind render in the report.

Addresses chatgpt-codex review P2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crypticpy has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@crypticpy crypticpy merged commit 535a960 into feat/pr-review-and-hardening May 17, 2026
1 check passed
@crypticpy crypticpy deleted the feat/upstream-fold-back branch May 17, 2026 18:17
crypticpy added a commit that referenced this pull request May 18, 2026
…ut adapter (#1)

* fix: cred detection + Claude MCP user-scope registration

Three fixes from chorus-issues.md that prevent a freshly-installed chorus
from finding the user's existing CLI credentials, so the daemon starts up
cleanly on machines that already have Claude / Kimi / moonshot configured.

#1: register Claude MCP at user scope. The chorus MCP entry now writes to
the top-level `mcpServers` block in `~/.claude.json` (idempotent), and any
stale chorus entry under the project-scoped `projects[homedir].mcpServers`
is cleaned up. Previously the project-scoped registration was invisible to
Claude Code launched outside that exact cwd.

#2: cred-path fallbacks. When the anthropic file check misses (e.g. user
authed via Claude Desktop, no `~/.claude/...` JSON), fall back to the macOS
Keychain via `security find-generic-password -s "Claude Code-credentials"`.
Added `~/.kimi/credentials/kimi-code.json` to the moonshot CRED_PATHS so
users who authed through `kimi-code` aren't told to log in again.

#3: kimi config-missing precheck. New layer-3 check parses
`~/.kimi/config.toml` and surfaces a `config_missing` reason when there's
no top-level `default_model` set — the CLI will silently pick whatever
backend it likes, which is rarely what the user wants.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: reviewer fidelity, verdict surfacing, event/prompt isolation

Seven fixes from chorus-issues.md covering the rest of the runner +
MCP-surface issues found while reviewing PR #26 of foresight-app.

#4: thread `repoPath` through reviewer subprocesses. `runReviewers` →
`runReviewer` → `runReviewerHeadless` now accept the chat's repoPath and
the reviewer's cwd switches to it when set, so `gh`, file reads, and
sandboxed CLIs (Gemini) see the actual code instead of running in an
empty per-reviewer scratch dir.

#5: surface reviewer answer.md in MCP responses. New `readReviewerArtifacts`
helper walks `~/.chorus/chats/<id>/round-N/reviewer-*/answer.md`, caps each
at 16 KiB, sorts by (round desc, agent asc), and merges the result into
`wait_for_chat` and `get_chat_status` payloads under `reviews`. Both the
doer and reviewer `participant_done` events now carry `outputPath` so MCP
clients can read the on-disk source of truth when they need more than the
streamed tail.

#6: bump phase_progress output tail from 500 B to 8 KiB. The 500-byte
slice clipped reviewer summaries mid-word; full text remains on disk and
is pointed to by `outputPath`. Affects both reviewer.ts and doer.ts.

#7: tri-review verdict on `max_rounds_exhausted`. When the doer succeeded
every round but reviewers kept saying request_changes through the round
cap, chat_done now emits `status: completed, verdict: request_changes,
reason: max_rounds_exhausted` with the last round's reviewer summary —
previously misclassified as a generic doer failure.

#8: refactor `CreateChatSchema` and `InvokePersonaSchema` to plain
`z.object()` with per-field `.describe()`. The prior `.transform()` wrapped
them in `ZodEffects` which strips the `properties` map from MCP
introspection — clients saw an empty schema. Legacy `template` alias and
the `code-review` default moved into a new `resolveTemplateId()` helper.

#9: dedup `participant_done` at the multiplex layer. Same-slot fallbacks
or parsers that emit `message_done` twice (the opencode parser
historically does this) used to fan duplicate terminal events out to
every subscriber; now keyed by `(phaseIdx, round, role, agent)` and
later duplicates drop silently.

#10: per-instance reviewer prompt isolation. Same-lineage instances
(claude-code-2/4/5, etc.) share the chat dir tree at
`~/.chorus/chats/<id>/round-N/reviewer-*/`; tool-using CLIs were
wandering into a sibling's answer.md mid-flight and short-circuiting
("the review is complete" referring to a different agent's work).
`buildReviewerAsk` now stamps an Independence directive when more than
one reviewer slot exists, naming the slot tag and forbidding cross-slot
reads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: replay chat_done from persisted verdict, not status

The synthetic chat_done emitted when a terminal chat is re-attached
derived `verdict` from `chat.status`, ignoring the `chat.verdict` column.
Since the previous commit shipped the `max_rounds_exhausted` branch
(chorus-issues.md #7), a chat can finish with `status='approved'
verdict='request_changes'` — replay was clobbering that to `approved`
on every page reload, hiding reviewer disagreement from the user.

Use the persisted column when set; fall back to the old
status-derived value only for pre-v0.8.27 rows where verdict is null.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: surface dropped attached_files + SSE backpressure; harden ship.ts

Three audit follow-ups on the daemon side, all surfacing previously
silent failures.

attached_files: parseAttachedFiles in runner-multiplex.ts used to
swallow JSON parse errors and run the chat with no attachments. Refactor
to a tagged result (`empty` / `ok` / `invalid`); on `invalid` the runner
logs and emits a `cli_warning` SSE so the cockpit + MCP clients see
which chat lost its file list.

SSE backpressure: when a subscriber's queue exceeds the 1000-line cap
the multiplex used to silently drop the connection. Now writes one
`error` frame with code `sse_backpressure` before close, and logs the
queue length to daemon.log so an operator tailing logs can see when
clients fall behind.

gh pr create URL validation: ship.ts captured stdout's last line as the
PR URL with no shape check; an empty/malformed stdout produced
`{ok: true, prUrl: ''}` and the chat row recorded "shipped" with an
unclickable link. Now matches against
`^https://github.com/<owner>/<repo>/pull/<n>` before declaring success.

detectGitContext parallelization: the five spawnSync probes (is-repo,
remote, gh --version, gh auth, HEAD) ran sequentially at 60s each —
worst case 360s before runner saw a result. Converted to async with a
new `runAsync` helper, batched via Promise.all with a 15s per-probe
cap; detectDefaultBranch's symref + three branch-existence checks
likewise parallelized. detectGitContext is now async; the lone caller
in runner.ts awaits it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: bound failure-summary regex; log malformed SSE frames

participant-card.tsx: parseFailureSummary ran the multi-step regex
chain over the full participant.answer string. Reviewer answers can be
up to 256 KB; on every render that's a UI-thread block. Slice to the
first 16 KiB before scanning — the failure-header block is always
written at the top of answer.md by reviewer.ts/doer.ts, so the cap
never loses signal.

live-run-real/index.tsx: the SSE onmessage handler already had a
try/catch around JSON.parse, but the catch was silent — a wire-format
mismatch dropped events with no trace. Add a console.warn with a
preview so devs notice schema drift in DevTools.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: github PR ingestion via gh CLI

Adds src/daemon/github-pr.ts: parsePrUrl + fetchPrArtifact run
gh pr view/diff plus review and issue comments in parallel,
synthesize a Markdown artifact (description, comments capped at
50 newest each, diff capped at 200 KB UTF-8 safe), and classify
gh failures into typed reasons.

Exports runAsync from ship.ts so the new module can reuse the
existing spawn+timeout helper instead of duplicating it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: extract createChatFromValidatedInputs helper

Pulls the template lookup, artifact validation, chat row + opening
phase event creation, and runner kickoff out of the POST /chats
handler into a reusable helper. POST /chats now only handles its
route-specific concerns (body shape, repoPath canonicalization,
error response shaping).

Sets up reuse from the upcoming POST /chats/from-pr endpoint
without duplicating ~150 lines of validation logic.

No behavior change — same template checks, same artifact rules,
same kickoff path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: POST /chats/from-pr — start a chat from a GitHub PR URL

Accepts { url, templateId, repoPath?, yolo? }, parses the PR URL,
fetches PR meta + diff + existing comments via gh CLI, synthesizes
a Markdown artifact, and creates the chat through the shared
createChatFromValidatedInputs helper.

gh failures map to typed reasons (invalid_url, gh_not_installed,
gh_not_authed, pr_not_found, network_failure, unknown) so the
cockpit can render actionable errors instead of generic 500s.

Adds tests/github-pr.test.ts covering parsePrUrl edge cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: cockpit "GitHub PR" tab on /new

Adds a Free-form / GitHub PR mode toggle on the new-chat page. PR
mode swaps the prompt textarea for a URL input and routes through
the new POST /chats/from-pr endpoint. Validates client-side that
the chosen template is review-only before letting the user submit.

createChatFromPr API client surfaces the daemon's typed PR meta
(owner/repo/number/title/branches) on the response so callers can
display PR context after the chat is created.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: review_pr MCP tool

Exposes POST /chats/from-pr through MCP. Orchestrators (Claude Code,
Codex, Cursor) can now hand chorus a PR URL and get reviewers running
against it without going through the cockpit. Defaults templateId to
review-only so a caller can pass just a URL.

ReviewPrSchema is a plain z.object (not ZodEffects) so MCP clients
can introspect required fields — same hazard documented on
CreateChatSchema and InvokePersonaSchema.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: capture multi-identity CLI follow-up idea

Idea note for running chorus against multiple paid accounts on the same
CLI binary (work + personal Claude Code Max, etc.). Filed as follow-up
after audit-presets + quota tiers ship — captures the env-override
mechanism, proposed Identity primitive, and open questions on keychain
CLIs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: schema for audit + orchestrate phases, voice tier, bypass_quota

Adds the foundation for repo-pointed audit-and-orchestrate runs and
the orchestrator's task↔voice tier matching.

Template schema:
- AuditPhase (kind: 'audit') — single reviewer voice + one of five
  preset lenses (de-slopify, monolith-breakdown, code-review,
  engineering-review, architecture-review). Output schema
  (AuditItemSchema, AuditOutputSchema) lives next to the phase shape
  so the structured-output adapter, scheduler, and cockpit checklist
  agree on the contract.
- OrchestratePhase (kind: 'orchestrate') — array of worker voices,
  default branchPrefix `chorus/{chatId}/worker-{idx}` so each worker
  gets isolated git state.
- templateRequiresRepo() helper for the cockpit's repo-picker gate.

Voices:
- Adds tier ('high' | 'medium' | 'low', default 'medium') and
  monthly_budget_usd (nullable) to the row schema, upsert input, and
  update input. Idempotent migrations on existing DBs.

Chats:
- bypass_quota INTEGER NOT NULL DEFAULT 0 — set on PR-review chats so
  the orchestrate scheduler runs every enabled voice at full capacity
  instead of tier-gating.

Runner is stubbed for the new kinds: phase_done emit + continue, so
templates that declare an audit/orchestrate phase before the runner
logic lands don't crash.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: structured-output adapter for CLI voices

Wraps an AgentShim's runHeadless with JSON-formatting prompt scaffold
and a one-shot repair loop, returning typed data validated against a
caller-supplied zod schema.

Used by the upcoming audit phase (which needs typed AuditItem[]
instead of free-form prose) and the orchestrate phase (worker
results). Keeps each CLI lineage's existing headless transport — the
adapter just owns the prompt-shape + parse-and-validate dance.

Extraction strategy: prefer direct JSON.parse of finalText; fall back
through fenced-block regex variants to a brace-to-brace slice. On
parse or schema-violation, retry once with a repair prompt that quotes
the validation error. Spawn errors short-circuit (the model never saw
the prompt — repair would just retry the same failure).

Tests cover happy path, fenced-block extraction, repair-loop success,
repair-loop exhaustion, schema violation, and spawn error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cockpit): audit-a-repo tab + checklist approval component

/new gets a third tab beside Free-form and GitHub PR. In audit mode
the user picks one of five preset lenses (de-slopify,
monolith-breakdown, code-review, engineering-review,
architecture-review) and supplies an absolute repo path. Submit fires
createChat with templateId=`audit-<preset>` — those built-in
templates land with the audit-phase implementation.

RunChecklist component lives at src/components/run-checklist/. It
takes the AuditItem[] surfaced by the audit phase's blocking event
and renders one row per item with a checkbox, complexity badge,
rationale, and file list. Default state has every item selected; the
user trims, then submits via the parent's onSubmit which JSON-encodes
the selected ids into the existing /chats/:id/resume `answer` field.
Wiring into the live-run UI lands with the audit phase.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: PR-review chats bypass quota + tier surface on /voices

PR-review chats automatically set bypass_quota=true so the orchestrate
scheduler ignores voice.tier and runs the full fleet at maximum
capacity — reviews are short, parallel, and the user wants the
strongest opinion possible regardless of model tier.

PUT /voices/:id now accepts tier ('high' | 'medium' | 'low') and
monthly_budget_usd (non-negative or null), so the cockpit fleet page
can label voices by capability for the orchestrate scheduler to route
work against. Tests cover both new fields plus a chat round-trip
asserting bypass_quota defaults false and persists when set.

* feat: audit phase + 5 presets + audit-* templates

Wires the audit phase end-to-end:
- src/daemon/phases/audit.ts runs the structured-output adapter against
  the chosen preset, persists the parsed AuditItem[] to
  <chatDir>/audit-output.json plus raw model output to
  round-1/audit/output.md, and emits phase_progress with the items.
- src/daemon/runner.ts replaces the audit/orchestrate stub: audit
  invokes runAuditPhase, flips chat status to blocked so the cockpit
  renders the checklist UI, and exits cleanly. Orchestrate keeps the
  no-op stub until step 5 lands.
- 5 preset prompts (de-slopify, monolith-breakdown, code-review,
  engineering-review, architecture-review) frame what each lens looks
  for. The structured-output adapter handles JSON formatting; presets
  describe the audit lens only.
- 5 audit-* templates (one per preset), each a 2-phase audit -> orchestrate
  shape with three default workers. Auto-loaded by seedBuiltinTemplates.
- tests/audit-phase.test.ts covers preset-file presence and the
  audit-* template parse + shape contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: orchestrate phase + audit-resume wiring + tier-aware scheduler

Wires the audit→orchestrate handoff: the cockpit POSTs the user's
trimmed audit checklist to /chats/:id/resume, the resume handler
cross-checks ids against audit-output.json, persists the selection,
flips chat to drafting on the orchestrate phase, and re-fires the
runner. The runner now starts at chat.current_phase_idx so a resumed
chat lands directly on orchestrate.

The new orchestrate phase walks the approved AuditItem[] sequentially
(parallelism is an explicit non-goal for v1), picks a worker per item
via the pure tier-aware scheduler, cuts a per-item branch, dispatches
the worker via shim.runHeadless, captures git diff --stat, and
persists orchestrate-manifest.json for the diff-apply UI to consume.

The scheduler is a pure function with 9 unit tests covering tier
matching, bypass override, disabled-voice skipping, empty pool, and
unknown voice ids. Resume route has 10 tests exercising body
validation, id cross-check, status gating, and the happy path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: orchestrate manifest UI + checkout/open-pr daemon routes

- Run page reads audit-output.json + orchestrate-manifest.json on render
- LiveRunReal renders RunChecklist while blocked w/ audit items, then
  swaps to OrchestrateManifest panel once orchestrate completes
- New OrchestrateManifest component shows one row per worker w/
  Checkout / Open PR buttons (per-row inline feedback, no global toast)
- Daemon: GET /chats/:id/audit-items, GET /chats/:id/orchestrate-manifest,
  POST /chats/:id/workers/:idx/checkout (refuses on dirty tree),
  POST /chats/:id/workers/:idx/open-pr (gh pr create, bucketed failures)
- OrchestrateManifestSchema added to template-schema.ts; route + UI
  parse via the same shape

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: harden resume race + branch validation + symlink TOCTOU + extractJson

Address /freview findings on the audit + orchestrate flow:

- Resume race (BLOCKER): two concurrent POSTs to /chats/:id/resume could
  both pass the `status=='blocked'` check and double-fire the runner.
  Guard with `getActiveRun` (catches the audit-finishing window before
  `.finally` clears the registry) and replace the status flip with an
  atomic `tryResumeFromBlocked` CAS conditional on `WHERE status =
  'blocked'`.
- Branch-name argument injection (BLOCKER): tighten zod regexes on
  `OrchestratePhase.branchPrefix` and `OrchestrateManifestEntry.branch`
  so values starting with `-` (or containing shell metachars) cannot
  flow into `git checkout` / `gh pr create` as flags.
- Symlink TOCTOU on checkout + open-pr (NON-BLOCKER): re-realpath
  `existing.repo_path` before passing to execFile cwd, mirroring the
  rerun-path pattern. Returns a structured validation error if the
  path no longer resolves.
- extractJson Path 4 (NON-BLOCKER): try `{...}` and `[...]` slices
  independently and prefer the longer parse, so prose like
  "mentions [stuff] before {object}" extracts the object instead of
  the bracket.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: prod CJS build — drop import.meta + copy presets to dist

Two issues blocked `pnpm build:server`:

- `audit.ts` used `import.meta.url` for module-relative path resolution,
  but the server tsconfig compiles to CJS where `import.meta` is a
  syntax error. Replaced with `__dirname`, which works in both the
  compiled dist (native CJS) and tsx-driven dev (tsx ≥4 shims it in
  ESM mode).
- The `build:server` script copied `schema.sql` to dist/ but missed the
  preset markdown files in `src/daemon/presets/`. The audit phase's
  `loadPresetPrompt` resolves relative to `__dirname`, so a published
  install was hitting ENOENT on every audit run. Extended the copy
  step to mirror the preset directory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: fold upstream T1+T2 fixes back into fork (12 commits) (#2)

* feat(cli): add diagnose command + crash-hook

Bundles two upstream changes that ship a self-service triage path for
chorus users hitting opaque failures:

- `chorus diagnose` walks the install, daemon, recent failed chats,
  voice health, and produces a sharable bug report.
- Crash hook captures uncaught exceptions in the CLI and writes them
  to a crash log alongside instructions to attach during a bug report.

Folded back from upstream chorus-codes/chorus:
  7ea712b feat: chorus diagnose command + crash hook for bug reports (#1)
  4a5ea20 fix(diagnose): realpath bin path + filter Next.js SSE noise (#4)

Co-Authored-By: chorus-codes <info@chorus.codes>

* feat(cli): add quickstart self-test command

`chorus quickstart` runs a 30-second activation flow that verifies
the daemon comes up, the SQLite DB initializes, and a minimal chat
round-trips end-to-end. Aimed at first-run users who want to know
"is this thing actually working" before authoring a template.

Folded back from upstream chorus-codes/chorus:
  56610cf feat(cli): chorus quickstart — 30-second activation self-test (#30)

Co-Authored-By: chorus-codes <info@chorus.codes>

* fix(cli): use dynamic import for open package (Node 22 ERR_REQUIRE_ESM)

The `open` package and `chokidar` are both ESM-only as of recent
versions. On Node 22 (the daily-driver target) static `require()`
calls into them throw ERR_REQUIRE_ESM and crash the CLI at boot.

Switch to dynamic import in:
- src/cli/commands/start.ts (open browser after boot)
- src/cli/open-browser.ts (new helper)
- src/cli/index.ts (route open import)
- src/daemon/output-watcher.ts (chokidar file watch)

Includes upstream's post-merge hardening: the setTimeout that triggers
the browser-open no longer wraps an async callback bare, so a missing
default browser doesn't surface as an unhandled rejection.

Folded back from upstream chorus-codes/chorus:
  e8ca2ee fix(cli): dynamic import for open package (#14)
  dcd1837 fix: post-merge hardening for #14 (start.ts portion only;
          cli-precheck.test.ts portion ships with the Keychain fix)

Co-Authored-By: Julien Deudon <deudon.j@gmail.com>
Co-Authored-By: chorus-codes <info@chorus.codes>

* feat(cockpit): seed empty round-1 so QUEUED renders from t=0

Before: when a chat starts but no reviewer has produced an event yet,
enrichRounds returned an empty rounds array and the live-run page
showed nothing for several seconds — the user couldn't tell whether
their chat had launched.

After: seed a synthetic round-1 with QUEUED placeholders for every
expected participant so the page renders the per-reviewer cards
immediately. Real events overwrite placeholders as they arrive.

Folded back from upstream chorus-codes/chorus:
  53e8fb6 feat(cockpit): seed empty round-1 so QUEUED placeholders
          render from t=0 (#2)

Co-Authored-By: chorus-codes <info@chorus.codes>

* feat(daemon): runtime fallback-collision dedup across reviewer slots

When two reviewer slots both fall through their per-slot chains to the
same template-level fallback target (common case: every slot ends in
anthropic/claude-sonnet-4-6), both used to dispatch the same (lineage,
model) in parallel — wasted cost and the lineage diversity that's the
point of multi-LLM peer review collapsed.

Build-time dedup (template-fallback.ts) couldn't catch it because each
slot only knows about other slots' PRIMARIES, not their fallback chains.

Fix: new per-chat/per-round (lineage, model) registry. reviewer-driver
tryClaim's before each chain attempt and releases in a finally. On
collision, return null + emit cli_warning(reason='fallback_collision')
so runWithChainFallback advances to the next entry and the cockpit can
show why the slot skipped.

Ported into fork's reviewer-driver.ts surgically so the verdict-isolation
refactor (2a2cde2) and per-slot repoPath threading stay intact.

Folded back from upstream chorus-codes/chorus:
  c4751fe feat(daemon): runtime fallback-collision dedup (#3)

Co-Authored-By: chorus-codes <info@chorus.codes>

* fix(daemon): write REVIEWER FAILED summary on pre-spawn failure

Before: when a reviewer's precheck fails (e.g. underlying CLI not
installed) or the chat is cancelled while the slot is queued for a
CLI semaphore slot, runReviewer used to return null silently —
leaving NO on-disk participant directory. The cockpit's enrich-rounds
loop then couldn't reconcile the synthesised template slot against
any real participant, so the card sat at "Queued — waiting for an
open slot." forever and the actual error was invisible.

Reproduction: install chorus on a host with only one CLI on PATH
(e.g. just claude-code), open a template that includes lineages
requiring codex/gemini/kimi, fire it. Every reviewer card stayed
"Queued" — chat never visibly progressed even though it was already
done failing.

Fix:
- Create the reviewer dir BEFORE the precheck runs.
- Add a writePreSpawnFailure helper that writes a `## REVIEWER FAILED`
  summary in the canonical format (Kind / Lineage / Model / message)
  that the cockpit's `parseFailureSummary` already understands.
- Wire it into the precheck-failed and cancelled-while-queued paths.

Card now transitions out of pending and shows the actual error
(cli_missing, cancelled, ...).

Folded back from upstream chorus-codes/chorus:
  afc59cc fix(daemon): REVIEWER FAILED summary on pre-spawn failure (#26)

Co-Authored-By: chorus-codes <info@chorus.codes>

* feat(voices): auto-disable on persistent quota_exhausted + lsof timeout

Real pain (upstream #11): a Pro Gemini model on a Flash-only account
fails every chorus run with "exhausted your capacity on this model"
— but Gemini doesn't return a resetAt because the model isn't going
to become available for that account. Without auto-disable, the
runner keeps picking the dead voice on every chat and the user keeps
seeing the same opaque error.

Voice auto-disable:
- New src/lib/voice-failure-tracker.ts records per-voice consecutive
  quota_exhausted strikes in a settings counter.
- Trigger: 2 consecutive strikes WITH no resetAt → set
  voices.enabled=false + disabled_reason='auto_quota'.
- Counter resets on participant_done success; rate-limit strikes
  (hasResetAt=true) bypass the counter entirely so a transient
  429 + a later permanent failure can't trip the threshold on the
  first permanent strike.
- Wired into reviewer-driver alongside recordHealth; emits a
  cli_warning(reason='voice_auto_disabled') so the cockpit can show
  a one-line explanation.
- VoiceDisabledReason union gains 'auto_quota' (schema column was
  already TEXT — no migration).

Lsof timeout (upstream #12):
- findPidsOnPort and findPidsOnPortWithSudo now bound execSync /
  execFileSync to 3s, so a slow-but-functional lsof on a loaded
  macOS box doesn't hang chorus boot. 3s leaves headroom while
  still bounding the hang case.

Ported into fork's reviewer-driver.ts tmux pollHandle + success
path. voices.ts disabled_reason union extended alongside fork's
voice-tier column.

Folded back from upstream chorus-codes/chorus:
  4f6becc v0.8.30 — voice auto-disable (#11) + lsof timeout (#12) (#17)

Co-Authored-By: chorus-codes <info@chorus.codes>
Co-Authored-By: Lumina Mao <luminamao@mac.lan>

* fix(daemon, schema): codex isolation + template-schema validation

Two issues caused chats to fail opaquely at run-start:

CODEX ISOLATION (#10, #16)
The user's ~/.codex/config.toml may declare MCP servers, plugins, or
notification hooks. In headless `codex exec` those integrations have
caused codex to hang or cancel mid-call — two independent
reproductions: codex as our reviewer (#10) and codex as MCP client of
chorus (#16). Add --ignore-user-config to every headless codex argv.
Extracted to a pure `buildHeadlessArgs(opts)` so the argv shape is
unit-testable.

TEMPLATE VALIDATION (#15)
`reviewer.require > candidates.length` used to surface as "Job moves
immediately to failure upon Start press" — the runner queued, failed
to grant enough slots, and emitted an opaque chat-failure. Same for
`require > distinct lineages` when crossLineage:true. Both now
caught at TemplateSchema.parse() time with a clear error message
the user can fix before the run starts.

ReviewerSchema.superRefine() additions slot in cleanly alongside the
fork's audit/orchestrate phase schema work — both are additive
constraints on the same ReviewerSchema object.

Folded back from upstream chorus-codes/chorus:
  8ed970b fix(daemon, schema): codex isolation + template validation

Co-Authored-By: chorus-codes <info@chorus.codes>

* fix(runner): honour iterate.onDisagreement accept-doer/escalate

The template schema, cockpit dialog, and SPEC-D-templates have always
exposed three values for iterate.onDisagreement — 'continue', 'escalate',
'accept-doer' — but the runner only honoured 'continue'. Picking the
other two from the cockpit form was a silent no-op: chats fell through
to phase_failed with 'doer_failed_all_rounds' regardless.

This wires both new branches into the round loop and the terminal
chat_done emission:

- 'accept-doer': after maxRounds without consensus, mark doerSucceeded
  and continue. The chat carries on (subsequent phases, ship, approval)
  as if reviewers had agreed on the doer's last answer.
- 'escalate': halt with status='failed' but verdict='request_changes'
  and error='escalated_on_disagreement', so cockpits can render
  "reviewers disagreed, needs human" distinctly from "doer broke."

Policy table extracted into a pure decidePhaseOutcome() helper so the
3 × 2 input matrix (policy × disagreement-in-last-round) is unit-tested
without standing up the full runChat scaffold.

Gated on disagreementInLastRound (reset at top of every round + on
doer-crash path) so a partial / empty doer answer can never be silently
"accept-doer"'d as final. Preserves the fork's existing
standardPhaseRoundsExhausted #7 surfacing for the 'continue' path; the
'escalate' path takes precedence with its own distinct chat_done.

Upstream PRs #49, #50 (commit 67572e9).

Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(cli-precheck): cover macOS Keychain fallback for Claude Code v2

The fork already implements the Keychain fallback in cli-precheck
(hasDarwinKeychainEntry). This adds the missing test coverage:

- passes when no cred file but keychain entry exists
- blocks when no cred file and no keychain entry
- skips keychain check when cred file exists (fast-path preserved)
- does not consult keychain for non-anthropic lineages

vi.mock('node:child_process') uses the importOriginal spread pattern so
spawn / exec / etc. keep their real implementations — a bare module
replacement would silently break any sibling test that imports from
child_process.

Upstream PRs #7, #8, plus the dcd1837 test-mock hardening.

Co-Authored-By: Yura <yurahalych@gmail.com>
Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cockpit): derive candidatesWithModels from snapshot's candidates field

Daemon-side TemplateSchema only carries `candidates` on each ReviewerRule.
The cockpit Template type expects `candidatesWithModels` populated —
enrich-rounds iterates that field to build slot→model mappings for
run-page cards. When fromRow parsed template_snapshot and cast it to
Template, the cast was a TypeScript lie: at runtime the parsed object
lacked candidatesWithModels, enrichRounds iterated zero reviewer slots,
and no model name reached the cards (badge appeared empty).

Derive candidatesWithModels at the parse seam (chats.fromRow) so the
cockpit's Template contract is honoured regardless of which path
produced the data. Idempotent — if a future daemon ever serialises
the field directly, that wins. Persona forwarded if present. Audit-
phase single-voice reviewers (no candidates array) are skipped via a
runtime narrow.

Upstream PR #6 (chorus-codes/chorus@ac0c7fd).

Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(diagnose): capture failure context — CLI smoke, voice health, recent failed chats

Extends `chorus diagnose` with three signals that triage the most common
breakage modes:

- **CLI smoke**: spawn `<bin> --version` per detected CLI with a hard 2s
  SIGKILL timeout (wrapper scripts may trap SIGTERM). Distinguishes
  `timedOut` from non-zero exit so the report can tell hangs apart from
  crashes.
- **Voice health**: counts `enabled=0` voices grouped by `disabled_reason`
  ('user' vs 'auto_missing' vs 'quota_exhausted'). Added
  `idx_voices_enabled` so the `WHERE enabled = 0` scan stays cheap as
  the table grows.
- **Recent failed chats**: last 5 chats with `status='blocked'` plus the
  errored participants pulled from `~/.chorus/chats/<id>/round-*/<part>/_attempts.jsonl`.
  Only `errorMessageBytes` is exposed — raw error text never leaves the
  user's machine. `$HOME` is redacted from any embedded path strings via
  `redactHomePaths`.

Adapted from upstream chorus-codes/chorus#19 (0666dca). Preserves the
fork's existing diagnose shape and adds tests for smokeOneCli /
readLatestAttempt / formatReport rendering of the three new sections.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(diagnose): include no_review in recent failed chats query

The recent-failed-chats section was meant to surface per-participant
failure context from `_attempts.jsonl`, but the WHERE clause only
covered 'failed', 'blocked', 'cancelled'. The most common failure
shape — every reviewer down for missing CLI / auth / quota — ends the
chat in 'no_review', which was being silently filtered out. So the
exact case the section exists to diagnose returned an empty list,
forcing users back into manual log collection.

Adds 'no_review' to the IN-list and a regression test that asserts
both the status and a quota_exhausted errorKind render in the report.

Addresses chatgpt-codex review P2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: chorus-codes <info@chorus.codes>
Co-authored-by: Julien Deudon <deudon.j@gmail.com>
Co-authored-by: Lumina Mao <luminamao@mac.lan>
Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Yura <yurahalych@gmail.com>

* feat: fold upstream Grok + Local LLM + Keychain dual-probe (4 commits) (#3)

* feat(grok): detect Grok Build (xAI) + Level 1 orchestrator

Adds Grok Build CLI to detection, onboarding picker, /connect card,
diagnose smoke, init listing, and doctor labels. Grok auto-picks
chorus MCP from ~/.claude.json (verified empirically via `grok
inspect`) — no separate MCP wire needed.

The grok orchestrator reports connected=true when both the binary is
detected AND chorus is wired in ~/.claude.json (either top-level
mcpServers or any project-scoped mcpServers entry). connect() is a
no-op that points users at `chorus connect claude` if claude hasn't
been wired yet.

Quickstart filters CLIs to those with shims, so grok-cli being
detected first no longer breaks the doer-pick flow. The cliToLineage
map remains the source of truth for reviewer-capable CLIs.

`docs/integrating-a-new-cli.md` captures the full Level 1/2/3
integration playbook for future CLIs — written while doing this so
the steps are tested.

Adapted from upstream chorus-codes/chorus#44 (6a00b00). No conflicts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(local): add Local LLM HTTP shim for OpenAI-compatible endpoints

Adds a `local` lineage that dispatches chat completions to any
OpenAI-compatible HTTP endpoint (Ollama, llama-swap, LM Studio, vLLM,
or anything that speaks `/v1/chat/completions`). No external
subscription or CLI binary required — only a running local inference
server.

Configuration: save a JSON secret under key `local` via Settings →
Local LLM:
  {"base_url": "http://127.0.0.1:11434/v1", "api_key": ""}

Model ids may use a `local:` prefix (e.g. `local:llama3`) which the
shim strips before dispatch, or bare model names directly. When no
secret is saved, falls back to Ollama's default port.

Wiring sweep (extends every exhaustive enum / Record so templates
can declare local voices without Zod errors):
- src/daemon/agents/local.ts — new HTTP shim with JSON.parse guard
  on the secret (yields a typed `config_parse` error event for
  malformed secrets instead of throwing inside the generator)
- src/daemon/agents/index.ts — register localShim, `local:` prefix
  routing in pickShimForVoice, add to isHttpDispatchedShim
- src/daemon/agents/types.ts — `local` in Lineage
- src/lib/template-schema.ts — `local` in both lineageEnum +
  reviewerLineageEnum
- src/lib/cli-health.ts — `local` in CliLineage + ALL_LINEAGES
- src/lib/cli-precheck.ts — empty CRED_PATHS, LOGIN_HINT, skip the
  file probe (same pattern as openrouter — auth lives in secrets table)
- src/lib/cockpit-types.ts — `local` in ReviewerLineage
- src/lib/lineage-maps.ts — `local` in DaemonLineage, UILineage,
  every label/dot/brand map; UI_LINEAGE_DEFAULT_MODEL[local] = ""
  (model IDs are endpoint-specific). Teal dot distinguishes local
  from openrouter's cyan
- src/components/phase-editor/constants.ts — LINEAGES list,
  DAEMON_TO_COCKPIT_LINEAGE
- src/components/template-dialog/constants.ts — COCKPIT_TO_DAEMON,
  DAEMON_TO_COCKPIT, DAEMON_DEFAULT_MODEL, FALLBACK_LINEAGES

Adapted from upstream chorus-codes/chorus#41 (716fa3a). The bundled
upstream commit also included Keychain dual-probe (#38) and
fallback-registry hold-on-success (#42) — those land in follow-up
commits in this PR so each concern is reviewable independently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Greg <7xshadowx7@gmail.com>
Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com>

* feat(grok): Level 3 shim — full reviewer dispatch (happy-path unverified)

Promotes Grok Build from Level 2 (consumer-only) to Level 3 (full
reviewer shim). Chorus can now dispatch to grok-build as a doer or
reviewer in any template.

What's verified (empirically):
- Detection, headless-mode invocation pattern (`grok -p ...
  --output-format streaming-json --yolo --max-turns 1`), error event
  shape, exit-code semantics
- Failure path: free-tier auth produces clean quota_exhausted
  (SuperGrok Heavy subscription required) → voice auto-disables after
  N strikes
- All UI surfaces (model boxes, template-editor lineage picker,
  run-page participant card, cli-status-panel, onboarding picker,
  connect orchestrator)

What's specced but not run live (needs SuperGrok Heavy):
- Happy-path streaming-json text/end event parsing (followed
  `~/.grok/docs/user-guide/13-headless-mode.md` spec)
- Token/cost accounting — Grok doesn't surface usage in end event;
  estimateCostUsd returns 0

New files:
- src/daemon/agents/grok.ts — shim with `--max-turns 1` headless args
- src/daemon/agents/parsers/grok.ts — streaming-json + stderr parser
- tests/grok-parser.test.ts — 18 cases covering happy / error /
  robustness

Lineage sweep (xai daemon lineage was already a legacy alias to
opencode — uses fresh `grok` daemon lineage to avoid colliding with
that mapping; old YAML with `lineage:xai` still routes to opencode):
- Lineage / CliLineage / ReviewerLineage / DaemonLineage / UILineage
- LINEAGE_LABEL / LINEAGE_DOT / UI_LINEAGE_* / UI_LINEAGE_BRAND
- UI_LINEAGE_AVAILABLE_MODELS.grok = ['grok-build']
- UI_LINEAGE_DEFAULT_MODEL.grok = 'grok-build'
- template-schema lineageEnum + reviewerLineageEnum
- DB voices row schema (additive — old rows still validate)
- phase-editor LINEAGES + DAEMON_TO_COCKPIT_LINEAGE
- template-dialog COCKPIT_TO_DAEMON + DAEMON_TO_COCKPIT +
  DAEMON_DEFAULT_MODEL + FALLBACK_LINEAGES
- cli-status-panel + live-run-real helpers
- error-detector auth-prompt regex (SuperGrok signature on its own
  branch ABOVE the generic auth regex — classifies to
  quota_exhausted, not auth_invalid)

Voice seeding: grok-cli registered in SINGLE_MODEL_CLIS — auto-
creates the grok-cli voice (id=grok-cli, lineage=grok,
model_id=grok-build) on first daemon boot when the binary is
detected.

Auth flow: ~/.grok/auth.json file probe OR GROK_CODE_XAI_API_KEY env
short-circuit. Both verified in tests/cli-precheck.test.ts. Daemon
won't spawn grok without one or the other present — prevents the
browser-OAuth flow from hanging headless dispatch.

Total tests: 821 → 842 (+21).

Adapted from upstream chorus-codes/chorus#46 (f9dfba5). Conflicts
resolved by taking the union of fork's `local`-extended enums and
upstream's `grok`-extended enums (every Record / z.enum had to be
extended in both dimensions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com>

* fix(cli-precheck): macOS Keychain dual-probe — also check "Claude Code" service

Claude Code v2.x stores OAuth credentials under two service names depending
on the auth flow:
  - `Claude Code-credentials` — Pro/Max OAuth via `claude login`
  - `Claude Code` (no suffix) — API-key auth + some Console-account flows

The previous single-service probe regressed to auth_missing for API-key
users on darwin. Refactor hasDarwinKeychainEntry to accept string | string[],
iterate candidates, short-circuit on first match. Each probe stays bounded
to 1.5s so a misconfigured keychain can't stall every spawn.

Refs upstream issue #38 / commit 716fa3a.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: PR review — local in voices enum, AGENT_TO_LINEAGE for grok/local, separate cred-precheck vs semaphore bypass

Addresses bot review on PR #3:

- Sourcery P2 (src/lib/db/voices.ts): VoiceRowSchema and VoiceUpsertInput
  only allowed `grok` in the new-lineage slot; `local` voices upserted
  via the (future) Local LLM connect flow would have failed Zod
  validation at runtime. Add `local` to both the enum and the union.

- Codex P2 (src/app/api/run-artifacts/[chatId]/route.ts +
  src/app/runs/[runId]/page.tsx): AGENT_TO_LINEAGE did not map
  `grok-cli` → `grok` nor `local` → `local`, so a real Grok or Local
  participant directory (`reviewer-grok-cli-N`, `reviewer-local-N`)
  resolved to a bogus lineage and rendered as an unbranded extra card
  while the placeholder slot stayed pending.

- Codex P2 (src/daemon/agents/index.ts +
  src/daemon/runner/{doer,reviewer}-driver.ts +
  src/lib/settings/concurrency.ts): the daemon used a single predicate
  `isHttpDispatchedShim` for two unrelated decisions — bypassing the
  CLI-credential precheck AND bypassing the local-CLI semaphore. That
  was safe for OpenRouter (truly remote) but wrong for the Local LLM
  shim, whose default endpoint is Ollama on 127.0.0.1: N concurrent
  reviewers + a doer can thrash VRAM/RAM on consumer hardware. Split
  into `isHttpDispatchedShim` (kept for cred-precheck bypass) and
  `bypassesLocalCliSemaphore` (only openrouter). Add `grok-cli` and
  `local` to CLI_LINEAGES with conservative per-CLI defaults (grok-cli
  matches gemini at 2; local defaults to 1, bump in /settings if your
  endpoint multiplexes).

Tests: 845 pass (unchanged), typecheck clean.

* fix: PR review — CodeRabbit pass (docs/Grok level, init+quickstart+local edges, regex, tests)

Addresses CodeRabbit's first batch of review comments on PR #3:

- docs/integrating-a-new-cli.md: contradictory level for Grok — line 3
  said "detection-only", line 15 said level 2, line 302 said level 3.
  Normalize to level-3 (the shim ships in this PR) and note that the
  level-2 orchestrator coexists for the consumer-side wiring.

- src/cli/commands/init.ts: `--connect grok` was rejected because the
  local Name union, ALL_NAMES list, and the `--connect` option help
  text omitted 'grok' even though detection labels and
  OrchestratorName already accepted it. Adding 'grok' to all three.

- src/cli/commands/quickstart.ts: the "install one of …" guidance
  printed when no CLIs are detected still listed only 5 — extend to
  Grok CLI to match the dispatchable set.

- src/daemon/agents/local.ts:
  * Empty `base_url` (e.g. user saved settings with an empty box)
    was passed through `??` as the URL and surfaced as an opaque fetch
    error; treat empty / whitespace-only as unset and fall back to
    DEFAULT_BASE. Strip trailing slashes while at it.
  * Trailing SSE payload was dropped when the server closed without
    a final blank-line delimiter (older Ollama, some vLLM configs) —
    the last text_delta could silently disappear, truncating answers.
    Extract event-dispatch + payload-extract into local helpers and
    flush the residual buffer after the read loop exits.

- src/lib/cli-detect.ts: grok regex documented "name OR bare-version"
  but only matched the name. Add the bare-version alternative; the
  basename guard already prevents cross-vendor matches.

- tests/grok-parser.test.ts: 4 cases narrowed event[0] under
  `if (events[0].type === 'error')` without a prior `expect(...).toBe`
  on type — a non-error event silently skipped the inner assertions.
  Add explicit type expectations before the narrowing.

Tests: 845 pass (unchanged), typecheck clean.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Greg <7xshadowx7@gmail.com>
Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com>

* feat: fold upstream contributor stack — repoPath default + CRLF persona parser (#4)

Folds the cross-platform pieces of upstream commit 781bc42 ("Contributor
stack: claude orchestrator + repoPath + Windows spawn (#39)") into the
fork, intentionally omitting Windows-specific hunks.

Included:
  - src/mcp/tools.ts: add safeCwd() helper + default `repoPath` on
    create_chat to safeCwd() when caller omits it. Previously the daemon
    fell back to its own cwd (packageRoot), which caused relative file
    paths in `files: [...]` to silently resolve to the chorus install
    dir and miss. MCP servers spawned by Claude Code / Codex / Gemini
    inherit the host's cwd (= the user's project), so safeCwd() lands
    at the right path automatically. safeCwd() also catches ENOENT from
    process.cwd() and falls back to homedir.
  - src/lib/personas.ts: normalize CRLF → LF in the frontmatter parser
    so persona .md files checked out with Windows line endings don't
    fail `missing YAML frontmatter`. Cross-platform safe.
  - src/daemon/orchestrators/index.ts: drop stale comment block about
    Claude having a project-config side-effect (the fork's orchestrator
    long since moved to user-scope).
  - tests/mcp-create-chat-repo-path.test.ts (+4 tests): cover explicit
    repoPath, cwd default, full-body forwarding, and ENOENT fallback
    to homedir.

Omitted (Windows-only hunks):
  - src/cli/commands/update.ts (shell: win32 for npm self-update)
  - src/daemon/routes/system.ts (shell: win32 for opencode probe)
  - src/daemon/orchestrators/{codex,gemini,kimi}.ts (shell: win32 tweaks)
  - src/lib/cli-detect.ts (SAFE_WIN_PATH regex + buildVersionSpawn)
  - src/lib/voices.ts (discoverNpmPrefixes Windows shell)
  - tests/cli-detect.test.ts (Windows-specific cmd.exe escape tests)

Also omitted:
  - src/daemon/orchestrators/claude.ts: upstream shells out to
    `claude mcp add --scope user`. Fork already implements user-scope
    registration via direct ~/.claude.json patch (more robust — no
    dependency on `claude` binary in PATH at registration time, plus
    sweeps stale project-scoped entries). Keeping fork's version.
  - tests/claude-orchestrator.test.ts: tests the upstream shell-out
    approach the fork doesn't use.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address PR #1 bot review batch — sendError migration, abort+cancel, schema CHECKs, audit + orchestrate hardening

Sweep of fixes for CodeRabbit + ChatGPT Codex review on PR #1. Grouped
into one commit because the surface is broad but every change is small,
review-driven, and verified together (typecheck clean, vitest 849/849,
lint 0 errors).

Routes — sendError vs errorResponse (CodeRabbit Critical/Major):
  - chats-from-pr.ts catch → sendError(reply, ...) so 5xx errors carry
    the right HTTP status instead of bare 200 + ok:false body.
  - voices.ts GET list / GET :id / POST / PUT / DELETE all migrated;
    DELETE handler gains the missing reply param.
  - Drop now-unused errorResponse imports in both files.

Quickstart abort propagation (CodeRabbit Major):
  - pollChat fetch passes the signal so a SIGINT or timeout interrupts
    the in-flight request instead of waiting for the daemon's response.
  - 1500 ms inter-poll sleep wakes on abort instead of always blocking
    its full duration after the signal fires.
  - Timeout path now also POSTs /chats/:id/cancel (extracted shared
    `cancelRemote` helper), matching the SIGINT handler so timed-out
    runs don't leave the daemon reviewing in the background.

start.ts best-effort openBrowser (CodeRabbit Major):
  - Both `chorus start` paths catch openBrowser rejection so a failing
    `open` doesn't fail the whole command when the daemon is already
    healthy. Matches scheduleAutoOpenBrowser's existing behaviour.

Codex headless GitHub transport (CodeRabbit Major):
  - HeadlessSpawnOptions gains optional `transport` mirroring
    AgentSpawnOptions.
  - codex.buildHeadlessArgs flips network_access on for transport
    === "github", matching buildLaunchCommand. Previously headless
    GitHub runs couldn't reach github.com or call out via gh.

CLI health auth-kind mapping (CodeRabbit Minor):
  - kindToStatus now maps auth_invalid and auth_missing to
    "auth_invalid" so Grok auth failures render the right cockpit CTA
    instead of "unknown".

Voice-failure-tracker hasResetAt streak reset (CodeRabbit Major):
  - When the upstream promises recovery, also clear any prior strike
    counter. Pre-fix, permanent-fail → resetAt-fail → permanent-fail
    tripped the threshold on the first permanent strike instead of
    the second.

Schema CHECK constraints (CodeRabbit Major):
  - schema.sql + connection.ts migrations add CHECKs on bypass_quota
    (0/1), tier ('low'/'medium'/'high'), and monthly_budget_usd
    (NULL or >= 0). Guards scheduler inputs at the DB layer for both
    fresh installs and migrated DBs.

MCP createChat dead conditional spread (CodeRabbit Minor):
  - safeCwd() is the deliberate fallback per upstream contributor PR.
    Drop the dead `...(parsed.repoPath !== undefined …)` spread that
    just re-set the same value the unconditional `repoPath` field
    already sent.

github-pr.ts ENOENT classifier (ChatGPT Codex Major):
  - classifyGhFailure now recognises Node's `spawn gh ENOENT` shape so
    the documented first-run path (paste PR URL before installing gh)
    returns the actionable gh_not_installed code instead of db_error.

Claude orchestrator trailing newline (CodeRabbit Trivial):
  - registerClaudeMcpServer JSON write gains the trailing "\n" used by
    connectClaude, keeping ~/.claude.json byte-for-byte stable.

runner-multiplex chat-scoped warning persistence (CodeRabbit Major):
  - cli_warning / cli_error events that arrive without a valid
    phaseKind (e.g. attached_files_invalid emitted before any phase
    starts) now skip phaseEvents.create instead of being coerced into
    a synthetic 'review'/'reviewer' row. The chatLogger path already
    captured the warning; live subscribers got it from the original
    onEvent.

doer.ts answerFile init guard (CodeRabbit Major):
  - Wrap the initial fs.writeFileSync(answerFile, "") in try/catch so
    EACCES/ENOSPC at startup emits a cli_error (kind:
    answer_init_failed) with a usable CTA instead of bypassing the
    failure path and leaving the chat dir empty.

cli-precheck kimi default_model gate (CodeRabbit Major):
  - Only enforce ~/.kimi/config.toml default_model when an actual
    kimi-cli credential file is present. moonshot voices routed via
    opencode are authed entirely by opencode and never touch ~/.kimi/
    — hard-failing them here rejected healthy setups.

audit.ts preset-load + id uniqueness (CodeRabbit Major):
  - loadPresetPrompt now wraps in try/catch and emits phase_failed
    (reason: preset_load_failed) instead of letting the promise reject
    after phase_start fires.
  - AuditItem.id uniqueness is enforced before persisting
    audit-output.json; duplicates emit phase_failed (reason:
    invalid_output) since orchestrate selection is id-keyed.

orchestrate.ts checkout failure path (CodeRabbit Major):
  - Capture `git checkout <startingBranch>` result. On failure, push a
    failed manifest entry and emit phase_failed (reason:
    checkout_failed) instead of silently letting the next worker stack
    on top of the prior worker's branch and polluting diff stats.

new/page.tsx PR flow stale repoPath (CodeRabbit Major):
  - handleStartFromPr no longer forwards the shared repoPath state.
    The input is cleared + disabled in reviewOnly mode but the state
    can still hold a stale value from a mode switch — never send it.

Lint: react/no-unescaped-entities (CodeRabbit Minor):
  - Three apostrophes in JSX text escaped to &apos; (page.tsx ×2,
    run-checklist/index.tsx ×1). 0 errors remaining.

orchestrate-manifest URL validation (CodeRabbit Nitpick):
  - Validate the `PR opened: <url>` href via new URL() and require
    http/https before rendering as an anchor; fall back to plain text
    on parse failure or weird scheme.

Preset markdown H1 (CodeRabbit Minor, MD041):
  - architecture-review.md, de-slopify.md, engineering-review.md gain
    a top-level H1 to satisfy markdown lint.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: PR #1 round-2 — Lineage parity, strict body validation, fence escape, Windows paths

Addresses 4 MAJOR + 1 acknowledged issue from CodeRabbit's 2026-05-18 review batch:

- voices.ts + db/voices.ts: extend Lineage enum with `openrouter`, `local`,
  `grok` so the route validators stop rejecting legitimate rows already
  supported by cli-precheck and the shim registry. Mirror in the DB schema
  (z.enum) and the VoiceUpsertInput union — they're three independent
  declarations of the same set, all need to track Lineage in agents/types.ts.

- chats-from-pr.ts: tighten request-body validation. Truthiness checks let
  non-string truthy `url`/`templateId` (e.g. `{}` or `42`) slip through and
  fail deep inside parsePrUrl as opaque server errors instead of clean 400s.
  Added strict `typeof === "string" && trim().length > 0` plus optional
  yolo type check.

- github-pr.ts: dynamic backtick fence around the diff body. Markdown/docs
  PRs frequently contain literal ``` fences; a fixed-width fence would close
  early and let the rest of the diff escape into the artifact prose,
  corrupting the prompt boundary for review-only chats. Now picks a fence
  one backtick longer than the longest run in the diff (min 3).

- new/page.tsx: accept Windows absolute paths (`C:\repo`, `\\server\share`)
  alongside POSIX. The audit-a-repo tab was unusable on Windows because the
  UI hard-coded `startsWith("/")`, even though cli-detect / runtime-path /
  settings-transport already handle win32 server-side.

Declined: CodeRabbit nitpick on formatBranchName (orchestrate.ts:124-133).
chatId is a server-issued ULID (generateUlid in lib/db/chats.ts) — all-
alphanumeric by construction — and branchPrefix already has a zod regex
guard from commit e93ce00. No real injection vector.

- pnpm exec tsc --noEmit — clean
- pnpm exec vitest run tests/voices.test.ts tests/voices-route-validation.test.ts tests/github-pr.test.ts tests/db.test.ts — 99/99 passing

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: chorus-codes <info@chorus.codes>
Co-authored-by: Julien Deudon <deudon.j@gmail.com>
Co-authored-by: Lumina Mao <luminamao@mac.lan>
Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com>
Co-authored-by: Yura <yurahalych@gmail.com>
Co-authored-by: Greg <7xshadowx7@gmail.com>
crypticpy added a commit that referenced this pull request May 18, 2026
…iet (#6)

* fix: cred detection + Claude MCP user-scope registration

Three fixes from chorus-issues.md that prevent a freshly-installed chorus
from finding the user's existing CLI credentials, so the daemon starts up
cleanly on machines that already have Claude / Kimi / moonshot configured.

#1: register Claude MCP at user scope. The chorus MCP entry now writes to
the top-level `mcpServers` block in `~/.claude.json` (idempotent), and any
stale chorus entry under the project-scoped `projects[homedir].mcpServers`
is cleaned up. Previously the project-scoped registration was invisible to
Claude Code launched outside that exact cwd.

#2: cred-path fallbacks. When the anthropic file check misses (e.g. user
authed via Claude Desktop, no `~/.claude/...` JSON), fall back to the macOS
Keychain via `security find-generic-password -s "Claude Code-credentials"`.
Added `~/.kimi/credentials/kimi-code.json` to the moonshot CRED_PATHS so
users who authed through `kimi-code` aren't told to log in again.

#3: kimi config-missing precheck. New layer-3 check parses
`~/.kimi/config.toml` and surfaces a `config_missing` reason when there's
no top-level `default_model` set — the CLI will silently pick whatever
backend it likes, which is rarely what the user wants.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: reviewer fidelity, verdict surfacing, event/prompt isolation

Seven fixes from chorus-issues.md covering the rest of the runner +
MCP-surface issues found while reviewing PR #26 of foresight-app.

#4: thread `repoPath` through reviewer subprocesses. `runReviewers` →
`runReviewer` → `runReviewerHeadless` now accept the chat's repoPath and
the reviewer's cwd switches to it when set, so `gh`, file reads, and
sandboxed CLIs (Gemini) see the actual code instead of running in an
empty per-reviewer scratch dir.

#5: surface reviewer answer.md in MCP responses. New `readReviewerArtifacts`
helper walks `~/.chorus/chats/<id>/round-N/reviewer-*/answer.md`, caps each
at 16 KiB, sorts by (round desc, agent asc), and merges the result into
`wait_for_chat` and `get_chat_status` payloads under `reviews`. Both the
doer and reviewer `participant_done` events now carry `outputPath` so MCP
clients can read the on-disk source of truth when they need more than the
streamed tail.

#6: bump phase_progress output tail from 500 B to 8 KiB. The 500-byte
slice clipped reviewer summaries mid-word; full text remains on disk and
is pointed to by `outputPath`. Affects both reviewer.ts and doer.ts.

#7: tri-review verdict on `max_rounds_exhausted`. When the doer succeeded
every round but reviewers kept saying request_changes through the round
cap, chat_done now emits `status: completed, verdict: request_changes,
reason: max_rounds_exhausted` with the last round's reviewer summary —
previously misclassified as a generic doer failure.

#8: refactor `CreateChatSchema` and `InvokePersonaSchema` to plain
`z.object()` with per-field `.describe()`. The prior `.transform()` wrapped
them in `ZodEffects` which strips the `properties` map from MCP
introspection — clients saw an empty schema. Legacy `template` alias and
the `code-review` default moved into a new `resolveTemplateId()` helper.

#9: dedup `participant_done` at the multiplex layer. Same-slot fallbacks
or parsers that emit `message_done` twice (the opencode parser
historically does this) used to fan duplicate terminal events out to
every subscriber; now keyed by `(phaseIdx, round, role, agent)` and
later duplicates drop silently.

#10: per-instance reviewer prompt isolation. Same-lineage instances
(claude-code-2/4/5, etc.) share the chat dir tree at
`~/.chorus/chats/<id>/round-N/reviewer-*/`; tool-using CLIs were
wandering into a sibling's answer.md mid-flight and short-circuiting
("the review is complete" referring to a different agent's work).
`buildReviewerAsk` now stamps an Independence directive when more than
one reviewer slot exists, naming the slot tag and forbidding cross-slot
reads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: replay chat_done from persisted verdict, not status

The synthetic chat_done emitted when a terminal chat is re-attached
derived `verdict` from `chat.status`, ignoring the `chat.verdict` column.
Since the previous commit shipped the `max_rounds_exhausted` branch
(chorus-issues.md #7), a chat can finish with `status='approved'
verdict='request_changes'` — replay was clobbering that to `approved`
on every page reload, hiding reviewer disagreement from the user.

Use the persisted column when set; fall back to the old
status-derived value only for pre-v0.8.27 rows where verdict is null.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: surface dropped attached_files + SSE backpressure; harden ship.ts

Three audit follow-ups on the daemon side, all surfacing previously
silent failures.

attached_files: parseAttachedFiles in runner-multiplex.ts used to
swallow JSON parse errors and run the chat with no attachments. Refactor
to a tagged result (`empty` / `ok` / `invalid`); on `invalid` the runner
logs and emits a `cli_warning` SSE so the cockpit + MCP clients see
which chat lost its file list.

SSE backpressure: when a subscriber's queue exceeds the 1000-line cap
the multiplex used to silently drop the connection. Now writes one
`error` frame with code `sse_backpressure` before close, and logs the
queue length to daemon.log so an operator tailing logs can see when
clients fall behind.

gh pr create URL validation: ship.ts captured stdout's last line as the
PR URL with no shape check; an empty/malformed stdout produced
`{ok: true, prUrl: ''}` and the chat row recorded "shipped" with an
unclickable link. Now matches against
`^https://github.com/<owner>/<repo>/pull/<n>` before declaring success.

detectGitContext parallelization: the five spawnSync probes (is-repo,
remote, gh --version, gh auth, HEAD) ran sequentially at 60s each —
worst case 360s before runner saw a result. Converted to async with a
new `runAsync` helper, batched via Promise.all with a 15s per-probe
cap; detectDefaultBranch's symref + three branch-existence checks
likewise parallelized. detectGitContext is now async; the lone caller
in runner.ts awaits it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: bound failure-summary regex; log malformed SSE frames

participant-card.tsx: parseFailureSummary ran the multi-step regex
chain over the full participant.answer string. Reviewer answers can be
up to 256 KB; on every render that's a UI-thread block. Slice to the
first 16 KiB before scanning — the failure-header block is always
written at the top of answer.md by reviewer.ts/doer.ts, so the cap
never loses signal.

live-run-real/index.tsx: the SSE onmessage handler already had a
try/catch around JSON.parse, but the catch was silent — a wire-format
mismatch dropped events with no trace. Add a console.warn with a
preview so devs notice schema drift in DevTools.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: github PR ingestion via gh CLI

Adds src/daemon/github-pr.ts: parsePrUrl + fetchPrArtifact run
gh pr view/diff plus review and issue comments in parallel,
synthesize a Markdown artifact (description, comments capped at
50 newest each, diff capped at 200 KB UTF-8 safe), and classify
gh failures into typed reasons.

Exports runAsync from ship.ts so the new module can reuse the
existing spawn+timeout helper instead of duplicating it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: extract createChatFromValidatedInputs helper

Pulls the template lookup, artifact validation, chat row + opening
phase event creation, and runner kickoff out of the POST /chats
handler into a reusable helper. POST /chats now only handles its
route-specific concerns (body shape, repoPath canonicalization,
error response shaping).

Sets up reuse from the upcoming POST /chats/from-pr endpoint
without duplicating ~150 lines of validation logic.

No behavior change — same template checks, same artifact rules,
same kickoff path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: POST /chats/from-pr — start a chat from a GitHub PR URL

Accepts { url, templateId, repoPath?, yolo? }, parses the PR URL,
fetches PR meta + diff + existing comments via gh CLI, synthesizes
a Markdown artifact, and creates the chat through the shared
createChatFromValidatedInputs helper.

gh failures map to typed reasons (invalid_url, gh_not_installed,
gh_not_authed, pr_not_found, network_failure, unknown) so the
cockpit can render actionable errors instead of generic 500s.

Adds tests/github-pr.test.ts covering parsePrUrl edge cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: cockpit "GitHub PR" tab on /new

Adds a Free-form / GitHub PR mode toggle on the new-chat page. PR
mode swaps the prompt textarea for a URL input and routes through
the new POST /chats/from-pr endpoint. Validates client-side that
the chosen template is review-only before letting the user submit.

createChatFromPr API client surfaces the daemon's typed PR meta
(owner/repo/number/title/branches) on the response so callers can
display PR context after the chat is created.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: review_pr MCP tool

Exposes POST /chats/from-pr through MCP. Orchestrators (Claude Code,
Codex, Cursor) can now hand chorus a PR URL and get reviewers running
against it without going through the cockpit. Defaults templateId to
review-only so a caller can pass just a URL.

ReviewPrSchema is a plain z.object (not ZodEffects) so MCP clients
can introspect required fields — same hazard documented on
CreateChatSchema and InvokePersonaSchema.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: capture multi-identity CLI follow-up idea

Idea note for running chorus against multiple paid accounts on the same
CLI binary (work + personal Claude Code Max, etc.). Filed as follow-up
after audit-presets + quota tiers ship — captures the env-override
mechanism, proposed Identity primitive, and open questions on keychain
CLIs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: schema for audit + orchestrate phases, voice tier, bypass_quota

Adds the foundation for repo-pointed audit-and-orchestrate runs and
the orchestrator's task↔voice tier matching.

Template schema:
- AuditPhase (kind: 'audit') — single reviewer voice + one of five
  preset lenses (de-slopify, monolith-breakdown, code-review,
  engineering-review, architecture-review). Output schema
  (AuditItemSchema, AuditOutputSchema) lives next to the phase shape
  so the structured-output adapter, scheduler, and cockpit checklist
  agree on the contract.
- OrchestratePhase (kind: 'orchestrate') — array of worker voices,
  default branchPrefix `chorus/{chatId}/worker-{idx}` so each worker
  gets isolated git state.
- templateRequiresRepo() helper for the cockpit's repo-picker gate.

Voices:
- Adds tier ('high' | 'medium' | 'low', default 'medium') and
  monthly_budget_usd (nullable) to the row schema, upsert input, and
  update input. Idempotent migrations on existing DBs.

Chats:
- bypass_quota INTEGER NOT NULL DEFAULT 0 — set on PR-review chats so
  the orchestrate scheduler runs every enabled voice at full capacity
  instead of tier-gating.

Runner is stubbed for the new kinds: phase_done emit + continue, so
templates that declare an audit/orchestrate phase before the runner
logic lands don't crash.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: structured-output adapter for CLI voices

Wraps an AgentShim's runHeadless with JSON-formatting prompt scaffold
and a one-shot repair loop, returning typed data validated against a
caller-supplied zod schema.

Used by the upcoming audit phase (which needs typed AuditItem[]
instead of free-form prose) and the orchestrate phase (worker
results). Keeps each CLI lineage's existing headless transport — the
adapter just owns the prompt-shape + parse-and-validate dance.

Extraction strategy: prefer direct JSON.parse of finalText; fall back
through fenced-block regex variants to a brace-to-brace slice. On
parse or schema-violation, retry once with a repair prompt that quotes
the validation error. Spawn errors short-circuit (the model never saw
the prompt — repair would just retry the same failure).

Tests cover happy path, fenced-block extraction, repair-loop success,
repair-loop exhaustion, schema violation, and spawn error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cockpit): audit-a-repo tab + checklist approval component

/new gets a third tab beside Free-form and GitHub PR. In audit mode
the user picks one of five preset lenses (de-slopify,
monolith-breakdown, code-review, engineering-review,
architecture-review) and supplies an absolute repo path. Submit fires
createChat with templateId=`audit-<preset>` — those built-in
templates land with the audit-phase implementation.

RunChecklist component lives at src/components/run-checklist/. It
takes the AuditItem[] surfaced by the audit phase's blocking event
and renders one row per item with a checkbox, complexity badge,
rationale, and file list. Default state has every item selected; the
user trims, then submits via the parent's onSubmit which JSON-encodes
the selected ids into the existing /chats/:id/resume `answer` field.
Wiring into the live-run UI lands with the audit phase.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: PR-review chats bypass quota + tier surface on /voices

PR-review chats automatically set bypass_quota=true so the orchestrate
scheduler ignores voice.tier and runs the full fleet at maximum
capacity — reviews are short, parallel, and the user wants the
strongest opinion possible regardless of model tier.

PUT /voices/:id now accepts tier ('high' | 'medium' | 'low') and
monthly_budget_usd (non-negative or null), so the cockpit fleet page
can label voices by capability for the orchestrate scheduler to route
work against. Tests cover both new fields plus a chat round-trip
asserting bypass_quota defaults false and persists when set.

* feat: audit phase + 5 presets + audit-* templates

Wires the audit phase end-to-end:
- src/daemon/phases/audit.ts runs the structured-output adapter against
  the chosen preset, persists the parsed AuditItem[] to
  <chatDir>/audit-output.json plus raw model output to
  round-1/audit/output.md, and emits phase_progress with the items.
- src/daemon/runner.ts replaces the audit/orchestrate stub: audit
  invokes runAuditPhase, flips chat status to blocked so the cockpit
  renders the checklist UI, and exits cleanly. Orchestrate keeps the
  no-op stub until step 5 lands.
- 5 preset prompts (de-slopify, monolith-breakdown, code-review,
  engineering-review, architecture-review) frame what each lens looks
  for. The structured-output adapter handles JSON formatting; presets
  describe the audit lens only.
- 5 audit-* templates (one per preset), each a 2-phase audit -> orchestrate
  shape with three default workers. Auto-loaded by seedBuiltinTemplates.
- tests/audit-phase.test.ts covers preset-file presence and the
  audit-* template parse + shape contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: orchestrate phase + audit-resume wiring + tier-aware scheduler

Wires the audit→orchestrate handoff: the cockpit POSTs the user's
trimmed audit checklist to /chats/:id/resume, the resume handler
cross-checks ids against audit-output.json, persists the selection,
flips chat to drafting on the orchestrate phase, and re-fires the
runner. The runner now starts at chat.current_phase_idx so a resumed
chat lands directly on orchestrate.

The new orchestrate phase walks the approved AuditItem[] sequentially
(parallelism is an explicit non-goal for v1), picks a worker per item
via the pure tier-aware scheduler, cuts a per-item branch, dispatches
the worker via shim.runHeadless, captures git diff --stat, and
persists orchestrate-manifest.json for the diff-apply UI to consume.

The scheduler is a pure function with 9 unit tests covering tier
matching, bypass override, disabled-voice skipping, empty pool, and
unknown voice ids. Resume route has 10 tests exercising body
validation, id cross-check, status gating, and the happy path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: orchestrate manifest UI + checkout/open-pr daemon routes

- Run page reads audit-output.json + orchestrate-manifest.json on render
- LiveRunReal renders RunChecklist while blocked w/ audit items, then
  swaps to OrchestrateManifest panel once orchestrate completes
- New OrchestrateManifest component shows one row per worker w/
  Checkout / Open PR buttons (per-row inline feedback, no global toast)
- Daemon: GET /chats/:id/audit-items, GET /chats/:id/orchestrate-manifest,
  POST /chats/:id/workers/:idx/checkout (refuses on dirty tree),
  POST /chats/:id/workers/:idx/open-pr (gh pr create, bucketed failures)
- OrchestrateManifestSchema added to template-schema.ts; route + UI
  parse via the same shape

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: harden resume race + branch validation + symlink TOCTOU + extractJson

Address /freview findings on the audit + orchestrate flow:

- Resume race (BLOCKER): two concurrent POSTs to /chats/:id/resume could
  both pass the `status=='blocked'` check and double-fire the runner.
  Guard with `getActiveRun` (catches the audit-finishing window before
  `.finally` clears the registry) and replace the status flip with an
  atomic `tryResumeFromBlocked` CAS conditional on `WHERE status =
  'blocked'`.
- Branch-name argument injection (BLOCKER): tighten zod regexes on
  `OrchestratePhase.branchPrefix` and `OrchestrateManifestEntry.branch`
  so values starting with `-` (or containing shell metachars) cannot
  flow into `git checkout` / `gh pr create` as flags.
- Symlink TOCTOU on checkout + open-pr (NON-BLOCKER): re-realpath
  `existing.repo_path` before passing to execFile cwd, mirroring the
  rerun-path pattern. Returns a structured validation error if the
  path no longer resolves.
- extractJson Path 4 (NON-BLOCKER): try `{...}` and `[...]` slices
  independently and prefer the longer parse, so prose like
  "mentions [stuff] before {object}" extracts the object instead of
  the bracket.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: prod CJS build — drop import.meta + copy presets to dist

Two issues blocked `pnpm build:server`:

- `audit.ts` used `import.meta.url` for module-relative path resolution,
  but the server tsconfig compiles to CJS where `import.meta` is a
  syntax error. Replaced with `__dirname`, which works in both the
  compiled dist (native CJS) and tsx-driven dev (tsx ≥4 shims it in
  ESM mode).
- The `build:server` script copied `schema.sql` to dist/ but missed the
  preset markdown files in `src/daemon/presets/`. The audit phase's
  `loadPresetPrompt` resolves relative to `__dirname`, so a published
  install was hitting ENOENT on every audit run. Extended the copy
  step to mirror the preset directory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: fold upstream T1+T2 fixes back into fork (12 commits) (#2)

* feat(cli): add diagnose command + crash-hook

Bundles two upstream changes that ship a self-service triage path for
chorus users hitting opaque failures:

- `chorus diagnose` walks the install, daemon, recent failed chats,
  voice health, and produces a sharable bug report.
- Crash hook captures uncaught exceptions in the CLI and writes them
  to a crash log alongside instructions to attach during a bug report.

Folded back from upstream chorus-codes/chorus:
  7ea712b feat: chorus diagnose command + crash hook for bug reports (#1)
  4a5ea20 fix(diagnose): realpath bin path + filter Next.js SSE noise (#4)

Co-Authored-By: chorus-codes <info@chorus.codes>

* feat(cli): add quickstart self-test command

`chorus quickstart` runs a 30-second activation flow that verifies
the daemon comes up, the SQLite DB initializes, and a minimal chat
round-trips end-to-end. Aimed at first-run users who want to know
"is this thing actually working" before authoring a template.

Folded back from upstream chorus-codes/chorus:
  56610cf feat(cli): chorus quickstart — 30-second activation self-test (#30)

Co-Authored-By: chorus-codes <info@chorus.codes>

* fix(cli): use dynamic import for open package (Node 22 ERR_REQUIRE_ESM)

The `open` package and `chokidar` are both ESM-only as of recent
versions. On Node 22 (the daily-driver target) static `require()`
calls into them throw ERR_REQUIRE_ESM and crash the CLI at boot.

Switch to dynamic import in:
- src/cli/commands/start.ts (open browser after boot)
- src/cli/open-browser.ts (new helper)
- src/cli/index.ts (route open import)
- src/daemon/output-watcher.ts (chokidar file watch)

Includes upstream's post-merge hardening: the setTimeout that triggers
the browser-open no longer wraps an async callback bare, so a missing
default browser doesn't surface as an unhandled rejection.

Folded back from upstream chorus-codes/chorus:
  e8ca2ee fix(cli): dynamic import for open package (#14)
  dcd1837 fix: post-merge hardening for #14 (start.ts portion only;
          cli-precheck.test.ts portion ships with the Keychain fix)

Co-Authored-By: Julien Deudon <deudon.j@gmail.com>
Co-Authored-By: chorus-codes <info@chorus.codes>

* feat(cockpit): seed empty round-1 so QUEUED renders from t=0

Before: when a chat starts but no reviewer has produced an event yet,
enrichRounds returned an empty rounds array and the live-run page
showed nothing for several seconds — the user couldn't tell whether
their chat had launched.

After: seed a synthetic round-1 with QUEUED placeholders for every
expected participant so the page renders the per-reviewer cards
immediately. Real events overwrite placeholders as they arrive.

Folded back from upstream chorus-codes/chorus:
  53e8fb6 feat(cockpit): seed empty round-1 so QUEUED placeholders
          render from t=0 (#2)

Co-Authored-By: chorus-codes <info@chorus.codes>

* feat(daemon): runtime fallback-collision dedup across reviewer slots

When two reviewer slots both fall through their per-slot chains to the
same template-level fallback target (common case: every slot ends in
anthropic/claude-sonnet-4-6), both used to dispatch the same (lineage,
model) in parallel — wasted cost and the lineage diversity that's the
point of multi-LLM peer review collapsed.

Build-time dedup (template-fallback.ts) couldn't catch it because each
slot only knows about other slots' PRIMARIES, not their fallback chains.

Fix: new per-chat/per-round (lineage, model) registry. reviewer-driver
tryClaim's before each chain attempt and releases in a finally. On
collision, return null + emit cli_warning(reason='fallback_collision')
so runWithChainFallback advances to the next entry and the cockpit can
show why the slot skipped.

Ported into fork's reviewer-driver.ts surgically so the verdict-isolation
refactor (2a2cde2) and per-slot repoPath threading stay intact.

Folded back from upstream chorus-codes/chorus:
  c4751fe feat(daemon): runtime fallback-collision dedup (#3)

Co-Authored-By: chorus-codes <info@chorus.codes>

* fix(daemon): write REVIEWER FAILED summary on pre-spawn failure

Before: when a reviewer's precheck fails (e.g. underlying CLI not
installed) or the chat is cancelled while the slot is queued for a
CLI semaphore slot, runReviewer used to return null silently —
leaving NO on-disk participant directory. The cockpit's enrich-rounds
loop then couldn't reconcile the synthesised template slot against
any real participant, so the card sat at "Queued — waiting for an
open slot." forever and the actual error was invisible.

Reproduction: install chorus on a host with only one CLI on PATH
(e.g. just claude-code), open a template that includes lineages
requiring codex/gemini/kimi, fire it. Every reviewer card stayed
"Queued" — chat never visibly progressed even though it was already
done failing.

Fix:
- Create the reviewer dir BEFORE the precheck runs.
- Add a writePreSpawnFailure helper that writes a `## REVIEWER FAILED`
  summary in the canonical format (Kind / Lineage / Model / message)
  that the cockpit's `parseFailureSummary` already understands.
- Wire it into the precheck-failed and cancelled-while-queued paths.

Card now transitions out of pending and shows the actual error
(cli_missing, cancelled, ...).

Folded back from upstream chorus-codes/chorus:
  afc59cc fix(daemon): REVIEWER FAILED summary on pre-spawn failure (#26)

Co-Authored-By: chorus-codes <info@chorus.codes>

* feat(voices): auto-disable on persistent quota_exhausted + lsof timeout

Real pain (upstream #11): a Pro Gemini model on a Flash-only account
fails every chorus run with "exhausted your capacity on this model"
— but Gemini doesn't return a resetAt because the model isn't going
to become available for that account. Without auto-disable, the
runner keeps picking the dead voice on every chat and the user keeps
seeing the same opaque error.

Voice auto-disable:
- New src/lib/voice-failure-tracker.ts records per-voice consecutive
  quota_exhausted strikes in a settings counter.
- Trigger: 2 consecutive strikes WITH no resetAt → set
  voices.enabled=false + disabled_reason='auto_quota'.
- Counter resets on participant_done success; rate-limit strikes
  (hasResetAt=true) bypass the counter entirely so a transient
  429 + a later permanent failure can't trip the threshold on the
  first permanent strike.
- Wired into reviewer-driver alongside recordHealth; emits a
  cli_warning(reason='voice_auto_disabled') so the cockpit can show
  a one-line explanation.
- VoiceDisabledReason union gains 'auto_quota' (schema column was
  already TEXT — no migration).

Lsof timeout (upstream #12):
- findPidsOnPort and findPidsOnPortWithSudo now bound execSync /
  execFileSync to 3s, so a slow-but-functional lsof on a loaded
  macOS box doesn't hang chorus boot. 3s leaves headroom while
  still bounding the hang case.

Ported into fork's reviewer-driver.ts tmux pollHandle + success
path. voices.ts disabled_reason union extended alongside fork's
voice-tier column.

Folded back from upstream chorus-codes/chorus:
  4f6becc v0.8.30 — voice auto-disable (#11) + lsof timeout (#12) (#17)

Co-Authored-By: chorus-codes <info@chorus.codes>
Co-Authored-By: Lumina Mao <luminamao@mac.lan>

* fix(daemon, schema): codex isolation + template-schema validation

Two issues caused chats to fail opaquely at run-start:

CODEX ISOLATION (#10, #16)
The user's ~/.codex/config.toml may declare MCP servers, plugins, or
notification hooks. In headless `codex exec` those integrations have
caused codex to hang or cancel mid-call — two independent
reproductions: codex as our reviewer (#10) and codex as MCP client of
chorus (#16). Add --ignore-user-config to every headless codex argv.
Extracted to a pure `buildHeadlessArgs(opts)` so the argv shape is
unit-testable.

TEMPLATE VALIDATION (#15)
`reviewer.require > candidates.length` used to surface as "Job moves
immediately to failure upon Start press" — the runner queued, failed
to grant enough slots, and emitted an opaque chat-failure. Same for
`require > distinct lineages` when crossLineage:true. Both now
caught at TemplateSchema.parse() time with a clear error message
the user can fix before the run starts.

ReviewerSchema.superRefine() additions slot in cleanly alongside the
fork's audit/orchestrate phase schema work — both are additive
constraints on the same ReviewerSchema object.

Folded back from upstream chorus-codes/chorus:
  8ed970b fix(daemon, schema): codex isolation + template validation

Co-Authored-By: chorus-codes <info@chorus.codes>

* fix(runner): honour iterate.onDisagreement accept-doer/escalate

The template schema, cockpit dialog, and SPEC-D-templates have always
exposed three values for iterate.onDisagreement — 'continue', 'escalate',
'accept-doer' — but the runner only honoured 'continue'. Picking the
other two from the cockpit form was a silent no-op: chats fell through
to phase_failed with 'doer_failed_all_rounds' regardless.

This wires both new branches into the round loop and the terminal
chat_done emission:

- 'accept-doer': after maxRounds without consensus, mark doerSucceeded
  and continue. The chat carries on (subsequent phases, ship, approval)
  as if reviewers had agreed on the doer's last answer.
- 'escalate': halt with status='failed' but verdict='request_changes'
  and error='escalated_on_disagreement', so cockpits can render
  "reviewers disagreed, needs human" distinctly from "doer broke."

Policy table extracted into a pure decidePhaseOutcome() helper so the
3 × 2 input matrix (policy × disagreement-in-last-round) is unit-tested
without standing up the full runChat scaffold.

Gated on disagreementInLastRound (reset at top of every round + on
doer-crash path) so a partial / empty doer answer can never be silently
"accept-doer"'d as final. Preserves the fork's existing
standardPhaseRoundsExhausted #7 surfacing for the 'continue' path; the
'escalate' path takes precedence with its own distinct chat_done.

Upstream PRs #49, #50 (commit 67572e9).

Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(cli-precheck): cover macOS Keychain fallback for Claude Code v2

The fork already implements the Keychain fallback in cli-precheck
(hasDarwinKeychainEntry). This adds the missing test coverage:

- passes when no cred file but keychain entry exists
- blocks when no cred file and no keychain entry
- skips keychain check when cred file exists (fast-path preserved)
- does not consult keychain for non-anthropic lineages

vi.mock('node:child_process') uses the importOriginal spread pattern so
spawn / exec / etc. keep their real implementations — a bare module
replacement would silently break any sibling test that imports from
child_process.

Upstream PRs #7, #8, plus the dcd1837 test-mock hardening.

Co-Authored-By: Yura <yurahalych@gmail.com>
Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cockpit): derive candidatesWithModels from snapshot's candidates field

Daemon-side TemplateSchema only carries `candidates` on each ReviewerRule.
The cockpit Template type expects `candidatesWithModels` populated —
enrich-rounds iterates that field to build slot→model mappings for
run-page cards. When fromRow parsed template_snapshot and cast it to
Template, the cast was a TypeScript lie: at runtime the parsed object
lacked candidatesWithModels, enrichRounds iterated zero reviewer slots,
and no model name reached the cards (badge appeared empty).

Derive candidatesWithModels at the parse seam (chats.fromRow) so the
cockpit's Template contract is honoured regardless of which path
produced the data. Idempotent — if a future daemon ever serialises
the field directly, that wins. Persona forwarded if present. Audit-
phase single-voice reviewers (no candidates array) are skipped via a
runtime narrow.

Upstream PR #6 (chorus-codes/chorus@ac0c7fd).

Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(diagnose): capture failure context — CLI smoke, voice health, recent failed chats

Extends `chorus diagnose` with three signals that triage the most common
breakage modes:

- **CLI smoke**: spawn `<bin> --version` per detected CLI with a hard 2s
  SIGKILL timeout (wrapper scripts may trap SIGTERM). Distinguishes
  `timedOut` from non-zero exit so the report can tell hangs apart from
  crashes.
- **Voice health**: counts `enabled=0` voices grouped by `disabled_reason`
  ('user' vs 'auto_missing' vs 'quota_exhausted'). Added
  `idx_voices_enabled` so the `WHERE enabled = 0` scan stays cheap as
  the table grows.
- **Recent failed chats**: last 5 chats with `status='blocked'` plus the
  errored participants pulled from `~/.chorus/chats/<id>/round-*/<part>/_attempts.jsonl`.
  Only `errorMessageBytes` is exposed — raw error text never leaves the
  user's machine. `$HOME` is redacted from any embedded path strings via
  `redactHomePaths`.

Adapted from upstream chorus-codes/chorus#19 (0666dca). Preserves the
fork's existing diagnose shape and adds tests for smokeOneCli /
readLatestAttempt / formatReport rendering of the three new sections.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(diagnose): include no_review in recent failed chats query

The recent-failed-chats section was meant to surface per-participant
failure context from `_attempts.jsonl`, but the WHERE clause only
covered 'failed', 'blocked', 'cancelled'. The most common failure
shape — every reviewer down for missing CLI / auth / quota — ends the
chat in 'no_review', which was being silently filtered out. So the
exact case the section exists to diagnose returned an empty list,
forcing users back into manual log collection.

Adds 'no_review' to the IN-list and a regression test that asserts
both the status and a quota_exhausted errorKind render in the report.

Addresses chatgpt-codex review P2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: chorus-codes <info@chorus.codes>
Co-authored-by: Julien Deudon <deudon.j@gmail.com>
Co-authored-by: Lumina Mao <luminamao@mac.lan>
Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Yura <yurahalych@gmail.com>

* feat: fold upstream Grok + Local LLM + Keychain dual-probe (4 commits) (#3)

* feat(grok): detect Grok Build (xAI) + Level 1 orchestrator

Adds Grok Build CLI to detection, onboarding picker, /connect card,
diagnose smoke, init listing, and doctor labels. Grok auto-picks
chorus MCP from ~/.claude.json (verified empirically via `grok
inspect`) — no separate MCP wire needed.

The grok orchestrator reports connected=true when both the binary is
detected AND chorus is wired in ~/.claude.json (either top-level
mcpServers or any project-scoped mcpServers entry). connect() is a
no-op that points users at `chorus connect claude` if claude hasn't
been wired yet.

Quickstart filters CLIs to those with shims, so grok-cli being
detected first no longer breaks the doer-pick flow. The cliToLineage
map remains the source of truth for reviewer-capable CLIs.

`docs/integrating-a-new-cli.md` captures the full Level 1/2/3
integration playbook for future CLIs — written while doing this so
the steps are tested.

Adapted from upstream chorus-codes/chorus#44 (6a00b00). No conflicts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(local): add Local LLM HTTP shim for OpenAI-compatible endpoints

Adds a `local` lineage that dispatches chat completions to any
OpenAI-compatible HTTP endpoint (Ollama, llama-swap, LM Studio, vLLM,
or anything that speaks `/v1/chat/completions`). No external
subscription or CLI binary required — only a running local inference
server.

Configuration: save a JSON secret under key `local` via Settings →
Local LLM:
  {"base_url": "http://127.0.0.1:11434/v1", "api_key": ""}

Model ids may use a `local:` prefix (e.g. `local:llama3`) which the
shim strips before dispatch, or bare model names directly. When no
secret is saved, falls back to Ollama's default port.

Wiring sweep (extends every exhaustive enum / Record so templates
can declare local voices without Zod errors):
- src/daemon/agents/local.ts — new HTTP shim with JSON.parse guard
  on the secret (yields a typed `config_parse` error event for
  malformed secrets instead of throwing inside the generator)
- src/daemon/agents/index.ts — register localShim, `local:` prefix
  routing in pickShimForVoice, add to isHttpDispatchedShim
- src/daemon/agents/types.ts — `local` in Lineage
- src/lib/template-schema.ts — `local` in both lineageEnum +
  reviewerLineageEnum
- src/lib/cli-health.ts — `local` in CliLineage + ALL_LINEAGES
- src/lib/cli-precheck.ts — empty CRED_PATHS, LOGIN_HINT, skip the
  file probe (same pattern as openrouter — auth lives in secrets table)
- src/lib/cockpit-types.ts — `local` in ReviewerLineage
- src/lib/lineage-maps.ts — `local` in DaemonLineage, UILineage,
  every label/dot/brand map; UI_LINEAGE_DEFAULT_MODEL[local] = ""
  (model IDs are endpoint-specific). Teal dot distinguishes local
  from openrouter's cyan
- src/components/phase-editor/constants.ts — LINEAGES list,
  DAEMON_TO_COCKPIT_LINEAGE
- src/components/template-dialog/constants.ts — COCKPIT_TO_DAEMON,
  DAEMON_TO_COCKPIT, DAEMON_DEFAULT_MODEL, FALLBACK_LINEAGES

Adapted from upstream chorus-codes/chorus#41 (716fa3a). The bundled
upstream commit also included Keychain dual-probe (#38) and
fallback-registry hold-on-success (#42) — those land in follow-up
commits in this PR so each concern is reviewable independently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Greg <7xshadowx7@gmail.com>
Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com>

* feat(grok): Level 3 shim — full reviewer dispatch (happy-path unverified)

Promotes Grok Build from Level 2 (consumer-only) to Level 3 (full
reviewer shim). Chorus can now dispatch to grok-build as a doer or
reviewer in any template.

What's verified (empirically):
- Detection, headless-mode invocation pattern (`grok -p ...
  --output-format streaming-json --yolo --max-turns 1`), error event
  shape, exit-code semantics
- Failure path: free-tier auth produces clean quota_exhausted
  (SuperGrok Heavy subscription required) → voice auto-disables after
  N strikes
- All UI surfaces (model boxes, template-editor lineage picker,
  run-page participant card, cli-status-panel, onboarding picker,
  connect orchestrator)

What's specced but not run live (needs SuperGrok Heavy):
- Happy-path streaming-json text/end event parsing (followed
  `~/.grok/docs/user-guide/13-headless-mode.md` spec)
- Token/cost accounting — Grok doesn't surface usage in end event;
  estimateCostUsd returns 0

New files:
- src/daemon/agents/grok.ts — shim with `--max-turns 1` headless args
- src/daemon/agents/parsers/grok.ts — streaming-json + stderr parser
- tests/grok-parser.test.ts — 18 cases covering happy / error /
  robustness

Lineage sweep (xai daemon lineage was already a legacy alias to
opencode — uses fresh `grok` daemon lineage to avoid colliding with
that mapping; old YAML with `lineage:xai` still routes to opencode):
- Lineage / CliLineage / ReviewerLineage / DaemonLineage / UILineage
- LINEAGE_LABEL / LINEAGE_DOT / UI_LINEAGE_* / UI_LINEAGE_BRAND
- UI_LINEAGE_AVAILABLE_MODELS.grok = ['grok-build']
- UI_LINEAGE_DEFAULT_MODEL.grok = 'grok-build'
- template-schema lineageEnum + reviewerLineageEnum
- DB voices row schema (additive — old rows still validate)
- phase-editor LINEAGES + DAEMON_TO_COCKPIT_LINEAGE
- template-dialog COCKPIT_TO_DAEMON + DAEMON_TO_COCKPIT +
  DAEMON_DEFAULT_MODEL + FALLBACK_LINEAGES
- cli-status-panel + live-run-real helpers
- error-detector auth-prompt regex (SuperGrok signature on its own
  branch ABOVE the generic auth regex — classifies to
  quota_exhausted, not auth_invalid)

Voice seeding: grok-cli registered in SINGLE_MODEL_CLIS — auto-
creates the grok-cli voice (id=grok-cli, lineage=grok,
model_id=grok-build) on first daemon boot when the binary is
detected.

Auth flow: ~/.grok/auth.json file probe OR GROK_CODE_XAI_API_KEY env
short-circuit. Both verified in tests/cli-precheck.test.ts. Daemon
won't spawn grok without one or the other present — prevents the
browser-OAuth flow from hanging headless dispatch.

Total tests: 821 → 842 (+21).

Adapted from upstream chorus-codes/chorus#46 (f9dfba5). Conflicts
resolved by taking the union of fork's `local`-extended enums and
upstream's `grok`-extended enums (every Record / z.enum had to be
extended in both dimensions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com>

* fix(cli-precheck): macOS Keychain dual-probe — also check "Claude Code" service

Claude Code v2.x stores OAuth credentials under two service names depending
on the auth flow:
  - `Claude Code-credentials` — Pro/Max OAuth via `claude login`
  - `Claude Code` (no suffix) — API-key auth + some Console-account flows

The previous single-service probe regressed to auth_missing for API-key
users on darwin. Refactor hasDarwinKeychainEntry to accept string | string[],
iterate candidates, short-circuit on first match. Each probe stays bounded
to 1.5s so a misconfigured keychain can't stall every spawn.

Refs upstream issue #38 / commit 716fa3a.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: PR review — local in voices enum, AGENT_TO_LINEAGE for grok/local, separate cred-precheck vs semaphore bypass

Addresses bot review on PR #3:

- Sourcery P2 (src/lib/db/voices.ts): VoiceRowSchema and VoiceUpsertInput
  only allowed `grok` in the new-lineage slot; `local` voices upserted
  via the (future) Local LLM connect flow would have failed Zod
  validation at runtime. Add `local` to both the enum and the union.

- Codex P2 (src/app/api/run-artifacts/[chatId]/route.ts +
  src/app/runs/[runId]/page.tsx): AGENT_TO_LINEAGE did not map
  `grok-cli` → `grok` nor `local` → `local`, so a real Grok or Local
  participant directory (`reviewer-grok-cli-N`, `reviewer-local-N`)
  resolved to a bogus lineage and rendered as an unbranded extra card
  while the placeholder slot stayed pending.

- Codex P2 (src/daemon/agents/index.ts +
  src/daemon/runner/{doer,reviewer}-driver.ts +
  src/lib/settings/concurrency.ts): the daemon used a single predicate
  `isHttpDispatchedShim` for two unrelated decisions — bypassing the
  CLI-credential precheck AND bypassing the local-CLI semaphore. That
  was safe for OpenRouter (truly remote) but wrong for the Local LLM
  shim, whose default endpoint is Ollama on 127.0.0.1: N concurrent
  reviewers + a doer can thrash VRAM/RAM on consumer hardware. Split
  into `isHttpDispatchedShim` (kept for cred-precheck bypass) and
  `bypassesLocalCliSemaphore` (only openrouter). Add `grok-cli` and
  `local` to CLI_LINEAGES with conservative per-CLI defaults (grok-cli
  matches gemini at 2; local defaults to 1, bump in /settings if your
  endpoint multiplexes).

Tests: 845 pass (unchanged), typecheck clean.

* fix: PR review — CodeRabbit pass (docs/Grok level, init+quickstart+local edges, regex, tests)

Addresses CodeRabbit's first batch of review comments on PR #3:

- docs/integrating-a-new-cli.md: contradictory level for Grok — line 3
  said "detection-only", line 15 said level 2, line 302 said level 3.
  Normalize to level-3 (the shim ships in this PR) and note that the
  level-2 orchestrator coexists for the consumer-side wiring.

- src/cli/commands/init.ts: `--connect grok` was rejected because the
  local Name union, ALL_NAMES list, and the `--connect` option help
  text omitted 'grok' even though detection labels and
  OrchestratorName already accepted it. Adding 'grok' to all three.

- src/cli/commands/quickstart.ts: the "install one of …" guidance
  printed when no CLIs are detected still listed only 5 — extend to
  Grok CLI to match the dispatchable set.

- src/daemon/agents/local.ts:
  * Empty `base_url` (e.g. user saved settings with an empty box)
    was passed through `??` as the URL and surfaced as an opaque fetch
    error; treat empty / whitespace-only as unset and fall back to
    DEFAULT_BASE. Strip trailing slashes while at it.
  * Trailing SSE payload was dropped when the server closed without
    a final blank-line delimiter (older Ollama, some vLLM configs) —
    the last text_delta could silently disappear, truncating answers.
    Extract event-dispatch + payload-extract into local helpers and
    flush the residual buffer after the read loop exits.

- src/lib/cli-detect.ts: grok regex documented "name OR bare-version"
  but only matched the name. Add the bare-version alternative; the
  basename guard already prevents cross-vendor matches.

- tests/grok-parser.test.ts: 4 cases narrowed event[0] under
  `if (events[0].type === 'error')` without a prior `expect(...).toBe`
  on type — a non-error event silently skipped the inner assertions.
  Add explicit type expectations before the narrowing.

Tests: 845 pass (unchanged), typecheck clean.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Greg <7xshadowx7@gmail.com>
Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com>

* feat: fold upstream contributor stack — repoPath default + CRLF persona parser (#4)

Folds the cross-platform pieces of upstream commit 781bc42 ("Contributor
stack: claude orchestrator + repoPath + Windows spawn (#39)") into the
fork, intentionally omitting Windows-specific hunks.

Included:
  - src/mcp/tools.ts: add safeCwd() helper + default `repoPath` on
    create_chat to safeCwd() when caller omits it. Previously the daemon
    fell back to its own cwd (packageRoot), which caused relative file
    paths in `files: [...]` to silently resolve to the chorus install
    dir and miss. MCP servers spawned by Claude Code / Codex / Gemini
    inherit the host's cwd (= the user's project), so safeCwd() lands
    at the right path automatically. safeCwd() also catches ENOENT from
    process.cwd() and falls back to homedir.
  - src/lib/personas.ts: normalize CRLF → LF in the frontmatter parser
    so persona .md files checked out with Windows line endings don't
    fail `missing YAML frontmatter`. Cross-platform safe.
  - src/daemon/orchestrators/index.ts: drop stale comment block about
    Claude having a project-config side-effect (the fork's orchestrator
    long since moved to user-scope).
  - tests/mcp-create-chat-repo-path.test.ts (+4 tests): cover explicit
    repoPath, cwd default, full-body forwarding, and ENOENT fallback
    to homedir.

Omitted (Windows-only hunks):
  - src/cli/commands/update.ts (shell: win32 for npm self-update)
  - src/daemon/routes/system.ts (shell: win32 for opencode probe)
  - src/daemon/orchestrators/{codex,gemini,kimi}.ts (shell: win32 tweaks)
  - src/lib/cli-detect.ts (SAFE_WIN_PATH regex + buildVersionSpawn)
  - src/lib/voices.ts (discoverNpmPrefixes Windows shell)
  - tests/cli-detect.test.ts (Windows-specific cmd.exe escape tests)

Also omitted:
  - src/daemon/orchestrators/claude.ts: upstream shells out to
    `claude mcp add --scope user`. Fork already implements user-scope
    registration via direct ~/.claude.json patch (more robust — no
    dependency on `claude` binary in PATH at registration time, plus
    sweeps stale project-scoped entries). Keeping fork's version.
  - tests/claude-orchestrator.test.ts: tests the upstream shell-out
    approach the fork doesn't use.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: pr-babysit design sketch (judge workflow + state machine)

Three-phase delivery plan for moving the PR babysitter loop out of
Claude Code and into the chorus daemon. Covers GH App + webhook
architecture, the judge phase (validity/category/confidence + shadow
judge pattern), fix routing rules (trivial/targeted/architectural →
Kimi/Sonnet/Opus), circuit breakers, merge gate, multi-PR
coordination, and proposed DB schema.

Design only — no code in this commit. Five open questions left for
team decisions in §"Open questions for the team".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: prime doer/reviewer prompts with AGENTS.md + CLAUDE.md

When a chat carries a repoPath, read AGENTS.md / CLAUDE.md from the
repo and prepend them inside a <project_guidelines> fence (between the
persona block and the phase header). Same TOCTOU + fence-breakout
defences as the persona/attached-file readers: lstat-rejects symlinks,
strips </project_guidelines> from contents, truncates each file at
16 KB with a visible marker.

Lets users carry project conventions into every doer + reviewer turn
by editing a file the rest of their AI tooling already reads, without
adding a new chorus-specific storage layer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: verify phase — exec package.json chorus.verify, judge with reviewer

Splits verify out of the StandardPhase shape into its own VerifyPhase
(no doer, reviewer required). Reads `chorus.verify` from package.json,
runs it via execFile in repoPath with a configurable command timeout
(default 5 min, max 30 min), captures stdout/stderr/exit, and feeds
the fenced artifact through the existing runReviewers flow.

Env is scrubbed to PATH/HOME/LANG/LC_ALL/NODE_ENV so a `chorus.verify`
script can't leak inherited credentials into the artifact. Output
streams cap at 64 KB each with a visible truncation marker. Timeout
detection matches both ETIMEDOUT and (killed && SIGTERM) shapes — node
sometimes only sets the signal.

The artifact lands at round-1/doer-verify-runner/answer.md so the
cockpit renders it identically to a doer answer. A phase_progress event
with kind="verify_command" surfaces the command-level outcome
(exitCode, timedOut, duration) without needing a brand-new event type
through the SSE multiplex.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: TDD loop — verify failure re-prompts named feedback phase doer

Verify phase gains optional `feedbackPhase` + `maxIterations` (default
5, max 20). On verify failure, the runner re-fires the named phase's
doer through `runDoer` with the verify output threaded in via
`priorRoundFeedback` — same hook a normal disagree-iterate loop uses,
so the doer sees the failure in the slot it already knows how to act
on. Loops until verify passes or the cap is hit.

Reviewers only run on the FINAL iteration (success or final failure);
intermediate iterations skip the reviewer pass because exit code is
the loop signal and asking the reviewer N times to judge the same
class of failure would just burn tokens.

Iterations write to round-1001, round-1002, … (TDD_ROUND_OFFSET=1000)
so the synthetic TDD-loop round dirs can't collide with the original
feedback phase's rounds in the same chat dir. Misconfigured templates
(feedbackPhase points at a non-existent or non-standard phase) fail
loudly at the top of the verify phase, before the first command run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: babysit DB — jobs + decisions tables, query helpers, 25 tests

Foundation for the PR-babysit autonomous review loop (Phase A of
docs/pr-babysit-design.md). Two tables:

- babysit_jobs: one row per (repo, pr_number) under review, state-machine
  tracked (idle → judging → fixing → verifying → pushing → quiet_check →
  merged | escalated). UNIQUE (repo, pr_number) prevents double-registration.
  ended_at auto-stamps on first terminal transition and is sticky.

- babysit_decisions: append-only audit trail of every judge call. Two-stage
  insert — judge writes validity/category/confidence/outcome=NULL, the fix
  runner stamps outcome (+ commit) when it resolves. getAttemptCount drives
  the per-comment circuit breaker (same comment hash flagged N+ times → stop
  trying, escalate).

Schema lives in schema.sql for fresh-DB init AND as idempotent CREATE TABLE
IF NOT EXISTS in connection.ts so DBs that pre-date this version pick the
tables up on next boot (matches the personas/voices migration pattern).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: babysit comment fetcher — gh CLI pull + author classify + sha256 hash

Pulls PR review (line-anchored) + issue (conversation) comments via `gh api`,
normalizes them into the shape the babysit judge consumes:

- author classification: recognises CodeRabbit / Sourcery / Greptile /
  Codex by login regex; falls back to GitHub user.type=Bot + [bot] suffix
  for unknown bots. Humans always come through as isBot=false / bot=null.
- sha256(body) keyed so the per-comment circuit breaker can recognise
  "the same bot re-flagged the same exact text" across polling ticks.
- partial-data tolerance: if one of review/issue endpoints fails we still
  return what we got from the other (a 500 on one shouldn't blank the
  whole tick). Only when BOTH fail do we surface a typed reason.
- `since=` parameter so the polling loop doesn't re-hash every comment
  on every tick.

16 tests covering author classify, sha256 stability, gh shellout via a
fake `gh` on PATH, partial-failure, auth/404 classification, since arg.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: babysit judge — classify PR-bot comments + pure action router

Reads one PR comment + diff context, asks the model to classify it as
valid/invalid/partially_valid/unsure with a category from a fixed menu
(apply-trivial | apply-targeted | apply-architectural | reply-disagree |
reply-ack | defer-to-human) and a confidence score in [0,1].

Three pieces:
- buildJudgePrompt(comment, ctx): pure prompt construction. Includes PR
  metadata, comment body, anchored code snippet, and (crucially) prior
  decisions on the same comment hash — so re-judgements after a failed
  fix tilt toward reply-disagree rather than re-trying the same fix.
- judgeComment(opts): drives requestStructured against the JudgeOutputSchema,
  flags judgements below the 0.7 confidence threshold as belowThreshold.
- decideAction(judgement, args): PURE routing function. Maps (judgement,
  attemptCount, belowThreshold) → fix/reply/escalate/skip. State machine
  in babysit/runner.ts (next session) stays a thin dispatcher.

Routing rules in priority order: per-comment cap → confidence threshold
→ defer-to-human → reply-* → apply-* (with invalid/unsure self-correction
to escalate, since acting on a comment we judged invalid is incoherent).

20 tests: prompt composition (bot vs human, snippet, prior decisions,
multi-line bodies, threshold mention, full category menu) + routing
table (every category × every priority rule).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: babysit MCP tool + daemon registrar + pr-babysit preset

Phase A MCP entry point for the PR-babysit loop.

- `mcp__chorus__babysit_pr`: registers a PR for autonomous bot-comment
  judging. Idempotent — re-calling with the same URL returns the existing
  job without resetting state mid-flight.
- Daemon routes:
    POST /babysit/jobs       — upsert idle job
    GET  /babysit/jobs       — list (filters: ?active=true, ?state=…)
    GET  /babysit/jobs/:id   — single job + recent decisions
- `templates/pr-babysit.yaml`: declares the judge roster (Haiku primary,
  Sonnet fallback). Validates against TemplateSchema as a `review_only`
  phase so seedBuiltinTemplates loads it cleanly; the babysit runner
  (next release) reads `phase.reviewer.candidates` for model selection
  but doesn't drive this phase through runner.ts.

13 route tests covering happy path, idempotent re-register, missing/
malformed URL, state filter, job-with-decisions detail view. MCP wrapper
schema added to tools.ts.

Note: src/daemon/index.ts diff is mostly Prettier rewriting single→double
quotes after my import addition; the real semantic change is the two
lines wiring registerBabysitRoutes into registerAll().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: babysit GH App auth — RS256 JWT + installation token cache

GitHub App auth bundle for the PR-babysit loop. Two-tier model:
mint a 9-min RS256 JWT from the App private key (Node built-in
crypto, no jsonwebtoken dep), then exchange it for a 1-hour
installation token cached in-memory with a 5-min refresh buffer
so we never present a token about to expire.

Config persisted as a single global row in secrets (provider=
github_app, kind=gh_app, value=JSON of appId/privateKey/
webhookSecret) — chorus is single-tenant, the App is owned by
the daemon operator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: babysit webhook HMAC verify helper

Pure-crypto helper for verifying GitHub's X-Hub-Signature-256
against the raw request body. Constant-time comparison via
crypto.timingSafeEqual + a typed discriminated-union failure
mode (missing/malformed/mismatch/secret_not_configured) so a
caller can log the precise reason without leaking it back to
the sender.

Not wired into a route this session — the daemon only polls —
but the verifier ships with full coverage now since shipping
the route later without it is a sharp footgun.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: babysit GH client — App-auth + CLI-fallback request shim

Unified GitHub-API surface for the babysit loop with two routes:

  - App auth when installationId is set AND App config persisted:
    mint/reuse a cached installation token, retry once on 401 (key
    rotation), retry once on 5xx with backoff.
  - gh CLI fallback otherwise. Inherits the developer's local gh
    auth. Bodies on this path return a typed error pointing the
    operator at the App-auth on-ramp — postponing the stdin plumbing
    until the runner actually needs to write through the CLI.

Routing is transparent to the caller; they always get back a
normalized {status, body|errorText, authMode} response.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: babysit per-PR worktree manager

Idempotent worktree lifecycle for the fix loop:

  - ensureWorktree() — create or reuse ~/.chorus/worktrees/
    <owner>__<name>/pr-<n>/, fetching + checking out the PR head
    branch. Wipes a stale directory if one exists from a half-
    failed previous run.
  - pullLatest() — fetch + reset --hard origin/<branch>. Hard
    reset is safe only because the runner pushes every commit it
    makes; documented inline so it doesn't get cargo-culted.
  - removeWorktree() — git worktree remove --force + rm -rf as
    belt-and-suspenders for older git versions.

Branch names from webhook payloads are validated against the
same shell/path-traversal rules used elsewhere in the daemon
before being passed to git.

Tests use real git against a bare-remote fixture per case;
mocking runAsync would leave 90% of the surface untested.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: babysit scheduler — bounded concurrency + per-job mutex

Tick driver for the babysit loop with three invariants the
production daemon needs:

  1. Per-job serialization. A Set keyed by job id, checked-and-set
     atomically inside dispatch(), prevents two ticks on the same
     PR from racing over the worktree, decisions table, or reply
     comment.
  2. Bounded global concurrency. maxConcurrent (default 3) caps
     simultaneous jobs so judge-model quotas + gh-API pressure
     stay predictable as the backlog grows.
  3. Clean drain on SIGTERM. stop() clears the interval AND awaits
     in-flight jobs so we never leave a worktree mid-commit.

Errors thrown from runJob are caught + logged so a single broken
PR can't poison the whole loop. The mutex is always released in
finally so the next tick can re-dispatch.

Not yet wired into daemon startup — the state-machine runner that
becomes runJob ships in the next commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: babysit state machine — full judge→fix→verify→push→quiet loop

End-to-end driver tying the existing pieces together. One entry
point (runJob) the scheduler calls per tick; per-state handlers
dispatch the work and return a transition descriptor; the driver
owns all babysit_jobs writes so handlers stay pure-ish.

State transitions:

  idle         -> judging   (provisions worktree)
  judging      -> fixing    (any apply-* decision)
  judging      -> quiet_check (replies only, or empty)
  judging      -> escalated (defer-to-human, low-confidence, cap-hit,
                             judge spawn/parse failure)
  fixing       -> verifying (doer produced file edits)
  fixing       -> escalated (doer failure; mark decision escalated)
  verifying    -> pushing   (verify passed)
  verifying    -> escalated (verify failed; no auto-retry — the
                             per-comment cap path catches genuine stuck)
  pushing      -> quiet_check (pushed; record commit sha + fix_commits++)
  pushing      -> escalated (git failure)
  quiet_check  -> merged    (PR merged on GitHub)
  quiet_check  -> judging   (new bot comments arrived)
  quiet_check  -> quiet_check (no change)

Supporting modules added in the same commit since they only exist
to serve this state machine:

  - pr-metadata.ts: tiny shim over gh client for title/head/base/
    default branch + PR state projection. Uses CLI fallback when
    no App config.
  - verifier.ts: resolves npm-test → npm-typecheck → tsc --noEmit
    from package.json/tsconfig; truncates output at 16 KiB for
    DB-safe escalation reasons.
  - fix-executor.ts: doer invocation via structured-output adapter
    returning {path, new_contents}[]. Full-file rewrites — LLMs
    are unreliable at diff coordinates and babysit fixes are small.
    Symlink-aware path safety refuses worktree escape.
  - git-push.ts: stage → diff-check → commit → push helper. No
    --force. Default chorus-babysit identity, overridable.

Tests: 45 new tests across 5 files cover each handler's happy
path + every failure-mode transition. State-machine tests use
real DB + mocked external IO; helpers use real shellouts against
fixture repos where the value is in the actual git/fs behaviour.

Not yet wired: scheduler.start() at daemon boot — that's the
next commit, separate from this so the integration is reviewable
on its own.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: wire babysit scheduler into daemon lifecycle

Start a BabysitScheduler post-listen with the state-machine runner
as its job handler. Tick interval defaults to 60s; sourceRepoPath
defaults to the daemon's CWD (per-repo overrides will land when the
registrar gains a sourceRepoPath field on the babysit job row).

CHORUS_DISABLE_BABYSIT_SCHEDULER=1 skips the start for integration
tests that drive ticks manually.

SIGTERM / SIGINT trigger scheduler.stop(), which clears the
interval AND awaits in-flight jobs so we never leave a worktree
mid-commit on shutdown.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: babysit pause/resume route — PATCH /babysit/jobs/:id

Adds operator-driven pause/resume so a registered PR can be taken
off the scheduler's tick without losing its decision history.

  PATCH /babysit/jobs/:id { action: 'pause' | 'resume' }

Pause refuses terminal states (merged, escalated) with 409 — there
is nothing for the scheduler to skip once a job has ended. Resume
refuses non-paused jobs with 409 to make the intent explicit; both
verbs are idempotent within their valid state. Resume re-opens
ended_at so the job reappears in listActive() / cockpit lists.

The scheduler already treats 'paused' as non-dispatchable
(NON_DISPATCHABLE includes paused alongside merged + escalated),
so this commit is just the controller — no scheduler change needed.

8 new tests on top of the existing 13 cover: pause happy path,
pause idempotency, resume happy path + ended_at clear, conflict on
pause-merged + pause-escalated, conflict on resume-when-not-paused,
validation on unknown action, 404 on missing job.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: chorus babysit CLI — list/show/register/pause/resume

User-facing subcommand group that fronts the existing daemon
routes so operators can drive the babysit scheduler without
hitting the API directly.

  chorus babysit register <pr-url> [--installation-id <n>]
  chorus babysit list [--active] [--state <s>]
  chorus babysit show <id>
  chorus babysit pause <id>
  chorus babysit resume <id>

All commands talk to the local daemon over /api/v1; a
connection-failed envelope surfaces the standard "start with
\`chorus start\`" hint so the failure mode is consistent with the
rest of the CLI. Job ids are "<owner>/<repo>#<number>" — show/
pause/resume URL-encode the segment so shells that treat # as a
comment don't strip it.

show prints the job header + decision log (comment id, author,
validity, category, outcome) so 'why did this PR get escalated'
is one command away. State labels are color-coded (terminal-red
escalated, green merged, yellow paused).

src/cli/index.ts also picks up unrelated single→double-quote
normalization from the project prettier hook — the only logical
change there is the new registerBabysitCommand wire-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.co…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant