feat: fold upstream T1+T2 fixes back into fork (12 commits) by crypticpy · Pull Request #2 · crypticpy/chorus

crypticpy · 2026-05-17T17:06:47Z

Summary

PR 1 of 3 in the upstream fold-back series. Picks up all Tier 1 (5 free pickup commits) and Tier 2 (high-value bug fixes) from chorus-codes/chorus upstream, adapted to fork conventions and preserving fork-specific behavior (audit-phase reviewer shape, fork's standardPhaseRoundsExhausted promotion, fork's per-vendor agent shims).

All commits authored as fork-native edits with Co-Authored-By: Claude Opus 4.7 attribution. Skips Windows-specific hunks (out of scope for fork) and contributor stack (deferred to PR 3).

Commits

Tier 1 — free pickup (5)

9fd258a feat(cli): add chorus diagnose command + crash-hook
33e7d57 feat(cli): add chorus quickstart self-test command
d8a4ef4 fix(cli): dynamic import for open package (Node 22 ERR_REQUIRE_ESM)
0d68fe3 feat(cockpit): seed empty round-1 so QUEUED renders from t=0
f595102 feat(daemon): runtime fallback-collision dedup across reviewer slots

Tier 2 — bug fixes (7)

f96a4f9 fix(daemon): write REVIEWER FAILED summary on pre-spawn failure
357dc0d feat(voices): auto-disable on persistent quota_exhausted + lsof timeout
865de94 fix(daemon, schema): codex isolation + template-schema validation
15013f0 fix(runner): honour iterate.onDisagreement accept-doer/escalate
4dd03ee test(cli-precheck): cover macOS Keychain fallback for Claude Code v2
e403e39 fix(cockpit): derive candidatesWithModels from snapshot's candidates field
5f770d8 feat(diagnose): capture failure context — CLI smoke, voice health, recent failed chats

Fork-specific adaptations

Runner: decidePhaseOutcome pure helper (6-case matrix: 3 policies × 2 disagreement-gate states). Fork's standardPhaseRoundsExhausted promotion preserved alongside new escalated_on_disagreement branch.
Cockpit candidatesWithModels: runtime narrowing skips audit-phase single-voice reviewers (fork's audit phase has a different reviewer shape than upstream).
Diagnose: privacy-preserving — only errorMessageBytes exposed for recent failed chats; $HOME redacted via redactHomePaths; hard 2s SIGKILL timeout for CLI smoke (wrappers may trap SIGTERM); timedOut distinct from non-zero exit.
Schema: new idx_voices_enabled index for WHERE enabled = 0 voice-health scan.

Test plan

pnpm typecheck clean
pnpm test — 821 tests passing (62 files)
Each commit cherry-picked and built incrementally; no commit left the tree in a broken state
Smoke chorus diagnose on a real ~/.chorus after merge
Smoke chorus quickstart on a clean machine

🤖 Generated with Claude Code

Summary by Sourcery

Integrate upstream CLI, daemon, cockpit, and schema improvements into the fork, adding diagnostic and quickstart workflows while tightening reviewer orchestration, voice health handling, and template/CLI robustness.

New Features:

Add a chorus diagnose command that prints a redacted diagnostic bundle for bug reports, including daemon state, DB health, logs, crash previews, CLI detection, voice health, and recent failed chats.
Add a chorus quickstart command that fires a short self-test review against the first detected CLI and streams the result, seeding a private quickstart template on demand.
Introduce a crash hook for the CLI entrypoint to capture uncaught errors into ~/.chorus/crashes and guide users to file issues or run diagnostics.
Add per-voice failure tracking that can auto-disable voices on repeated quota_exhausted failures without a reset window, and surface auto-disabled voices in diagnostics.
Add a runtime fallback-collision registry so reviewer slots avoid running the same fallback (lineage, model) in parallel, preserving lineage diversity and reducing wasted cost.

Bug Fixes:

Ensure reviewer precheck failures write a REVIEWER FAILED summary on disk so cockpit cards exit the queued state with a visible error.
Correct iterate onDisagreement handling so accept-doer and escalate policies are honoured distinctly from the legacy continue path, including terminal chat verdicts.
Prevent false auth_missing failures for Claude Code v2+ on macOS by falling back to a Keychain probe when no credential file is present.
Avoid ESM runtime errors for the open package by using a dynamic import when launching the browser from the CLI.
Harden port and process utilities with bounded-time ss/lsof invocations and more robust PID/cmdline inspection to avoid hangs and misclassification.

Enhancements:

Enrich template snapshots loaded from the DB by deriving candidatesWithModels from reviewer candidates so cockpit run cards always know which model to display, while preserving existing data when already present.
Tighten template validation by rejecting reviewer configurations where require exceeds the number of candidates or distinct lineages when crossLineage=true, turning opaque run-time failures into clear schema errors.
Refine cockpit round enrichment to seed an initial empty round-1 and synthesise queued reviewer placeholders from t=0 so cards render deterministically even before any reviewer directories exist.
Factor codex headless invocation into a pure argv builder that always skips user config and git-repo checks, ensuring stable, sandboxed codex exec behaviour in reviewer/doer runs.
Extend voice schema and indexing (including an enabled index) to support new auto-disable reasons and faster health scans used by diagnostics.
Improve CLI browser-opening paths to use a shared helper with proper error handling and timeouts, and adjust Windows ESM imports in the launcher to be URL-safe.
Load the file watcher library lazily in the daemon output watcher to reduce upfront dependencies and keep timeouts explicit.

Documentation:

Document the new chorus diagnose command and crash log location in the README, including guidance for filing bug reports with diagnostic output.

Tests:

Add extensive unit and integration-style tests for diagnostics, quickstart template generation, voice failure tracking, reviewer pre-spawn failures, fallback collision handling, codex headless args, port utilities, template parsing/validation, and cockpit round enrichment to cover the new behaviours and regressions.

Bundles two upstream changes that ship a self-service triage path for chorus users hitting opaque failures: - `chorus diagnose` walks the install, daemon, recent failed chats, voice health, and produces a sharable bug report. - Crash hook captures uncaught exceptions in the CLI and writes them to a crash log alongside instructions to attach during a bug report. Folded back from upstream chorus-codes/chorus: 7ea712b feat: chorus diagnose command + crash hook for bug reports (#1) 4a5ea20 fix(diagnose): realpath bin path + filter Next.js SSE noise (#4) Co-Authored-By: chorus-codes <info@chorus.codes>

`chorus quickstart` runs a 30-second activation flow that verifies the daemon comes up, the SQLite DB initializes, and a minimal chat round-trips end-to-end. Aimed at first-run users who want to know "is this thing actually working" before authoring a template. Folded back from upstream chorus-codes/chorus: 56610cf feat(cli): chorus quickstart — 30-second activation self-test (chorus-codes#30) Co-Authored-By: chorus-codes <info@chorus.codes>

The `open` package and `chokidar` are both ESM-only as of recent versions. On Node 22 (the daily-driver target) static `require()` calls into them throw ERR_REQUIRE_ESM and crash the CLI at boot. Switch to dynamic import in: - src/cli/commands/start.ts (open browser after boot) - src/cli/open-browser.ts (new helper) - src/cli/index.ts (route open import) - src/daemon/output-watcher.ts (chokidar file watch) Includes upstream's post-merge hardening: the setTimeout that triggers the browser-open no longer wraps an async callback bare, so a missing default browser doesn't surface as an unhandled rejection. Folded back from upstream chorus-codes/chorus: e8ca2ee fix(cli): dynamic import for open package (chorus-codes#14) dcd1837 fix: post-merge hardening for chorus-codes#14 (start.ts portion only; cli-precheck.test.ts portion ships with the Keychain fix) Co-Authored-By: Julien Deudon <deudon.j@gmail.com> Co-Authored-By: chorus-codes <info@chorus.codes>

Before: when a chat starts but no reviewer has produced an event yet, enrichRounds returned an empty rounds array and the live-run page showed nothing for several seconds — the user couldn't tell whether their chat had launched. After: seed a synthetic round-1 with QUEUED placeholders for every expected participant so the page renders the per-reviewer cards immediately. Real events overwrite placeholders as they arrive. Folded back from upstream chorus-codes/chorus: 53e8fb6 feat(cockpit): seed empty round-1 so QUEUED placeholders render from t=0 (#2) Co-Authored-By: chorus-codes <info@chorus.codes>

When two reviewer slots both fall through their per-slot chains to the same template-level fallback target (common case: every slot ends in anthropic/claude-sonnet-4-6), both used to dispatch the same (lineage, model) in parallel — wasted cost and the lineage diversity that's the point of multi-LLM peer review collapsed. Build-time dedup (template-fallback.ts) couldn't catch it because each slot only knows about other slots' PRIMARIES, not their fallback chains. Fix: new per-chat/per-round (lineage, model) registry. reviewer-driver tryClaim's before each chain attempt and releases in a finally. On collision, return null + emit cli_warning(reason='fallback_collision') so runWithChainFallback advances to the next entry and the cockpit can show why the slot skipped. Ported into fork's reviewer-driver.ts surgically so the verdict-isolation refactor (2a2cde2) and per-slot repoPath threading stay intact. Folded back from upstream chorus-codes/chorus: c4751fe feat(daemon): runtime fallback-collision dedup (#3) Co-Authored-By: chorus-codes <info@chorus.codes>

Before: when a reviewer's precheck fails (e.g. underlying CLI not installed) or the chat is cancelled while the slot is queued for a CLI semaphore slot, runReviewer used to return null silently — leaving NO on-disk participant directory. The cockpit's enrich-rounds loop then couldn't reconcile the synthesised template slot against any real participant, so the card sat at "Queued — waiting for an open slot." forever and the actual error was invisible. Reproduction: install chorus on a host with only one CLI on PATH (e.g. just claude-code), open a template that includes lineages requiring codex/gemini/kimi, fire it. Every reviewer card stayed "Queued" — chat never visibly progressed even though it was already done failing. Fix: - Create the reviewer dir BEFORE the precheck runs. - Add a writePreSpawnFailure helper that writes a `## REVIEWER FAILED` summary in the canonical format (Kind / Lineage / Model / message) that the cockpit's `parseFailureSummary` already understands. - Wire it into the precheck-failed and cancelled-while-queued paths. Card now transitions out of pending and shows the actual error (cli_missing, cancelled, ...). Folded back from upstream chorus-codes/chorus: afc59cc fix(daemon): REVIEWER FAILED summary on pre-spawn failure (chorus-codes#26) Co-Authored-By: chorus-codes <info@chorus.codes>

Real pain (upstream chorus-codes#11): a Pro Gemini model on a Flash-only account fails every chorus run with "exhausted your capacity on this model" — but Gemini doesn't return a resetAt because the model isn't going to become available for that account. Without auto-disable, the runner keeps picking the dead voice on every chat and the user keeps seeing the same opaque error. Voice auto-disable: - New src/lib/voice-failure-tracker.ts records per-voice consecutive quota_exhausted strikes in a settings counter. - Trigger: 2 consecutive strikes WITH no resetAt → set voices.enabled=false + disabled_reason='auto_quota'. - Counter resets on participant_done success; rate-limit strikes (hasResetAt=true) bypass the counter entirely so a transient 429 + a later permanent failure can't trip the threshold on the first permanent strike. - Wired into reviewer-driver alongside recordHealth; emits a cli_warning(reason='voice_auto_disabled') so the cockpit can show a one-line explanation. - VoiceDisabledReason union gains 'auto_quota' (schema column was already TEXT — no migration). Lsof timeout (upstream chorus-codes#12): - findPidsOnPort and findPidsOnPortWithSudo now bound execSync / execFileSync to 3s, so a slow-but-functional lsof on a loaded macOS box doesn't hang chorus boot. 3s leaves headroom while still bounding the hang case. Ported into fork's reviewer-driver.ts tmux pollHandle + success path. voices.ts disabled_reason union extended alongside fork's voice-tier column. Folded back from upstream chorus-codes/chorus: 4f6becc v0.8.30 — voice auto-disable (chorus-codes#11) + lsof timeout (chorus-codes#12) (chorus-codes#17) Co-Authored-By: chorus-codes <info@chorus.codes> Co-Authored-By: Lumina Mao <luminamao@mac.lan>

Two issues caused chats to fail opaquely at run-start: CODEX ISOLATION (chorus-codes#10, chorus-codes#16) The user's ~/.codex/config.toml may declare MCP servers, plugins, or notification hooks. In headless `codex exec` those integrations have caused codex to hang or cancel mid-call — two independent reproductions: codex as our reviewer (chorus-codes#10) and codex as MCP client of chorus (chorus-codes#16). Add --ignore-user-config to every headless codex argv. Extracted to a pure `buildHeadlessArgs(opts)` so the argv shape is unit-testable. TEMPLATE VALIDATION (chorus-codes#15) `reviewer.require > candidates.length` used to surface as "Job moves immediately to failure upon Start press" — the runner queued, failed to grant enough slots, and emitted an opaque chat-failure. Same for `require > distinct lineages` when crossLineage:true. Both now caught at TemplateSchema.parse() time with a clear error message the user can fix before the run starts. ReviewerSchema.superRefine() additions slot in cleanly alongside the fork's audit/orchestrate phase schema work — both are additive constraints on the same ReviewerSchema object. Folded back from upstream chorus-codes/chorus: 8ed970b fix(daemon, schema): codex isolation + template validation Co-Authored-By: chorus-codes <info@chorus.codes>

The template schema, cockpit dialog, and SPEC-D-templates have always exposed three values for iterate.onDisagreement — 'continue', 'escalate', 'accept-doer' — but the runner only honoured 'continue'. Picking the other two from the cockpit form was a silent no-op: chats fell through to phase_failed with 'doer_failed_all_rounds' regardless. This wires both new branches into the round loop and the terminal chat_done emission: - 'accept-doer': after maxRounds without consensus, mark doerSucceeded and continue. The chat carries on (subsequent phases, ship, approval) as if reviewers had agreed on the doer's last answer. - 'escalate': halt with status='failed' but verdict='request_changes' and error='escalated_on_disagreement', so cockpits can render "reviewers disagreed, needs human" distinctly from "doer broke." Policy table extracted into a pure decidePhaseOutcome() helper so the 3 × 2 input matrix (policy × disagreement-in-last-round) is unit-tested without standing up the full runChat scaffold. Gated on disagreementInLastRound (reset at top of every round + on doer-crash path) so a partial / empty doer answer can never be silently "accept-doer"'d as final. Preserves the fork's existing standardPhaseRoundsExhausted #7 surfacing for the 'continue' path; the 'escalate' path takes precedence with its own distinct chat_done. Upstream PRs chorus-codes#49, chorus-codes#50 (commit 67572e9). Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The fork already implements the Keychain fallback in cli-precheck (hasDarwinKeychainEntry). This adds the missing test coverage: - passes when no cred file but keychain entry exists - blocks when no cred file and no keychain entry - skips keychain check when cred file exists (fast-path preserved) - does not consult keychain for non-anthropic lineages vi.mock('node:child_process') uses the importOriginal spread pattern so spawn / exec / etc. keep their real implementations — a bare module replacement would silently break any sibling test that imports from child_process. Upstream PRs #7, #8, plus the dcd1837 test-mock hardening. Co-Authored-By: Yura <yurahalych@gmail.com> Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…field Daemon-side TemplateSchema only carries `candidates` on each ReviewerRule. The cockpit Template type expects `candidatesWithModels` populated — enrich-rounds iterates that field to build slot→model mappings for run-page cards. When fromRow parsed template_snapshot and cast it to Template, the cast was a TypeScript lie: at runtime the parsed object lacked candidatesWithModels, enrichRounds iterated zero reviewer slots, and no model name reached the cards (badge appeared empty). Derive candidatesWithModels at the parse seam (chats.fromRow) so the cockpit's Template contract is honoured regardless of which path produced the data. Idempotent — if a future daemon ever serialises the field directly, that wins. Persona forwarded if present. Audit- phase single-voice reviewers (no candidates array) are skipped via a runtime narrow. Upstream PR #6 (chorus-codes/chorus@ac0c7fd). Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…cent failed chats Extends `chorus diagnose` with three signals that triage the most common breakage modes: - **CLI smoke**: spawn `<bin> --version` per detected CLI with a hard 2s SIGKILL timeout (wrapper scripts may trap SIGTERM). Distinguishes `timedOut` from non-zero exit so the report can tell hangs apart from crashes. - **Voice health**: counts `enabled=0` voices grouped by `disabled_reason` ('user' vs 'auto_missing' vs 'quota_exhausted'). Added `idx_voices_enabled` so the `WHERE enabled = 0` scan stays cheap as the table grows. - **Recent failed chats**: last 5 chats with `status='blocked'` plus the errored participants pulled from `~/.chorus/chats/<id>/round-*/<part>/_attempts.jsonl`. Only `errorMessageBytes` is exposed — raw error text never leaves the user's machine. `$HOME` is redacted from any embedded path strings via `redactHomePaths`. Adapted from upstream chorus-codes#19 (0666dca). Preserves the fork's existing diagnose shape and adds tests for smokeOneCli / readLatestAttempt / formatReport rendering of the three new sections. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

greptile-apps

crypticpy has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

sourcery-ai · 2026-05-17T17:06:54Z

Reviewer's Guide

Backports multiple upstream Tier 1/2 CLI, daemon, cockpit, and schema fixes into the fork, adding a new chorus diagnose command and crash hook, chorus quickstart self-test, improved reviewer/disagreement semantics, voice auto-disable on persistent quota_exhausted, codex headless hardening, macOS keychain auth fallback, richer template/cockpit schema handling, port/CLI robustness, and extensive regression tests — all while preserving fork-specific behaviors.

Sequence diagram for voice auto-disable on persistent quota_exhausted

sequenceDiagram
  participant Runner as runReviewer
  participant Tracker as recordVoiceFailure
  participant Voices as voices
  participant Settings as settings
  participant TrackerOK as recordVoiceSuccess

  Runner->>Runner: reviewer throws err.kind="quota_exhausted"
  Runner->>Tracker: recordVoiceFailure(lineage, model, hasResetAt)
  Tracker->>Voices: list({ lineage })
  Voices-->>Tracker: [voiceRow]
  alt hasResetAt is true
    Tracker-->>Runner: { disabled:false, voiceId }
  else hasResetAt is false
    Tracker->>Settings: get("voice_failures.{voiceId}")
    Settings-->>Tracker: previousCount
    Tracker->>Settings: set("voice_failures.{voiceId}", previousCount+1)
    alt failures >= AUTO_DISABLE_THRESHOLD
      Tracker->>Voices: update(voiceId,{ enabled:false, disabled_reason: auto_quota })
      Tracker->>Settings: set("voice_failures.{voiceId}", 0)
      Tracker-->>Runner: { disabled:true, voiceId }
      Runner->>Runner: onEvent(cli_warning, reason=voice_auto_disabled)
    else
      Tracker-->>Runner: { disabled:false, voiceId }
    end
  end

  %% On successful run
  Runner->>TrackerOK: recordVoiceSuccess(lineage, model)
  TrackerOK->>Voices: list({ lineage })
  Voices-->>TrackerOK: [voiceRow]
  TrackerOK->>Settings: get("voice_failures.{voiceId}")
  Settings-->>TrackerOK: count
  alt count > 0
    TrackerOK->>Settings: set("voice_failures.{voiceId}", 0)
  end

File-Level Changes

Change	Details	Files
Improve reviewer lifecycle, fallback behavior, and disagreement handling while preserving fork-specific runner semantics.	Create reviewer directories before precheck and write `## REVIEWER FAILED` summaries for all pre-spawn failure paths so cockpit can surface errors instead of leaving cards queued forever. Introduce a per-chat/round fallback registry and use it in reviewer fallback chains to prevent multiple slots from concurrently running the same (lineage, model) fallback target, emitting `fallback_collision` warnings when collisions are avoided. Add per-voice quota tracking so repeated `quota_exhausted` errors without reset windows increment counters, auto-disable affected voices with `auto_quota` reason, and clear counters on successful runs. Track per-round disagreement state and add a pure `decidePhaseOutcome` helper to correctly honor `iterate.onDisagreement` policies (`continue`, `accept-doer`, `escalate`) while preserving existing `standardPhaseRoundsExhausted` handling and fork-specific chat_done behavior.	`src/daemon/runner/reviewer-driver.ts` `src/daemon/runner.ts` `src/daemon/runner/template-fallback.ts` `src/daemon/runner/fallback-registry.ts` `src/lib/voice-failure-tracker.ts` `tests/voice-failure-tracker.test.ts` `tests/reviewer-driver-pre-spawn-failure.test.ts` `tests/iterate-on-disagreement.test.ts`
Add `chorus diagnose` diagnostic command and global crash-hook for better bug reports and crash visibility.	Implement `chorus diagnose` command that gathers a redact-home diagnostic snapshot (versions, install mode, daemon state, DB counts, voice health, recent failed chats, CLI detection + `--version` smokes, crash previews, log tails) and prints it as a fenced markdown bundle. Add helpers to safely resolve bin paths through symlinks, detect install mode, redact `$HOME` from free-form strings, filter benign Next.js SSE disconnect noise from web logs, and summarise recent failed chats using on-disk `_attempts.jsonl` without leaking raw error messages. Introduce a minimal, dependency-free crash hook (and a matching inline twin in `bin/chorus.mjs`) that writes structured crash logs to `~/.chorus/crashes` on uncaught exceptions/unhandled rejections and nudges users toward GitHub issues or `chorus diagnose`. Wire the diagnose command into the CLI entrypoint and README, and add unit tests for all helper functions and formatting paths.	`src/cli/commands/diagnose.ts` `src/cli/crash-hook.ts` `bin/chorus.mjs` `src/cli/index.ts` `src/lib/db/connection.ts` `src/lib/db/schema.sql` `README.md` `tests/diagnose.test.ts` `tests/crash-hook.test.ts`
Introduce `chorus quickstart` self-test command to fire a minimal review-only chat against the first detected CLI and surface its result inline.	Implement `chorus quickstart` command that detects available CLIs, maps the first one to a template reviewer lineage, upserts a private `quickstart-self-test` review-only template, posts a chat with a hardcoded off-by-one sample artifact, polls its status with SIGINT cancellation, and displays reviewer output or failure summaries inline. Add a small YAML builder that generates a schema-valid review-only template matching the live `TemplateSchema`, with `crossLineage=false` and `require=1` so it works for single-CLI users, and ship disabled. Resolve cockpit URLs robustly via `daemon.json` instead of string substitution, and add tests covering the YAML builder, sample artifact, and mapping to the active schema.	`src/cli/commands/quickstart.ts` `src/cli/index.ts` `tests/quickstart.test.ts`
Harden CLI browser-opening and TCP port-inspection behavior for reliability across Node/OS combinations.	Replace direct `open` usage in CLI with a new `openBrowser` helper that dynamically imports the ESM-only `open` package to avoid `ERR_REQUIRE_ESM` under CJS builds, and await/catch failures when opening the cockpit URL (including from `chorus start` and status/auto-open paths). Enhance `port-utils` to set timeouts on `ss`/`lsof` subprocesses (with and without sudo), standardize on double-quoted strings, and ensure process-kill helpers use explicit signals and safer process lookup, adding tests to assert the timeout behavior.	`src/cli/open-browser.ts` `src/cli/commands/start.ts` `src/cli/index.ts` `src/cli/port-utils.ts` `tests/port-utils.test.ts`
Improve cockpit template/candidate handling and round enrichment so reviewer cards and models render correctly from t=0.	Extend `fromRow` to validate `template_snapshot` with `TemplateSchema.safeParse` and, on success, derive or preserve `candidatesWithModels` from each reviewer’s `candidates` while leaving single-voice audit-phase reviewers untouched, so cockpit code can iterate reviewer slots and show model names even when snapshots only carry daemon-side shapes. Add tests ensuring malformed/structurally-invalid/non-object snapshots fall back gracefully, and that both derived and pre-populated `candidatesWithModels` behave idempotently. Update `enrichRounds` to seed an empty round-1 when no rounds exist yet so the run page immediately renders QUEUED placeholder cards for all expected reviewer slots instead of cards appearing only as dirs are created, and add tests for the placeholder behavior and model propagation.	`src/lib/api/chats.ts` `tests/api-chats-from-row.test.ts` `src/components/live-run-real/enrich-rounds.ts` `tests/enrich-rounds.test.ts`
Tighten template schema validation and review configuration to catch misconfigured reviewer pools early.	Enhance `ReviewerSchema` with `superRefine` rules that reject `require` values exceeding `candidates.length` and, when `crossLineage=true`, exceeding the number of distinct lineages, surfacing clear schema errors at template-save time instead of opaque run-time failures. Add tests covering invalid `require`/candidates combinations and valid edge cases (e.g. `require=N` with N distinct lineages and `crossLineage=true`, and the non-cross-lineage cases).	`src/lib/template-schema.ts` `tests/template-schema.test.ts`
Harden codex integration for headless runs by centralizing argv construction and ignoring user config that can hang review jobs.	Refactor codex headless execution to a pure `buildHeadlessArgs` helper that always includes `--skip-git-repo-check` and `--ignore-user-config`, encodes sandbox/network/model flags, and tells `codex exec` to read the prompt from stdin. Update `codexShim.runHeadless` to use the new helper while preserving accountId/model validation, workspace pre-trust, and spawn options; add tests to lock the expected argv shape and the presence of `--ignore-user-config`. Normalize string quoting in `codex.ts` to consistent double quotes.	`src/daemon/agents/codex.ts` `tests/codex-headless-args.test.ts`
Extend CLI precheck behavior with macOS Keychain support for Claude Code v2 credentials and improve the associated tests.	Mock `node:child_process.execFileSync` in `cli-precheck` tests to simulate macOS Keychain behavior, and expand test coverage to ensure quota gating, cred file detection, per-lineage CTAs, and the new anthropic-on-darwin keychain fallback behave as intended. Ensure that when cred files exist, keychain is skipped; when running on non-anthropic lineages, keychain is not consulted; and that tests correctly reset HOME, DB, and mocks between runs.	`tests/cli-precheck.test.ts`
Add and wire voice-health metadata to support diagnose and auto-disable surfaces.	Extend `VoiceRowSchema.disabled_reason` to accept a new `auto_quota` value and document semantics for `user`, `auto_missing`, and `auto_quota` in code comments. Add indexes on `voices.enabled` (both in SQL and initDb) to speed up disabled-voice scans used by `chorus diagnose`, and expose voice-health summary (total voices, auto-disabled-by-quota/missing, user-disabled count) via the diagnose snapshot and report formatting.	`src/lib/db/voices.ts` `src/lib/db/connection.ts` `src/lib/db/schema.sql` `src/cli/commands/diagnose.ts` `tests/diagnose.test.ts`
Misc daemon and CLI robustness improvements and test coverage expansions.	Convert the output watcher’s `waitForAnswer` to dynamically import `chokidar` to avoid bundling issues and better align with async usage. Normalize string quoting/style across several modules for consistency, and add small correctness tweaks such as using `pathToFileURL` when dynamically importing dist/src entrypoints from the bin script to avoid Windows ESM URL scheme issues. Add or expand tests around template snapshot parsing, daemon chat-from-row behavior, and other touched areas to ensure no regressions from the upstream fold-in.	`src/daemon/output-watcher.ts` `bin/chorus.mjs` `tests/api-chats-from-row.test.ts` `tests/enrich-rounds.test.ts` `tests/iterate-on-disagreement.test.ts`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

coderabbitai · 2026-05-17T17:06:54Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8ba5d7a2-c07c-408c-8821-70c31c170afa

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey - I've found 1 security issue, 3 other issues, and left some high level feedback:

Security issues:

Detected calls to child_process from a function argument bin. This could lead to a command injection if the input is user controllable. Try to avoid calls to child_process, and if it is needed ensure user input is correctly sanitized or sandboxed. (link)

General comments:

The new fallback registry (fallback-registry.ts) is only used for per-attempt claims; I don’t see any call to resetRound() in the runner, so it’d be good to either wire resetRound(chatId, round) into the end-of-round/phase path or document why cross-round stickiness is intentional to avoid subtle over‑deduplication or state leaks across rounds.
Crash handling logic now exists both in bin/chorus.mjs and src/cli/crash-hook.ts with slightly different responsibilities; consider centralizing shared pieces (e.g. log format, field set) or adding a small shared helper to keep them from drifting over time.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The new fallback registry (`fallback-registry.ts`) is only used for per-attempt claims; I don’t see any call to `resetRound()` in the runner, so it’d be good to either wire `resetRound(chatId, round)` into the end-of-round/phase path or document why cross-round stickiness is intentional to avoid subtle over‑deduplication or state leaks across rounds.
- Crash handling logic now exists both in `bin/chorus.mjs` and `src/cli/crash-hook.ts` with slightly different responsibilities; consider centralizing shared pieces (e.g. log format, field set) or adding a small shared helper to keep them from drifting over time.

## Individual Comments

### Comment 1
<location path="src/daemon/runner/reviewer-driver.ts" line_range="434-438" />
<code_context>
+            // return null so runWithChainFallback advances to the next chain
+            // entry; emit a cli_warning tagged `fallback_collision` so the
+            // cockpit can show why the slot skipped.
+            const claimed = tryClaimFallbackTarget(
               chatId,
-              phase,
               round,
-              reviewerIdx,
-              candidateLineage: entry.lineage,
-              candidateModel: entry.model,
-              agentName,
-              askContent: ask,
-              answerFile,
-              reviewerDir,
-              repoPath,
-              abortSignal: handle.signal,
-              onEvent,
-            });
+              entry.lineage,
+              entry.model,
+            );
+            if (!claimed) {
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid releasing a fallback claim that wasn’t acquired.

Because `releaseFallbackClaim(chatId, round, entry.lineage, entry.model)` is always called in `finally`, it runs even when `tryClaimFallbackTarget` returns `false`. That allows a slot that never held the claim to release it, potentially clearing another slot’s valid claim and causing two reviewers to collide on the same fallback. Please call `releaseFallbackClaim` only when `claimed` is true (e.g., `if (claimed) releaseFallbackClaim(...)`).
</issue_to_address>

### Comment 2
<location path="src/cli/commands/diagnose.ts" line_range="163" />
<code_context>
+ * earn their entry by being explicitly added — we don't want to hide
+ * an actual bug because its message vaguely matches a regex.
+ */
+function filterBenignNoise(text: string): {
+  kept: string;
+  filteredCount: number;
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting shared helpers (log-noise filtering, CLI smoking, path utilities, and DB/daemon sections) into dedicated functions/modules so the diagnose command stays focused and easier to follow.

The new command is functionally rich but quite dense; a few small extractions would reduce complexity without changing behavior.

### 1. Simplify `filterBenignNoise` to a line/block‑level filter

The current brace‑depth + orphan‑tail logic is quite intricate for a very specific pattern. You can treat the Next.js SSE trace as a block starting from a header and ending at the first blank line (or a fixed number of lines), which is easier to reason about and extend.

For example:

```ts
// lib/diagnostics/log-noise.ts
const NEXT_PIPE_HEADER = "Error: failed to pipe response";
const NEXT_BLOCK_MAX_LINES = 20;

export function filterBenignNoise(text: string): { kept: string; filteredCount: number } {
  if (!text || text.startsWith("(")) return { kept: text, filteredCount: 0 };

  const lines = text.split("\n");
  const kept: string[] = [];
  let filteredCount = 0;

  for (let i = 0; i < lines.length; i++) {
    const line = lines[i];
    if (line.includes(NEXT_PIPE_HEADER)) {
      filteredCount++;
      let skipped = 0;
      // Drop the header + following lines until blank or cap
      while (i + 1 < lines.length && skipped < NEXT_BLOCK_MAX_LINES) {
        const next = lines[i + 1];
        if (!next.trim()) {
          i++; // consume blank line and stop
          break;
        }
        i++;
        skipped++;
      }
      continue;
    }
    kept.push(line);
  }

  return { kept: kept.join("\n"), filteredCount };
}
```

Then the command module only wires it:

```ts
import { filterBenignNoise } from "../../lib/diagnostics/log-noise.js";

// ...
webTail: (() => {
  const raw = tailFile(path.join(chorusDir, "logs", "web.log"), 300);
  const { kept, filteredCount } = filterBenignNoise(raw);
  const trimmed = kept.split("\n").slice(-20).join("\n").trim();
  return filteredCount > 0
    ? `${trimmed}\n  (${filteredCount} benign SSE-disconnect trace${filteredCount === 1 ? "" : "s"} filtered)`
    : trimmed;
})(),
```

This keeps the “hide SSE noise” behavior while making the implementation much simpler.

### 2. Extract CLI smoking into a reusable helper

`smokeOneCli` is fairly sophisticated (timeouts, stdout/stderr capture, redaction, signals). Pulling it into a small reusable helper keeps this command thin and makes the behavior shareable across diagnostics.

```ts
// lib/cli/smoke.ts
import { spawn } from "child_process";
import { redactHomePaths } from "../path-utils.js";

export interface SmokeResult {
  ok: boolean;
  exitCode?: number;
  version?: string;
  stderrFirstLine?: string;
  timedOut?: boolean;
}

export function smokeOneCli(bin: string): Promise<SmokeResult> {
  // (move the existing implementation here unchanged)
}
```

In `diagnose.ts`:

```ts
import { smokeOneCli, type SmokeResult } from "../../lib/cli/smoke.js";

// ...
const smokes: Array<SmokeResult | undefined> = await Promise.all(
  found.map((d) => (d.found && d.path ? smokeOneCli(d.path) : Promise.resolve(undefined))),
);
```

This immediately shortens the command file and isolates process‑spawning concerns.

### 3. Centralize path helpers

`abbreviateHome`, `redactHomePaths`, and `resolveBinPath` are generic utilities and can live in a small shared module:

```ts
// lib/path-utils.ts
import os from "os";
import fs from "fs";

export function abbreviateHome(p: string): string {
  const home = os.homedir();
  return p.startsWith(home) ? "~" + p.slice(home.length) : p;
}

export function redactHomePaths(s: string): string {
  const home = os.homedir();
  if (!home) return s;
  const escaped = home.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
  return s.replace(new RegExp(escaped, "g"), "~");
}

export function resolveBinPath(rawBinPath: string): string {
  try {
    return fs.realpathSync(rawBinPath);
  } catch {
    return rawBinPath;
  }
}
```

Then in `diagnose.ts`:

```ts
import { abbreviateHome, redactHomePaths, resolveBinPath } from "../../lib/path-utils.js";
```

This reduces the “utility noise” in the command and makes these helpers available to other commands.

### 4. Optionally split `gather()` by concern

Even without creating separate files immediately, you can reduce `gather()`’s cognitive load by factoring the major sections into small helpers (which can later be moved to `lib/diagnostics/*`):

```ts
async function getDbCounts(): Promise<{ chats: number | string; voices: number | string }> {
  try {
    const { getDb } = await import("../../lib/db/connection.js");
    const db = await getDb();
    const cr = await db.execute("SELECT COUNT(*) AS n FROM chats");
    const vr = await db.execute("SELECT COUNT(*) AS n FROM voices");
    return {
      chats: Number((cr.rows[0] as any).n),
      voices: Number((vr.rows[0] as any).n),
    };
  } catch (err) {
    return {
      chats: `(error: ${err instanceof Error ? err.message.slice(0, 80) : "unknown"})`,
      voices: "(unavailable)",
    };
  }
}

// inside gather():
const { chats, voices } = await getDbCounts();
// ...
db: { chats, voices },
```

Similar extractions for “daemon state”, “voice health”, and “recent failed chats” would turn `gather()` into a high‑level orchestrator rather than a long, mixed‑concern function.
</issue_to_address>

### Comment 3
<location path="bin/chorus.mjs" line_range="21" />
<code_context>
+import { fileURLToPath, pathToFileURL } from "node:url";
+import { dirname, join, resolve } from "node:path";
+
+// Crash hook — installed BEFORE any other import so it captures early
+// startup failures. The src/cli/crash-hook.ts version is the testable
+// canonical source; this is its zero-dependency twin, kept inline so it
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting the shared crash-log formatting into a small reusable module so both the TS crash hook and the bin script can use it instead of duplicating logic.

The duplicated crash‑handling logic in `bin/chorus.mjs` does increase complexity and maintenance risk. You can keep the “early install, zero deps” behavior without hand‑maintaining a second implementation by extracting a tiny shared formatter module and reusing it from both places.

One concrete way to do this:

1. **Extract a pure formatter from the TS side**

Create a small, dependency‑free module that only knows how to turn an error + context into `{ body, headline }`. It should not do any IO, so it’s safe to call from both the bin stub and the existing TS crash hook.

```ts
// src/cli/crash-log-core.ts
export interface CrashContext {
  err: unknown;
  source: string;
  version: string;
  node: string;
  platform: string;
  argv: string;
  cwd: string;
  uptimeMs: number;
}

export function buildCrashLog(ctx: CrashContext): {
  body: string;
  headline: string;
} {
  const stack =
    ctx.err instanceof Error
      ? `${ctx.err.name}: ${ctx.err.message}\n${ctx.err.stack ?? "(no stack)"}`
      : String(ctx.err);

  const body = [
    "# Chorus crash report",
    "",
    `timestamp:    ${new Date().toISOString()}`,
    `source:       ${ctx.source}`,
    `chorus:       ${ctx.version}`,
    `node:         ${ctx.node}`,
    `platform:     ${ctx.platform}`,
    `argv:         ${ctx.argv}`,
    `cwd:          ${ctx.cwd}`,
    `uptime_ms:    ${ctx.uptimeMs}`,
    "",
    "## Error",
    "",
    stack,
    "",
  ].join("\n");

  const headline =
    ctx.err instanceof Error
      ? `${ctx.err.name}: ${ctx.err.message}`
      : String(ctx.err);

  return { body, headline };
}
```

Ensure this is compiled as a tiny JS helper (e.g. `dist/cli/crash-log-core.js`) as part of your normal build.

2. **Use the shared formatter in the existing TS crash hook**

Refactor `src/cli/crash-hook.ts` to call the formatter and only own the wiring + IO:

```ts
// src/cli/crash-hook.ts
import { homedir } from "node:os";
import { join } from "node:path";
import { mkdirSync, writeFileSync } from "node:fs";
import { buildCrashLog } from "./crash-log-core";

const crashDir = join(homedir(), ".chorus", "crashes");

function writeCrash(err: unknown, source: string, version: string) {
  const { body, headline } = buildCrashLog({
    err,
    source,
    version,
    node: process.versions.node,
    platform: `${process.platform} ${process.arch}`,
    argv: process.argv.slice(1).join(" "),
    cwd: process.cwd(),
    uptimeMs: Math.round(process.uptime() * 1000),
  });

  // (existing mkdirSync/writeFileSync + stderr messaging here)
}

// existing process.on(...) hooks call writeCrash()
```

3. **Thin the bin crash hook down to “wire + IO” and reuse the same formatter**

In `bin/chorus.mjs`, import the compiled helper before anything else, but keep the rest minimal. You still read the version locally to avoid depending on `src`, but you no longer duplicate the log format / fields.

```js
// bin/chorus.mjs
import { existsSync, mkdirSync, readFileSync, writeFileSync } from "node:fs";
import { homedir } from "node:os";
import { fileURLToPath, pathToFileURL } from "node:url";
import { dirname, join, resolve } from "node:path";
import { buildCrashLog } from "../dist/cli/crash-log-core.js"; // tiny, prebuilt helper

const ISSUE_URL = "https://github.com/chorus-codes/chorus/issues/new";

function readChorusVersion() {
  try {
    const __dn = dirname(fileURLToPath(import.meta.url));
    const raw = readFileSync(resolve(__dn, "..", "package.json"), "utf-8");
    const parsed = JSON.parse(raw);
    return typeof parsed.version === "string" ? parsed.version : "(unknown)";
  } catch {
    return "(unknown)";
  }
}

function installCrashHook() {
  const crashDir = join(homedir(), ".chorus", "crashes");
  const version = readChorusVersion();

  const handle = (err, source) => {
    const ctx = {
      err,
      source,
      version,
      node: process.versions.node,
      platform: `${process.platform} ${process.arch}`,
      argv: process.argv.slice(1).join(" "),
      cwd: process.cwd(),
      uptimeMs: Math.round(process.uptime() * 1000),
    };

    const { body, headline } = buildCrashLog(ctx);
    // minimal duplication: mkdirSync/writeFileSync + stderr messaging
  };

  process.on("uncaughtException", (err) => handle(err, "uncaughtException"));
  process.on("unhandledRejection", (err) => handle(err, "unhandledRejection"));
}

installCrashHook();
```

This preserves:

- Early installation in the bin file before `await import(distEntry)`.
- Zero reliance on importing TS or the main CLI entrypoint.
- Identical crash log structure and messaging in both code paths, with a single source of truth for semantics.

The remaining duplication is now limited to small, obvious IO glue (where the paths differ anyway), and all formatting / fields live in one place.
</issue_to_address>

### Comment 4
<location path="src/cli/commands/diagnose.ts" line_range="302" />
<code_context>
      child = spawn(bin, ["--version"], { windowsHide: true });
</code_context>
<issue_to_address>
**security (javascript.lang.security.detect-child-process):** Detected calls to child_process from a function argument `bin`. This could lead to a command injection if the input is user controllable. Try to avoid calls to child_process, and if it is needed ensure user input is correctly sanitized or sandboxed.

*Source: opengrep*
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5f770d88e3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-17T17:10:03Z

+      `SELECT id, status, created_at FROM chats
+       WHERE status IN ('failed', 'blocked', 'cancelled')
+       ORDER BY created_at DESC LIMIT 5`,


Include no_review chats in diagnostics

When every reviewer fails (for example missing CLI/auth/quota exhaustion), runChat ends the chat with status = 'no_review', and those runs are exactly where _attempts.jsonl contains the failure context this new diagnose section is meant to surface. Because this query only includes failed, blocked, and cancelled, chorus diagnose reports no recent failed chats for the common all-reviewers-failed case, forcing users back into manual log collection.

Useful? React with 👍 / 👎.

Fixed in f404044 — added no_review to the IN-list (and a regression test that asserts both the status and quota_exhausted errorKind render). Good catch: no_review is exactly the all-reviewers-failed terminal state that this section was meant to surface, so the original query was reporting empty for the most useful case.

The recent-failed-chats section was meant to surface per-participant failure context from `_attempts.jsonl`, but the WHERE clause only covered 'failed', 'blocked', 'cancelled'. The most common failure shape — every reviewer down for missing CLI / auth / quota — ends the chat in 'no_review', which was being silently filtered out. So the exact case the section exists to diagnose returned an empty list, forcing users back into manual log collection. Adds 'no_review' to the IN-list and a regression test that asserts both the status and a quota_exhausted errorKind render in the report. Addresses chatgpt-codex review P2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

greptile-apps

crypticpy has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

…ut adapter (#1) * fix: cred detection + Claude MCP user-scope registration Three fixes from chorus-issues.md that prevent a freshly-installed chorus from finding the user's existing CLI credentials, so the daemon starts up cleanly on machines that already have Claude / Kimi / moonshot configured. #1: register Claude MCP at user scope. The chorus MCP entry now writes to the top-level `mcpServers` block in `~/.claude.json` (idempotent), and any stale chorus entry under the project-scoped `projects[homedir].mcpServers` is cleaned up. Previously the project-scoped registration was invisible to Claude Code launched outside that exact cwd. #2: cred-path fallbacks. When the anthropic file check misses (e.g. user authed via Claude Desktop, no `~/.claude/...` JSON), fall back to the macOS Keychain via `security find-generic-password -s "Claude Code-credentials"`. Added `~/.kimi/credentials/kimi-code.json` to the moonshot CRED_PATHS so users who authed through `kimi-code` aren't told to log in again. #3: kimi config-missing precheck. New layer-3 check parses `~/.kimi/config.toml` and surfaces a `config_missing` reason when there's no top-level `default_model` set — the CLI will silently pick whatever backend it likes, which is rarely what the user wants. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: reviewer fidelity, verdict surfacing, event/prompt isolation Seven fixes from chorus-issues.md covering the rest of the runner + MCP-surface issues found while reviewing PR #26 of foresight-app. #4: thread `repoPath` through reviewer subprocesses. `runReviewers` → `runReviewer` → `runReviewerHeadless` now accept the chat's repoPath and the reviewer's cwd switches to it when set, so `gh`, file reads, and sandboxed CLIs (Gemini) see the actual code instead of running in an empty per-reviewer scratch dir. #5: surface reviewer answer.md in MCP responses. New `readReviewerArtifacts` helper walks `~/.chorus/chats/<id>/round-N/reviewer-*/answer.md`, caps each at 16 KiB, sorts by (round desc, agent asc), and merges the result into `wait_for_chat` and `get_chat_status` payloads under `reviews`. Both the doer and reviewer `participant_done` events now carry `outputPath` so MCP clients can read the on-disk source of truth when they need more than the streamed tail. #6: bump phase_progress output tail from 500 B to 8 KiB. The 500-byte slice clipped reviewer summaries mid-word; full text remains on disk and is pointed to by `outputPath`. Affects both reviewer.ts and doer.ts. #7: tri-review verdict on `max_rounds_exhausted`. When the doer succeeded every round but reviewers kept saying request_changes through the round cap, chat_done now emits `status: completed, verdict: request_changes, reason: max_rounds_exhausted` with the last round's reviewer summary — previously misclassified as a generic doer failure. #8: refactor `CreateChatSchema` and `InvokePersonaSchema` to plain `z.object()` with per-field `.describe()`. The prior `.transform()` wrapped them in `ZodEffects` which strips the `properties` map from MCP introspection — clients saw an empty schema. Legacy `template` alias and the `code-review` default moved into a new `resolveTemplateId()` helper. #9: dedup `participant_done` at the multiplex layer. Same-slot fallbacks or parsers that emit `message_done` twice (the opencode parser historically does this) used to fan duplicate terminal events out to every subscriber; now keyed by `(phaseIdx, round, role, agent)` and later duplicates drop silently. #10: per-instance reviewer prompt isolation. Same-lineage instances (claude-code-2/4/5, etc.) share the chat dir tree at `~/.chorus/chats/<id>/round-N/reviewer-*/`; tool-using CLIs were wandering into a sibling's answer.md mid-flight and short-circuiting ("the review is complete" referring to a different agent's work). `buildReviewerAsk` now stamps an Independence directive when more than one reviewer slot exists, naming the slot tag and forbidding cross-slot reads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: replay chat_done from persisted verdict, not status The synthetic chat_done emitted when a terminal chat is re-attached derived `verdict` from `chat.status`, ignoring the `chat.verdict` column. Since the previous commit shipped the `max_rounds_exhausted` branch (chorus-issues.md #7), a chat can finish with `status='approved' verdict='request_changes'` — replay was clobbering that to `approved` on every page reload, hiding reviewer disagreement from the user. Use the persisted column when set; fall back to the old status-derived value only for pre-v0.8.27 rows where verdict is null. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: surface dropped attached_files + SSE backpressure; harden ship.ts Three audit follow-ups on the daemon side, all surfacing previously silent failures. attached_files: parseAttachedFiles in runner-multiplex.ts used to swallow JSON parse errors and run the chat with no attachments. Refactor to a tagged result (`empty` / `ok` / `invalid`); on `invalid` the runner logs and emits a `cli_warning` SSE so the cockpit + MCP clients see which chat lost its file list. SSE backpressure: when a subscriber's queue exceeds the 1000-line cap the multiplex used to silently drop the connection. Now writes one `error` frame with code `sse_backpressure` before close, and logs the queue length to daemon.log so an operator tailing logs can see when clients fall behind. gh pr create URL validation: ship.ts captured stdout's last line as the PR URL with no shape check; an empty/malformed stdout produced `{ok: true, prUrl: ''}` and the chat row recorded "shipped" with an unclickable link. Now matches against `^https://github.com/<owner>/<repo>/pull/<n>` before declaring success. detectGitContext parallelization: the five spawnSync probes (is-repo, remote, gh --version, gh auth, HEAD) ran sequentially at 60s each — worst case 360s before runner saw a result. Converted to async with a new `runAsync` helper, batched via Promise.all with a 15s per-probe cap; detectDefaultBranch's symref + three branch-existence checks likewise parallelized. detectGitContext is now async; the lone caller in runner.ts awaits it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: bound failure-summary regex; log malformed SSE frames participant-card.tsx: parseFailureSummary ran the multi-step regex chain over the full participant.answer string. Reviewer answers can be up to 256 KB; on every render that's a UI-thread block. Slice to the first 16 KiB before scanning — the failure-header block is always written at the top of answer.md by reviewer.ts/doer.ts, so the cap never loses signal. live-run-real/index.tsx: the SSE onmessage handler already had a try/catch around JSON.parse, but the catch was silent — a wire-format mismatch dropped events with no trace. Add a console.warn with a preview so devs notice schema drift in DevTools. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: github PR ingestion via gh CLI Adds src/daemon/github-pr.ts: parsePrUrl + fetchPrArtifact run gh pr view/diff plus review and issue comments in parallel, synthesize a Markdown artifact (description, comments capped at 50 newest each, diff capped at 200 KB UTF-8 safe), and classify gh failures into typed reasons. Exports runAsync from ship.ts so the new module can reuse the existing spawn+timeout helper instead of duplicating it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: extract createChatFromValidatedInputs helper Pulls the template lookup, artifact validation, chat row + opening phase event creation, and runner kickoff out of the POST /chats handler into a reusable helper. POST /chats now only handles its route-specific concerns (body shape, repoPath canonicalization, error response shaping). Sets up reuse from the upcoming POST /chats/from-pr endpoint without duplicating ~150 lines of validation logic. No behavior change — same template checks, same artifact rules, same kickoff path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: POST /chats/from-pr — start a chat from a GitHub PR URL Accepts { url, templateId, repoPath?, yolo? }, parses the PR URL, fetches PR meta + diff + existing comments via gh CLI, synthesizes a Markdown artifact, and creates the chat through the shared createChatFromValidatedInputs helper. gh failures map to typed reasons (invalid_url, gh_not_installed, gh_not_authed, pr_not_found, network_failure, unknown) so the cockpit can render actionable errors instead of generic 500s. Adds tests/github-pr.test.ts covering parsePrUrl edge cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: cockpit "GitHub PR" tab on /new Adds a Free-form / GitHub PR mode toggle on the new-chat page. PR mode swaps the prompt textarea for a URL input and routes through the new POST /chats/from-pr endpoint. Validates client-side that the chosen template is review-only before letting the user submit. createChatFromPr API client surfaces the daemon's typed PR meta (owner/repo/number/title/branches) on the response so callers can display PR context after the chat is created. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: review_pr MCP tool Exposes POST /chats/from-pr through MCP. Orchestrators (Claude Code, Codex, Cursor) can now hand chorus a PR URL and get reviewers running against it without going through the cockpit. Defaults templateId to review-only so a caller can pass just a URL. ReviewPrSchema is a plain z.object (not ZodEffects) so MCP clients can introspect required fields — same hazard documented on CreateChatSchema and InvokePersonaSchema. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: capture multi-identity CLI follow-up idea Idea note for running chorus against multiple paid accounts on the same CLI binary (work + personal Claude Code Max, etc.). Filed as follow-up after audit-presets + quota tiers ship — captures the env-override mechanism, proposed Identity primitive, and open questions on keychain CLIs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: schema for audit + orchestrate phases, voice tier, bypass_quota Adds the foundation for repo-pointed audit-and-orchestrate runs and the orchestrator's task↔voice tier matching. Template schema: - AuditPhase (kind: 'audit') — single reviewer voice + one of five preset lenses (de-slopify, monolith-breakdown, code-review, engineering-review, architecture-review). Output schema (AuditItemSchema, AuditOutputSchema) lives next to the phase shape so the structured-output adapter, scheduler, and cockpit checklist agree on the contract. - OrchestratePhase (kind: 'orchestrate') — array of worker voices, default branchPrefix `chorus/{chatId}/worker-{idx}` so each worker gets isolated git state. - templateRequiresRepo() helper for the cockpit's repo-picker gate. Voices: - Adds tier ('high' | 'medium' | 'low', default 'medium') and monthly_budget_usd (nullable) to the row schema, upsert input, and update input. Idempotent migrations on existing DBs. Chats: - bypass_quota INTEGER NOT NULL DEFAULT 0 — set on PR-review chats so the orchestrate scheduler runs every enabled voice at full capacity instead of tier-gating. Runner is stubbed for the new kinds: phase_done emit + continue, so templates that declare an audit/orchestrate phase before the runner logic lands don't crash. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: structured-output adapter for CLI voices Wraps an AgentShim's runHeadless with JSON-formatting prompt scaffold and a one-shot repair loop, returning typed data validated against a caller-supplied zod schema. Used by the upcoming audit phase (which needs typed AuditItem[] instead of free-form prose) and the orchestrate phase (worker results). Keeps each CLI lineage's existing headless transport — the adapter just owns the prompt-shape + parse-and-validate dance. Extraction strategy: prefer direct JSON.parse of finalText; fall back through fenced-block regex variants to a brace-to-brace slice. On parse or schema-violation, retry once with a repair prompt that quotes the validation error. Spawn errors short-circuit (the model never saw the prompt — repair would just retry the same failure). Tests cover happy path, fenced-block extraction, repair-loop success, repair-loop exhaustion, schema violation, and spawn error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cockpit): audit-a-repo tab + checklist approval component /new gets a third tab beside Free-form and GitHub PR. In audit mode the user picks one of five preset lenses (de-slopify, monolith-breakdown, code-review, engineering-review, architecture-review) and supplies an absolute repo path. Submit fires createChat with templateId=`audit-<preset>` — those built-in templates land with the audit-phase implementation. RunChecklist component lives at src/components/run-checklist/. It takes the AuditItem[] surfaced by the audit phase's blocking event and renders one row per item with a checkbox, complexity badge, rationale, and file list. Default state has every item selected; the user trims, then submits via the parent's onSubmit which JSON-encodes the selected ids into the existing /chats/:id/resume `answer` field. Wiring into the live-run UI lands with the audit phase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: PR-review chats bypass quota + tier surface on /voices PR-review chats automatically set bypass_quota=true so the orchestrate scheduler ignores voice.tier and runs the full fleet at maximum capacity — reviews are short, parallel, and the user wants the strongest opinion possible regardless of model tier. PUT /voices/:id now accepts tier ('high' | 'medium' | 'low') and monthly_budget_usd (non-negative or null), so the cockpit fleet page can label voices by capability for the orchestrate scheduler to route work against. Tests cover both new fields plus a chat round-trip asserting bypass_quota defaults false and persists when set. * feat: audit phase + 5 presets + audit-* templates Wires the audit phase end-to-end: - src/daemon/phases/audit.ts runs the structured-output adapter against the chosen preset, persists the parsed AuditItem[] to <chatDir>/audit-output.json plus raw model output to round-1/audit/output.md, and emits phase_progress with the items. - src/daemon/runner.ts replaces the audit/orchestrate stub: audit invokes runAuditPhase, flips chat status to blocked so the cockpit renders the checklist UI, and exits cleanly. Orchestrate keeps the no-op stub until step 5 lands. - 5 preset prompts (de-slopify, monolith-breakdown, code-review, engineering-review, architecture-review) frame what each lens looks for. The structured-output adapter handles JSON formatting; presets describe the audit lens only. - 5 audit-* templates (one per preset), each a 2-phase audit -> orchestrate shape with three default workers. Auto-loaded by seedBuiltinTemplates. - tests/audit-phase.test.ts covers preset-file presence and the audit-* template parse + shape contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: orchestrate phase + audit-resume wiring + tier-aware scheduler Wires the audit→orchestrate handoff: the cockpit POSTs the user's trimmed audit checklist to /chats/:id/resume, the resume handler cross-checks ids against audit-output.json, persists the selection, flips chat to drafting on the orchestrate phase, and re-fires the runner. The runner now starts at chat.current_phase_idx so a resumed chat lands directly on orchestrate. The new orchestrate phase walks the approved AuditItem[] sequentially (parallelism is an explicit non-goal for v1), picks a worker per item via the pure tier-aware scheduler, cuts a per-item branch, dispatches the worker via shim.runHeadless, captures git diff --stat, and persists orchestrate-manifest.json for the diff-apply UI to consume. The scheduler is a pure function with 9 unit tests covering tier matching, bypass override, disabled-voice skipping, empty pool, and unknown voice ids. Resume route has 10 tests exercising body validation, id cross-check, status gating, and the happy path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: orchestrate manifest UI + checkout/open-pr daemon routes - Run page reads audit-output.json + orchestrate-manifest.json on render - LiveRunReal renders RunChecklist while blocked w/ audit items, then swaps to OrchestrateManifest panel once orchestrate completes - New OrchestrateManifest component shows one row per worker w/ Checkout / Open PR buttons (per-row inline feedback, no global toast) - Daemon: GET /chats/:id/audit-items, GET /chats/:id/orchestrate-manifest, POST /chats/:id/workers/:idx/checkout (refuses on dirty tree), POST /chats/:id/workers/:idx/open-pr (gh pr create, bucketed failures) - OrchestrateManifestSchema added to template-schema.ts; route + UI parse via the same shape Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: harden resume race + branch validation + symlink TOCTOU + extractJson Address /freview findings on the audit + orchestrate flow: - Resume race (BLOCKER): two concurrent POSTs to /chats/:id/resume could both pass the `status=='blocked'` check and double-fire the runner. Guard with `getActiveRun` (catches the audit-finishing window before `.finally` clears the registry) and replace the status flip with an atomic `tryResumeFromBlocked` CAS conditional on `WHERE status = 'blocked'`. - Branch-name argument injection (BLOCKER): tighten zod regexes on `OrchestratePhase.branchPrefix` and `OrchestrateManifestEntry.branch` so values starting with `-` (or containing shell metachars) cannot flow into `git checkout` / `gh pr create` as flags. - Symlink TOCTOU on checkout + open-pr (NON-BLOCKER): re-realpath `existing.repo_path` before passing to execFile cwd, mirroring the rerun-path pattern. Returns a structured validation error if the path no longer resolves. - extractJson Path 4 (NON-BLOCKER): try `{...}` and `[...]` slices independently and prefer the longer parse, so prose like "mentions [stuff] before {object}" extracts the object instead of the bracket. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: prod CJS build — drop import.meta + copy presets to dist Two issues blocked `pnpm build:server`: - `audit.ts` used `import.meta.url` for module-relative path resolution, but the server tsconfig compiles to CJS where `import.meta` is a syntax error. Replaced with `__dirname`, which works in both the compiled dist (native CJS) and tsx-driven dev (tsx ≥4 shims it in ESM mode). - The `build:server` script copied `schema.sql` to dist/ but missed the preset markdown files in `src/daemon/presets/`. The audit phase's `loadPresetPrompt` resolves relative to `__dirname`, so a published install was hitting ENOENT on every audit run. Extended the copy step to mirror the preset directory. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: fold upstream T1+T2 fixes back into fork (12 commits) (#2) * feat(cli): add diagnose command + crash-hook Bundles two upstream changes that ship a self-service triage path for chorus users hitting opaque failures: - `chorus diagnose` walks the install, daemon, recent failed chats, voice health, and produces a sharable bug report. - Crash hook captures uncaught exceptions in the CLI and writes them to a crash log alongside instructions to attach during a bug report. Folded back from upstream chorus-codes/chorus: 7ea712b feat: chorus diagnose command + crash hook for bug reports (#1) 4a5ea20 fix(diagnose): realpath bin path + filter Next.js SSE noise (#4) Co-Authored-By: chorus-codes <info@chorus.codes> * feat(cli): add quickstart self-test command `chorus quickstart` runs a 30-second activation flow that verifies the daemon comes up, the SQLite DB initializes, and a minimal chat round-trips end-to-end. Aimed at first-run users who want to know "is this thing actually working" before authoring a template. Folded back from upstream chorus-codes/chorus: 56610cf feat(cli): chorus quickstart — 30-second activation self-test (#30) Co-Authored-By: chorus-codes <info@chorus.codes> * fix(cli): use dynamic import for open package (Node 22 ERR_REQUIRE_ESM) The `open` package and `chokidar` are both ESM-only as of recent versions. On Node 22 (the daily-driver target) static `require()` calls into them throw ERR_REQUIRE_ESM and crash the CLI at boot. Switch to dynamic import in: - src/cli/commands/start.ts (open browser after boot) - src/cli/open-browser.ts (new helper) - src/cli/index.ts (route open import) - src/daemon/output-watcher.ts (chokidar file watch) Includes upstream's post-merge hardening: the setTimeout that triggers the browser-open no longer wraps an async callback bare, so a missing default browser doesn't surface as an unhandled rejection. Folded back from upstream chorus-codes/chorus: e8ca2ee fix(cli): dynamic import for open package (#14) dcd1837 fix: post-merge hardening for #14 (start.ts portion only; cli-precheck.test.ts portion ships with the Keychain fix) Co-Authored-By: Julien Deudon <deudon.j@gmail.com> Co-Authored-By: chorus-codes <info@chorus.codes> * feat(cockpit): seed empty round-1 so QUEUED renders from t=0 Before: when a chat starts but no reviewer has produced an event yet, enrichRounds returned an empty rounds array and the live-run page showed nothing for several seconds — the user couldn't tell whether their chat had launched. After: seed a synthetic round-1 with QUEUED placeholders for every expected participant so the page renders the per-reviewer cards immediately. Real events overwrite placeholders as they arrive. Folded back from upstream chorus-codes/chorus: 53e8fb6 feat(cockpit): seed empty round-1 so QUEUED placeholders render from t=0 (#2) Co-Authored-By: chorus-codes <info@chorus.codes> * feat(daemon): runtime fallback-collision dedup across reviewer slots When two reviewer slots both fall through their per-slot chains to the same template-level fallback target (common case: every slot ends in anthropic/claude-sonnet-4-6), both used to dispatch the same (lineage, model) in parallel — wasted cost and the lineage diversity that's the point of multi-LLM peer review collapsed. Build-time dedup (template-fallback.ts) couldn't catch it because each slot only knows about other slots' PRIMARIES, not their fallback chains. Fix: new per-chat/per-round (lineage, model) registry. reviewer-driver tryClaim's before each chain attempt and releases in a finally. On collision, return null + emit cli_warning(reason='fallback_collision') so runWithChainFallback advances to the next entry and the cockpit can show why the slot skipped. Ported into fork's reviewer-driver.ts surgically so the verdict-isolation refactor (2a2cde2) and per-slot repoPath threading stay intact. Folded back from upstream chorus-codes/chorus: c4751fe feat(daemon): runtime fallback-collision dedup (#3) Co-Authored-By: chorus-codes <info@chorus.codes> * fix(daemon): write REVIEWER FAILED summary on pre-spawn failure Before: when a reviewer's precheck fails (e.g. underlying CLI not installed) or the chat is cancelled while the slot is queued for a CLI semaphore slot, runReviewer used to return null silently — leaving NO on-disk participant directory. The cockpit's enrich-rounds loop then couldn't reconcile the synthesised template slot against any real participant, so the card sat at "Queued — waiting for an open slot." forever and the actual error was invisible. Reproduction: install chorus on a host with only one CLI on PATH (e.g. just claude-code), open a template that includes lineages requiring codex/gemini/kimi, fire it. Every reviewer card stayed "Queued" — chat never visibly progressed even though it was already done failing. Fix: - Create the reviewer dir BEFORE the precheck runs. - Add a writePreSpawnFailure helper that writes a `## REVIEWER FAILED` summary in the canonical format (Kind / Lineage / Model / message) that the cockpit's `parseFailureSummary` already understands. - Wire it into the precheck-failed and cancelled-while-queued paths. Card now transitions out of pending and shows the actual error (cli_missing, cancelled, ...). Folded back from upstream chorus-codes/chorus: afc59cc fix(daemon): REVIEWER FAILED summary on pre-spawn failure (#26) Co-Authored-By: chorus-codes <info@chorus.codes> * feat(voices): auto-disable on persistent quota_exhausted + lsof timeout Real pain (upstream #11): a Pro Gemini model on a Flash-only account fails every chorus run with "exhausted your capacity on this model" — but Gemini doesn't return a resetAt because the model isn't going to become available for that account. Without auto-disable, the runner keeps picking the dead voice on every chat and the user keeps seeing the same opaque error. Voice auto-disable: - New src/lib/voice-failure-tracker.ts records per-voice consecutive quota_exhausted strikes in a settings counter. - Trigger: 2 consecutive strikes WITH no resetAt → set voices.enabled=false + disabled_reason='auto_quota'. - Counter resets on participant_done success; rate-limit strikes (hasResetAt=true) bypass the counter entirely so a transient 429 + a later permanent failure can't trip the threshold on the first permanent strike. - Wired into reviewer-driver alongside recordHealth; emits a cli_warning(reason='voice_auto_disabled') so the cockpit can show a one-line explanation. - VoiceDisabledReason union gains 'auto_quota' (schema column was already TEXT — no migration). Lsof timeout (upstream #12): - findPidsOnPort and findPidsOnPortWithSudo now bound execSync / execFileSync to 3s, so a slow-but-functional lsof on a loaded macOS box doesn't hang chorus boot. 3s leaves headroom while still bounding the hang case. Ported into fork's reviewer-driver.ts tmux pollHandle + success path. voices.ts disabled_reason union extended alongside fork's voice-tier column. Folded back from upstream chorus-codes/chorus: 4f6becc v0.8.30 — voice auto-disable (#11) + lsof timeout (#12) (#17) Co-Authored-By: chorus-codes <info@chorus.codes> Co-Authored-By: Lumina Mao <luminamao@mac.lan> * fix(daemon, schema): codex isolation + template-schema validation Two issues caused chats to fail opaquely at run-start: CODEX ISOLATION (#10, #16) The user's ~/.codex/config.toml may declare MCP servers, plugins, or notification hooks. In headless `codex exec` those integrations have caused codex to hang or cancel mid-call — two independent reproductions: codex as our reviewer (#10) and codex as MCP client of chorus (#16). Add --ignore-user-config to every headless codex argv. Extracted to a pure `buildHeadlessArgs(opts)` so the argv shape is unit-testable. TEMPLATE VALIDATION (#15) `reviewer.require > candidates.length` used to surface as "Job moves immediately to failure upon Start press" — the runner queued, failed to grant enough slots, and emitted an opaque chat-failure. Same for `require > distinct lineages` when crossLineage:true. Both now caught at TemplateSchema.parse() time with a clear error message the user can fix before the run starts. ReviewerSchema.superRefine() additions slot in cleanly alongside the fork's audit/orchestrate phase schema work — both are additive constraints on the same ReviewerSchema object. Folded back from upstream chorus-codes/chorus: 8ed970b fix(daemon, schema): codex isolation + template validation Co-Authored-By: chorus-codes <info@chorus.codes> * fix(runner): honour iterate.onDisagreement accept-doer/escalate The template schema, cockpit dialog, and SPEC-D-templates have always exposed three values for iterate.onDisagreement — 'continue', 'escalate', 'accept-doer' — but the runner only honoured 'continue'. Picking the other two from the cockpit form was a silent no-op: chats fell through to phase_failed with 'doer_failed_all_rounds' regardless. This wires both new branches into the round loop and the terminal chat_done emission: - 'accept-doer': after maxRounds without consensus, mark doerSucceeded and continue. The chat carries on (subsequent phases, ship, approval) as if reviewers had agreed on the doer's last answer. - 'escalate': halt with status='failed' but verdict='request_changes' and error='escalated_on_disagreement', so cockpits can render "reviewers disagreed, needs human" distinctly from "doer broke." Policy table extracted into a pure decidePhaseOutcome() helper so the 3 × 2 input matrix (policy × disagreement-in-last-round) is unit-tested without standing up the full runChat scaffold. Gated on disagreementInLastRound (reset at top of every round + on doer-crash path) so a partial / empty doer answer can never be silently "accept-doer"'d as final. Preserves the fork's existing standardPhaseRoundsExhausted #7 surfacing for the 'continue' path; the 'escalate' path takes precedence with its own distinct chat_done. Upstream PRs #49, #50 (commit 67572e9). Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(cli-precheck): cover macOS Keychain fallback for Claude Code v2 The fork already implements the Keychain fallback in cli-precheck (hasDarwinKeychainEntry). This adds the missing test coverage: - passes when no cred file but keychain entry exists - blocks when no cred file and no keychain entry - skips keychain check when cred file exists (fast-path preserved) - does not consult keychain for non-anthropic lineages vi.mock('node:child_process') uses the importOriginal spread pattern so spawn / exec / etc. keep their real implementations — a bare module replacement would silently break any sibling test that imports from child_process. Upstream PRs #7, #8, plus the dcd1837 test-mock hardening. Co-Authored-By: Yura <yurahalych@gmail.com> Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cockpit): derive candidatesWithModels from snapshot's candidates field Daemon-side TemplateSchema only carries `candidates` on each ReviewerRule. The cockpit Template type expects `candidatesWithModels` populated — enrich-rounds iterates that field to build slot→model mappings for run-page cards. When fromRow parsed template_snapshot and cast it to Template, the cast was a TypeScript lie: at runtime the parsed object lacked candidatesWithModels, enrichRounds iterated zero reviewer slots, and no model name reached the cards (badge appeared empty). Derive candidatesWithModels at the parse seam (chats.fromRow) so the cockpit's Template contract is honoured regardless of which path produced the data. Idempotent — if a future daemon ever serialises the field directly, that wins. Persona forwarded if present. Audit- phase single-voice reviewers (no candidates array) are skipped via a runtime narrow. Upstream PR #6 (chorus-codes/chorus@ac0c7fd). Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(diagnose): capture failure context — CLI smoke, voice health, recent failed chats Extends `chorus diagnose` with three signals that triage the most common breakage modes: - **CLI smoke**: spawn `<bin> --version` per detected CLI with a hard 2s SIGKILL timeout (wrapper scripts may trap SIGTERM). Distinguishes `timedOut` from non-zero exit so the report can tell hangs apart from crashes. - **Voice health**: counts `enabled=0` voices grouped by `disabled_reason` ('user' vs 'auto_missing' vs 'quota_exhausted'). Added `idx_voices_enabled` so the `WHERE enabled = 0` scan stays cheap as the table grows. - **Recent failed chats**: last 5 chats with `status='blocked'` plus the errored participants pulled from `~/.chorus/chats/<id>/round-*/<part>/_attempts.jsonl`. Only `errorMessageBytes` is exposed — raw error text never leaves the user's machine. `$HOME` is redacted from any embedded path strings via `redactHomePaths`. Adapted from upstream chorus-codes/chorus#19 (0666dca). Preserves the fork's existing diagnose shape and adds tests for smokeOneCli / readLatestAttempt / formatReport rendering of the three new sections. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(diagnose): include no_review in recent failed chats query The recent-failed-chats section was meant to surface per-participant failure context from `_attempts.jsonl`, but the WHERE clause only covered 'failed', 'blocked', 'cancelled'. The most common failure shape — every reviewer down for missing CLI / auth / quota — ends the chat in 'no_review', which was being silently filtered out. So the exact case the section exists to diagnose returned an empty list, forcing users back into manual log collection. Adds 'no_review' to the IN-list and a regression test that asserts both the status and a quota_exhausted errorKind render in the report. Addresses chatgpt-codex review P2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: chorus-codes <info@chorus.codes> Co-authored-by: Julien Deudon <deudon.j@gmail.com> Co-authored-by: Lumina Mao <luminamao@mac.lan> Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Yura <yurahalych@gmail.com> * feat: fold upstream Grok + Local LLM + Keychain dual-probe (4 commits) (#3) * feat(grok): detect Grok Build (xAI) + Level 1 orchestrator Adds Grok Build CLI to detection, onboarding picker, /connect card, diagnose smoke, init listing, and doctor labels. Grok auto-picks chorus MCP from ~/.claude.json (verified empirically via `grok inspect`) — no separate MCP wire needed. The grok orchestrator reports connected=true when both the binary is detected AND chorus is wired in ~/.claude.json (either top-level mcpServers or any project-scoped mcpServers entry). connect() is a no-op that points users at `chorus connect claude` if claude hasn't been wired yet. Quickstart filters CLIs to those with shims, so grok-cli being detected first no longer breaks the doer-pick flow. The cliToLineage map remains the source of truth for reviewer-capable CLIs. `docs/integrating-a-new-cli.md` captures the full Level 1/2/3 integration playbook for future CLIs — written while doing this so the steps are tested. Adapted from upstream chorus-codes/chorus#44 (6a00b00). No conflicts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(local): add Local LLM HTTP shim for OpenAI-compatible endpoints Adds a `local` lineage that dispatches chat completions to any OpenAI-compatible HTTP endpoint (Ollama, llama-swap, LM Studio, vLLM, or anything that speaks `/v1/chat/completions`). No external subscription or CLI binary required — only a running local inference server. Configuration: save a JSON secret under key `local` via Settings → Local LLM: {"base_url": "http://127.0.0.1:11434/v1", "api_key": ""} Model ids may use a `local:` prefix (e.g. `local:llama3`) which the shim strips before dispatch, or bare model names directly. When no secret is saved, falls back to Ollama's default port. Wiring sweep (extends every exhaustive enum / Record so templates can declare local voices without Zod errors): - src/daemon/agents/local.ts — new HTTP shim with JSON.parse guard on the secret (yields a typed `config_parse` error event for malformed secrets instead of throwing inside the generator) - src/daemon/agents/index.ts — register localShim, `local:` prefix routing in pickShimForVoice, add to isHttpDispatchedShim - src/daemon/agents/types.ts — `local` in Lineage - src/lib/template-schema.ts — `local` in both lineageEnum + reviewerLineageEnum - src/lib/cli-health.ts — `local` in CliLineage + ALL_LINEAGES - src/lib/cli-precheck.ts — empty CRED_PATHS, LOGIN_HINT, skip the file probe (same pattern as openrouter — auth lives in secrets table) - src/lib/cockpit-types.ts — `local` in ReviewerLineage - src/lib/lineage-maps.ts — `local` in DaemonLineage, UILineage, every label/dot/brand map; UI_LINEAGE_DEFAULT_MODEL[local] = "" (model IDs are endpoint-specific). Teal dot distinguishes local from openrouter's cyan - src/components/phase-editor/constants.ts — LINEAGES list, DAEMON_TO_COCKPIT_LINEAGE - src/components/template-dialog/constants.ts — COCKPIT_TO_DAEMON, DAEMON_TO_COCKPIT, DAEMON_DEFAULT_MODEL, FALLBACK_LINEAGES Adapted from upstream chorus-codes/chorus#41 (716fa3a). The bundled upstream commit also included Keychain dual-probe (#38) and fallback-registry hold-on-success (#42) — those land in follow-up commits in this PR so each concern is reviewable independently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Greg <7xshadowx7@gmail.com> Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com> * feat(grok): Level 3 shim — full reviewer dispatch (happy-path unverified) Promotes Grok Build from Level 2 (consumer-only) to Level 3 (full reviewer shim). Chorus can now dispatch to grok-build as a doer or reviewer in any template. What's verified (empirically): - Detection, headless-mode invocation pattern (`grok -p ... --output-format streaming-json --yolo --max-turns 1`), error event shape, exit-code semantics - Failure path: free-tier auth produces clean quota_exhausted (SuperGrok Heavy subscription required) → voice auto-disables after N strikes - All UI surfaces (model boxes, template-editor lineage picker, run-page participant card, cli-status-panel, onboarding picker, connect orchestrator) What's specced but not run live (needs SuperGrok Heavy): - Happy-path streaming-json text/end event parsing (followed `~/.grok/docs/user-guide/13-headless-mode.md` spec) - Token/cost accounting — Grok doesn't surface usage in end event; estimateCostUsd returns 0 New files: - src/daemon/agents/grok.ts — shim with `--max-turns 1` headless args - src/daemon/agents/parsers/grok.ts — streaming-json + stderr parser - tests/grok-parser.test.ts — 18 cases covering happy / error / robustness Lineage sweep (xai daemon lineage was already a legacy alias to opencode — uses fresh `grok` daemon lineage to avoid colliding with that mapping; old YAML with `lineage:xai` still routes to opencode): - Lineage / CliLineage / ReviewerLineage / DaemonLineage / UILineage - LINEAGE_LABEL / LINEAGE_DOT / UI_LINEAGE_* / UI_LINEAGE_BRAND - UI_LINEAGE_AVAILABLE_MODELS.grok = ['grok-build'] - UI_LINEAGE_DEFAULT_MODEL.grok = 'grok-build' - template-schema lineageEnum + reviewerLineageEnum - DB voices row schema (additive — old rows still validate) - phase-editor LINEAGES + DAEMON_TO_COCKPIT_LINEAGE - template-dialog COCKPIT_TO_DAEMON + DAEMON_TO_COCKPIT + DAEMON_DEFAULT_MODEL + FALLBACK_LINEAGES - cli-status-panel + live-run-real helpers - error-detector auth-prompt regex (SuperGrok signature on its own branch ABOVE the generic auth regex — classifies to quota_exhausted, not auth_invalid) Voice seeding: grok-cli registered in SINGLE_MODEL_CLIS — auto- creates the grok-cli voice (id=grok-cli, lineage=grok, model_id=grok-build) on first daemon boot when the binary is detected. Auth flow: ~/.grok/auth.json file probe OR GROK_CODE_XAI_API_KEY env short-circuit. Both verified in tests/cli-precheck.test.ts. Daemon won't spawn grok without one or the other present — prevents the browser-OAuth flow from hanging headless dispatch. Total tests: 821 → 842 (+21). Adapted from upstream chorus-codes/chorus#46 (f9dfba5). Conflicts resolved by taking the union of fork's `local`-extended enums and upstream's `grok`-extended enums (every Record / z.enum had to be extended in both dimensions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com> * fix(cli-precheck): macOS Keychain dual-probe — also check "Claude Code" service Claude Code v2.x stores OAuth credentials under two service names depending on the auth flow: - `Claude Code-credentials` — Pro/Max OAuth via `claude login` - `Claude Code` (no suffix) — API-key auth + some Console-account flows The previous single-service probe regressed to auth_missing for API-key users on darwin. Refactor hasDarwinKeychainEntry to accept string | string[], iterate candidates, short-circuit on first match. Each probe stays bounded to 1.5s so a misconfigured keychain can't stall every spawn. Refs upstream issue #38 / commit 716fa3a. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: PR review — local in voices enum, AGENT_TO_LINEAGE for grok/local, separate cred-precheck vs semaphore bypass Addresses bot review on PR #3: - Sourcery P2 (src/lib/db/voices.ts): VoiceRowSchema and VoiceUpsertInput only allowed `grok` in the new-lineage slot; `local` voices upserted via the (future) Local LLM connect flow would have failed Zod validation at runtime. Add `local` to both the enum and the union. - Codex P2 (src/app/api/run-artifacts/[chatId]/route.ts + src/app/runs/[runId]/page.tsx): AGENT_TO_LINEAGE did not map `grok-cli` → `grok` nor `local` → `local`, so a real Grok or Local participant directory (`reviewer-grok-cli-N`, `reviewer-local-N`) resolved to a bogus lineage and rendered as an unbranded extra card while the placeholder slot stayed pending. - Codex P2 (src/daemon/agents/index.ts + src/daemon/runner/{doer,reviewer}-driver.ts + src/lib/settings/concurrency.ts): the daemon used a single predicate `isHttpDispatchedShim` for two unrelated decisions — bypassing the CLI-credential precheck AND bypassing the local-CLI semaphore. That was safe for OpenRouter (truly remote) but wrong for the Local LLM shim, whose default endpoint is Ollama on 127.0.0.1: N concurrent reviewers + a doer can thrash VRAM/RAM on consumer hardware. Split into `isHttpDispatchedShim` (kept for cred-precheck bypass) and `bypassesLocalCliSemaphore` (only openrouter). Add `grok-cli` and `local` to CLI_LINEAGES with conservative per-CLI defaults (grok-cli matches gemini at 2; local defaults to 1, bump in /settings if your endpoint multiplexes). Tests: 845 pass (unchanged), typecheck clean. * fix: PR review — CodeRabbit pass (docs/Grok level, init+quickstart+local edges, regex, tests) Addresses CodeRabbit's first batch of review comments on PR #3: - docs/integrating-a-new-cli.md: contradictory level for Grok — line 3 said "detection-only", line 15 said level 2, line 302 said level 3. Normalize to level-3 (the shim ships in this PR) and note that the level-2 orchestrator coexists for the consumer-side wiring. - src/cli/commands/init.ts: `--connect grok` was rejected because the local Name union, ALL_NAMES list, and the `--connect` option help text omitted 'grok' even though detection labels and OrchestratorName already accepted it. Adding 'grok' to all three. - src/cli/commands/quickstart.ts: the "install one of …" guidance printed when no CLIs are detected still listed only 5 — extend to Grok CLI to match the dispatchable set. - src/daemon/agents/local.ts: * Empty `base_url` (e.g. user saved settings with an empty box) was passed through `??` as the URL and surfaced as an opaque fetch error; treat empty / whitespace-only as unset and fall back to DEFAULT_BASE. Strip trailing slashes while at it. * Trailing SSE payload was dropped when the server closed without a final blank-line delimiter (older Ollama, some vLLM configs) — the last text_delta could silently disappear, truncating answers. Extract event-dispatch + payload-extract into local helpers and flush the residual buffer after the read loop exits. - src/lib/cli-detect.ts: grok regex documented "name OR bare-version" but only matched the name. Add the bare-version alternative; the basename guard already prevents cross-vendor matches. - tests/grok-parser.test.ts: 4 cases narrowed event[0] under `if (events[0].type === 'error')` without a prior `expect(...).toBe` on type — a non-error event silently skipped the inner assertions. Add explicit type expectations before the narrowing. Tests: 845 pass (unchanged), typecheck clean. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Greg <7xshadowx7@gmail.com> Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com> * feat: fold upstream contributor stack — repoPath default + CRLF persona parser (#4) Folds the cross-platform pieces of upstream commit 781bc42 ("Contributor stack: claude orchestrator + repoPath + Windows spawn (#39)") into the fork, intentionally omitting Windows-specific hunks. Included: - src/mcp/tools.ts: add safeCwd() helper + default `repoPath` on create_chat to safeCwd() when caller omits it. Previously the daemon fell back to its own cwd (packageRoot), which caused relative file paths in `files: [...]` to silently resolve to the chorus install dir and miss. MCP servers spawned by Claude Code / Codex / Gemini inherit the host's cwd (= the user's project), so safeCwd() lands at the right path automatically. safeCwd() also catches ENOENT from process.cwd() and falls back to homedir. - src/lib/personas.ts: normalize CRLF → LF in the frontmatter parser so persona .md files checked out with Windows line endings don't fail `missing YAML frontmatter`. Cross-platform safe. - src/daemon/orchestrators/index.ts: drop stale comment block about Claude having a project-config side-effect (the fork's orchestrator long since moved to user-scope). - tests/mcp-create-chat-repo-path.test.ts (+4 tests): cover explicit repoPath, cwd default, full-body forwarding, and ENOENT fallback to homedir. Omitted (Windows-only hunks): - src/cli/commands/update.ts (shell: win32 for npm self-update) - src/daemon/routes/system.ts (shell: win32 for opencode probe) - src/daemon/orchestrators/{codex,gemini,kimi}.ts (shell: win32 tweaks) - src/lib/cli-detect.ts (SAFE_WIN_PATH regex + buildVersionSpawn) - src/lib/voices.ts (discoverNpmPrefixes Windows shell) - tests/cli-detect.test.ts (Windows-specific cmd.exe escape tests) Also omitted: - src/daemon/orchestrators/claude.ts: upstream shells out to `claude mcp add --scope user`. Fork already implements user-scope registration via direct ~/.claude.json patch (more robust — no dependency on `claude` binary in PATH at registration time, plus sweeps stale project-scoped entries). Keeping fork's version. - tests/claude-orchestrator.test.ts: tests the upstream shell-out approach the fork doesn't use. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address PR #1 bot review batch — sendError migration, abort+cancel, schema CHECKs, audit + orchestrate hardening Sweep of fixes for CodeRabbit + ChatGPT Codex review on PR #1. Grouped into one commit because the surface is broad but every change is small, review-driven, and verified together (typecheck clean, vitest 849/849, lint 0 errors). Routes — sendError vs errorResponse (CodeRabbit Critical/Major): - chats-from-pr.ts catch → sendError(reply, ...) so 5xx errors carry the right HTTP status instead of bare 200 + ok:false body. - voices.ts GET list / GET :id / POST / PUT / DELETE all migrated; DELETE handler gains the missing reply param. - Drop now-unused errorResponse imports in both files. Quickstart abort propagation (CodeRabbit Major): - pollChat fetch passes the signal so a SIGINT or timeout interrupts the in-flight request instead of waiting for the daemon's response. - 1500 ms inter-poll sleep wakes on abort instead of always blocking its full duration after the signal fires. - Timeout path now also POSTs /chats/:id/cancel (extracted shared `cancelRemote` helper), matching the SIGINT handler so timed-out runs don't leave the daemon reviewing in the background. start.ts best-effort openBrowser (CodeRabbit Major): - Both `chorus start` paths catch openBrowser rejection so a failing `open` doesn't fail the whole command when the daemon is already healthy. Matches scheduleAutoOpenBrowser's existing behaviour. Codex headless GitHub transport (CodeRabbit Major): - HeadlessSpawnOptions gains optional `transport` mirroring AgentSpawnOptions. - codex.buildHeadlessArgs flips network_access on for transport === "github", matching buildLaunchCommand. Previously headless GitHub runs couldn't reach github.com or call out via gh. CLI health auth-kind mapping (CodeRabbit Minor): - kindToStatus now maps auth_invalid and auth_missing to "auth_invalid" so Grok auth failures render the right cockpit CTA instead of "unknown". Voice-failure-tracker hasResetAt streak reset (CodeRabbit Major): - When the upstream promises recovery, also clear any prior strike counter. Pre-fix, permanent-fail → resetAt-fail → permanent-fail tripped the threshold on the first permanent strike instead of the second. Schema CHECK constraints (CodeRabbit Major): - schema.sql + connection.ts migrations add CHECKs on bypass_quota (0/1), tier ('low'/'medium'/'high'), and monthly_budget_usd (NULL or >= 0). Guards scheduler inputs at the DB layer for both fresh installs and migrated DBs. MCP createChat dead conditional spread (CodeRabbit Minor): - safeCwd() is the deliberate fallback per upstream contributor PR. Drop the dead `...(parsed.repoPath !== undefined …)` spread that just re-set the same value the unconditional `repoPath` field already sent. github-pr.ts ENOENT classifier (ChatGPT Codex Major): - classifyGhFailure now recognises Node's `spawn gh ENOENT` shape so the documented first-run path (paste PR URL before installing gh) returns the actionable gh_not_installed code instead of db_error. Claude orchestrator trailing newline (CodeRabbit Trivial): - registerClaudeMcpServer JSON write gains the trailing "\n" used by connectClaude, keeping ~/.claude.json byte-for-byte stable. runner-multiplex chat-scoped warning persistence (CodeRabbit Major): - cli_warning / cli_error events that arrive without a valid phaseKind (e.g. attached_files_invalid emitted before any phase starts) now skip phaseEvents.create instead of being coerced into a synthetic 'review'/'reviewer' row. The chatLogger path already captured the warning; live subscribers got it from the original onEvent. doer.ts answerFile init guard (CodeRabbit Major): - Wrap the initial fs.writeFileSync(answerFile, "") in try/catch so EACCES/ENOSPC at startup emits a cli_error (kind: answer_init_failed) with a usable CTA instead of bypassing the failure path and leaving the chat dir empty. cli-precheck kimi default_model gate (CodeRabbit Major): - Only enforce ~/.kimi/config.toml default_model when an actual kimi-cli credential file is present. moonshot voices routed via opencode are authed entirely by opencode and never touch ~/.kimi/ — hard-failing them here rejected healthy setups. audit.ts preset-load + id uniqueness (CodeRabbit Major): - loadPresetPrompt now wraps in try/catch and emits phase_failed (reason: preset_load_failed) instead of letting the promise reject after phase_start fires. - AuditItem.id uniqueness is enforced before persisting audit-output.json; duplicates emit phase_failed (reason: invalid_output) since orchestrate selection is id-keyed. orchestrate.ts checkout failure path (CodeRabbit Major): - Capture `git checkout <startingBranch>` result. On failure, push a failed manifest entry and emit phase_failed (reason: checkout_failed) instead of silently letting the next worker stack on top of the prior worker's branch and polluting diff stats. new/page.tsx PR flow stale repoPath (CodeRabbit Major): - handleStartFromPr no longer forwards the shared repoPath state. The input is cleared + disabled in reviewOnly mode but the state can still hold a stale value from a mode switch — never send it. Lint: react/no-unescaped-entities (CodeRabbit Minor): - Three apostrophes in JSX text escaped to ' (page.tsx ×2, run-checklist/index.tsx ×1). 0 errors remaining. orchestrate-manifest URL validation (CodeRabbit Nitpick): - Validate the `PR opened: <url>` href via new URL() and require http/https before rendering as an anchor; fall back to plain text on parse failure or weird scheme. Preset markdown H1 (CodeRabbit Minor, MD041): - architecture-review.md, de-slopify.md, engineering-review.md gain a top-level H1 to satisfy markdown lint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: PR #1 round-2 — Lineage parity, strict body validation, fence escape, Windows paths Addresses 4 MAJOR + 1 acknowledged issue from CodeRabbit's 2026-05-18 review batch: - voices.ts + db/voices.ts: extend Lineage enum with `openrouter`, `local`, `grok` so the route validators stop rejecting legitimate rows already supported by cli-precheck and the shim registry. Mirror in the DB schema (z.enum) and the VoiceUpsertInput union — they're three independent declarations of the same set, all need to track Lineage in agents/types.ts. - chats-from-pr.ts: tighten request-body validation. Truthiness checks let non-string truthy `url`/`templateId` (e.g. `{}` or `42`) slip through and fail deep inside parsePrUrl as opaque server errors instead of clean 400s. Added strict `typeof === "string" && trim().length > 0` plus optional yolo type check. - github-pr.ts: dynamic backtick fence around the diff body. Markdown/docs PRs frequently contain literal ``` fences; a fixed-width fence would close early and let the rest of the diff escape into the artifact prose, corrupting the prompt boundary for review-only chats. Now picks a fence one backtick longer than the longest run in the diff (min 3). - new/page.tsx: accept Windows absolute paths (`C:\repo`, `\\server\share`) alongside POSIX. The audit-a-repo tab was unusable on Windows because the UI hard-coded `startsWith("/")`, even though cli-detect / runtime-path / settings-transport already handle win32 server-side. Declined: CodeRabbit nitpick on formatBranchName (orchestrate.ts:124-133). chatId is a server-issued ULID (generateUlid in lib/db/chats.ts) — all- alphanumeric by construction — and branchPrefix already has a zod regex guard from commit e93ce00. No real injection vector. - pnpm exec tsc --noEmit — clean - pnpm exec vitest run tests/voices.test.ts tests/voices-route-validation.test.ts tests/github-pr.test.ts tests/db.test.ts — 99/99 passing Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: chorus-codes <info@chorus.codes> Co-authored-by: Julien Deudon <deudon.j@gmail.com> Co-authored-by: Lumina Mao <luminamao@mac.lan> Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-authored-by: Yura <yurahalych@gmail.com> Co-authored-by: Greg <7xshadowx7@gmail.com>

…iet (#6) * fix: cred detection + Claude MCP user-scope registration Three fixes from chorus-issues.md that prevent a freshly-installed chorus from finding the user's existing CLI credentials, so the daemon starts up cleanly on machines that already have Claude / Kimi / moonshot configured. #1: register Claude MCP at user scope. The chorus MCP entry now writes to the top-level `mcpServers` block in `~/.claude.json` (idempotent), and any stale chorus entry under the project-scoped `projects[homedir].mcpServers` is cleaned up. Previously the project-scoped registration was invisible to Claude Code launched outside that exact cwd. #2: cred-path fallbacks. When the anthropic file check misses (e.g. user authed via Claude Desktop, no `~/.claude/...` JSON), fall back to the macOS Keychain via `security find-generic-password -s "Claude Code-credentials"`. Added `~/.kimi/credentials/kimi-code.json` to the moonshot CRED_PATHS so users who authed through `kimi-code` aren't told to log in again. #3: kimi config-missing precheck. New layer-3 check parses `~/.kimi/config.toml` and surfaces a `config_missing` reason when there's no top-level `default_model` set — the CLI will silently pick whatever backend it likes, which is rarely what the user wants. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: reviewer fidelity, verdict surfacing, event/prompt isolation Seven fixes from chorus-issues.md covering the rest of the runner + MCP-surface issues found while reviewing PR #26 of foresight-app. #4: thread `repoPath` through reviewer subprocesses. `runReviewers` → `runReviewer` → `runReviewerHeadless` now accept the chat's repoPath and the reviewer's cwd switches to it when set, so `gh`, file reads, and sandboxed CLIs (Gemini) see the actual code instead of running in an empty per-reviewer scratch dir. #5: surface reviewer answer.md in MCP responses. New `readReviewerArtifacts` helper walks `~/.chorus/chats/<id>/round-N/reviewer-*/answer.md`, caps each at 16 KiB, sorts by (round desc, agent asc), and merges the result into `wait_for_chat` and `get_chat_status` payloads under `reviews`. Both the doer and reviewer `participant_done` events now carry `outputPath` so MCP clients can read the on-disk source of truth when they need more than the streamed tail. #6: bump phase_progress output tail from 500 B to 8 KiB. The 500-byte slice clipped reviewer summaries mid-word; full text remains on disk and is pointed to by `outputPath`. Affects both reviewer.ts and doer.ts. #7: tri-review verdict on `max_rounds_exhausted`. When the doer succeeded every round but reviewers kept saying request_changes through the round cap, chat_done now emits `status: completed, verdict: request_changes, reason: max_rounds_exhausted` with the last round's reviewer summary — previously misclassified as a generic doer failure. #8: refactor `CreateChatSchema` and `InvokePersonaSchema` to plain `z.object()` with per-field `.describe()`. The prior `.transform()` wrapped them in `ZodEffects` which strips the `properties` map from MCP introspection — clients saw an empty schema. Legacy `template` alias and the `code-review` default moved into a new `resolveTemplateId()` helper. #9: dedup `participant_done` at the multiplex layer. Same-slot fallbacks or parsers that emit `message_done` twice (the opencode parser historically does this) used to fan duplicate terminal events out to every subscriber; now keyed by `(phaseIdx, round, role, agent)` and later duplicates drop silently. #10: per-instance reviewer prompt isolation. Same-lineage instances (claude-code-2/4/5, etc.) share the chat dir tree at `~/.chorus/chats/<id>/round-N/reviewer-*/`; tool-using CLIs were wandering into a sibling's answer.md mid-flight and short-circuiting ("the review is complete" referring to a different agent's work). `buildReviewerAsk` now stamps an Independence directive when more than one reviewer slot exists, naming the slot tag and forbidding cross-slot reads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: replay chat_done from persisted verdict, not status The synthetic chat_done emitted when a terminal chat is re-attached derived `verdict` from `chat.status`, ignoring the `chat.verdict` column. Since the previous commit shipped the `max_rounds_exhausted` branch (chorus-issues.md #7), a chat can finish with `status='approved' verdict='request_changes'` — replay was clobbering that to `approved` on every page reload, hiding reviewer disagreement from the user. Use the persisted column when set; fall back to the old status-derived value only for pre-v0.8.27 rows where verdict is null. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: surface dropped attached_files + SSE backpressure; harden ship.ts Three audit follow-ups on the daemon side, all surfacing previously silent failures. attached_files: parseAttachedFiles in runner-multiplex.ts used to swallow JSON parse errors and run the chat with no attachments. Refactor to a tagged result (`empty` / `ok` / `invalid`); on `invalid` the runner logs and emits a `cli_warning` SSE so the cockpit + MCP clients see which chat lost its file list. SSE backpressure: when a subscriber's queue exceeds the 1000-line cap the multiplex used to silently drop the connection. Now writes one `error` frame with code `sse_backpressure` before close, and logs the queue length to daemon.log so an operator tailing logs can see when clients fall behind. gh pr create URL validation: ship.ts captured stdout's last line as the PR URL with no shape check; an empty/malformed stdout produced `{ok: true, prUrl: ''}` and the chat row recorded "shipped" with an unclickable link. Now matches against `^https://github.com/<owner>/<repo>/pull/<n>` before declaring success. detectGitContext parallelization: the five spawnSync probes (is-repo, remote, gh --version, gh auth, HEAD) ran sequentially at 60s each — worst case 360s before runner saw a result. Converted to async with a new `runAsync` helper, batched via Promise.all with a 15s per-probe cap; detectDefaultBranch's symref + three branch-existence checks likewise parallelized. detectGitContext is now async; the lone caller in runner.ts awaits it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: bound failure-summary regex; log malformed SSE frames participant-card.tsx: parseFailureSummary ran the multi-step regex chain over the full participant.answer string. Reviewer answers can be up to 256 KB; on every render that's a UI-thread block. Slice to the first 16 KiB before scanning — the failure-header block is always written at the top of answer.md by reviewer.ts/doer.ts, so the cap never loses signal. live-run-real/index.tsx: the SSE onmessage handler already had a try/catch around JSON.parse, but the catch was silent — a wire-format mismatch dropped events with no trace. Add a console.warn with a preview so devs notice schema drift in DevTools. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: github PR ingestion via gh CLI Adds src/daemon/github-pr.ts: parsePrUrl + fetchPrArtifact run gh pr view/diff plus review and issue comments in parallel, synthesize a Markdown artifact (description, comments capped at 50 newest each, diff capped at 200 KB UTF-8 safe), and classify gh failures into typed reasons. Exports runAsync from ship.ts so the new module can reuse the existing spawn+timeout helper instead of duplicating it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: extract createChatFromValidatedInputs helper Pulls the template lookup, artifact validation, chat row + opening phase event creation, and runner kickoff out of the POST /chats handler into a reusable helper. POST /chats now only handles its route-specific concerns (body shape, repoPath canonicalization, error response shaping). Sets up reuse from the upcoming POST /chats/from-pr endpoint without duplicating ~150 lines of validation logic. No behavior change — same template checks, same artifact rules, same kickoff path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: POST /chats/from-pr — start a chat from a GitHub PR URL Accepts { url, templateId, repoPath?, yolo? }, parses the PR URL, fetches PR meta + diff + existing comments via gh CLI, synthesizes a Markdown artifact, and creates the chat through the shared createChatFromValidatedInputs helper. gh failures map to typed reasons (invalid_url, gh_not_installed, gh_not_authed, pr_not_found, network_failure, unknown) so the cockpit can render actionable errors instead of generic 500s. Adds tests/github-pr.test.ts covering parsePrUrl edge cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: cockpit "GitHub PR" tab on /new Adds a Free-form / GitHub PR mode toggle on the new-chat page. PR mode swaps the prompt textarea for a URL input and routes through the new POST /chats/from-pr endpoint. Validates client-side that the chosen template is review-only before letting the user submit. createChatFromPr API client surfaces the daemon's typed PR meta (owner/repo/number/title/branches) on the response so callers can display PR context after the chat is created. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: review_pr MCP tool Exposes POST /chats/from-pr through MCP. Orchestrators (Claude Code, Codex, Cursor) can now hand chorus a PR URL and get reviewers running against it without going through the cockpit. Defaults templateId to review-only so a caller can pass just a URL. ReviewPrSchema is a plain z.object (not ZodEffects) so MCP clients can introspect required fields — same hazard documented on CreateChatSchema and InvokePersonaSchema. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: capture multi-identity CLI follow-up idea Idea note for running chorus against multiple paid accounts on the same CLI binary (work + personal Claude Code Max, etc.). Filed as follow-up after audit-presets + quota tiers ship — captures the env-override mechanism, proposed Identity primitive, and open questions on keychain CLIs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: schema for audit + orchestrate phases, voice tier, bypass_quota Adds the foundation for repo-pointed audit-and-orchestrate runs and the orchestrator's task↔voice tier matching. Template schema: - AuditPhase (kind: 'audit') — single reviewer voice + one of five preset lenses (de-slopify, monolith-breakdown, code-review, engineering-review, architecture-review). Output schema (AuditItemSchema, AuditOutputSchema) lives next to the phase shape so the structured-output adapter, scheduler, and cockpit checklist agree on the contract. - OrchestratePhase (kind: 'orchestrate') — array of worker voices, default branchPrefix `chorus/{chatId}/worker-{idx}` so each worker gets isolated git state. - templateRequiresRepo() helper for the cockpit's repo-picker gate. Voices: - Adds tier ('high' | 'medium' | 'low', default 'medium') and monthly_budget_usd (nullable) to the row schema, upsert input, and update input. Idempotent migrations on existing DBs. Chats: - bypass_quota INTEGER NOT NULL DEFAULT 0 — set on PR-review chats so the orchestrate scheduler runs every enabled voice at full capacity instead of tier-gating. Runner is stubbed for the new kinds: phase_done emit + continue, so templates that declare an audit/orchestrate phase before the runner logic lands don't crash. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: structured-output adapter for CLI voices Wraps an AgentShim's runHeadless with JSON-formatting prompt scaffold and a one-shot repair loop, returning typed data validated against a caller-supplied zod schema. Used by the upcoming audit phase (which needs typed AuditItem[] instead of free-form prose) and the orchestrate phase (worker results). Keeps each CLI lineage's existing headless transport — the adapter just owns the prompt-shape + parse-and-validate dance. Extraction strategy: prefer direct JSON.parse of finalText; fall back through fenced-block regex variants to a brace-to-brace slice. On parse or schema-violation, retry once with a repair prompt that quotes the validation error. Spawn errors short-circuit (the model never saw the prompt — repair would just retry the same failure). Tests cover happy path, fenced-block extraction, repair-loop success, repair-loop exhaustion, schema violation, and spawn error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cockpit): audit-a-repo tab + checklist approval component /new gets a third tab beside Free-form and GitHub PR. In audit mode the user picks one of five preset lenses (de-slopify, monolith-breakdown, code-review, engineering-review, architecture-review) and supplies an absolute repo path. Submit fires createChat with templateId=`audit-<preset>` — those built-in templates land with the audit-phase implementation. RunChecklist component lives at src/components/run-checklist/. It takes the AuditItem[] surfaced by the audit phase's blocking event and renders one row per item with a checkbox, complexity badge, rationale, and file list. Default state has every item selected; the user trims, then submits via the parent's onSubmit which JSON-encodes the selected ids into the existing /chats/:id/resume `answer` field. Wiring into the live-run UI lands with the audit phase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: PR-review chats bypass quota + tier surface on /voices PR-review chats automatically set bypass_quota=true so the orchestrate scheduler ignores voice.tier and runs the full fleet at maximum capacity — reviews are short, parallel, and the user wants the strongest opinion possible regardless of model tier. PUT /voices/:id now accepts tier ('high' | 'medium' | 'low') and monthly_budget_usd (non-negative or null), so the cockpit fleet page can label voices by capability for the orchestrate scheduler to route work against. Tests cover both new fields plus a chat round-trip asserting bypass_quota defaults false and persists when set. * feat: audit phase + 5 presets + audit-* templates Wires the audit phase end-to-end: - src/daemon/phases/audit.ts runs the structured-output adapter against the chosen preset, persists the parsed AuditItem[] to <chatDir>/audit-output.json plus raw model output to round-1/audit/output.md, and emits phase_progress with the items. - src/daemon/runner.ts replaces the audit/orchestrate stub: audit invokes runAuditPhase, flips chat status to blocked so the cockpit renders the checklist UI, and exits cleanly. Orchestrate keeps the no-op stub until step 5 lands. - 5 preset prompts (de-slopify, monolith-breakdown, code-review, engineering-review, architecture-review) frame what each lens looks for. The structured-output adapter handles JSON formatting; presets describe the audit lens only. - 5 audit-* templates (one per preset), each a 2-phase audit -> orchestrate shape with three default workers. Auto-loaded by seedBuiltinTemplates. - tests/audit-phase.test.ts covers preset-file presence and the audit-* template parse + shape contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: orchestrate phase + audit-resume wiring + tier-aware scheduler Wires the audit→orchestrate handoff: the cockpit POSTs the user's trimmed audit checklist to /chats/:id/resume, the resume handler cross-checks ids against audit-output.json, persists the selection, flips chat to drafting on the orchestrate phase, and re-fires the runner. The runner now starts at chat.current_phase_idx so a resumed chat lands directly on orchestrate. The new orchestrate phase walks the approved AuditItem[] sequentially (parallelism is an explicit non-goal for v1), picks a worker per item via the pure tier-aware scheduler, cuts a per-item branch, dispatches the worker via shim.runHeadless, captures git diff --stat, and persists orchestrate-manifest.json for the diff-apply UI to consume. The scheduler is a pure function with 9 unit tests covering tier matching, bypass override, disabled-voice skipping, empty pool, and unknown voice ids. Resume route has 10 tests exercising body validation, id cross-check, status gating, and the happy path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: orchestrate manifest UI + checkout/open-pr daemon routes - Run page reads audit-output.json + orchestrate-manifest.json on render - LiveRunReal renders RunChecklist while blocked w/ audit items, then swaps to OrchestrateManifest panel once orchestrate completes - New OrchestrateManifest component shows one row per worker w/ Checkout / Open PR buttons (per-row inline feedback, no global toast) - Daemon: GET /chats/:id/audit-items, GET /chats/:id/orchestrate-manifest, POST /chats/:id/workers/:idx/checkout (refuses on dirty tree), POST /chats/:id/workers/:idx/open-pr (gh pr create, bucketed failures) - OrchestrateManifestSchema added to template-schema.ts; route + UI parse via the same shape Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: harden resume race + branch validation + symlink TOCTOU + extractJson Address /freview findings on the audit + orchestrate flow: - Resume race (BLOCKER): two concurrent POSTs to /chats/:id/resume could both pass the `status=='blocked'` check and double-fire the runner. Guard with `getActiveRun` (catches the audit-finishing window before `.finally` clears the registry) and replace the status flip with an atomic `tryResumeFromBlocked` CAS conditional on `WHERE status = 'blocked'`. - Branch-name argument injection (BLOCKER): tighten zod regexes on `OrchestratePhase.branchPrefix` and `OrchestrateManifestEntry.branch` so values starting with `-` (or containing shell metachars) cannot flow into `git checkout` / `gh pr create` as flags. - Symlink TOCTOU on checkout + open-pr (NON-BLOCKER): re-realpath `existing.repo_path` before passing to execFile cwd, mirroring the rerun-path pattern. Returns a structured validation error if the path no longer resolves. - extractJson Path 4 (NON-BLOCKER): try `{...}` and `[...]` slices independently and prefer the longer parse, so prose like "mentions [stuff] before {object}" extracts the object instead of the bracket. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: prod CJS build — drop import.meta + copy presets to dist Two issues blocked `pnpm build:server`: - `audit.ts` used `import.meta.url` for module-relative path resolution, but the server tsconfig compiles to CJS where `import.meta` is a syntax error. Replaced with `__dirname`, which works in both the compiled dist (native CJS) and tsx-driven dev (tsx ≥4 shims it in ESM mode). - The `build:server` script copied `schema.sql` to dist/ but missed the preset markdown files in `src/daemon/presets/`. The audit phase's `loadPresetPrompt` resolves relative to `__dirname`, so a published install was hitting ENOENT on every audit run. Extended the copy step to mirror the preset directory. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: fold upstream T1+T2 fixes back into fork (12 commits) (#2) * feat(cli): add diagnose command + crash-hook Bundles two upstream changes that ship a self-service triage path for chorus users hitting opaque failures: - `chorus diagnose` walks the install, daemon, recent failed chats, voice health, and produces a sharable bug report. - Crash hook captures uncaught exceptions in the CLI and writes them to a crash log alongside instructions to attach during a bug report. Folded back from upstream chorus-codes/chorus: 7ea712b feat: chorus diagnose command + crash hook for bug reports (#1) 4a5ea20 fix(diagnose): realpath bin path + filter Next.js SSE noise (#4) Co-Authored-By: chorus-codes <info@chorus.codes> * feat(cli): add quickstart self-test command `chorus quickstart` runs a 30-second activation flow that verifies the daemon comes up, the SQLite DB initializes, and a minimal chat round-trips end-to-end. Aimed at first-run users who want to know "is this thing actually working" before authoring a template. Folded back from upstream chorus-codes/chorus: 56610cf feat(cli): chorus quickstart — 30-second activation self-test (#30) Co-Authored-By: chorus-codes <info@chorus.codes> * fix(cli): use dynamic import for open package (Node 22 ERR_REQUIRE_ESM) The `open` package and `chokidar` are both ESM-only as of recent versions. On Node 22 (the daily-driver target) static `require()` calls into them throw ERR_REQUIRE_ESM and crash the CLI at boot. Switch to dynamic import in: - src/cli/commands/start.ts (open browser after boot) - src/cli/open-browser.ts (new helper) - src/cli/index.ts (route open import) - src/daemon/output-watcher.ts (chokidar file watch) Includes upstream's post-merge hardening: the setTimeout that triggers the browser-open no longer wraps an async callback bare, so a missing default browser doesn't surface as an unhandled rejection. Folded back from upstream chorus-codes/chorus: e8ca2ee fix(cli): dynamic import for open package (#14) dcd1837 fix: post-merge hardening for #14 (start.ts portion only; cli-precheck.test.ts portion ships with the Keychain fix) Co-Authored-By: Julien Deudon <deudon.j@gmail.com> Co-Authored-By: chorus-codes <info@chorus.codes> * feat(cockpit): seed empty round-1 so QUEUED renders from t=0 Before: when a chat starts but no reviewer has produced an event yet, enrichRounds returned an empty rounds array and the live-run page showed nothing for several seconds — the user couldn't tell whether their chat had launched. After: seed a synthetic round-1 with QUEUED placeholders for every expected participant so the page renders the per-reviewer cards immediately. Real events overwrite placeholders as they arrive. Folded back from upstream chorus-codes/chorus: 53e8fb6 feat(cockpit): seed empty round-1 so QUEUED placeholders render from t=0 (#2) Co-Authored-By: chorus-codes <info@chorus.codes> * feat(daemon): runtime fallback-collision dedup across reviewer slots When two reviewer slots both fall through their per-slot chains to the same template-level fallback target (common case: every slot ends in anthropic/claude-sonnet-4-6), both used to dispatch the same (lineage, model) in parallel — wasted cost and the lineage diversity that's the point of multi-LLM peer review collapsed. Build-time dedup (template-fallback.ts) couldn't catch it because each slot only knows about other slots' PRIMARIES, not their fallback chains. Fix: new per-chat/per-round (lineage, model) registry. reviewer-driver tryClaim's before each chain attempt and releases in a finally. On collision, return null + emit cli_warning(reason='fallback_collision') so runWithChainFallback advances to the next entry and the cockpit can show why the slot skipped. Ported into fork's reviewer-driver.ts surgically so the verdict-isolation refactor (2a2cde2) and per-slot repoPath threading stay intact. Folded back from upstream chorus-codes/chorus: c4751fe feat(daemon): runtime fallback-collision dedup (#3) Co-Authored-By: chorus-codes <info@chorus.codes> * fix(daemon): write REVIEWER FAILED summary on pre-spawn failure Before: when a reviewer's precheck fails (e.g. underlying CLI not installed) or the chat is cancelled while the slot is queued for a CLI semaphore slot, runReviewer used to return null silently — leaving NO on-disk participant directory. The cockpit's enrich-rounds loop then couldn't reconcile the synthesised template slot against any real participant, so the card sat at "Queued — waiting for an open slot." forever and the actual error was invisible. Reproduction: install chorus on a host with only one CLI on PATH (e.g. just claude-code), open a template that includes lineages requiring codex/gemini/kimi, fire it. Every reviewer card stayed "Queued" — chat never visibly progressed even though it was already done failing. Fix: - Create the reviewer dir BEFORE the precheck runs. - Add a writePreSpawnFailure helper that writes a `## REVIEWER FAILED` summary in the canonical format (Kind / Lineage / Model / message) that the cockpit's `parseFailureSummary` already understands. - Wire it into the precheck-failed and cancelled-while-queued paths. Card now transitions out of pending and shows the actual error (cli_missing, cancelled, ...). Folded back from upstream chorus-codes/chorus: afc59cc fix(daemon): REVIEWER FAILED summary on pre-spawn failure (#26) Co-Authored-By: chorus-codes <info@chorus.codes> * feat(voices): auto-disable on persistent quota_exhausted + lsof timeout Real pain (upstream #11): a Pro Gemini model on a Flash-only account fails every chorus run with "exhausted your capacity on this model" — but Gemini doesn't return a resetAt because the model isn't going to become available for that account. Without auto-disable, the runner keeps picking the dead voice on every chat and the user keeps seeing the same opaque error. Voice auto-disable: - New src/lib/voice-failure-tracker.ts records per-voice consecutive quota_exhausted strikes in a settings counter. - Trigger: 2 consecutive strikes WITH no resetAt → set voices.enabled=false + disabled_reason='auto_quota'. - Counter resets on participant_done success; rate-limit strikes (hasResetAt=true) bypass the counter entirely so a transient 429 + a later permanent failure can't trip the threshold on the first permanent strike. - Wired into reviewer-driver alongside recordHealth; emits a cli_warning(reason='voice_auto_disabled') so the cockpit can show a one-line explanation. - VoiceDisabledReason union gains 'auto_quota' (schema column was already TEXT — no migration). Lsof timeout (upstream #12): - findPidsOnPort and findPidsOnPortWithSudo now bound execSync / execFileSync to 3s, so a slow-but-functional lsof on a loaded macOS box doesn't hang chorus boot. 3s leaves headroom while still bounding the hang case. Ported into fork's reviewer-driver.ts tmux pollHandle + success path. voices.ts disabled_reason union extended alongside fork's voice-tier column. Folded back from upstream chorus-codes/chorus: 4f6becc v0.8.30 — voice auto-disable (#11) + lsof timeout (#12) (#17) Co-Authored-By: chorus-codes <info@chorus.codes> Co-Authored-By: Lumina Mao <luminamao@mac.lan> * fix(daemon, schema): codex isolation + template-schema validation Two issues caused chats to fail opaquely at run-start: CODEX ISOLATION (#10, #16) The user's ~/.codex/config.toml may declare MCP servers, plugins, or notification hooks. In headless `codex exec` those integrations have caused codex to hang or cancel mid-call — two independent reproductions: codex as our reviewer (#10) and codex as MCP client of chorus (#16). Add --ignore-user-config to every headless codex argv. Extracted to a pure `buildHeadlessArgs(opts)` so the argv shape is unit-testable. TEMPLATE VALIDATION (#15) `reviewer.require > candidates.length` used to surface as "Job moves immediately to failure upon Start press" — the runner queued, failed to grant enough slots, and emitted an opaque chat-failure. Same for `require > distinct lineages` when crossLineage:true. Both now caught at TemplateSchema.parse() time with a clear error message the user can fix before the run starts. ReviewerSchema.superRefine() additions slot in cleanly alongside the fork's audit/orchestrate phase schema work — both are additive constraints on the same ReviewerSchema object. Folded back from upstream chorus-codes/chorus: 8ed970b fix(daemon, schema): codex isolation + template validation Co-Authored-By: chorus-codes <info@chorus.codes> * fix(runner): honour iterate.onDisagreement accept-doer/escalate The template schema, cockpit dialog, and SPEC-D-templates have always exposed three values for iterate.onDisagreement — 'continue', 'escalate', 'accept-doer' — but the runner only honoured 'continue'. Picking the other two from the cockpit form was a silent no-op: chats fell through to phase_failed with 'doer_failed_all_rounds' regardless. This wires both new branches into the round loop and the terminal chat_done emission: - 'accept-doer': after maxRounds without consensus, mark doerSucceeded and continue. The chat carries on (subsequent phases, ship, approval) as if reviewers had agreed on the doer's last answer. - 'escalate': halt with status='failed' but verdict='request_changes' and error='escalated_on_disagreement', so cockpits can render "reviewers disagreed, needs human" distinctly from "doer broke." Policy table extracted into a pure decidePhaseOutcome() helper so the 3 × 2 input matrix (policy × disagreement-in-last-round) is unit-tested without standing up the full runChat scaffold. Gated on disagreementInLastRound (reset at top of every round + on doer-crash path) so a partial / empty doer answer can never be silently "accept-doer"'d as final. Preserves the fork's existing standardPhaseRoundsExhausted #7 surfacing for the 'continue' path; the 'escalate' path takes precedence with its own distinct chat_done. Upstream PRs #49, #50 (commit 67572e9). Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(cli-precheck): cover macOS Keychain fallback for Claude Code v2 The fork already implements the Keychain fallback in cli-precheck (hasDarwinKeychainEntry). This adds the missing test coverage: - passes when no cred file but keychain entry exists - blocks when no cred file and no keychain entry - skips keychain check when cred file exists (fast-path preserved) - does not consult keychain for non-anthropic lineages vi.mock('node:child_process') uses the importOriginal spread pattern so spawn / exec / etc. keep their real implementations — a bare module replacement would silently break any sibling test that imports from child_process. Upstream PRs #7, #8, plus the dcd1837 test-mock hardening. Co-Authored-By: Yura <yurahalych@gmail.com> Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cockpit): derive candidatesWithModels from snapshot's candidates field Daemon-side TemplateSchema only carries `candidates` on each ReviewerRule. The cockpit Template type expects `candidatesWithModels` populated — enrich-rounds iterates that field to build slot→model mappings for run-page cards. When fromRow parsed template_snapshot and cast it to Template, the cast was a TypeScript lie: at runtime the parsed object lacked candidatesWithModels, enrichRounds iterated zero reviewer slots, and no model name reached the cards (badge appeared empty). Derive candidatesWithModels at the parse seam (chats.fromRow) so the cockpit's Template contract is honoured regardless of which path produced the data. Idempotent — if a future daemon ever serialises the field directly, that wins. Persona forwarded if present. Audit- phase single-voice reviewers (no candidates array) are skipped via a runtime narrow. Upstream PR #6 (chorus-codes/chorus@ac0c7fd). Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(diagnose): capture failure context — CLI smoke, voice health, recent failed chats Extends `chorus diagnose` with three signals that triage the most common breakage modes: - **CLI smoke**: spawn `<bin> --version` per detected CLI with a hard 2s SIGKILL timeout (wrapper scripts may trap SIGTERM). Distinguishes `timedOut` from non-zero exit so the report can tell hangs apart from crashes. - **Voice health**: counts `enabled=0` voices grouped by `disabled_reason` ('user' vs 'auto_missing' vs 'quota_exhausted'). Added `idx_voices_enabled` so the `WHERE enabled = 0` scan stays cheap as the table grows. - **Recent failed chats**: last 5 chats with `status='blocked'` plus the errored participants pulled from `~/.chorus/chats/<id>/round-*/<part>/_attempts.jsonl`. Only `errorMessageBytes` is exposed — raw error text never leaves the user's machine. `$HOME` is redacted from any embedded path strings via `redactHomePaths`. Adapted from upstream chorus-codes/chorus#19 (0666dca). Preserves the fork's existing diagnose shape and adds tests for smokeOneCli / readLatestAttempt / formatReport rendering of the three new sections. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(diagnose): include no_review in recent failed chats query The recent-failed-chats section was meant to surface per-participant failure context from `_attempts.jsonl`, but the WHERE clause only covered 'failed', 'blocked', 'cancelled'. The most common failure shape — every reviewer down for missing CLI / auth / quota — ends the chat in 'no_review', which was being silently filtered out. So the exact case the section exists to diagnose returned an empty list, forcing users back into manual log collection. Adds 'no_review' to the IN-list and a regression test that asserts both the status and a quota_exhausted errorKind render in the report. Addresses chatgpt-codex review P2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: chorus-codes <info@chorus.codes> Co-authored-by: Julien Deudon <deudon.j@gmail.com> Co-authored-by: Lumina Mao <luminamao@mac.lan> Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Yura <yurahalych@gmail.com> * feat: fold upstream Grok + Local LLM + Keychain dual-probe (4 commits) (#3) * feat(grok): detect Grok Build (xAI) + Level 1 orchestrator Adds Grok Build CLI to detection, onboarding picker, /connect card, diagnose smoke, init listing, and doctor labels. Grok auto-picks chorus MCP from ~/.claude.json (verified empirically via `grok inspect`) — no separate MCP wire needed. The grok orchestrator reports connected=true when both the binary is detected AND chorus is wired in ~/.claude.json (either top-level mcpServers or any project-scoped mcpServers entry). connect() is a no-op that points users at `chorus connect claude` if claude hasn't been wired yet. Quickstart filters CLIs to those with shims, so grok-cli being detected first no longer breaks the doer-pick flow. The cliToLineage map remains the source of truth for reviewer-capable CLIs. `docs/integrating-a-new-cli.md` captures the full Level 1/2/3 integration playbook for future CLIs — written while doing this so the steps are tested. Adapted from upstream chorus-codes/chorus#44 (6a00b00). No conflicts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(local): add Local LLM HTTP shim for OpenAI-compatible endpoints Adds a `local` lineage that dispatches chat completions to any OpenAI-compatible HTTP endpoint (Ollama, llama-swap, LM Studio, vLLM, or anything that speaks `/v1/chat/completions`). No external subscription or CLI binary required — only a running local inference server. Configuration: save a JSON secret under key `local` via Settings → Local LLM: {"base_url": "http://127.0.0.1:11434/v1", "api_key": ""} Model ids may use a `local:` prefix (e.g. `local:llama3`) which the shim strips before dispatch, or bare model names directly. When no secret is saved, falls back to Ollama's default port. Wiring sweep (extends every exhaustive enum / Record so templates can declare local voices without Zod errors): - src/daemon/agents/local.ts — new HTTP shim with JSON.parse guard on the secret (yields a typed `config_parse` error event for malformed secrets instead of throwing inside the generator) - src/daemon/agents/index.ts — register localShim, `local:` prefix routing in pickShimForVoice, add to isHttpDispatchedShim - src/daemon/agents/types.ts — `local` in Lineage - src/lib/template-schema.ts — `local` in both lineageEnum + reviewerLineageEnum - src/lib/cli-health.ts — `local` in CliLineage + ALL_LINEAGES - src/lib/cli-precheck.ts — empty CRED_PATHS, LOGIN_HINT, skip the file probe (same pattern as openrouter — auth lives in secrets table) - src/lib/cockpit-types.ts — `local` in ReviewerLineage - src/lib/lineage-maps.ts — `local` in DaemonLineage, UILineage, every label/dot/brand map; UI_LINEAGE_DEFAULT_MODEL[local] = "" (model IDs are endpoint-specific). Teal dot distinguishes local from openrouter's cyan - src/components/phase-editor/constants.ts — LINEAGES list, DAEMON_TO_COCKPIT_LINEAGE - src/components/template-dialog/constants.ts — COCKPIT_TO_DAEMON, DAEMON_TO_COCKPIT, DAEMON_DEFAULT_MODEL, FALLBACK_LINEAGES Adapted from upstream chorus-codes/chorus#41 (716fa3a). The bundled upstream commit also included Keychain dual-probe (#38) and fallback-registry hold-on-success (#42) — those land in follow-up commits in this PR so each concern is reviewable independently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Greg <7xshadowx7@gmail.com> Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com> * feat(grok): Level 3 shim — full reviewer dispatch (happy-path unverified) Promotes Grok Build from Level 2 (consumer-only) to Level 3 (full reviewer shim). Chorus can now dispatch to grok-build as a doer or reviewer in any template. What's verified (empirically): - Detection, headless-mode invocation pattern (`grok -p ... --output-format streaming-json --yolo --max-turns 1`), error event shape, exit-code semantics - Failure path: free-tier auth produces clean quota_exhausted (SuperGrok Heavy subscription required) → voice auto-disables after N strikes - All UI surfaces (model boxes, template-editor lineage picker, run-page participant card, cli-status-panel, onboarding picker, connect orchestrator) What's specced but not run live (needs SuperGrok Heavy): - Happy-path streaming-json text/end event parsing (followed `~/.grok/docs/user-guide/13-headless-mode.md` spec) - Token/cost accounting — Grok doesn't surface usage in end event; estimateCostUsd returns 0 New files: - src/daemon/agents/grok.ts — shim with `--max-turns 1` headless args - src/daemon/agents/parsers/grok.ts — streaming-json + stderr parser - tests/grok-parser.test.ts — 18 cases covering happy / error / robustness Lineage sweep (xai daemon lineage was already a legacy alias to opencode — uses fresh `grok` daemon lineage to avoid colliding with that mapping; old YAML with `lineage:xai` still routes to opencode): - Lineage / CliLineage / ReviewerLineage / DaemonLineage / UILineage - LINEAGE_LABEL / LINEAGE_DOT / UI_LINEAGE_* / UI_LINEAGE_BRAND - UI_LINEAGE_AVAILABLE_MODELS.grok = ['grok-build'] - UI_LINEAGE_DEFAULT_MODEL.grok = 'grok-build' - template-schema lineageEnum + reviewerLineageEnum - DB voices row schema (additive — old rows still validate) - phase-editor LINEAGES + DAEMON_TO_COCKPIT_LINEAGE - template-dialog COCKPIT_TO_DAEMON + DAEMON_TO_COCKPIT + DAEMON_DEFAULT_MODEL + FALLBACK_LINEAGES - cli-status-panel + live-run-real helpers - error-detector auth-prompt regex (SuperGrok signature on its own branch ABOVE the generic auth regex — classifies to quota_exhausted, not auth_invalid) Voice seeding: grok-cli registered in SINGLE_MODEL_CLIS — auto- creates the grok-cli voice (id=grok-cli, lineage=grok, model_id=grok-build) on first daemon boot when the binary is detected. Auth flow: ~/.grok/auth.json file probe OR GROK_CODE_XAI_API_KEY env short-circuit. Both verified in tests/cli-precheck.test.ts. Daemon won't spawn grok without one or the other present — prevents the browser-OAuth flow from hanging headless dispatch. Total tests: 821 → 842 (+21). Adapted from upstream chorus-codes/chorus#46 (f9dfba5). Conflicts resolved by taking the union of fork's `local`-extended enums and upstream's `grok`-extended enums (every Record / z.enum had to be extended in both dimensions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com> * fix(cli-precheck): macOS Keychain dual-probe — also check "Claude Code" service Claude Code v2.x stores OAuth credentials under two service names depending on the auth flow: - `Claude Code-credentials` — Pro/Max OAuth via `claude login` - `Claude Code` (no suffix) — API-key auth + some Console-account flows The previous single-service probe regressed to auth_missing for API-key users on darwin. Refactor hasDarwinKeychainEntry to accept string | string[], iterate candidates, short-circuit on first match. Each probe stays bounded to 1.5s so a misconfigured keychain can't stall every spawn. Refs upstream issue #38 / commit 716fa3a. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: PR review — local in voices enum, AGENT_TO_LINEAGE for grok/local, separate cred-precheck vs semaphore bypass Addresses bot review on PR #3: - Sourcery P2 (src/lib/db/voices.ts): VoiceRowSchema and VoiceUpsertInput only allowed `grok` in the new-lineage slot; `local` voices upserted via the (future) Local LLM connect flow would have failed Zod validation at runtime. Add `local` to both the enum and the union. - Codex P2 (src/app/api/run-artifacts/[chatId]/route.ts + src/app/runs/[runId]/page.tsx): AGENT_TO_LINEAGE did not map `grok-cli` → `grok` nor `local` → `local`, so a real Grok or Local participant directory (`reviewer-grok-cli-N`, `reviewer-local-N`) resolved to a bogus lineage and rendered as an unbranded extra card while the placeholder slot stayed pending. - Codex P2 (src/daemon/agents/index.ts + src/daemon/runner/{doer,reviewer}-driver.ts + src/lib/settings/concurrency.ts): the daemon used a single predicate `isHttpDispatchedShim` for two unrelated decisions — bypassing the CLI-credential precheck AND bypassing the local-CLI semaphore. That was safe for OpenRouter (truly remote) but wrong for the Local LLM shim, whose default endpoint is Ollama on 127.0.0.1: N concurrent reviewers + a doer can thrash VRAM/RAM on consumer hardware. Split into `isHttpDispatchedShim` (kept for cred-precheck bypass) and `bypassesLocalCliSemaphore` (only openrouter). Add `grok-cli` and `local` to CLI_LINEAGES with conservative per-CLI defaults (grok-cli matches gemini at 2; local defaults to 1, bump in /settings if your endpoint multiplexes). Tests: 845 pass (unchanged), typecheck clean. * fix: PR review — CodeRabbit pass (docs/Grok level, init+quickstart+local edges, regex, tests) Addresses CodeRabbit's first batch of review comments on PR #3: - docs/integrating-a-new-cli.md: contradictory level for Grok — line 3 said "detection-only", line 15 said level 2, line 302 said level 3. Normalize to level-3 (the shim ships in this PR) and note that the level-2 orchestrator coexists for the consumer-side wiring. - src/cli/commands/init.ts: `--connect grok` was rejected because the local Name union, ALL_NAMES list, and the `--connect` option help text omitted 'grok' even though detection labels and OrchestratorName already accepted it. Adding 'grok' to all three. - src/cli/commands/quickstart.ts: the "install one of …" guidance printed when no CLIs are detected still listed only 5 — extend to Grok CLI to match the dispatchable set. - src/daemon/agents/local.ts: * Empty `base_url` (e.g. user saved settings with an empty box) was passed through `??` as the URL and surfaced as an opaque fetch error; treat empty / whitespace-only as unset and fall back to DEFAULT_BASE. Strip trailing slashes while at it. * Trailing SSE payload was dropped when the server closed without a final blank-line delimiter (older Ollama, some vLLM configs) — the last text_delta could silently disappear, truncating answers. Extract event-dispatch + payload-extract into local helpers and flush the residual buffer after the read loop exits. - src/lib/cli-detect.ts: grok regex documented "name OR bare-version" but only matched the name. Add the bare-version alternative; the basename guard already prevents cross-vendor matches. - tests/grok-parser.test.ts: 4 cases narrowed event[0] under `if (events[0].type === 'error')` without a prior `expect(...).toBe` on type — a non-error event silently skipped the inner assertions. Add explicit type expectations before the narrowing. Tests: 845 pass (unchanged), typecheck clean. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Greg <7xshadowx7@gmail.com> Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com> * feat: fold upstream contributor stack — repoPath default + CRLF persona parser (#4) Folds the cross-platform pieces of upstream commit 781bc42 ("Contributor stack: claude orchestrator + repoPath + Windows spawn (#39)") into the fork, intentionally omitting Windows-specific hunks. Included: - src/mcp/tools.ts: add safeCwd() helper + default `repoPath` on create_chat to safeCwd() when caller omits it. Previously the daemon fell back to its own cwd (packageRoot), which caused relative file paths in `files: [...]` to silently resolve to the chorus install dir and miss. MCP servers spawned by Claude Code / Codex / Gemini inherit the host's cwd (= the user's project), so safeCwd() lands at the right path automatically. safeCwd() also catches ENOENT from process.cwd() and falls back to homedir. - src/lib/personas.ts: normalize CRLF → LF in the frontmatter parser so persona .md files checked out with Windows line endings don't fail `missing YAML frontmatter`. Cross-platform safe. - src/daemon/orchestrators/index.ts: drop stale comment block about Claude having a project-config side-effect (the fork's orchestrator long since moved to user-scope). - tests/mcp-create-chat-repo-path.test.ts (+4 tests): cover explicit repoPath, cwd default, full-body forwarding, and ENOENT fallback to homedir. Omitted (Windows-only hunks): - src/cli/commands/update.ts (shell: win32 for npm self-update) - src/daemon/routes/system.ts (shell: win32 for opencode probe) - src/daemon/orchestrators/{codex,gemini,kimi}.ts (shell: win32 tweaks) - src/lib/cli-detect.ts (SAFE_WIN_PATH regex + buildVersionSpawn) - src/lib/voices.ts (discoverNpmPrefixes Windows shell) - tests/cli-detect.test.ts (Windows-specific cmd.exe escape tests) Also omitted: - src/daemon/orchestrators/claude.ts: upstream shells out to `claude mcp add --scope user`. Fork already implements user-scope registration via direct ~/.claude.json patch (more robust — no dependency on `claude` binary in PATH at registration time, plus sweeps stale project-scoped entries). Keeping fork's version. - tests/claude-orchestrator.test.ts: tests the upstream shell-out approach the fork doesn't use. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: pr-babysit design sketch (judge workflow + state machine) Three-phase delivery plan for moving the PR babysitter loop out of Claude Code and into the chorus daemon. Covers GH App + webhook architecture, the judge phase (validity/category/confidence + shadow judge pattern), fix routing rules (trivial/targeted/architectural → Kimi/Sonnet/Opus), circuit breakers, merge gate, multi-PR coordination, and proposed DB schema. Design only — no code in this commit. Five open questions left for team decisions in §"Open questions for the team". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: prime doer/reviewer prompts with AGENTS.md + CLAUDE.md When a chat carries a repoPath, read AGENTS.md / CLAUDE.md from the repo and prepend them inside a <project_guidelines> fence (between the persona block and the phase header). Same TOCTOU + fence-breakout defences as the persona/attached-file readers: lstat-rejects symlinks, strips </project_guidelines> from contents, truncates each file at 16 KB with a visible marker. Lets users carry project conventions into every doer + reviewer turn by editing a file the rest of their AI tooling already reads, without adding a new chorus-specific storage layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: verify phase — exec package.json chorus.verify, judge with reviewer Splits verify out of the StandardPhase shape into its own VerifyPhase (no doer, reviewer required). Reads `chorus.verify` from package.json, runs it via execFile in repoPath with a configurable command timeout (default 5 min, max 30 min), captures stdout/stderr/exit, and feeds the fenced artifact through the existing runReviewers flow. Env is scrubbed to PATH/HOME/LANG/LC_ALL/NODE_ENV so a `chorus.verify` script can't leak inherited credentials into the artifact. Output streams cap at 64 KB each with a visible truncation marker. Timeout detection matches both ETIMEDOUT and (killed && SIGTERM) shapes — node sometimes only sets the signal. The artifact lands at round-1/doer-verify-runner/answer.md so the cockpit renders it identically to a doer answer. A phase_progress event with kind="verify_command" surfaces the command-level outcome (exitCode, timedOut, duration) without needing a brand-new event type through the SSE multiplex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: TDD loop — verify failure re-prompts named feedback phase doer Verify phase gains optional `feedbackPhase` + `maxIterations` (default 5, max 20). On verify failure, the runner re-fires the named phase's doer through `runDoer` with the verify output threaded in via `priorRoundFeedback` — same hook a normal disagree-iterate loop uses, so the doer sees the failure in the slot it already knows how to act on. Loops until verify passes or the cap is hit. Reviewers only run on the FINAL iteration (success or final failure); intermediate iterations skip the reviewer pass because exit code is the loop signal and asking the reviewer N times to judge the same class of failure would just burn tokens. Iterations write to round-1001, round-1002, … (TDD_ROUND_OFFSET=1000) so the synthetic TDD-loop round dirs can't collide with the original feedback phase's rounds in the same chat dir. Misconfigured templates (feedbackPhase points at a non-existent or non-standard phase) fail loudly at the top of the verify phase, before the first command run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: babysit DB — jobs + decisions tables, query helpers, 25 tests Foundation for the PR-babysit autonomous review loop (Phase A of docs/pr-babysit-design.md). Two tables: - babysit_jobs: one row per (repo, pr_number) under review, state-machine tracked (idle → judging → fixing → verifying → pushing → quiet_check → merged | escalated). UNIQUE (repo, pr_number) prevents double-registration. ended_at auto-stamps on first terminal transition and is sticky. - babysit_decisions: append-only audit trail of every judge call. Two-stage insert — judge writes validity/category/confidence/outcome=NULL, the fix runner stamps outcome (+ commit) when it resolves. getAttemptCount drives the per-comment circuit breaker (same comment hash flagged N+ times → stop trying, escalate). Schema lives in schema.sql for fresh-DB init AND as idempotent CREATE TABLE IF NOT EXISTS in connection.ts so DBs that pre-date this version pick the tables up on next boot (matches the personas/voices migration pattern). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: babysit comment fetcher — gh CLI pull + author classify + sha256 hash Pulls PR review (line-anchored) + issue (conversation) comments via `gh api`, normalizes them into the shape the babysit judge consumes: - author classification: recognises CodeRabbit / Sourcery / Greptile / Codex by login regex; falls back to GitHub user.type=Bot + [bot] suffix for unknown bots. Humans always come through as isBot=false / bot=null. - sha256(body) keyed so the per-comment circuit breaker can recognise "the same bot re-flagged the same exact text" across polling ticks. - partial-data tolerance: if one of review/issue endpoints fails we still return what we got from the other (a 500 on one shouldn't blank the whole tick). Only when BOTH fail do we surface a typed reason. - `since=` parameter so the polling loop doesn't re-hash every comment on every tick. 16 tests covering author classify, sha256 stability, gh shellout via a fake `gh` on PATH, partial-failure, auth/404 classification, since arg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: babysit judge — classify PR-bot comments + pure action router Reads one PR comment + diff context, asks the model to classify it as valid/invalid/partially_valid/unsure with a category from a fixed menu (apply-trivial | apply-targeted | apply-architectural | reply-disagree | reply-ack | defer-to-human) and a confidence score in [0,1]. Three pieces: - buildJudgePrompt(comment, ctx): pure prompt construction. Includes PR metadata, comment body, anchored code snippet, and (crucially) prior decisions on the same comment hash — so re-judgements after a failed fix tilt toward reply-disagree rather than re-trying the same fix. - judgeComment(opts): drives requestStructured against the JudgeOutputSchema, flags judgements below the 0.7 confidence threshold as belowThreshold. - decideAction(judgement, args): PURE routing function. Maps (judgement, attemptCount, belowThreshold) → fix/reply/escalate/skip. State machine in babysit/runner.ts (next session) stays a thin dispatcher. Routing rules in priority order: per-comment cap → confidence threshold → defer-to-human → reply-* → apply-* (with invalid/unsure self-correction to escalate, since acting on a comment we judged invalid is incoherent). 20 tests: prompt composition (bot vs human, snippet, prior decisions, multi-line bodies, threshold mention, full category menu) + routing table (every category × every priority rule). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: babysit MCP tool + daemon registrar + pr-babysit preset Phase A MCP entry point for the PR-babysit loop. - `mcp__chorus__babysit_pr`: registers a PR for autonomous bot-comment judging. Idempotent — re-calling with the same URL returns the existing job without resetting state mid-flight. - Daemon routes: POST /babysit/jobs — upsert idle job GET /babysit/jobs — list (filters: ?active=true, ?state=…) GET /babysit/jobs/:id — single job + recent decisions - `templates/pr-babysit.yaml`: declares the judge roster (Haiku primary, Sonnet fallback). Validates against TemplateSchema as a `review_only` phase so seedBuiltinTemplates loads it cleanly; the babysit runner (next release) reads `phase.reviewer.candidates` for model selection but doesn't drive this phase through runner.ts. 13 route tests covering happy path, idempotent re-register, missing/ malformed URL, state filter, job-with-decisions detail view. MCP wrapper schema added to tools.ts. Note: src/daemon/index.ts diff is mostly Prettier rewriting single→double quotes after my import addition; the real semantic change is the two lines wiring registerBabysitRoutes into registerAll(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: babysit GH App auth — RS256 JWT + installation token cache GitHub App auth bundle for the PR-babysit loop. Two-tier model: mint a 9-min RS256 JWT from the App private key (Node built-in crypto, no jsonwebtoken dep), then exchange it for a 1-hour installation token cached in-memory with a 5-min refresh buffer so we never present a token about to expire. Config persisted as a single global row in secrets (provider= github_app, kind=gh_app, value=JSON of appId/privateKey/ webhookSecret) — chorus is single-tenant, the App is owned by the daemon operator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: babysit webhook HMAC verify helper Pure-crypto helper for verifying GitHub's X-Hub-Signature-256 against the raw request body. Constant-time comparison via crypto.timingSafeEqual + a typed discriminated-union failure mode (missing/malformed/mismatch/secret_not_configured) so a caller can log the precise reason without leaking it back to the sender. Not wired into a route this session — the daemon only polls — but the verifier ships with full coverage now since shipping the route later without it is a sharp footgun. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: babysit GH client — App-auth + CLI-fallback request shim Unified GitHub-API surface for the babysit loop with two routes: - App auth when installationId is set AND App config persisted: mint/reuse a cached installation token, retry once on 401 (key rotation), retry once on 5xx with backoff. - gh CLI fallback otherwise. Inherits the developer's local gh auth. Bodies on this path return a typed error pointing the operator at the App-auth on-ramp — postponing the stdin plumbing until the runner actually needs to write through the CLI. Routing is transparent to the caller; they always get back a normalized {status, body|errorText, authMode} response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: babysit per-PR worktree manager Idempotent worktree lifecycle for the fix loop: - ensureWorktree() — create or reuse ~/.chorus/worktrees/ <owner>__<name>/pr-<n>/, fetching + checking out the PR head branch. Wipes a stale directory if one exists from a half- failed previous run. - pullLatest() — fetch + reset --hard origin/<branch>. Hard reset is safe only because the runner pushes every commit it makes; documented inline so it doesn't get cargo-culted. - removeWorktree() — git worktree remove --force + rm -rf as belt-and-suspenders for older git versions. Branch names from webhook payloads are validated against the same shell/path-traversal rules used elsewhere in the daemon before being passed to git. Tests use real git against a bare-remote fixture per case; mocking runAsync would leave 90% of the surface untested. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: babysit scheduler — bounded concurrency + per-job mutex Tick driver for the babysit loop with three invariants the production daemon needs: 1. Per-job serialization. A Set keyed by job id, checked-and-set atomically inside dispatch(), prevents two ticks on the same PR from racing over the worktree, decisions table, or reply comment. 2. Bounded global concurrency. maxConcurrent (default 3) caps simultaneous jobs so judge-model quotas + gh-API pressure stay predictable as the backlog grows. 3. Clean drain on SIGTERM. stop() clears the interval AND awaits in-flight jobs so we never leave a worktree mid-commit. Errors thrown from runJob are caught + logged so a single broken PR can't poison the whole loop. The mutex is always released in finally so the next tick can re-dispatch. Not yet wired into daemon startup — the state-machine runner that becomes runJob ships in the next commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: babysit state machine — full judge→fix→verify→push→quiet loop End-to-end driver tying the existing pieces together. One entry point (runJob) the scheduler calls per tick; per-state handlers dispatch the work and return a transition descriptor; the driver owns all babysit_jobs writes so handlers stay pure-ish. State transitions: idle -> judging (provisions worktree) judging -> fixing (any apply-* decision) judging -> quiet_check (replies only, or empty) judging -> escalated (defer-to-human, low-confidence, cap-hit, judge spawn/parse failure) fixing -> verifying (doer produced file edits) fixing -> escalated (doer failure; mark decision escalated) verifying -> pushing (verify passed) verifying -> escalated (verify failed; no auto-retry — the per-comment cap path catches genuine stuck) pushing -> quiet_check (pushed; record commit sha + fix_commits++) pushing -> escalated (git failure) quiet_check -> merged (PR merged on GitHub) quiet_check -> judging (new bot comments arrived) quiet_check -> quiet_check (no change) Supporting modules added in the same commit since they only exist to serve this state machine: - pr-metadata.ts: tiny shim over gh client for title/head/base/ default branch + PR state projection. Uses CLI fallback when no App config. - verifier.ts: resolves npm-test → npm-typecheck → tsc --noEmit from package.json/tsconfig; truncates output at 16 KiB for DB-safe escalation reasons. - fix-executor.ts: doer invocation via structured-output adapter returning {path, new_contents}[]. Full-file rewrites — LLMs are unreliable at diff coordinates and babysit fixes are small. Symlink-aware path safety refuses worktree escape. - git-push.ts: stage → diff-check → commit → push helper. No --force. Default chorus-babysit identity, overridable. Tests: 45 new tests across 5 files cover each handler's happy path + every failure-mode transition. State-machine tests use real DB + mocked external IO; helpers use real shellouts against fixture repos where the value is in the actual git/fs behaviour. Not yet wired: scheduler.start() at daemon boot — that's the next commit, separate from this so the integration is reviewable on its own. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: wire babysit scheduler into daemon lifecycle Start a BabysitScheduler post-listen with the state-machine runner as its job handler. Tick interval defaults to 60s; sourceRepoPath defaults to the daemon's CWD (per-repo overrides will land when the registrar gains a sourceRepoPath field on the babysit job row). CHORUS_DISABLE_BABYSIT_SCHEDULER=1 skips the start for integration tests that drive ticks manually. SIGTERM / SIGINT trigger scheduler.stop(), which clears the interval AND awaits in-flight jobs so we never leave a worktree mid-commit on shutdown. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: babysit pause/resume route — PATCH /babysit/jobs/:id Adds operator-driven pause/resume so a registered PR can be taken off the scheduler's tick without losing its decision history. PATCH /babysit/jobs/:id { action: 'pause' | 'resume' } Pause refuses terminal states (merged, escalated) with 409 — there is nothing for the scheduler to skip once a job has ended. Resume refuses non-paused jobs with 409 to make the intent explicit; both verbs are idempotent within their valid state. Resume re-opens ended_at so the job reappears in listActive() / cockpit lists. The scheduler already treats 'paused' as non-dispatchable (NON_DISPATCHABLE includes paused alongside merged + escalated), so this commit is just the controller — no scheduler change needed. 8 new tests on top of the existing 13 cover: pause happy path, pause idempotency, resume happy path + ended_at clear, conflict on pause-merged + pause-escalated, conflict on resume-when-not-paused, validation on unknown action, 404 on missing job. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: chorus babysit CLI — list/show/register/pause/resume User-facing subcommand group that fronts the existing daemon routes so operators can drive the babysit scheduler without hitting the API directly. chorus babysit register <pr-url> [--installation-id <n>] chorus babysit list [--active] [--state <s>] chorus babysit show <id> chorus babysit pause <id> chorus babysit resume <id> All commands talk to the local daemon over /api/v1; a connection-failed envelope surfaces the standard "start with \`chorus start\`" hint so the failure mode is consistent with the rest of the CLI. Job ids are "<owner>/<repo>#<number>" — show/ pause/resume URL-encode the segment so shells that treat # as a comment don't strip it. show prints the job header + decision log (comment id, author, validity, category, outcome) so 'why did this PR get escalated' is one command away. State labels are color-coded (terminal-red escalated, green merged, yellow paused). src/cli/index.ts also picks up unrelated single→double-quote normalization from the project prettier hook — the only logical change there is the new registerBabysitCommand wire-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.co…

crypticpy and others added 12 commits May 17, 2026 11:28

greptile-apps Bot reviewed May 17, 2026

View reviewed changes

sourcery-ai Bot reviewed May 17, 2026

View reviewed changes

Comment thread src/daemon/runner/reviewer-driver.ts

Comment thread src/cli/commands/diagnose.ts

Comment thread bin/chorus.mjs

Comment thread src/cli/commands/diagnose.ts

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

greptile-apps Bot reviewed May 17, 2026

View reviewed changes

crypticpy merged commit 535a960 into feat/pr-review-and-hardening May 17, 2026
1 check passed

crypticpy deleted the feat/upstream-fold-back branch May 17, 2026 18:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: fold upstream T1+T2 fixes back into fork (12 commits)#2

feat: fold upstream T1+T2 fixes back into fork (12 commits)#2
crypticpy merged 13 commits into
feat/pr-review-and-hardeningfrom
feat/upstream-fold-back

crypticpy commented May 17, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

greptile-apps Bot left a comment

Uh oh!

sourcery-ai Bot commented May 17, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai Bot commented May 17, 2026 •

edited

Loading

Review skipped

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Uh oh!

crypticpy May 17, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

crypticpy commented May 17, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Commits

Fork-specific adaptations

Test plan

Summary by Sourcery

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for voice auto-disable on persistent quota_exhausted

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

crypticpy May 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

crypticpy commented May 17, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented May 17, 2026 •

edited

Loading

coderabbitai Bot commented May 17, 2026 •

edited

Loading