Contributor stack: claude orchestrator + repoPath + Windows spawn (supersedes #35, #36, #37)#39
Merged
Merged
Conversation
`chorus init` was writing the Chorus MCP entry to `projects.<homedir>.mcpServers.chorus` in `~/.claude.json`, so it only surfaced when Claude Code was launched from the user's home directory — not from the project they actually ran `chorus init` in. Shell out to `claude mcp add ... --scope user` instead, matching the codex/gemini orchestrators. Entry lands at top-level `mcpServers` and is available from every project. Removes any stale entry first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`create_chat` drops `repoPath` on the floor — the schema doesn't accept it and the daemonFetch body doesn't forward it. The daemon's `/chats` route already supports `repoPath`, but with nothing forwarded `prompt-builder.ts:45` falls back to the daemon's own `process.cwd()`, which is locked to packageRoot by `start.ts:324`. Two concrete effects: 1. Relative paths in `files: [...]` resolve against the chorus npm install dir, miss, and get silently skipped — so the inline-packed file contents the reviewers + doer see in their prompt are wrong (usually empty). 2. The DOER (only) gets cwd = scratch dir instead of the user's repo (`doer-driver.ts:170`: `repoPath ?? doerDir`), so it can't read project files via its own tools and can't make the real edits the ship phase would commit. Reviewers are NOT affected by this fix — they intentionally run in a per-round scratch dir regardless of repoPath (`reviewer.ts:84`, spelled out in `doer-driver.ts:165-168`). Fix: add `repoPath: z.string().optional()` to `CreateChatSchema`, default it to `process.cwd()` in `createChat`, forward in the POST body. MCP servers spawned by Claude Code / Codex / Gemini inherit the host's cwd (= the project), so the default lands at the right path automatically. Explicit callers can still override. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add shell: win32 to daemon/routes/system.ts (opencode models probe) - Add shell: win32 to cli/commands/update.ts (npm self-update spawn) - Relax SAFE_WIN_PATH regex: whitelist -> blacklist for Unicode paths - Add shell: isWindows to discoverNpmPrefixes (npm.cmd on Windows) - Remove shell: win32 from ship.ts (git/gh are native .exe, no EINVAL risk) Findings from multi-LLM code review (Gemini, DeepSeek, OpenCode).
Persona .md files checked out on Windows have CRLF line endings, but the frontmatter parser checks for '---\n' which fails with '---\r\n', throwing 'missing YAML frontmatter'. Adding .replace() normalizes to LF before parsing. Also adds pr-description.md documenting the full spawn EINVAL fix with tri-review V3 results.
… blacklist) Adds tests for cmd.exe escape-char rejection and @-scoped npm package path acceptance. Without ^ in the blacklist, a path containing `^"& cmd` could break out of the shell-quoted wrap in buildVersionSpawn.
Three convergent blocking findings from 8-reviewer panel: 1. claude.ts execFileAsync missing shell:win32 (caught by gemini-cli — the irony: this is the same EINVAL bug #37 fixes everywhere else). 2. SAFE_WIN_PATH missing ! (delayed expansion under setlocal enabledelayedexpansion). 6/8 reviewers flagged. 3. process.cwd() throwing ENOENT crashes createChat. 5/8 reviewers flagged — wrap in safeCwd() with homedir fallback. Plus: friendlier error message in claude.ts pointing at minimum Claude Code version when 'mcp add --scope user' is unsupported.
This was referenced May 14, 2026
Merged
chorus-codes
added a commit
that referenced
this pull request
May 14, 2026
Bundles #39 (supersedes #35 chrisayl, #36 chrisayl, #37 magalz) — claude orchestrator user-scope MCP registration, repoPath plumbing through MCP create_chat, and Windows spawn EINVAL fix across 8 call sites. Plus chorus-self-review fixups: ! and ^ added to SAFE_WIN_PATH blacklist, shell:win32 on claude.ts execFileAsync, safeCwd() ENOENT fallback. Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com>
Merged
3 tasks
crypticpy
added a commit
to crypticpy/chorus
that referenced
this pull request
May 17, 2026
…na parser (#4) Folds the cross-platform pieces of upstream commit 781bc42 ("Contributor stack: claude orchestrator + repoPath + Windows spawn (chorus-codes#39)") into the fork, intentionally omitting Windows-specific hunks. Included: - src/mcp/tools.ts: add safeCwd() helper + default `repoPath` on create_chat to safeCwd() when caller omits it. Previously the daemon fell back to its own cwd (packageRoot), which caused relative file paths in `files: [...]` to silently resolve to the chorus install dir and miss. MCP servers spawned by Claude Code / Codex / Gemini inherit the host's cwd (= the user's project), so safeCwd() lands at the right path automatically. safeCwd() also catches ENOENT from process.cwd() and falls back to homedir. - src/lib/personas.ts: normalize CRLF → LF in the frontmatter parser so persona .md files checked out with Windows line endings don't fail `missing YAML frontmatter`. Cross-platform safe. - src/daemon/orchestrators/index.ts: drop stale comment block about Claude having a project-config side-effect (the fork's orchestrator long since moved to user-scope). - tests/mcp-create-chat-repo-path.test.ts (+4 tests): cover explicit repoPath, cwd default, full-body forwarding, and ENOENT fallback to homedir. Omitted (Windows-only hunks): - src/cli/commands/update.ts (shell: win32 for npm self-update) - src/daemon/routes/system.ts (shell: win32 for opencode probe) - src/daemon/orchestrators/{codex,gemini,kimi}.ts (shell: win32 tweaks) - src/lib/cli-detect.ts (SAFE_WIN_PATH regex + buildVersionSpawn) - src/lib/voices.ts (discoverNpmPrefixes Windows shell) - tests/cli-detect.test.ts (Windows-specific cmd.exe escape tests) Also omitted: - src/daemon/orchestrators/claude.ts: upstream shells out to `claude mcp add --scope user`. Fork already implements user-scope registration via direct ~/.claude.json patch (more robust — no dependency on `claude` binary in PATH at registration time, plus sweeps stale project-scoped entries). Keeping fork's version. - tests/claude-orchestrator.test.ts: tests the upstream shell-out approach the fork doesn't use. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
crypticpy
added a commit
to crypticpy/chorus
that referenced
this pull request
May 18, 2026
…ut adapter (#1) * fix: cred detection + Claude MCP user-scope registration Three fixes from chorus-issues.md that prevent a freshly-installed chorus from finding the user's existing CLI credentials, so the daemon starts up cleanly on machines that already have Claude / Kimi / moonshot configured. #1: register Claude MCP at user scope. The chorus MCP entry now writes to the top-level `mcpServers` block in `~/.claude.json` (idempotent), and any stale chorus entry under the project-scoped `projects[homedir].mcpServers` is cleaned up. Previously the project-scoped registration was invisible to Claude Code launched outside that exact cwd. #2: cred-path fallbacks. When the anthropic file check misses (e.g. user authed via Claude Desktop, no `~/.claude/...` JSON), fall back to the macOS Keychain via `security find-generic-password -s "Claude Code-credentials"`. Added `~/.kimi/credentials/kimi-code.json` to the moonshot CRED_PATHS so users who authed through `kimi-code` aren't told to log in again. #3: kimi config-missing precheck. New layer-3 check parses `~/.kimi/config.toml` and surfaces a `config_missing` reason when there's no top-level `default_model` set — the CLI will silently pick whatever backend it likes, which is rarely what the user wants. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: reviewer fidelity, verdict surfacing, event/prompt isolation Seven fixes from chorus-issues.md covering the rest of the runner + MCP-surface issues found while reviewing PR #26 of foresight-app. #4: thread `repoPath` through reviewer subprocesses. `runReviewers` → `runReviewer` → `runReviewerHeadless` now accept the chat's repoPath and the reviewer's cwd switches to it when set, so `gh`, file reads, and sandboxed CLIs (Gemini) see the actual code instead of running in an empty per-reviewer scratch dir. #5: surface reviewer answer.md in MCP responses. New `readReviewerArtifacts` helper walks `~/.chorus/chats/<id>/round-N/reviewer-*/answer.md`, caps each at 16 KiB, sorts by (round desc, agent asc), and merges the result into `wait_for_chat` and `get_chat_status` payloads under `reviews`. Both the doer and reviewer `participant_done` events now carry `outputPath` so MCP clients can read the on-disk source of truth when they need more than the streamed tail. #6: bump phase_progress output tail from 500 B to 8 KiB. The 500-byte slice clipped reviewer summaries mid-word; full text remains on disk and is pointed to by `outputPath`. Affects both reviewer.ts and doer.ts. #7: tri-review verdict on `max_rounds_exhausted`. When the doer succeeded every round but reviewers kept saying request_changes through the round cap, chat_done now emits `status: completed, verdict: request_changes, reason: max_rounds_exhausted` with the last round's reviewer summary — previously misclassified as a generic doer failure. #8: refactor `CreateChatSchema` and `InvokePersonaSchema` to plain `z.object()` with per-field `.describe()`. The prior `.transform()` wrapped them in `ZodEffects` which strips the `properties` map from MCP introspection — clients saw an empty schema. Legacy `template` alias and the `code-review` default moved into a new `resolveTemplateId()` helper. #9: dedup `participant_done` at the multiplex layer. Same-slot fallbacks or parsers that emit `message_done` twice (the opencode parser historically does this) used to fan duplicate terminal events out to every subscriber; now keyed by `(phaseIdx, round, role, agent)` and later duplicates drop silently. #10: per-instance reviewer prompt isolation. Same-lineage instances (claude-code-2/4/5, etc.) share the chat dir tree at `~/.chorus/chats/<id>/round-N/reviewer-*/`; tool-using CLIs were wandering into a sibling's answer.md mid-flight and short-circuiting ("the review is complete" referring to a different agent's work). `buildReviewerAsk` now stamps an Independence directive when more than one reviewer slot exists, naming the slot tag and forbidding cross-slot reads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: replay chat_done from persisted verdict, not status The synthetic chat_done emitted when a terminal chat is re-attached derived `verdict` from `chat.status`, ignoring the `chat.verdict` column. Since the previous commit shipped the `max_rounds_exhausted` branch (chorus-issues.md #7), a chat can finish with `status='approved' verdict='request_changes'` — replay was clobbering that to `approved` on every page reload, hiding reviewer disagreement from the user. Use the persisted column when set; fall back to the old status-derived value only for pre-v0.8.27 rows where verdict is null. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: surface dropped attached_files + SSE backpressure; harden ship.ts Three audit follow-ups on the daemon side, all surfacing previously silent failures. attached_files: parseAttachedFiles in runner-multiplex.ts used to swallow JSON parse errors and run the chat with no attachments. Refactor to a tagged result (`empty` / `ok` / `invalid`); on `invalid` the runner logs and emits a `cli_warning` SSE so the cockpit + MCP clients see which chat lost its file list. SSE backpressure: when a subscriber's queue exceeds the 1000-line cap the multiplex used to silently drop the connection. Now writes one `error` frame with code `sse_backpressure` before close, and logs the queue length to daemon.log so an operator tailing logs can see when clients fall behind. gh pr create URL validation: ship.ts captured stdout's last line as the PR URL with no shape check; an empty/malformed stdout produced `{ok: true, prUrl: ''}` and the chat row recorded "shipped" with an unclickable link. Now matches against `^https://github.com/<owner>/<repo>/pull/<n>` before declaring success. detectGitContext parallelization: the five spawnSync probes (is-repo, remote, gh --version, gh auth, HEAD) ran sequentially at 60s each — worst case 360s before runner saw a result. Converted to async with a new `runAsync` helper, batched via Promise.all with a 15s per-probe cap; detectDefaultBranch's symref + three branch-existence checks likewise parallelized. detectGitContext is now async; the lone caller in runner.ts awaits it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: bound failure-summary regex; log malformed SSE frames participant-card.tsx: parseFailureSummary ran the multi-step regex chain over the full participant.answer string. Reviewer answers can be up to 256 KB; on every render that's a UI-thread block. Slice to the first 16 KiB before scanning — the failure-header block is always written at the top of answer.md by reviewer.ts/doer.ts, so the cap never loses signal. live-run-real/index.tsx: the SSE onmessage handler already had a try/catch around JSON.parse, but the catch was silent — a wire-format mismatch dropped events with no trace. Add a console.warn with a preview so devs notice schema drift in DevTools. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: github PR ingestion via gh CLI Adds src/daemon/github-pr.ts: parsePrUrl + fetchPrArtifact run gh pr view/diff plus review and issue comments in parallel, synthesize a Markdown artifact (description, comments capped at 50 newest each, diff capped at 200 KB UTF-8 safe), and classify gh failures into typed reasons. Exports runAsync from ship.ts so the new module can reuse the existing spawn+timeout helper instead of duplicating it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: extract createChatFromValidatedInputs helper Pulls the template lookup, artifact validation, chat row + opening phase event creation, and runner kickoff out of the POST /chats handler into a reusable helper. POST /chats now only handles its route-specific concerns (body shape, repoPath canonicalization, error response shaping). Sets up reuse from the upcoming POST /chats/from-pr endpoint without duplicating ~150 lines of validation logic. No behavior change — same template checks, same artifact rules, same kickoff path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: POST /chats/from-pr — start a chat from a GitHub PR URL Accepts { url, templateId, repoPath?, yolo? }, parses the PR URL, fetches PR meta + diff + existing comments via gh CLI, synthesizes a Markdown artifact, and creates the chat through the shared createChatFromValidatedInputs helper. gh failures map to typed reasons (invalid_url, gh_not_installed, gh_not_authed, pr_not_found, network_failure, unknown) so the cockpit can render actionable errors instead of generic 500s. Adds tests/github-pr.test.ts covering parsePrUrl edge cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: cockpit "GitHub PR" tab on /new Adds a Free-form / GitHub PR mode toggle on the new-chat page. PR mode swaps the prompt textarea for a URL input and routes through the new POST /chats/from-pr endpoint. Validates client-side that the chosen template is review-only before letting the user submit. createChatFromPr API client surfaces the daemon's typed PR meta (owner/repo/number/title/branches) on the response so callers can display PR context after the chat is created. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: review_pr MCP tool Exposes POST /chats/from-pr through MCP. Orchestrators (Claude Code, Codex, Cursor) can now hand chorus a PR URL and get reviewers running against it without going through the cockpit. Defaults templateId to review-only so a caller can pass just a URL. ReviewPrSchema is a plain z.object (not ZodEffects) so MCP clients can introspect required fields — same hazard documented on CreateChatSchema and InvokePersonaSchema. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: capture multi-identity CLI follow-up idea Idea note for running chorus against multiple paid accounts on the same CLI binary (work + personal Claude Code Max, etc.). Filed as follow-up after audit-presets + quota tiers ship — captures the env-override mechanism, proposed Identity primitive, and open questions on keychain CLIs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: schema for audit + orchestrate phases, voice tier, bypass_quota Adds the foundation for repo-pointed audit-and-orchestrate runs and the orchestrator's task↔voice tier matching. Template schema: - AuditPhase (kind: 'audit') — single reviewer voice + one of five preset lenses (de-slopify, monolith-breakdown, code-review, engineering-review, architecture-review). Output schema (AuditItemSchema, AuditOutputSchema) lives next to the phase shape so the structured-output adapter, scheduler, and cockpit checklist agree on the contract. - OrchestratePhase (kind: 'orchestrate') — array of worker voices, default branchPrefix `chorus/{chatId}/worker-{idx}` so each worker gets isolated git state. - templateRequiresRepo() helper for the cockpit's repo-picker gate. Voices: - Adds tier ('high' | 'medium' | 'low', default 'medium') and monthly_budget_usd (nullable) to the row schema, upsert input, and update input. Idempotent migrations on existing DBs. Chats: - bypass_quota INTEGER NOT NULL DEFAULT 0 — set on PR-review chats so the orchestrate scheduler runs every enabled voice at full capacity instead of tier-gating. Runner is stubbed for the new kinds: phase_done emit + continue, so templates that declare an audit/orchestrate phase before the runner logic lands don't crash. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: structured-output adapter for CLI voices Wraps an AgentShim's runHeadless with JSON-formatting prompt scaffold and a one-shot repair loop, returning typed data validated against a caller-supplied zod schema. Used by the upcoming audit phase (which needs typed AuditItem[] instead of free-form prose) and the orchestrate phase (worker results). Keeps each CLI lineage's existing headless transport — the adapter just owns the prompt-shape + parse-and-validate dance. Extraction strategy: prefer direct JSON.parse of finalText; fall back through fenced-block regex variants to a brace-to-brace slice. On parse or schema-violation, retry once with a repair prompt that quotes the validation error. Spawn errors short-circuit (the model never saw the prompt — repair would just retry the same failure). Tests cover happy path, fenced-block extraction, repair-loop success, repair-loop exhaustion, schema violation, and spawn error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cockpit): audit-a-repo tab + checklist approval component /new gets a third tab beside Free-form and GitHub PR. In audit mode the user picks one of five preset lenses (de-slopify, monolith-breakdown, code-review, engineering-review, architecture-review) and supplies an absolute repo path. Submit fires createChat with templateId=`audit-<preset>` — those built-in templates land with the audit-phase implementation. RunChecklist component lives at src/components/run-checklist/. It takes the AuditItem[] surfaced by the audit phase's blocking event and renders one row per item with a checkbox, complexity badge, rationale, and file list. Default state has every item selected; the user trims, then submits via the parent's onSubmit which JSON-encodes the selected ids into the existing /chats/:id/resume `answer` field. Wiring into the live-run UI lands with the audit phase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: PR-review chats bypass quota + tier surface on /voices PR-review chats automatically set bypass_quota=true so the orchestrate scheduler ignores voice.tier and runs the full fleet at maximum capacity — reviews are short, parallel, and the user wants the strongest opinion possible regardless of model tier. PUT /voices/:id now accepts tier ('high' | 'medium' | 'low') and monthly_budget_usd (non-negative or null), so the cockpit fleet page can label voices by capability for the orchestrate scheduler to route work against. Tests cover both new fields plus a chat round-trip asserting bypass_quota defaults false and persists when set. * feat: audit phase + 5 presets + audit-* templates Wires the audit phase end-to-end: - src/daemon/phases/audit.ts runs the structured-output adapter against the chosen preset, persists the parsed AuditItem[] to <chatDir>/audit-output.json plus raw model output to round-1/audit/output.md, and emits phase_progress with the items. - src/daemon/runner.ts replaces the audit/orchestrate stub: audit invokes runAuditPhase, flips chat status to blocked so the cockpit renders the checklist UI, and exits cleanly. Orchestrate keeps the no-op stub until step 5 lands. - 5 preset prompts (de-slopify, monolith-breakdown, code-review, engineering-review, architecture-review) frame what each lens looks for. The structured-output adapter handles JSON formatting; presets describe the audit lens only. - 5 audit-* templates (one per preset), each a 2-phase audit -> orchestrate shape with three default workers. Auto-loaded by seedBuiltinTemplates. - tests/audit-phase.test.ts covers preset-file presence and the audit-* template parse + shape contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: orchestrate phase + audit-resume wiring + tier-aware scheduler Wires the audit→orchestrate handoff: the cockpit POSTs the user's trimmed audit checklist to /chats/:id/resume, the resume handler cross-checks ids against audit-output.json, persists the selection, flips chat to drafting on the orchestrate phase, and re-fires the runner. The runner now starts at chat.current_phase_idx so a resumed chat lands directly on orchestrate. The new orchestrate phase walks the approved AuditItem[] sequentially (parallelism is an explicit non-goal for v1), picks a worker per item via the pure tier-aware scheduler, cuts a per-item branch, dispatches the worker via shim.runHeadless, captures git diff --stat, and persists orchestrate-manifest.json for the diff-apply UI to consume. The scheduler is a pure function with 9 unit tests covering tier matching, bypass override, disabled-voice skipping, empty pool, and unknown voice ids. Resume route has 10 tests exercising body validation, id cross-check, status gating, and the happy path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: orchestrate manifest UI + checkout/open-pr daemon routes - Run page reads audit-output.json + orchestrate-manifest.json on render - LiveRunReal renders RunChecklist while blocked w/ audit items, then swaps to OrchestrateManifest panel once orchestrate completes - New OrchestrateManifest component shows one row per worker w/ Checkout / Open PR buttons (per-row inline feedback, no global toast) - Daemon: GET /chats/:id/audit-items, GET /chats/:id/orchestrate-manifest, POST /chats/:id/workers/:idx/checkout (refuses on dirty tree), POST /chats/:id/workers/:idx/open-pr (gh pr create, bucketed failures) - OrchestrateManifestSchema added to template-schema.ts; route + UI parse via the same shape Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: harden resume race + branch validation + symlink TOCTOU + extractJson Address /freview findings on the audit + orchestrate flow: - Resume race (BLOCKER): two concurrent POSTs to /chats/:id/resume could both pass the `status=='blocked'` check and double-fire the runner. Guard with `getActiveRun` (catches the audit-finishing window before `.finally` clears the registry) and replace the status flip with an atomic `tryResumeFromBlocked` CAS conditional on `WHERE status = 'blocked'`. - Branch-name argument injection (BLOCKER): tighten zod regexes on `OrchestratePhase.branchPrefix` and `OrchestrateManifestEntry.branch` so values starting with `-` (or containing shell metachars) cannot flow into `git checkout` / `gh pr create` as flags. - Symlink TOCTOU on checkout + open-pr (NON-BLOCKER): re-realpath `existing.repo_path` before passing to execFile cwd, mirroring the rerun-path pattern. Returns a structured validation error if the path no longer resolves. - extractJson Path 4 (NON-BLOCKER): try `{...}` and `[...]` slices independently and prefer the longer parse, so prose like "mentions [stuff] before {object}" extracts the object instead of the bracket. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: prod CJS build — drop import.meta + copy presets to dist Two issues blocked `pnpm build:server`: - `audit.ts` used `import.meta.url` for module-relative path resolution, but the server tsconfig compiles to CJS where `import.meta` is a syntax error. Replaced with `__dirname`, which works in both the compiled dist (native CJS) and tsx-driven dev (tsx ≥4 shims it in ESM mode). - The `build:server` script copied `schema.sql` to dist/ but missed the preset markdown files in `src/daemon/presets/`. The audit phase's `loadPresetPrompt` resolves relative to `__dirname`, so a published install was hitting ENOENT on every audit run. Extended the copy step to mirror the preset directory. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: fold upstream T1+T2 fixes back into fork (12 commits) (#2) * feat(cli): add diagnose command + crash-hook Bundles two upstream changes that ship a self-service triage path for chorus users hitting opaque failures: - `chorus diagnose` walks the install, daemon, recent failed chats, voice health, and produces a sharable bug report. - Crash hook captures uncaught exceptions in the CLI and writes them to a crash log alongside instructions to attach during a bug report. Folded back from upstream chorus-codes/chorus: 7ea712b feat: chorus diagnose command + crash hook for bug reports (#1) 4a5ea20 fix(diagnose): realpath bin path + filter Next.js SSE noise (#4) Co-Authored-By: chorus-codes <info@chorus.codes> * feat(cli): add quickstart self-test command `chorus quickstart` runs a 30-second activation flow that verifies the daemon comes up, the SQLite DB initializes, and a minimal chat round-trips end-to-end. Aimed at first-run users who want to know "is this thing actually working" before authoring a template. Folded back from upstream chorus-codes/chorus: 56610cf feat(cli): chorus quickstart — 30-second activation self-test (#30) Co-Authored-By: chorus-codes <info@chorus.codes> * fix(cli): use dynamic import for open package (Node 22 ERR_REQUIRE_ESM) The `open` package and `chokidar` are both ESM-only as of recent versions. On Node 22 (the daily-driver target) static `require()` calls into them throw ERR_REQUIRE_ESM and crash the CLI at boot. Switch to dynamic import in: - src/cli/commands/start.ts (open browser after boot) - src/cli/open-browser.ts (new helper) - src/cli/index.ts (route open import) - src/daemon/output-watcher.ts (chokidar file watch) Includes upstream's post-merge hardening: the setTimeout that triggers the browser-open no longer wraps an async callback bare, so a missing default browser doesn't surface as an unhandled rejection. Folded back from upstream chorus-codes/chorus: e8ca2ee fix(cli): dynamic import for open package (#14) dcd1837 fix: post-merge hardening for #14 (start.ts portion only; cli-precheck.test.ts portion ships with the Keychain fix) Co-Authored-By: Julien Deudon <deudon.j@gmail.com> Co-Authored-By: chorus-codes <info@chorus.codes> * feat(cockpit): seed empty round-1 so QUEUED renders from t=0 Before: when a chat starts but no reviewer has produced an event yet, enrichRounds returned an empty rounds array and the live-run page showed nothing for several seconds — the user couldn't tell whether their chat had launched. After: seed a synthetic round-1 with QUEUED placeholders for every expected participant so the page renders the per-reviewer cards immediately. Real events overwrite placeholders as they arrive. Folded back from upstream chorus-codes/chorus: 53e8fb6 feat(cockpit): seed empty round-1 so QUEUED placeholders render from t=0 (#2) Co-Authored-By: chorus-codes <info@chorus.codes> * feat(daemon): runtime fallback-collision dedup across reviewer slots When two reviewer slots both fall through their per-slot chains to the same template-level fallback target (common case: every slot ends in anthropic/claude-sonnet-4-6), both used to dispatch the same (lineage, model) in parallel — wasted cost and the lineage diversity that's the point of multi-LLM peer review collapsed. Build-time dedup (template-fallback.ts) couldn't catch it because each slot only knows about other slots' PRIMARIES, not their fallback chains. Fix: new per-chat/per-round (lineage, model) registry. reviewer-driver tryClaim's before each chain attempt and releases in a finally. On collision, return null + emit cli_warning(reason='fallback_collision') so runWithChainFallback advances to the next entry and the cockpit can show why the slot skipped. Ported into fork's reviewer-driver.ts surgically so the verdict-isolation refactor (2a2cde2) and per-slot repoPath threading stay intact. Folded back from upstream chorus-codes/chorus: c4751fe feat(daemon): runtime fallback-collision dedup (#3) Co-Authored-By: chorus-codes <info@chorus.codes> * fix(daemon): write REVIEWER FAILED summary on pre-spawn failure Before: when a reviewer's precheck fails (e.g. underlying CLI not installed) or the chat is cancelled while the slot is queued for a CLI semaphore slot, runReviewer used to return null silently — leaving NO on-disk participant directory. The cockpit's enrich-rounds loop then couldn't reconcile the synthesised template slot against any real participant, so the card sat at "Queued — waiting for an open slot." forever and the actual error was invisible. Reproduction: install chorus on a host with only one CLI on PATH (e.g. just claude-code), open a template that includes lineages requiring codex/gemini/kimi, fire it. Every reviewer card stayed "Queued" — chat never visibly progressed even though it was already done failing. Fix: - Create the reviewer dir BEFORE the precheck runs. - Add a writePreSpawnFailure helper that writes a `## REVIEWER FAILED` summary in the canonical format (Kind / Lineage / Model / message) that the cockpit's `parseFailureSummary` already understands. - Wire it into the precheck-failed and cancelled-while-queued paths. Card now transitions out of pending and shows the actual error (cli_missing, cancelled, ...). Folded back from upstream chorus-codes/chorus: afc59cc fix(daemon): REVIEWER FAILED summary on pre-spawn failure (#26) Co-Authored-By: chorus-codes <info@chorus.codes> * feat(voices): auto-disable on persistent quota_exhausted + lsof timeout Real pain (upstream #11): a Pro Gemini model on a Flash-only account fails every chorus run with "exhausted your capacity on this model" — but Gemini doesn't return a resetAt because the model isn't going to become available for that account. Without auto-disable, the runner keeps picking the dead voice on every chat and the user keeps seeing the same opaque error. Voice auto-disable: - New src/lib/voice-failure-tracker.ts records per-voice consecutive quota_exhausted strikes in a settings counter. - Trigger: 2 consecutive strikes WITH no resetAt → set voices.enabled=false + disabled_reason='auto_quota'. - Counter resets on participant_done success; rate-limit strikes (hasResetAt=true) bypass the counter entirely so a transient 429 + a later permanent failure can't trip the threshold on the first permanent strike. - Wired into reviewer-driver alongside recordHealth; emits a cli_warning(reason='voice_auto_disabled') so the cockpit can show a one-line explanation. - VoiceDisabledReason union gains 'auto_quota' (schema column was already TEXT — no migration). Lsof timeout (upstream #12): - findPidsOnPort and findPidsOnPortWithSudo now bound execSync / execFileSync to 3s, so a slow-but-functional lsof on a loaded macOS box doesn't hang chorus boot. 3s leaves headroom while still bounding the hang case. Ported into fork's reviewer-driver.ts tmux pollHandle + success path. voices.ts disabled_reason union extended alongside fork's voice-tier column. Folded back from upstream chorus-codes/chorus: 4f6becc v0.8.30 — voice auto-disable (#11) + lsof timeout (#12) (#17) Co-Authored-By: chorus-codes <info@chorus.codes> Co-Authored-By: Lumina Mao <luminamao@mac.lan> * fix(daemon, schema): codex isolation + template-schema validation Two issues caused chats to fail opaquely at run-start: CODEX ISOLATION (#10, #16) The user's ~/.codex/config.toml may declare MCP servers, plugins, or notification hooks. In headless `codex exec` those integrations have caused codex to hang or cancel mid-call — two independent reproductions: codex as our reviewer (#10) and codex as MCP client of chorus (#16). Add --ignore-user-config to every headless codex argv. Extracted to a pure `buildHeadlessArgs(opts)` so the argv shape is unit-testable. TEMPLATE VALIDATION (#15) `reviewer.require > candidates.length` used to surface as "Job moves immediately to failure upon Start press" — the runner queued, failed to grant enough slots, and emitted an opaque chat-failure. Same for `require > distinct lineages` when crossLineage:true. Both now caught at TemplateSchema.parse() time with a clear error message the user can fix before the run starts. ReviewerSchema.superRefine() additions slot in cleanly alongside the fork's audit/orchestrate phase schema work — both are additive constraints on the same ReviewerSchema object. Folded back from upstream chorus-codes/chorus: 8ed970b fix(daemon, schema): codex isolation + template validation Co-Authored-By: chorus-codes <info@chorus.codes> * fix(runner): honour iterate.onDisagreement accept-doer/escalate The template schema, cockpit dialog, and SPEC-D-templates have always exposed three values for iterate.onDisagreement — 'continue', 'escalate', 'accept-doer' — but the runner only honoured 'continue'. Picking the other two from the cockpit form was a silent no-op: chats fell through to phase_failed with 'doer_failed_all_rounds' regardless. This wires both new branches into the round loop and the terminal chat_done emission: - 'accept-doer': after maxRounds without consensus, mark doerSucceeded and continue. The chat carries on (subsequent phases, ship, approval) as if reviewers had agreed on the doer's last answer. - 'escalate': halt with status='failed' but verdict='request_changes' and error='escalated_on_disagreement', so cockpits can render "reviewers disagreed, needs human" distinctly from "doer broke." Policy table extracted into a pure decidePhaseOutcome() helper so the 3 × 2 input matrix (policy × disagreement-in-last-round) is unit-tested without standing up the full runChat scaffold. Gated on disagreementInLastRound (reset at top of every round + on doer-crash path) so a partial / empty doer answer can never be silently "accept-doer"'d as final. Preserves the fork's existing standardPhaseRoundsExhausted #7 surfacing for the 'continue' path; the 'escalate' path takes precedence with its own distinct chat_done. Upstream PRs #49, #50 (commit 67572e9). Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(cli-precheck): cover macOS Keychain fallback for Claude Code v2 The fork already implements the Keychain fallback in cli-precheck (hasDarwinKeychainEntry). This adds the missing test coverage: - passes when no cred file but keychain entry exists - blocks when no cred file and no keychain entry - skips keychain check when cred file exists (fast-path preserved) - does not consult keychain for non-anthropic lineages vi.mock('node:child_process') uses the importOriginal spread pattern so spawn / exec / etc. keep their real implementations — a bare module replacement would silently break any sibling test that imports from child_process. Upstream PRs #7, #8, plus the dcd1837 test-mock hardening. Co-Authored-By: Yura <yurahalych@gmail.com> Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cockpit): derive candidatesWithModels from snapshot's candidates field Daemon-side TemplateSchema only carries `candidates` on each ReviewerRule. The cockpit Template type expects `candidatesWithModels` populated — enrich-rounds iterates that field to build slot→model mappings for run-page cards. When fromRow parsed template_snapshot and cast it to Template, the cast was a TypeScript lie: at runtime the parsed object lacked candidatesWithModels, enrichRounds iterated zero reviewer slots, and no model name reached the cards (badge appeared empty). Derive candidatesWithModels at the parse seam (chats.fromRow) so the cockpit's Template contract is honoured regardless of which path produced the data. Idempotent — if a future daemon ever serialises the field directly, that wins. Persona forwarded if present. Audit- phase single-voice reviewers (no candidates array) are skipped via a runtime narrow. Upstream PR #6 (chorus-codes/chorus@ac0c7fd). Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(diagnose): capture failure context — CLI smoke, voice health, recent failed chats Extends `chorus diagnose` with three signals that triage the most common breakage modes: - **CLI smoke**: spawn `<bin> --version` per detected CLI with a hard 2s SIGKILL timeout (wrapper scripts may trap SIGTERM). Distinguishes `timedOut` from non-zero exit so the report can tell hangs apart from crashes. - **Voice health**: counts `enabled=0` voices grouped by `disabled_reason` ('user' vs 'auto_missing' vs 'quota_exhausted'). Added `idx_voices_enabled` so the `WHERE enabled = 0` scan stays cheap as the table grows. - **Recent failed chats**: last 5 chats with `status='blocked'` plus the errored participants pulled from `~/.chorus/chats/<id>/round-*/<part>/_attempts.jsonl`. Only `errorMessageBytes` is exposed — raw error text never leaves the user's machine. `$HOME` is redacted from any embedded path strings via `redactHomePaths`. Adapted from upstream chorus-codes/chorus#19 (0666dca). Preserves the fork's existing diagnose shape and adds tests for smokeOneCli / readLatestAttempt / formatReport rendering of the three new sections. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(diagnose): include no_review in recent failed chats query The recent-failed-chats section was meant to surface per-participant failure context from `_attempts.jsonl`, but the WHERE clause only covered 'failed', 'blocked', 'cancelled'. The most common failure shape — every reviewer down for missing CLI / auth / quota — ends the chat in 'no_review', which was being silently filtered out. So the exact case the section exists to diagnose returned an empty list, forcing users back into manual log collection. Adds 'no_review' to the IN-list and a regression test that asserts both the status and a quota_exhausted errorKind render in the report. Addresses chatgpt-codex review P2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: chorus-codes <info@chorus.codes> Co-authored-by: Julien Deudon <deudon.j@gmail.com> Co-authored-by: Lumina Mao <luminamao@mac.lan> Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Yura <yurahalych@gmail.com> * feat: fold upstream Grok + Local LLM + Keychain dual-probe (4 commits) (#3) * feat(grok): detect Grok Build (xAI) + Level 1 orchestrator Adds Grok Build CLI to detection, onboarding picker, /connect card, diagnose smoke, init listing, and doctor labels. Grok auto-picks chorus MCP from ~/.claude.json (verified empirically via `grok inspect`) — no separate MCP wire needed. The grok orchestrator reports connected=true when both the binary is detected AND chorus is wired in ~/.claude.json (either top-level mcpServers or any project-scoped mcpServers entry). connect() is a no-op that points users at `chorus connect claude` if claude hasn't been wired yet. Quickstart filters CLIs to those with shims, so grok-cli being detected first no longer breaks the doer-pick flow. The cliToLineage map remains the source of truth for reviewer-capable CLIs. `docs/integrating-a-new-cli.md` captures the full Level 1/2/3 integration playbook for future CLIs — written while doing this so the steps are tested. Adapted from upstream chorus-codes/chorus#44 (6a00b00). No conflicts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(local): add Local LLM HTTP shim for OpenAI-compatible endpoints Adds a `local` lineage that dispatches chat completions to any OpenAI-compatible HTTP endpoint (Ollama, llama-swap, LM Studio, vLLM, or anything that speaks `/v1/chat/completions`). No external subscription or CLI binary required — only a running local inference server. Configuration: save a JSON secret under key `local` via Settings → Local LLM: {"base_url": "http://127.0.0.1:11434/v1", "api_key": ""} Model ids may use a `local:` prefix (e.g. `local:llama3`) which the shim strips before dispatch, or bare model names directly. When no secret is saved, falls back to Ollama's default port. Wiring sweep (extends every exhaustive enum / Record so templates can declare local voices without Zod errors): - src/daemon/agents/local.ts — new HTTP shim with JSON.parse guard on the secret (yields a typed `config_parse` error event for malformed secrets instead of throwing inside the generator) - src/daemon/agents/index.ts — register localShim, `local:` prefix routing in pickShimForVoice, add to isHttpDispatchedShim - src/daemon/agents/types.ts — `local` in Lineage - src/lib/template-schema.ts — `local` in both lineageEnum + reviewerLineageEnum - src/lib/cli-health.ts — `local` in CliLineage + ALL_LINEAGES - src/lib/cli-precheck.ts — empty CRED_PATHS, LOGIN_HINT, skip the file probe (same pattern as openrouter — auth lives in secrets table) - src/lib/cockpit-types.ts — `local` in ReviewerLineage - src/lib/lineage-maps.ts — `local` in DaemonLineage, UILineage, every label/dot/brand map; UI_LINEAGE_DEFAULT_MODEL[local] = "" (model IDs are endpoint-specific). Teal dot distinguishes local from openrouter's cyan - src/components/phase-editor/constants.ts — LINEAGES list, DAEMON_TO_COCKPIT_LINEAGE - src/components/template-dialog/constants.ts — COCKPIT_TO_DAEMON, DAEMON_TO_COCKPIT, DAEMON_DEFAULT_MODEL, FALLBACK_LINEAGES Adapted from upstream chorus-codes/chorus#41 (716fa3a). The bundled upstream commit also included Keychain dual-probe (#38) and fallback-registry hold-on-success (#42) — those land in follow-up commits in this PR so each concern is reviewable independently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Greg <7xshadowx7@gmail.com> Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com> * feat(grok): Level 3 shim — full reviewer dispatch (happy-path unverified) Promotes Grok Build from Level 2 (consumer-only) to Level 3 (full reviewer shim). Chorus can now dispatch to grok-build as a doer or reviewer in any template. What's verified (empirically): - Detection, headless-mode invocation pattern (`grok -p ... --output-format streaming-json --yolo --max-turns 1`), error event shape, exit-code semantics - Failure path: free-tier auth produces clean quota_exhausted (SuperGrok Heavy subscription required) → voice auto-disables after N strikes - All UI surfaces (model boxes, template-editor lineage picker, run-page participant card, cli-status-panel, onboarding picker, connect orchestrator) What's specced but not run live (needs SuperGrok Heavy): - Happy-path streaming-json text/end event parsing (followed `~/.grok/docs/user-guide/13-headless-mode.md` spec) - Token/cost accounting — Grok doesn't surface usage in end event; estimateCostUsd returns 0 New files: - src/daemon/agents/grok.ts — shim with `--max-turns 1` headless args - src/daemon/agents/parsers/grok.ts — streaming-json + stderr parser - tests/grok-parser.test.ts — 18 cases covering happy / error / robustness Lineage sweep (xai daemon lineage was already a legacy alias to opencode — uses fresh `grok` daemon lineage to avoid colliding with that mapping; old YAML with `lineage:xai` still routes to opencode): - Lineage / CliLineage / ReviewerLineage / DaemonLineage / UILineage - LINEAGE_LABEL / LINEAGE_DOT / UI_LINEAGE_* / UI_LINEAGE_BRAND - UI_LINEAGE_AVAILABLE_MODELS.grok = ['grok-build'] - UI_LINEAGE_DEFAULT_MODEL.grok = 'grok-build' - template-schema lineageEnum + reviewerLineageEnum - DB voices row schema (additive — old rows still validate) - phase-editor LINEAGES + DAEMON_TO_COCKPIT_LINEAGE - template-dialog COCKPIT_TO_DAEMON + DAEMON_TO_COCKPIT + DAEMON_DEFAULT_MODEL + FALLBACK_LINEAGES - cli-status-panel + live-run-real helpers - error-detector auth-prompt regex (SuperGrok signature on its own branch ABOVE the generic auth regex — classifies to quota_exhausted, not auth_invalid) Voice seeding: grok-cli registered in SINGLE_MODEL_CLIS — auto- creates the grok-cli voice (id=grok-cli, lineage=grok, model_id=grok-build) on first daemon boot when the binary is detected. Auth flow: ~/.grok/auth.json file probe OR GROK_CODE_XAI_API_KEY env short-circuit. Both verified in tests/cli-precheck.test.ts. Daemon won't spawn grok without one or the other present — prevents the browser-OAuth flow from hanging headless dispatch. Total tests: 821 → 842 (+21). Adapted from upstream chorus-codes/chorus#46 (f9dfba5). Conflicts resolved by taking the union of fork's `local`-extended enums and upstream's `grok`-extended enums (every Record / z.enum had to be extended in both dimensions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com> * fix(cli-precheck): macOS Keychain dual-probe — also check "Claude Code" service Claude Code v2.x stores OAuth credentials under two service names depending on the auth flow: - `Claude Code-credentials` — Pro/Max OAuth via `claude login` - `Claude Code` (no suffix) — API-key auth + some Console-account flows The previous single-service probe regressed to auth_missing for API-key users on darwin. Refactor hasDarwinKeychainEntry to accept string | string[], iterate candidates, short-circuit on first match. Each probe stays bounded to 1.5s so a misconfigured keychain can't stall every spawn. Refs upstream issue #38 / commit 716fa3a. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: PR review — local in voices enum, AGENT_TO_LINEAGE for grok/local, separate cred-precheck vs semaphore bypass Addresses bot review on PR #3: - Sourcery P2 (src/lib/db/voices.ts): VoiceRowSchema and VoiceUpsertInput only allowed `grok` in the new-lineage slot; `local` voices upserted via the (future) Local LLM connect flow would have failed Zod validation at runtime. Add `local` to both the enum and the union. - Codex P2 (src/app/api/run-artifacts/[chatId]/route.ts + src/app/runs/[runId]/page.tsx): AGENT_TO_LINEAGE did not map `grok-cli` → `grok` nor `local` → `local`, so a real Grok or Local participant directory (`reviewer-grok-cli-N`, `reviewer-local-N`) resolved to a bogus lineage and rendered as an unbranded extra card while the placeholder slot stayed pending. - Codex P2 (src/daemon/agents/index.ts + src/daemon/runner/{doer,reviewer}-driver.ts + src/lib/settings/concurrency.ts): the daemon used a single predicate `isHttpDispatchedShim` for two unrelated decisions — bypassing the CLI-credential precheck AND bypassing the local-CLI semaphore. That was safe for OpenRouter (truly remote) but wrong for the Local LLM shim, whose default endpoint is Ollama on 127.0.0.1: N concurrent reviewers + a doer can thrash VRAM/RAM on consumer hardware. Split into `isHttpDispatchedShim` (kept for cred-precheck bypass) and `bypassesLocalCliSemaphore` (only openrouter). Add `grok-cli` and `local` to CLI_LINEAGES with conservative per-CLI defaults (grok-cli matches gemini at 2; local defaults to 1, bump in /settings if your endpoint multiplexes). Tests: 845 pass (unchanged), typecheck clean. * fix: PR review — CodeRabbit pass (docs/Grok level, init+quickstart+local edges, regex, tests) Addresses CodeRabbit's first batch of review comments on PR #3: - docs/integrating-a-new-cli.md: contradictory level for Grok — line 3 said "detection-only", line 15 said level 2, line 302 said level 3. Normalize to level-3 (the shim ships in this PR) and note that the level-2 orchestrator coexists for the consumer-side wiring. - src/cli/commands/init.ts: `--connect grok` was rejected because the local Name union, ALL_NAMES list, and the `--connect` option help text omitted 'grok' even though detection labels and OrchestratorName already accepted it. Adding 'grok' to all three. - src/cli/commands/quickstart.ts: the "install one of …" guidance printed when no CLIs are detected still listed only 5 — extend to Grok CLI to match the dispatchable set. - src/daemon/agents/local.ts: * Empty `base_url` (e.g. user saved settings with an empty box) was passed through `??` as the URL and surfaced as an opaque fetch error; treat empty / whitespace-only as unset and fall back to DEFAULT_BASE. Strip trailing slashes while at it. * Trailing SSE payload was dropped when the server closed without a final blank-line delimiter (older Ollama, some vLLM configs) — the last text_delta could silently disappear, truncating answers. Extract event-dispatch + payload-extract into local helpers and flush the residual buffer after the read loop exits. - src/lib/cli-detect.ts: grok regex documented "name OR bare-version" but only matched the name. Add the bare-version alternative; the basename guard already prevents cross-vendor matches. - tests/grok-parser.test.ts: 4 cases narrowed event[0] under `if (events[0].type === 'error')` without a prior `expect(...).toBe` on type — a non-error event silently skipped the inner assertions. Add explicit type expectations before the narrowing. Tests: 845 pass (unchanged), typecheck clean. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Greg <7xshadowx7@gmail.com> Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com> * feat: fold upstream contributor stack — repoPath default + CRLF persona parser (#4) Folds the cross-platform pieces of upstream commit 781bc42 ("Contributor stack: claude orchestrator + repoPath + Windows spawn (#39)") into the fork, intentionally omitting Windows-specific hunks. Included: - src/mcp/tools.ts: add safeCwd() helper + default `repoPath` on create_chat to safeCwd() when caller omits it. Previously the daemon fell back to its own cwd (packageRoot), which caused relative file paths in `files: [...]` to silently resolve to the chorus install dir and miss. MCP servers spawned by Claude Code / Codex / Gemini inherit the host's cwd (= the user's project), so safeCwd() lands at the right path automatically. safeCwd() also catches ENOENT from process.cwd() and falls back to homedir. - src/lib/personas.ts: normalize CRLF → LF in the frontmatter parser so persona .md files checked out with Windows line endings don't fail `missing YAML frontmatter`. Cross-platform safe. - src/daemon/orchestrators/index.ts: drop stale comment block about Claude having a project-config side-effect (the fork's orchestrator long since moved to user-scope). - tests/mcp-create-chat-repo-path.test.ts (+4 tests): cover explicit repoPath, cwd default, full-body forwarding, and ENOENT fallback to homedir. Omitted (Windows-only hunks): - src/cli/commands/update.ts (shell: win32 for npm self-update) - src/daemon/routes/system.ts (shell: win32 for opencode probe) - src/daemon/orchestrators/{codex,gemini,kimi}.ts (shell: win32 tweaks) - src/lib/cli-detect.ts (SAFE_WIN_PATH regex + buildVersionSpawn) - src/lib/voices.ts (discoverNpmPrefixes Windows shell) - tests/cli-detect.test.ts (Windows-specific cmd.exe escape tests) Also omitted: - src/daemon/orchestrators/claude.ts: upstream shells out to `claude mcp add --scope user`. Fork already implements user-scope registration via direct ~/.claude.json patch (more robust — no dependency on `claude` binary in PATH at registration time, plus sweeps stale project-scoped entries). Keeping fork's version. - tests/claude-orchestrator.test.ts: tests the upstream shell-out approach the fork doesn't use. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address PR #1 bot review batch — sendError migration, abort+cancel, schema CHECKs, audit + orchestrate hardening Sweep of fixes for CodeRabbit + ChatGPT Codex review on PR #1. Grouped into one commit because the surface is broad but every change is small, review-driven, and verified together (typecheck clean, vitest 849/849, lint 0 errors). Routes — sendError vs errorResponse (CodeRabbit Critical/Major): - chats-from-pr.ts catch → sendError(reply, ...) so 5xx errors carry the right HTTP status instead of bare 200 + ok:false body. - voices.ts GET list / GET :id / POST / PUT / DELETE all migrated; DELETE handler gains the missing reply param. - Drop now-unused errorResponse imports in both files. Quickstart abort propagation (CodeRabbit Major): - pollChat fetch passes the signal so a SIGINT or timeout interrupts the in-flight request instead of waiting for the daemon's response. - 1500 ms inter-poll sleep wakes on abort instead of always blocking its full duration after the signal fires. - Timeout path now also POSTs /chats/:id/cancel (extracted shared `cancelRemote` helper), matching the SIGINT handler so timed-out runs don't leave the daemon reviewing in the background. start.ts best-effort openBrowser (CodeRabbit Major): - Both `chorus start` paths catch openBrowser rejection so a failing `open` doesn't fail the whole command when the daemon is already healthy. Matches scheduleAutoOpenBrowser's existing behaviour. Codex headless GitHub transport (CodeRabbit Major): - HeadlessSpawnOptions gains optional `transport` mirroring AgentSpawnOptions. - codex.buildHeadlessArgs flips network_access on for transport === "github", matching buildLaunchCommand. Previously headless GitHub runs couldn't reach github.com or call out via gh. CLI health auth-kind mapping (CodeRabbit Minor): - kindToStatus now maps auth_invalid and auth_missing to "auth_invalid" so Grok auth failures render the right cockpit CTA instead of "unknown". Voice-failure-tracker hasResetAt streak reset (CodeRabbit Major): - When the upstream promises recovery, also clear any prior strike counter. Pre-fix, permanent-fail → resetAt-fail → permanent-fail tripped the threshold on the first permanent strike instead of the second. Schema CHECK constraints (CodeRabbit Major): - schema.sql + connection.ts migrations add CHECKs on bypass_quota (0/1), tier ('low'/'medium'/'high'), and monthly_budget_usd (NULL or >= 0). Guards scheduler inputs at the DB layer for both fresh installs and migrated DBs. MCP createChat dead conditional spread (CodeRabbit Minor): - safeCwd() is the deliberate fallback per upstream contributor PR. Drop the dead `...(parsed.repoPath !== undefined …)` spread that just re-set the same value the unconditional `repoPath` field already sent. github-pr.ts ENOENT classifier (ChatGPT Codex Major): - classifyGhFailure now recognises Node's `spawn gh ENOENT` shape so the documented first-run path (paste PR URL before installing gh) returns the actionable gh_not_installed code instead of db_error. Claude orchestrator trailing newline (CodeRabbit Trivial): - registerClaudeMcpServer JSON write gains the trailing "\n" used by connectClaude, keeping ~/.claude.json byte-for-byte stable. runner-multiplex chat-scoped warning persistence (CodeRabbit Major): - cli_warning / cli_error events that arrive without a valid phaseKind (e.g. attached_files_invalid emitted before any phase starts) now skip phaseEvents.create instead of being coerced into a synthetic 'review'/'reviewer' row. The chatLogger path already captured the warning; live subscribers got it from the original onEvent. doer.ts answerFile init guard (CodeRabbit Major): - Wrap the initial fs.writeFileSync(answerFile, "") in try/catch so EACCES/ENOSPC at startup emits a cli_error (kind: answer_init_failed) with a usable CTA instead of bypassing the failure path and leaving the chat dir empty. cli-precheck kimi default_model gate (CodeRabbit Major): - Only enforce ~/.kimi/config.toml default_model when an actual kimi-cli credential file is present. moonshot voices routed via opencode are authed entirely by opencode and never touch ~/.kimi/ — hard-failing them here rejected healthy setups. audit.ts preset-load + id uniqueness (CodeRabbit Major): - loadPresetPrompt now wraps in try/catch and emits phase_failed (reason: preset_load_failed) instead of letting the promise reject after phase_start fires. - AuditItem.id uniqueness is enforced before persisting audit-output.json; duplicates emit phase_failed (reason: invalid_output) since orchestrate selection is id-keyed. orchestrate.ts checkout failure path (CodeRabbit Major): - Capture `git checkout <startingBranch>` result. On failure, push a failed manifest entry and emit phase_failed (reason: checkout_failed) instead of silently letting the next worker stack on top of the prior worker's branch and polluting diff stats. new/page.tsx PR flow stale repoPath (CodeRabbit Major): - handleStartFromPr no longer forwards the shared repoPath state. The input is cleared + disabled in reviewOnly mode but the state can still hold a stale value from a mode switch — never send it. Lint: react/no-unescaped-entities (CodeRabbit Minor): - Three apostrophes in JSX text escaped to ' (page.tsx ×2, run-checklist/index.tsx ×1). 0 errors remaining. orchestrate-manifest URL validation (CodeRabbit Nitpick): - Validate the `PR opened: <url>` href via new URL() and require http/https before rendering as an anchor; fall back to plain text on parse failure or weird scheme. Preset markdown H1 (CodeRabbit Minor, MD041): - architecture-review.md, de-slopify.md, engineering-review.md gain a top-level H1 to satisfy markdown lint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: PR #1 round-2 — Lineage parity, strict body validation, fence escape, Windows paths Addresses 4 MAJOR + 1 acknowledged issue from CodeRabbit's 2026-05-18 review batch: - voices.ts + db/voices.ts: extend Lineage enum with `openrouter`, `local`, `grok` so the route validators stop rejecting legitimate rows already supported by cli-precheck and the shim registry. Mirror in the DB schema (z.enum) and the VoiceUpsertInput union — they're three independent declarations of the same set, all need to track Lineage in agents/types.ts. - chats-from-pr.ts: tighten request-body validation. Truthiness checks let non-string truthy `url`/`templateId` (e.g. `{}` or `42`) slip through and fail deep inside parsePrUrl as opaque server errors instead of clean 400s. Added strict `typeof === "string" && trim().length > 0` plus optional yolo type check. - github-pr.ts: dynamic backtick fence around the diff body. Markdown/docs PRs frequently contain literal ``` fences; a fixed-width fence would close early and let the rest of the diff escape into the artifact prose, corrupting the prompt boundary for review-only chats. Now picks a fence one backtick longer than the longest run in the diff (min 3). - new/page.tsx: accept Windows absolute paths (`C:\repo`, `\\server\share`) alongside POSIX. The audit-a-repo tab was unusable on Windows because the UI hard-coded `startsWith("/")`, even though cli-detect / runtime-path / settings-transport already handle win32 server-side. Declined: CodeRabbit nitpick on formatBranchName (orchestrate.ts:124-133). chatId is a server-issued ULID (generateUlid in lib/db/chats.ts) — all- alphanumeric by construction — and branchPrefix already has a zod regex guard from commit e93ce00. No real injection vector. - pnpm exec tsc --noEmit — clean - pnpm exec vitest run tests/voices.test.ts tests/voices-route-validation.test.ts tests/github-pr.test.ts tests/db.test.ts — 99/99 passing Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: chorus-codes <info@chorus.codes> Co-authored-by: Julien Deudon <deudon.j@gmail.com> Co-authored-by: Lumina Mao <luminamao@mac.lan> Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-authored-by: Yura <yurahalych@gmail.com> Co-authored-by: Greg <7xshadowx7@gmail.com>
crypticpy
added a commit
to crypticpy/chorus
that referenced
this pull request
May 18, 2026
…iet (#6) * fix: cred detection + Claude MCP user-scope registration Three fixes from chorus-issues.md that prevent a freshly-installed chorus from finding the user's existing CLI credentials, so the daemon starts up cleanly on machines that already have Claude / Kimi / moonshot configured. #1: register Claude MCP at user scope. The chorus MCP entry now writes to the top-level `mcpServers` block in `~/.claude.json` (idempotent), and any stale chorus entry under the project-scoped `projects[homedir].mcpServers` is cleaned up. Previously the project-scoped registration was invisible to Claude Code launched outside that exact cwd. #2: cred-path fallbacks. When the anthropic file check misses (e.g. user authed via Claude Desktop, no `~/.claude/...` JSON), fall back to the macOS Keychain via `security find-generic-password -s "Claude Code-credentials"`. Added `~/.kimi/credentials/kimi-code.json` to the moonshot CRED_PATHS so users who authed through `kimi-code` aren't told to log in again. #3: kimi config-missing precheck. New layer-3 check parses `~/.kimi/config.toml` and surfaces a `config_missing` reason when there's no top-level `default_model` set — the CLI will silently pick whatever backend it likes, which is rarely what the user wants. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: reviewer fidelity, verdict surfacing, event/prompt isolation Seven fixes from chorus-issues.md covering the rest of the runner + MCP-surface issues found while reviewing PR #26 of foresight-app. #4: thread `repoPath` through reviewer subprocesses. `runReviewers` → `runReviewer` → `runReviewerHeadless` now accept the chat's repoPath and the reviewer's cwd switches to it when set, so `gh`, file reads, and sandboxed CLIs (Gemini) see the actual code instead of running in an empty per-reviewer scratch dir. #5: surface reviewer answer.md in MCP responses. New `readReviewerArtifacts` helper walks `~/.chorus/chats/<id>/round-N/reviewer-*/answer.md`, caps each at 16 KiB, sorts by (round desc, agent asc), and merges the result into `wait_for_chat` and `get_chat_status` payloads under `reviews`. Both the doer and reviewer `participant_done` events now carry `outputPath` so MCP clients can read the on-disk source of truth when they need more than the streamed tail. #6: bump phase_progress output tail from 500 B to 8 KiB. The 500-byte slice clipped reviewer summaries mid-word; full text remains on disk and is pointed to by `outputPath`. Affects both reviewer.ts and doer.ts. #7: tri-review verdict on `max_rounds_exhausted`. When the doer succeeded every round but reviewers kept saying request_changes through the round cap, chat_done now emits `status: completed, verdict: request_changes, reason: max_rounds_exhausted` with the last round's reviewer summary — previously misclassified as a generic doer failure. #8: refactor `CreateChatSchema` and `InvokePersonaSchema` to plain `z.object()` with per-field `.describe()`. The prior `.transform()` wrapped them in `ZodEffects` which strips the `properties` map from MCP introspection — clients saw an empty schema. Legacy `template` alias and the `code-review` default moved into a new `resolveTemplateId()` helper. #9: dedup `participant_done` at the multiplex layer. Same-slot fallbacks or parsers that emit `message_done` twice (the opencode parser historically does this) used to fan duplicate terminal events out to every subscriber; now keyed by `(phaseIdx, round, role, agent)` and later duplicates drop silently. #10: per-instance reviewer prompt isolation. Same-lineage instances (claude-code-2/4/5, etc.) share the chat dir tree at `~/.chorus/chats/<id>/round-N/reviewer-*/`; tool-using CLIs were wandering into a sibling's answer.md mid-flight and short-circuiting ("the review is complete" referring to a different agent's work). `buildReviewerAsk` now stamps an Independence directive when more than one reviewer slot exists, naming the slot tag and forbidding cross-slot reads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: replay chat_done from persisted verdict, not status The synthetic chat_done emitted when a terminal chat is re-attached derived `verdict` from `chat.status`, ignoring the `chat.verdict` column. Since the previous commit shipped the `max_rounds_exhausted` branch (chorus-issues.md #7), a chat can finish with `status='approved' verdict='request_changes'` — replay was clobbering that to `approved` on every page reload, hiding reviewer disagreement from the user. Use the persisted column when set; fall back to the old status-derived value only for pre-v0.8.27 rows where verdict is null. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: surface dropped attached_files + SSE backpressure; harden ship.ts Three audit follow-ups on the daemon side, all surfacing previously silent failures. attached_files: parseAttachedFiles in runner-multiplex.ts used to swallow JSON parse errors and run the chat with no attachments. Refactor to a tagged result (`empty` / `ok` / `invalid`); on `invalid` the runner logs and emits a `cli_warning` SSE so the cockpit + MCP clients see which chat lost its file list. SSE backpressure: when a subscriber's queue exceeds the 1000-line cap the multiplex used to silently drop the connection. Now writes one `error` frame with code `sse_backpressure` before close, and logs the queue length to daemon.log so an operator tailing logs can see when clients fall behind. gh pr create URL validation: ship.ts captured stdout's last line as the PR URL with no shape check; an empty/malformed stdout produced `{ok: true, prUrl: ''}` and the chat row recorded "shipped" with an unclickable link. Now matches against `^https://github.com/<owner>/<repo>/pull/<n>` before declaring success. detectGitContext parallelization: the five spawnSync probes (is-repo, remote, gh --version, gh auth, HEAD) ran sequentially at 60s each — worst case 360s before runner saw a result. Converted to async with a new `runAsync` helper, batched via Promise.all with a 15s per-probe cap; detectDefaultBranch's symref + three branch-existence checks likewise parallelized. detectGitContext is now async; the lone caller in runner.ts awaits it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: bound failure-summary regex; log malformed SSE frames participant-card.tsx: parseFailureSummary ran the multi-step regex chain over the full participant.answer string. Reviewer answers can be up to 256 KB; on every render that's a UI-thread block. Slice to the first 16 KiB before scanning — the failure-header block is always written at the top of answer.md by reviewer.ts/doer.ts, so the cap never loses signal. live-run-real/index.tsx: the SSE onmessage handler already had a try/catch around JSON.parse, but the catch was silent — a wire-format mismatch dropped events with no trace. Add a console.warn with a preview so devs notice schema drift in DevTools. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: github PR ingestion via gh CLI Adds src/daemon/github-pr.ts: parsePrUrl + fetchPrArtifact run gh pr view/diff plus review and issue comments in parallel, synthesize a Markdown artifact (description, comments capped at 50 newest each, diff capped at 200 KB UTF-8 safe), and classify gh failures into typed reasons. Exports runAsync from ship.ts so the new module can reuse the existing spawn+timeout helper instead of duplicating it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: extract createChatFromValidatedInputs helper Pulls the template lookup, artifact validation, chat row + opening phase event creation, and runner kickoff out of the POST /chats handler into a reusable helper. POST /chats now only handles its route-specific concerns (body shape, repoPath canonicalization, error response shaping). Sets up reuse from the upcoming POST /chats/from-pr endpoint without duplicating ~150 lines of validation logic. No behavior change — same template checks, same artifact rules, same kickoff path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: POST /chats/from-pr — start a chat from a GitHub PR URL Accepts { url, templateId, repoPath?, yolo? }, parses the PR URL, fetches PR meta + diff + existing comments via gh CLI, synthesizes a Markdown artifact, and creates the chat through the shared createChatFromValidatedInputs helper. gh failures map to typed reasons (invalid_url, gh_not_installed, gh_not_authed, pr_not_found, network_failure, unknown) so the cockpit can render actionable errors instead of generic 500s. Adds tests/github-pr.test.ts covering parsePrUrl edge cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: cockpit "GitHub PR" tab on /new Adds a Free-form / GitHub PR mode toggle on the new-chat page. PR mode swaps the prompt textarea for a URL input and routes through the new POST /chats/from-pr endpoint. Validates client-side that the chosen template is review-only before letting the user submit. createChatFromPr API client surfaces the daemon's typed PR meta (owner/repo/number/title/branches) on the response so callers can display PR context after the chat is created. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: review_pr MCP tool Exposes POST /chats/from-pr through MCP. Orchestrators (Claude Code, Codex, Cursor) can now hand chorus a PR URL and get reviewers running against it without going through the cockpit. Defaults templateId to review-only so a caller can pass just a URL. ReviewPrSchema is a plain z.object (not ZodEffects) so MCP clients can introspect required fields — same hazard documented on CreateChatSchema and InvokePersonaSchema. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: capture multi-identity CLI follow-up idea Idea note for running chorus against multiple paid accounts on the same CLI binary (work + personal Claude Code Max, etc.). Filed as follow-up after audit-presets + quota tiers ship — captures the env-override mechanism, proposed Identity primitive, and open questions on keychain CLIs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: schema for audit + orchestrate phases, voice tier, bypass_quota Adds the foundation for repo-pointed audit-and-orchestrate runs and the orchestrator's task↔voice tier matching. Template schema: - AuditPhase (kind: 'audit') — single reviewer voice + one of five preset lenses (de-slopify, monolith-breakdown, code-review, engineering-review, architecture-review). Output schema (AuditItemSchema, AuditOutputSchema) lives next to the phase shape so the structured-output adapter, scheduler, and cockpit checklist agree on the contract. - OrchestratePhase (kind: 'orchestrate') — array of worker voices, default branchPrefix `chorus/{chatId}/worker-{idx}` so each worker gets isolated git state. - templateRequiresRepo() helper for the cockpit's repo-picker gate. Voices: - Adds tier ('high' | 'medium' | 'low', default 'medium') and monthly_budget_usd (nullable) to the row schema, upsert input, and update input. Idempotent migrations on existing DBs. Chats: - bypass_quota INTEGER NOT NULL DEFAULT 0 — set on PR-review chats so the orchestrate scheduler runs every enabled voice at full capacity instead of tier-gating. Runner is stubbed for the new kinds: phase_done emit + continue, so templates that declare an audit/orchestrate phase before the runner logic lands don't crash. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: structured-output adapter for CLI voices Wraps an AgentShim's runHeadless with JSON-formatting prompt scaffold and a one-shot repair loop, returning typed data validated against a caller-supplied zod schema. Used by the upcoming audit phase (which needs typed AuditItem[] instead of free-form prose) and the orchestrate phase (worker results). Keeps each CLI lineage's existing headless transport — the adapter just owns the prompt-shape + parse-and-validate dance. Extraction strategy: prefer direct JSON.parse of finalText; fall back through fenced-block regex variants to a brace-to-brace slice. On parse or schema-violation, retry once with a repair prompt that quotes the validation error. Spawn errors short-circuit (the model never saw the prompt — repair would just retry the same failure). Tests cover happy path, fenced-block extraction, repair-loop success, repair-loop exhaustion, schema violation, and spawn error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cockpit): audit-a-repo tab + checklist approval component /new gets a third tab beside Free-form and GitHub PR. In audit mode the user picks one of five preset lenses (de-slopify, monolith-breakdown, code-review, engineering-review, architecture-review) and supplies an absolute repo path. Submit fires createChat with templateId=`audit-<preset>` — those built-in templates land with the audit-phase implementation. RunChecklist component lives at src/components/run-checklist/. It takes the AuditItem[] surfaced by the audit phase's blocking event and renders one row per item with a checkbox, complexity badge, rationale, and file list. Default state has every item selected; the user trims, then submits via the parent's onSubmit which JSON-encodes the selected ids into the existing /chats/:id/resume `answer` field. Wiring into the live-run UI lands with the audit phase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: PR-review chats bypass quota + tier surface on /voices PR-review chats automatically set bypass_quota=true so the orchestrate scheduler ignores voice.tier and runs the full fleet at maximum capacity — reviews are short, parallel, and the user wants the strongest opinion possible regardless of model tier. PUT /voices/:id now accepts tier ('high' | 'medium' | 'low') and monthly_budget_usd (non-negative or null), so the cockpit fleet page can label voices by capability for the orchestrate scheduler to route work against. Tests cover both new fields plus a chat round-trip asserting bypass_quota defaults false and persists when set. * feat: audit phase + 5 presets + audit-* templates Wires the audit phase end-to-end: - src/daemon/phases/audit.ts runs the structured-output adapter against the chosen preset, persists the parsed AuditItem[] to <chatDir>/audit-output.json plus raw model output to round-1/audit/output.md, and emits phase_progress with the items. - src/daemon/runner.ts replaces the audit/orchestrate stub: audit invokes runAuditPhase, flips chat status to blocked so the cockpit renders the checklist UI, and exits cleanly. Orchestrate keeps the no-op stub until step 5 lands. - 5 preset prompts (de-slopify, monolith-breakdown, code-review, engineering-review, architecture-review) frame what each lens looks for. The structured-output adapter handles JSON formatting; presets describe the audit lens only. - 5 audit-* templates (one per preset), each a 2-phase audit -> orchestrate shape with three default workers. Auto-loaded by seedBuiltinTemplates. - tests/audit-phase.test.ts covers preset-file presence and the audit-* template parse + shape contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: orchestrate phase + audit-resume wiring + tier-aware scheduler Wires the audit→orchestrate handoff: the cockpit POSTs the user's trimmed audit checklist to /chats/:id/resume, the resume handler cross-checks ids against audit-output.json, persists the selection, flips chat to drafting on the orchestrate phase, and re-fires the runner. The runner now starts at chat.current_phase_idx so a resumed chat lands directly on orchestrate. The new orchestrate phase walks the approved AuditItem[] sequentially (parallelism is an explicit non-goal for v1), picks a worker per item via the pure tier-aware scheduler, cuts a per-item branch, dispatches the worker via shim.runHeadless, captures git diff --stat, and persists orchestrate-manifest.json for the diff-apply UI to consume. The scheduler is a pure function with 9 unit tests covering tier matching, bypass override, disabled-voice skipping, empty pool, and unknown voice ids. Resume route has 10 tests exercising body validation, id cross-check, status gating, and the happy path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: orchestrate manifest UI + checkout/open-pr daemon routes - Run page reads audit-output.json + orchestrate-manifest.json on render - LiveRunReal renders RunChecklist while blocked w/ audit items, then swaps to OrchestrateManifest panel once orchestrate completes - New OrchestrateManifest component shows one row per worker w/ Checkout / Open PR buttons (per-row inline feedback, no global toast) - Daemon: GET /chats/:id/audit-items, GET /chats/:id/orchestrate-manifest, POST /chats/:id/workers/:idx/checkout (refuses on dirty tree), POST /chats/:id/workers/:idx/open-pr (gh pr create, bucketed failures) - OrchestrateManifestSchema added to template-schema.ts; route + UI parse via the same shape Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: harden resume race + branch validation + symlink TOCTOU + extractJson Address /freview findings on the audit + orchestrate flow: - Resume race (BLOCKER): two concurrent POSTs to /chats/:id/resume could both pass the `status=='blocked'` check and double-fire the runner. Guard with `getActiveRun` (catches the audit-finishing window before `.finally` clears the registry) and replace the status flip with an atomic `tryResumeFromBlocked` CAS conditional on `WHERE status = 'blocked'`. - Branch-name argument injection (BLOCKER): tighten zod regexes on `OrchestratePhase.branchPrefix` and `OrchestrateManifestEntry.branch` so values starting with `-` (or containing shell metachars) cannot flow into `git checkout` / `gh pr create` as flags. - Symlink TOCTOU on checkout + open-pr (NON-BLOCKER): re-realpath `existing.repo_path` before passing to execFile cwd, mirroring the rerun-path pattern. Returns a structured validation error if the path no longer resolves. - extractJson Path 4 (NON-BLOCKER): try `{...}` and `[...]` slices independently and prefer the longer parse, so prose like "mentions [stuff] before {object}" extracts the object instead of the bracket. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: prod CJS build — drop import.meta + copy presets to dist Two issues blocked `pnpm build:server`: - `audit.ts` used `import.meta.url` for module-relative path resolution, but the server tsconfig compiles to CJS where `import.meta` is a syntax error. Replaced with `__dirname`, which works in both the compiled dist (native CJS) and tsx-driven dev (tsx ≥4 shims it in ESM mode). - The `build:server` script copied `schema.sql` to dist/ but missed the preset markdown files in `src/daemon/presets/`. The audit phase's `loadPresetPrompt` resolves relative to `__dirname`, so a published install was hitting ENOENT on every audit run. Extended the copy step to mirror the preset directory. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: fold upstream T1+T2 fixes back into fork (12 commits) (#2) * feat(cli): add diagnose command + crash-hook Bundles two upstream changes that ship a self-service triage path for chorus users hitting opaque failures: - `chorus diagnose` walks the install, daemon, recent failed chats, voice health, and produces a sharable bug report. - Crash hook captures uncaught exceptions in the CLI and writes them to a crash log alongside instructions to attach during a bug report. Folded back from upstream chorus-codes/chorus: 7ea712b feat: chorus diagnose command + crash hook for bug reports (#1) 4a5ea20 fix(diagnose): realpath bin path + filter Next.js SSE noise (#4) Co-Authored-By: chorus-codes <info@chorus.codes> * feat(cli): add quickstart self-test command `chorus quickstart` runs a 30-second activation flow that verifies the daemon comes up, the SQLite DB initializes, and a minimal chat round-trips end-to-end. Aimed at first-run users who want to know "is this thing actually working" before authoring a template. Folded back from upstream chorus-codes/chorus: 56610cf feat(cli): chorus quickstart — 30-second activation self-test (#30) Co-Authored-By: chorus-codes <info@chorus.codes> * fix(cli): use dynamic import for open package (Node 22 ERR_REQUIRE_ESM) The `open` package and `chokidar` are both ESM-only as of recent versions. On Node 22 (the daily-driver target) static `require()` calls into them throw ERR_REQUIRE_ESM and crash the CLI at boot. Switch to dynamic import in: - src/cli/commands/start.ts (open browser after boot) - src/cli/open-browser.ts (new helper) - src/cli/index.ts (route open import) - src/daemon/output-watcher.ts (chokidar file watch) Includes upstream's post-merge hardening: the setTimeout that triggers the browser-open no longer wraps an async callback bare, so a missing default browser doesn't surface as an unhandled rejection. Folded back from upstream chorus-codes/chorus: e8ca2ee fix(cli): dynamic import for open package (#14) dcd1837 fix: post-merge hardening for #14 (start.ts portion only; cli-precheck.test.ts portion ships with the Keychain fix) Co-Authored-By: Julien Deudon <deudon.j@gmail.com> Co-Authored-By: chorus-codes <info@chorus.codes> * feat(cockpit): seed empty round-1 so QUEUED renders from t=0 Before: when a chat starts but no reviewer has produced an event yet, enrichRounds returned an empty rounds array and the live-run page showed nothing for several seconds — the user couldn't tell whether their chat had launched. After: seed a synthetic round-1 with QUEUED placeholders for every expected participant so the page renders the per-reviewer cards immediately. Real events overwrite placeholders as they arrive. Folded back from upstream chorus-codes/chorus: 53e8fb6 feat(cockpit): seed empty round-1 so QUEUED placeholders render from t=0 (#2) Co-Authored-By: chorus-codes <info@chorus.codes> * feat(daemon): runtime fallback-collision dedup across reviewer slots When two reviewer slots both fall through their per-slot chains to the same template-level fallback target (common case: every slot ends in anthropic/claude-sonnet-4-6), both used to dispatch the same (lineage, model) in parallel — wasted cost and the lineage diversity that's the point of multi-LLM peer review collapsed. Build-time dedup (template-fallback.ts) couldn't catch it because each slot only knows about other slots' PRIMARIES, not their fallback chains. Fix: new per-chat/per-round (lineage, model) registry. reviewer-driver tryClaim's before each chain attempt and releases in a finally. On collision, return null + emit cli_warning(reason='fallback_collision') so runWithChainFallback advances to the next entry and the cockpit can show why the slot skipped. Ported into fork's reviewer-driver.ts surgically so the verdict-isolation refactor (2a2cde2) and per-slot repoPath threading stay intact. Folded back from upstream chorus-codes/chorus: c4751fe feat(daemon): runtime fallback-collision dedup (#3) Co-Authored-By: chorus-codes <info@chorus.codes> * fix(daemon): write REVIEWER FAILED summary on pre-spawn failure Before: when a reviewer's precheck fails (e.g. underlying CLI not installed) or the chat is cancelled while the slot is queued for a CLI semaphore slot, runReviewer used to return null silently — leaving NO on-disk participant directory. The cockpit's enrich-rounds loop then couldn't reconcile the synthesised template slot against any real participant, so the card sat at "Queued — waiting for an open slot." forever and the actual error was invisible. Reproduction: install chorus on a host with only one CLI on PATH (e.g. just claude-code), open a template that includes lineages requiring codex/gemini/kimi, fire it. Every reviewer card stayed "Queued" — chat never visibly progressed even though it was already done failing. Fix: - Create the reviewer dir BEFORE the precheck runs. - Add a writePreSpawnFailure helper that writes a `## REVIEWER FAILED` summary in the canonical format (Kind / Lineage / Model / message) that the cockpit's `parseFailureSummary` already understands. - Wire it into the precheck-failed and cancelled-while-queued paths. Card now transitions out of pending and shows the actual error (cli_missing, cancelled, ...). Folded back from upstream chorus-codes/chorus: afc59cc fix(daemon): REVIEWER FAILED summary on pre-spawn failure (#26) Co-Authored-By: chorus-codes <info@chorus.codes> * feat(voices): auto-disable on persistent quota_exhausted + lsof timeout Real pain (upstream #11): a Pro Gemini model on a Flash-only account fails every chorus run with "exhausted your capacity on this model" — but Gemini doesn't return a resetAt because the model isn't going to become available for that account. Without auto-disable, the runner keeps picking the dead voice on every chat and the user keeps seeing the same opaque error. Voice auto-disable: - New src/lib/voice-failure-tracker.ts records per-voice consecutive quota_exhausted strikes in a settings counter. - Trigger: 2 consecutive strikes WITH no resetAt → set voices.enabled=false + disabled_reason='auto_quota'. - Counter resets on participant_done success; rate-limit strikes (hasResetAt=true) bypass the counter entirely so a transient 429 + a later permanent failure can't trip the threshold on the first permanent strike. - Wired into reviewer-driver alongside recordHealth; emits a cli_warning(reason='voice_auto_disabled') so the cockpit can show a one-line explanation. - VoiceDisabledReason union gains 'auto_quota' (schema column was already TEXT — no migration). Lsof timeout (upstream #12): - findPidsOnPort and findPidsOnPortWithSudo now bound execSync / execFileSync to 3s, so a slow-but-functional lsof on a loaded macOS box doesn't hang chorus boot. 3s leaves headroom while still bounding the hang case. Ported into fork's reviewer-driver.ts tmux pollHandle + success path. voices.ts disabled_reason union extended alongside fork's voice-tier column. Folded back from upstream chorus-codes/chorus: 4f6becc v0.8.30 — voice auto-disable (#11) + lsof timeout (#12) (#17) Co-Authored-By: chorus-codes <info@chorus.codes> Co-Authored-By: Lumina Mao <luminamao@mac.lan> * fix(daemon, schema): codex isolation + template-schema validation Two issues caused chats to fail opaquely at run-start: CODEX ISOLATION (#10, #16) The user's ~/.codex/config.toml may declare MCP servers, plugins, or notification hooks. In headless `codex exec` those integrations have caused codex to hang or cancel mid-call — two independent reproductions: codex as our reviewer (#10) and codex as MCP client of chorus (#16). Add --ignore-user-config to every headless codex argv. Extracted to a pure `buildHeadlessArgs(opts)` so the argv shape is unit-testable. TEMPLATE VALIDATION (#15) `reviewer.require > candidates.length` used to surface as "Job moves immediately to failure upon Start press" — the runner queued, failed to grant enough slots, and emitted an opaque chat-failure. Same for `require > distinct lineages` when crossLineage:true. Both now caught at TemplateSchema.parse() time with a clear error message the user can fix before the run starts. ReviewerSchema.superRefine() additions slot in cleanly alongside the fork's audit/orchestrate phase schema work — both are additive constraints on the same ReviewerSchema object. Folded back from upstream chorus-codes/chorus: 8ed970b fix(daemon, schema): codex isolation + template validation Co-Authored-By: chorus-codes <info@chorus.codes> * fix(runner): honour iterate.onDisagreement accept-doer/escalate The template schema, cockpit dialog, and SPEC-D-templates have always exposed three values for iterate.onDisagreement — 'continue', 'escalate', 'accept-doer' — but the runner only honoured 'continue'. Picking the other two from the cockpit form was a silent no-op: chats fell through to phase_failed with 'doer_failed_all_rounds' regardless. This wires both new branches into the round loop and the terminal chat_done emission: - 'accept-doer': after maxRounds without consensus, mark doerSucceeded and continue. The chat carries on (subsequent phases, ship, approval) as if reviewers had agreed on the doer's last answer. - 'escalate': halt with status='failed' but verdict='request_changes' and error='escalated_on_disagreement', so cockpits can render "reviewers disagreed, needs human" distinctly from "doer broke." Policy table extracted into a pure decidePhaseOutcome() helper so the 3 × 2 input matrix (policy × disagreement-in-last-round) is unit-tested without standing up the full runChat scaffold. Gated on disagreementInLastRound (reset at top of every round + on doer-crash path) so a partial / empty doer answer can never be silently "accept-doer"'d as final. Preserves the fork's existing standardPhaseRoundsExhausted #7 surfacing for the 'continue' path; the 'escalate' path takes precedence with its own distinct chat_done. Upstream PRs #49, #50 (commit 67572e9). Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(cli-precheck): cover macOS Keychain fallback for Claude Code v2 The fork already implements the Keychain fallback in cli-precheck (hasDarwinKeychainEntry). This adds the missing test coverage: - passes when no cred file but keychain entry exists - blocks when no cred file and no keychain entry - skips keychain check when cred file exists (fast-path preserved) - does not consult keychain for non-anthropic lineages vi.mock('node:child_process') uses the importOriginal spread pattern so spawn / exec / etc. keep their real implementations — a bare module replacement would silently break any sibling test that imports from child_process. Upstream PRs #7, #8, plus the dcd1837 test-mock hardening. Co-Authored-By: Yura <yurahalych@gmail.com> Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cockpit): derive candidatesWithModels from snapshot's candidates field Daemon-side TemplateSchema only carries `candidates` on each ReviewerRule. The cockpit Template type expects `candidatesWithModels` populated — enrich-rounds iterates that field to build slot→model mappings for run-page cards. When fromRow parsed template_snapshot and cast it to Template, the cast was a TypeScript lie: at runtime the parsed object lacked candidatesWithModels, enrichRounds iterated zero reviewer slots, and no model name reached the cards (badge appeared empty). Derive candidatesWithModels at the parse seam (chats.fromRow) so the cockpit's Template contract is honoured regardless of which path produced the data. Idempotent — if a future daemon ever serialises the field directly, that wins. Persona forwarded if present. Audit- phase single-voice reviewers (no candidates array) are skipped via a runtime narrow. Upstream PR #6 (chorus-codes/chorus@ac0c7fd). Co-Authored-By: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(diagnose): capture failure context — CLI smoke, voice health, recent failed chats Extends `chorus diagnose` with three signals that triage the most common breakage modes: - **CLI smoke**: spawn `<bin> --version` per detected CLI with a hard 2s SIGKILL timeout (wrapper scripts may trap SIGTERM). Distinguishes `timedOut` from non-zero exit so the report can tell hangs apart from crashes. - **Voice health**: counts `enabled=0` voices grouped by `disabled_reason` ('user' vs 'auto_missing' vs 'quota_exhausted'). Added `idx_voices_enabled` so the `WHERE enabled = 0` scan stays cheap as the table grows. - **Recent failed chats**: last 5 chats with `status='blocked'` plus the errored participants pulled from `~/.chorus/chats/<id>/round-*/<part>/_attempts.jsonl`. Only `errorMessageBytes` is exposed — raw error text never leaves the user's machine. `$HOME` is redacted from any embedded path strings via `redactHomePaths`. Adapted from upstream chorus-codes/chorus#19 (0666dca). Preserves the fork's existing diagnose shape and adds tests for smokeOneCli / readLatestAttempt / formatReport rendering of the three new sections. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(diagnose): include no_review in recent failed chats query The recent-failed-chats section was meant to surface per-participant failure context from `_attempts.jsonl`, but the WHERE clause only covered 'failed', 'blocked', 'cancelled'. The most common failure shape — every reviewer down for missing CLI / auth / quota — ends the chat in 'no_review', which was being silently filtered out. So the exact case the section exists to diagnose returned an empty list, forcing users back into manual log collection. Adds 'no_review' to the IN-list and a regression test that asserts both the status and a quota_exhausted errorKind render in the report. Addresses chatgpt-codex review P2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: chorus-codes <info@chorus.codes> Co-authored-by: Julien Deudon <deudon.j@gmail.com> Co-authored-by: Lumina Mao <luminamao@mac.lan> Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Yura <yurahalych@gmail.com> * feat: fold upstream Grok + Local LLM + Keychain dual-probe (4 commits) (#3) * feat(grok): detect Grok Build (xAI) + Level 1 orchestrator Adds Grok Build CLI to detection, onboarding picker, /connect card, diagnose smoke, init listing, and doctor labels. Grok auto-picks chorus MCP from ~/.claude.json (verified empirically via `grok inspect`) — no separate MCP wire needed. The grok orchestrator reports connected=true when both the binary is detected AND chorus is wired in ~/.claude.json (either top-level mcpServers or any project-scoped mcpServers entry). connect() is a no-op that points users at `chorus connect claude` if claude hasn't been wired yet. Quickstart filters CLIs to those with shims, so grok-cli being detected first no longer breaks the doer-pick flow. The cliToLineage map remains the source of truth for reviewer-capable CLIs. `docs/integrating-a-new-cli.md` captures the full Level 1/2/3 integration playbook for future CLIs — written while doing this so the steps are tested. Adapted from upstream chorus-codes/chorus#44 (6a00b00). No conflicts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(local): add Local LLM HTTP shim for OpenAI-compatible endpoints Adds a `local` lineage that dispatches chat completions to any OpenAI-compatible HTTP endpoint (Ollama, llama-swap, LM Studio, vLLM, or anything that speaks `/v1/chat/completions`). No external subscription or CLI binary required — only a running local inference server. Configuration: save a JSON secret under key `local` via Settings → Local LLM: {"base_url": "http://127.0.0.1:11434/v1", "api_key": ""} Model ids may use a `local:` prefix (e.g. `local:llama3`) which the shim strips before dispatch, or bare model names directly. When no secret is saved, falls back to Ollama's default port. Wiring sweep (extends every exhaustive enum / Record so templates can declare local voices without Zod errors): - src/daemon/agents/local.ts — new HTTP shim with JSON.parse guard on the secret (yields a typed `config_parse` error event for malformed secrets instead of throwing inside the generator) - src/daemon/agents/index.ts — register localShim, `local:` prefix routing in pickShimForVoice, add to isHttpDispatchedShim - src/daemon/agents/types.ts — `local` in Lineage - src/lib/template-schema.ts — `local` in both lineageEnum + reviewerLineageEnum - src/lib/cli-health.ts — `local` in CliLineage + ALL_LINEAGES - src/lib/cli-precheck.ts — empty CRED_PATHS, LOGIN_HINT, skip the file probe (same pattern as openrouter — auth lives in secrets table) - src/lib/cockpit-types.ts — `local` in ReviewerLineage - src/lib/lineage-maps.ts — `local` in DaemonLineage, UILineage, every label/dot/brand map; UI_LINEAGE_DEFAULT_MODEL[local] = "" (model IDs are endpoint-specific). Teal dot distinguishes local from openrouter's cyan - src/components/phase-editor/constants.ts — LINEAGES list, DAEMON_TO_COCKPIT_LINEAGE - src/components/template-dialog/constants.ts — COCKPIT_TO_DAEMON, DAEMON_TO_COCKPIT, DAEMON_DEFAULT_MODEL, FALLBACK_LINEAGES Adapted from upstream chorus-codes/chorus#41 (716fa3a). The bundled upstream commit also included Keychain dual-probe (#38) and fallback-registry hold-on-success (#42) — those land in follow-up commits in this PR so each concern is reviewable independently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Greg <7xshadowx7@gmail.com> Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com> * feat(grok): Level 3 shim — full reviewer dispatch (happy-path unverified) Promotes Grok Build from Level 2 (consumer-only) to Level 3 (full reviewer shim). Chorus can now dispatch to grok-build as a doer or reviewer in any template. What's verified (empirically): - Detection, headless-mode invocation pattern (`grok -p ... --output-format streaming-json --yolo --max-turns 1`), error event shape, exit-code semantics - Failure path: free-tier auth produces clean quota_exhausted (SuperGrok Heavy subscription required) → voice auto-disables after N strikes - All UI surfaces (model boxes, template-editor lineage picker, run-page participant card, cli-status-panel, onboarding picker, connect orchestrator) What's specced but not run live (needs SuperGrok Heavy): - Happy-path streaming-json text/end event parsing (followed `~/.grok/docs/user-guide/13-headless-mode.md` spec) - Token/cost accounting — Grok doesn't surface usage in end event; estimateCostUsd returns 0 New files: - src/daemon/agents/grok.ts — shim with `--max-turns 1` headless args - src/daemon/agents/parsers/grok.ts — streaming-json + stderr parser - tests/grok-parser.test.ts — 18 cases covering happy / error / robustness Lineage sweep (xai daemon lineage was already a legacy alias to opencode — uses fresh `grok` daemon lineage to avoid colliding with that mapping; old YAML with `lineage:xai` still routes to opencode): - Lineage / CliLineage / ReviewerLineage / DaemonLineage / UILineage - LINEAGE_LABEL / LINEAGE_DOT / UI_LINEAGE_* / UI_LINEAGE_BRAND - UI_LINEAGE_AVAILABLE_MODELS.grok = ['grok-build'] - UI_LINEAGE_DEFAULT_MODEL.grok = 'grok-build' - template-schema lineageEnum + reviewerLineageEnum - DB voices row schema (additive — old rows still validate) - phase-editor LINEAGES + DAEMON_TO_COCKPIT_LINEAGE - template-dialog COCKPIT_TO_DAEMON + DAEMON_TO_COCKPIT + DAEMON_DEFAULT_MODEL + FALLBACK_LINEAGES - cli-status-panel + live-run-real helpers - error-detector auth-prompt regex (SuperGrok signature on its own branch ABOVE the generic auth regex — classifies to quota_exhausted, not auth_invalid) Voice seeding: grok-cli registered in SINGLE_MODEL_CLIS — auto- creates the grok-cli voice (id=grok-cli, lineage=grok, model_id=grok-build) on first daemon boot when the binary is detected. Auth flow: ~/.grok/auth.json file probe OR GROK_CODE_XAI_API_KEY env short-circuit. Both verified in tests/cli-precheck.test.ts. Daemon won't spawn grok without one or the other present — prevents the browser-OAuth flow from hanging headless dispatch. Total tests: 821 → 842 (+21). Adapted from upstream chorus-codes/chorus#46 (f9dfba5). Conflicts resolved by taking the union of fork's `local`-extended enums and upstream's `grok`-extended enums (every Record / z.enum had to be extended in both dimensions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com> * fix(cli-precheck): macOS Keychain dual-probe — also check "Claude Code" service Claude Code v2.x stores OAuth credentials under two service names depending on the auth flow: - `Claude Code-credentials` — Pro/Max OAuth via `claude login` - `Claude Code` (no suffix) — API-key auth + some Console-account flows The previous single-service probe regressed to auth_missing for API-key users on darwin. Refactor hasDarwinKeychainEntry to accept string | string[], iterate candidates, short-circuit on first match. Each probe stays bounded to 1.5s so a misconfigured keychain can't stall every spawn. Refs upstream issue #38 / commit 716fa3a. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: PR review — local in voices enum, AGENT_TO_LINEAGE for grok/local, separate cred-precheck vs semaphore bypass Addresses bot review on PR #3: - Sourcery P2 (src/lib/db/voices.ts): VoiceRowSchema and VoiceUpsertInput only allowed `grok` in the new-lineage slot; `local` voices upserted via the (future) Local LLM connect flow would have failed Zod validation at runtime. Add `local` to both the enum and the union. - Codex P2 (src/app/api/run-artifacts/[chatId]/route.ts + src/app/runs/[runId]/page.tsx): AGENT_TO_LINEAGE did not map `grok-cli` → `grok` nor `local` → `local`, so a real Grok or Local participant directory (`reviewer-grok-cli-N`, `reviewer-local-N`) resolved to a bogus lineage and rendered as an unbranded extra card while the placeholder slot stayed pending. - Codex P2 (src/daemon/agents/index.ts + src/daemon/runner/{doer,reviewer}-driver.ts + src/lib/settings/concurrency.ts): the daemon used a single predicate `isHttpDispatchedShim` for two unrelated decisions — bypassing the CLI-credential precheck AND bypassing the local-CLI semaphore. That was safe for OpenRouter (truly remote) but wrong for the Local LLM shim, whose default endpoint is Ollama on 127.0.0.1: N concurrent reviewers + a doer can thrash VRAM/RAM on consumer hardware. Split into `isHttpDispatchedShim` (kept for cred-precheck bypass) and `bypassesLocalCliSemaphore` (only openrouter). Add `grok-cli` and `local` to CLI_LINEAGES with conservative per-CLI defaults (grok-cli matches gemini at 2; local defaults to 1, bump in /settings if your endpoint multiplexes). Tests: 845 pass (unchanged), typecheck clean. * fix: PR review — CodeRabbit pass (docs/Grok level, init+quickstart+local edges, regex, tests) Addresses CodeRabbit's first batch of review comments on PR #3: - docs/integrating-a-new-cli.md: contradictory level for Grok — line 3 said "detection-only", line 15 said level 2, line 302 said level 3. Normalize to level-3 (the shim ships in this PR) and note that the level-2 orchestrator coexists for the consumer-side wiring. - src/cli/commands/init.ts: `--connect grok` was rejected because the local Name union, ALL_NAMES list, and the `--connect` option help text omitted 'grok' even though detection labels and OrchestratorName already accepted it. Adding 'grok' to all three. - src/cli/commands/quickstart.ts: the "install one of …" guidance printed when no CLIs are detected still listed only 5 — extend to Grok CLI to match the dispatchable set. - src/daemon/agents/local.ts: * Empty `base_url` (e.g. user saved settings with an empty box) was passed through `??` as the URL and surfaced as an opaque fetch error; treat empty / whitespace-only as unset and fall back to DEFAULT_BASE. Strip trailing slashes while at it. * Trailing SSE payload was dropped when the server closed without a final blank-line delimiter (older Ollama, some vLLM configs) — the last text_delta could silently disappear, truncating answers. Extract event-dispatch + payload-extract into local helpers and flush the residual buffer after the read loop exits. - src/lib/cli-detect.ts: grok regex documented "name OR bare-version" but only matched the name. Add the bare-version alternative; the basename guard already prevents cross-vendor matches. - tests/grok-parser.test.ts: 4 cases narrowed event[0] under `if (events[0].type === 'error')` without a prior `expect(...).toBe` on type — a non-error event silently skipped the inner assertions. Add explicit type expectations before the narrowing. Tests: 845 pass (unchanged), typecheck clean. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Greg <7xshadowx7@gmail.com> Co-authored-by: chorus-codes <280607145+chorus-codes@users.noreply.github.com> * feat: fold upstream contributor stack — repoPath default + CRLF persona parser (#4) Folds the cross-platform pieces of upstream commit 781bc42 ("Contributor stack: claude orchestrator + repoPath + Windows spawn (#39)") into the fork, intentionally omitting Windows-specific hunks. Included: - src/mcp/tools.ts: add safeCwd() helper + default `repoPath` on create_chat to safeCwd() when caller omits it. Previously the daemon fell back to its own cwd (packageRoot), which caused relative file paths in `files: [...]` to silently resolve to the chorus install dir and miss. MCP servers spawned by Claude Code / Codex / Gemini inherit the host's cwd (= the user's project), so safeCwd() lands at the right path automatically. safeCwd() also catches ENOENT from process.cwd() and falls back to homedir. - src/lib/personas.ts: normalize CRLF → LF in the frontmatter parser so persona .md files checked out with Windows line endings don't fail `missing YAML frontmatter`. Cross-platform safe. - src/daemon/orchestrators/index.ts: drop stale comment block about Claude having a project-config side-effect (the fork's orchestrator long since moved to user-scope). - tests/mcp-create-chat-repo-path.test.ts (+4 tests): cover explicit repoPath, cwd default, full-body forwarding, and ENOENT fallback to homedir. Omitted (Windows-only hunks): - src/cli/commands/update.ts (shell: win32 for npm self-update) - src/daemon/routes/system.ts (shell: win32 for opencode probe) - src/daemon/orchestrators/{codex,gemini,kimi}.ts (shell: win32 tweaks) - src/lib/cli-detect.ts (SAFE_WIN_PATH regex + buildVersionSpawn) - src/lib/voices.ts (discoverNpmPrefixes Windows shell) - tests/cli-detect.test.ts (Windows-specific cmd.exe escape tests) Also omitted: - src/daemon/orchestrators/claude.ts: upstream shells out to `claude mcp add --scope user`. Fork already implements user-scope registration via direct ~/.claude.json patch (more robust — no dependency on `claude` binary in PATH at registration time, plus sweeps stale project-scoped entries). Keeping fork's version. - tests/claude-orchestrator.test.ts: tests the upstream shell-out approach the fork doesn't use. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: pr-babysit design sketch (judge workflow + state machine) Three-phase delivery plan for moving the PR babysitter loop out of Claude Code and into the chorus daemon. Covers GH App + webhook architecture, the judge phase (validity/category/confidence + shadow judge pattern), fix routing rules (trivial/targeted/architectural → Kimi/Sonnet/Opus), circuit breakers, merge gate, multi-PR coordination, and proposed DB schema. Design only — no code in this commit. Five open questions left for team decisions in §"Open questions for the team". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: prime doer/reviewer prompts with AGENTS.md + CLAUDE.md When a chat carries a repoPath, read AGENTS.md / CLAUDE.md from the repo and prepend them inside a <project_guidelines> fence (between the persona block and the phase header). Same TOCTOU + fence-breakout defences as the persona/attached-file readers: lstat-rejects symlinks, strips </project_guidelines> from contents, truncates each file at 16 KB with a visible marker. Lets users carry project conventions into every doer + reviewer turn by editing a file the rest of their AI tooling already reads, without adding a new chorus-specific storage layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: verify phase — exec package.json chorus.verify, judge with reviewer Splits verify out of the StandardPhase shape into its own VerifyPhase (no doer, reviewer required). Reads `chorus.verify` from package.json, runs it via execFile in repoPath with a configurable command timeout (default 5 min, max 30 min), captures stdout/stderr/exit, and feeds the fenced artifact through the existing runReviewers flow. Env is scrubbed to PATH/HOME/LANG/LC_ALL/NODE_ENV so a `chorus.verify` script can't leak inherited credentials into the artifact. Output streams cap at 64 KB each with a visible truncation marker. Timeout detection matches both ETIMEDOUT and (killed && SIGTERM) shapes — node sometimes only sets the signal. The artifact lands at round-1/doer-verify-runner/answer.md so the cockpit renders it identically to a doer answer. A phase_progress event with kind="verify_command" surfaces the command-level outcome (exitCode, timedOut, duration) without needing a brand-new event type through the SSE multiplex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: TDD loop — verify failure re-prompts named feedback phase doer Verify phase gains optional `feedbackPhase` + `maxIterations` (default 5, max 20). On verify failure, the runner re-fires the named phase's doer through `runDoer` with the verify output threaded in via `priorRoundFeedback` — same hook a normal disagree-iterate loop uses, so the doer sees the failure in the slot it already knows how to act on. Loops until verify passes or the cap is hit. Reviewers only run on the FINAL iteration (success or final failure); intermediate iterations skip the reviewer pass because exit code is the loop signal and asking the reviewer N times to judge the same class of failure would just burn tokens. Iterations write to round-1001, round-1002, … (TDD_ROUND_OFFSET=1000) so the synthetic TDD-loop round dirs can't collide with the original feedback phase's rounds in the same chat dir. Misconfigured templates (feedbackPhase points at a non-existent or non-standard phase) fail loudly at the top of the verify phase, before the first command run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: babysit DB — jobs + decisions tables, query helpers, 25 tests Foundation for the PR-babysit autonomous review loop (Phase A of docs/pr-babysit-design.md). Two tables: - babysit_jobs: one row per (repo, pr_number) under review, state-machine tracked (idle → judging → fixing → verifying → pushing → quiet_check → merged | escalated). UNIQUE (repo, pr_number) prevents double-registration. ended_at auto-stamps on first terminal transition and is sticky. - babysit_decisions: append-only audit trail of every judge call. Two-stage insert — judge writes validity/category/confidence/outcome=NULL, the fix runner stamps outcome (+ commit) when it resolves. getAttemptCount drives the per-comment circuit breaker (same comment hash flagged N+ times → stop trying, escalate). Schema lives in schema.sql for fresh-DB init AND as idempotent CREATE TABLE IF NOT EXISTS in connection.ts so DBs that pre-date this version pick the tables up on next boot (matches the personas/voices migration pattern). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: babysit comment fetcher — gh CLI pull + author classify + sha256 hash Pulls PR review (line-anchored) + issue (conversation) comments via `gh api`, normalizes them into the shape the babysit judge consumes: - author classification: recognises CodeRabbit / Sourcery / Greptile / Codex by login regex; falls back to GitHub user.type=Bot + [bot] suffix for unknown bots. Humans always come through as isBot=false / bot=null. - sha256(body) keyed so the per-comment circuit breaker can recognise "the same bot re-flagged the same exact text" across polling ticks. - partial-data tolerance: if one of review/issue endpoints fails we still return what we got from the other (a 500 on one shouldn't blank the whole tick). Only when BOTH fail do we surface a typed reason. - `since=` parameter so the polling loop doesn't re-hash every comment on every tick. 16 tests covering author classify, sha256 stability, gh shellout via a fake `gh` on PATH, partial-failure, auth/404 classification, since arg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: babysit judge — classify PR-bot comments + pure action router Reads one PR comment + diff context, asks the model to classify it as valid/invalid/partially_valid/unsure with a category from a fixed menu (apply-trivial | apply-targeted | apply-architectural | reply-disagree | reply-ack | defer-to-human) and a confidence score in [0,1]. Three pieces: - buildJudgePrompt(comment, ctx): pure prompt construction. Includes PR metadata, comment body, anchored code snippet, and (crucially) prior decisions on the same comment hash — so re-judgements after a failed fix tilt toward reply-disagree rather than re-trying the same fix. - judgeComment(opts): drives requestStructured against the JudgeOutputSchema, flags judgements below the 0.7 confidence threshold as belowThreshold. - decideAction(judgement, args): PURE routing function. Maps (judgement, attemptCount, belowThreshold) → fix/reply/escalate/skip. State machine in babysit/runner.ts (next session) stays a thin dispatcher. Routing rules in priority order: per-comment cap → confidence threshold → defer-to-human → reply-* → apply-* (with invalid/unsure self-correction to escalate, since acting on a comment we judged invalid is incoherent). 20 tests: prompt composition (bot vs human, snippet, prior decisions, multi-line bodies, threshold mention, full category menu) + routing table (every category × every priority rule). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: babysit MCP tool + daemon registrar + pr-babysit preset Phase A MCP entry point for the PR-babysit loop. - `mcp__chorus__babysit_pr`: registers a PR for autonomous bot-comment judging. Idempotent — re-calling with the same URL returns the existing job without resetting state mid-flight. - Daemon routes: POST /babysit/jobs — upsert idle job GET /babysit/jobs — list (filters: ?active=true, ?state=…) GET /babysit/jobs/:id — single job + recent decisions - `templates/pr-babysit.yaml`: declares the judge roster (Haiku primary, Sonnet fallback). Validates against TemplateSchema as a `review_only` phase so seedBuiltinTemplates loads it cleanly; the babysit runner (next release) reads `phase.reviewer.candidates` for model selection but doesn't drive this phase through runner.ts. 13 route tests covering happy path, idempotent re-register, missing/ malformed URL, state filter, job-with-decisions detail view. MCP wrapper schema added to tools.ts. Note: src/daemon/index.ts diff is mostly Prettier rewriting single→double quotes after my import addition; the real semantic change is the two lines wiring registerBabysitRoutes into registerAll(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: babysit GH App auth — RS256 JWT + installation token cache GitHub App auth bundle for the PR-babysit loop. Two-tier model: mint a 9-min RS256 JWT from the App private key (Node built-in crypto, no jsonwebtoken dep), then exchange it for a 1-hour installation token cached in-memory with a 5-min refresh buffer so we never present a token about to expire. Config persisted as a single global row in secrets (provider= github_app, kind=gh_app, value=JSON of appId/privateKey/ webhookSecret) — chorus is single-tenant, the App is owned by the daemon operator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: babysit webhook HMAC verify helper Pure-crypto helper for verifying GitHub's X-Hub-Signature-256 against the raw request body. Constant-time comparison via crypto.timingSafeEqual + a typed discriminated-union failure mode (missing/malformed/mismatch/secret_not_configured) so a caller can log the precise reason without leaking it back to the sender. Not wired into a route this session — the daemon only polls — but the verifier ships with full coverage now since shipping the route later without it is a sharp footgun. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: babysit GH client — App-auth + CLI-fallback request shim Unified GitHub-API surface for the babysit loop with two routes: - App auth when installationId is set AND App config persisted: mint/reuse a cached installation token, retry once on 401 (key rotation), retry once on 5xx with backoff. - gh CLI fallback otherwise. Inherits the developer's local gh auth. Bodies on this path return a typed error pointing the operator at the App-auth on-ramp — postponing the stdin plumbing until the runner actually needs to write through the CLI. Routing is transparent to the caller; they always get back a normalized {status, body|errorText, authMode} response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: babysit per-PR worktree manager Idempotent worktree lifecycle for the fix loop: - ensureWorktree() — create or reuse ~/.chorus/worktrees/ <owner>__<name>/pr-<n>/, fetching + checking out the PR head branch. Wipes a stale directory if one exists from a half- failed previous run. - pullLatest() — fetch + reset --hard origin/<branch>. Hard reset is safe only because the runner pushes every commit it makes; documented inline so it doesn't get cargo-culted. - removeWorktree() — git worktree remove --force + rm -rf as belt-and-suspenders for older git versions. Branch names from webhook payloads are validated against the same shell/path-traversal rules used elsewhere in the daemon before being passed to git. Tests use real git against a bare-remote fixture per case; mocking runAsync would leave 90% of the surface untested. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: babysit scheduler — bounded concurrency + per-job mutex Tick driver for the babysit loop with three invariants the production daemon needs: 1. Per-job serialization. A Set keyed by job id, checked-and-set atomically inside dispatch(), prevents two ticks on the same PR from racing over the worktree, decisions table, or reply comment. 2. Bounded global concurrency. maxConcurrent (default 3) caps simultaneous jobs so judge-model quotas + gh-API pressure stay predictable as the backlog grows. 3. Clean drain on SIGTERM. stop() clears the interval AND awaits in-flight jobs so we never leave a worktree mid-commit. Errors thrown from runJob are caught + logged so a single broken PR can't poison the whole loop. The mutex is always released in finally so the next tick can re-dispatch. Not yet wired into daemon startup — the state-machine runner that becomes runJob ships in the next commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: babysit state machine — full judge→fix→verify→push→quiet loop End-to-end driver tying the existing pieces together. One entry point (runJob) the scheduler calls per tick; per-state handlers dispatch the work and return a transition descriptor; the driver owns all babysit_jobs writes so handlers stay pure-ish. State transitions: idle -> judging (provisions worktree) judging -> fixing (any apply-* decision) judging -> quiet_check (replies only, or empty) judging -> escalated (defer-to-human, low-confidence, cap-hit, judge spawn/parse failure) fixing -> verifying (doer produced file edits) fixing -> escalated (doer failure; mark decision escalated) verifying -> pushing (verify passed) verifying -> escalated (verify failed; no auto-retry — the per-comment cap path catches genuine stuck) pushing -> quiet_check (pushed; record commit sha + fix_commits++) pushing -> escalated (git failure) quiet_check -> merged (PR merged on GitHub) quiet_check -> judging (new bot comments arrived) quiet_check -> quiet_check (no change) Supporting modules added in the same commit since they only exist to serve this state machine: - pr-metadata.ts: tiny shim over gh client for title/head/base/ default branch + PR state projection. Uses CLI fallback when no App config. - verifier.ts: resolves npm-test → npm-typecheck → tsc --noEmit from package.json/tsconfig; truncates output at 16 KiB for DB-safe escalation reasons. - fix-executor.ts: doer invocation via structured-output adapter returning {path, new_contents}[]. Full-file rewrites — LLMs are unreliable at diff coordinates and babysit fixes are small. Symlink-aware path safety refuses worktree escape. - git-push.ts: stage → diff-check → commit → push helper. No --force. Default chorus-babysit identity, overridable. Tests: 45 new tests across 5 files cover each handler's happy path + every failure-mode transition. State-machine tests use real DB + mocked external IO; helpers use real shellouts against fixture repos where the value is in the actual git/fs behaviour. Not yet wired: scheduler.start() at daemon boot — that's the next commit, separate from this so the integration is reviewable on its own. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: wire babysit scheduler into daemon lifecycle Start a BabysitScheduler post-listen with the state-machine runner as its job handler. Tick interval defaults to 60s; sourceRepoPath defaults to the daemon's CWD (per-repo overrides will land when the registrar gains a sourceRepoPath field on the babysit job row). CHORUS_DISABLE_BABYSIT_SCHEDULER=1 skips the start for integration tests that drive ticks manually. SIGTERM / SIGINT trigger scheduler.stop(), which clears the interval AND awaits in-flight jobs so we never leave a worktree mid-commit on shutdown. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: babysit pause/resume route — PATCH /babysit/jobs/:id Adds operator-driven pause/resume so a registered PR can be taken off the scheduler's tick without losing its decision history. PATCH /babysit/jobs/:id { action: 'pause' | 'resume' } Pause refuses terminal states (merged, escalated) with 409 — there is nothing for the scheduler to skip once a job has ended. Resume refuses non-paused jobs with 409 to make the intent explicit; both verbs are idempotent within their valid state. Resume re-opens ended_at so the job reappears in listActive() / cockpit lists. The scheduler already treats 'paused' as non-dispatchable (NON_DISPATCHABLE includes paused alongside merged + escalated), so this commit is just the controller — no scheduler change needed. 8 new tests on top of the existing 13 cover: pause happy path, pause idempotency, resume happy path + ended_at clear, conflict on pause-merged + pause-escalated, conflict on resume-when-not-paused, validation on unknown action, 404 on missing job. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: chorus babysit CLI — list/show/register/pause/resume User-facing subcommand group that fronts the existing daemon routes so operators can drive the babysit scheduler without hitting the API directly. chorus babysit register <pr-url> [--installation-id <n>] chorus babysit list [--active] [--state <s>] chorus babysit show <id> chorus babysit pause <id> chorus babysit resume <id> All commands talk to the local daemon over /api/v1; a connection-failed envelope surfaces the standard "start with \`chorus start\`" hint so the failure mode is consistent with the rest of the CLI. Job ids are "<owner>/<repo>#<number>" — show/ pause/resume URL-encode the segment so shells that treat # as a comment don't strip it. show prints the job header + decision log (comment id, author, validity, category, outcome) so 'why did this PR get escalated' is one command away. State labels are color-coded (terminal-red escalated, green merged, yellow paused). src/cli/index.ts also picks up unrelated single→double-quote normalization from the project prettier hook — the only logical change there is the new registerBabysitCommand wire-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.co…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bundles three external-contributor PRs into one squash-merge, with chorus self-review fixups on top. All credited via
Co-authored-by.A) #35 — chrisayl —
fix(orchestrators): register Chorus at Claude Code user scope~/.claude.jsonwithclaude mcp add --scope userprojects.<HOMEDIR>.mcpServers.chorus, so only surfaced when Claude Code launched from home — invisible from any real projectprojectDirparameterB) #36 — chrisayl —
fix(mcp): forward repoPath from create_chat to the daemonrepoPath: z.string().optional()toCreateChatSchema; defaults toprocess.cwd()files: []attachments resolved against chorus install dir and silently got skipped; doer ran with cwd=scratch instead of user repoC) #37 — magalz —
fix(windows): spawn EINVALshell: process.platform === 'win32'on every npm-CLI spawn site (8 files)SAFE_WIN_PATHfrom whitelist to blacklist (excludes& | ; " \$ < > % \0 \r \n) — supports Unicode usernames and@`-scoped packagesMaintainer fixups on top
pr-description.mdcommitted by fix(windows): add shell flags to npm shim call sites fixing spawn EINVAL #37.^toSAFE_WIN_PATHblacklist — cmd.exe escape character can break out of quoted wrap.Chorus self-review
Ran
review-onlytemplate on the combined diff. 8 reviewers, 5 REQUEST_CHANGES / 3 APPROVE. Convergent blocking findings:shell:win32onclaude.ts(same EINVAL bug #37 fixes everywhere else — irony of the year)!missing fromSAFE_WIN_PATH(delayed expansion viasetlocal enabledelayedexpansion)process.cwd()ENOENT crashescreateChatsafeCwd())--scope userNon-blocking findings deferred:
system.ts opencode.pathinjection (path comes from detection, lower severity — follow-up)claude mcp add --scope userversion probe + JSON-patch fallback (over-engineered; clear error message is sufficient)Test plan
pnpm typecheck— cleanpnpm test— 763/763 green (was 750 pre-stack, +13 new tests)pr-description.mdno longer at repo rootCloses
Co-authored-by: Chris Aylott chris.aylott@gmail.com
Co-authored-by: João Pedro Magalhães pedropeixotomagalhaes@gmail.com