updates from source repo#1
Merged
Merged
Conversation
) * fix: zombie process accumulation + health endpoint timeout Three fixes for cascading failure mode in long-running deployments: 1. cli.ts: Install SIGCHLD handler to reap zombie children. Bun (like Node) only auto-reaps when a handler is registered. Without this, child processes spawned by the worker (embed batches, shell jobs, sub-agents) become zombies when they exit, accumulating in the PID table. 2. serve-http.ts: Add 5s timeout to /health endpoint's getStats() call. When the DB connection pool is saturated (e.g., from zombie processes holding phantom connections), getStats() hangs indefinitely, making the server appear dead to health checks even though it's running. 3. worker.ts: Call engine.disconnect() in the finally block after draining in-flight jobs. Releases PgBouncer connection slots immediately on shutdown rather than waiting for TCP keepalive expiry. 4. supervisor.ts + autopilot.ts: Auto-detect tini on PATH and wrap the spawned worker with it. Belt-and-suspenders with the SIGCHLD handler — tini catches children spawned by native addons that bypass the JS event loop. Zero-config: works when tini is installed, silently skips when not. * refactor(zombie-reap): extract idempotent SIGCHLD installer module Extract the inline SIGCHLD handler from cli.ts into a small dedicated module so it's testable directly without importing cli.ts (which invokes main() at module load — incompatible with bun:test imports). The new installSigchldHandler() uses a named module-level handler + includes() check to dedupe across hot-import scenarios. EventEmitter does NOT dedupe listeners by reference, so without this guard a re-import of zombie-reap.ts would accumulate handlers. _uninstallSigchldHandlerForTests() is the test-only escape hatch so test/zombie-reap.test.ts's afterAll can prevent cross-file listener accumulation in the parallel shard process — codex review #6 noted that mutating global process signal listeners in parallel pools is a leak class the isolation lint doesn't protect against. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(spawn-helpers): extract detectTini + buildSpawnInvocation; DRY-consolidate supervisor + autopilot Pulls the duplicated tini detection + (cmd, args) composition out of src/core/minions/supervisor.ts and src/commands/autopilot.ts into a single src/core/minions/spawn-helpers.ts module that both consume. Side effects: - Autopilot now resolves tini ONCE at startup instead of shelling out via execSync('which tini') on every worker respawn (every restart-after-crash path lost ~1ms + a fork to /usr/bin/which). - detectTini() passes env: process.env explicitly to execFileSync. Bun snapshots env at startup; without this, runtime PATH mutations (in tests via withEnv, or in any prod code that ever changes PATH) are invisible to `which`. Tiny correctness fix that also makes the test work. - MinionSupervisor gains an `isTiniDetected` read-only accessor so test/supervisor-tini.test.ts can assert the constructor wired tini correctly without exposing the resolved path or needing to spawn the full lifecycle. The existing worker_spawned event payload still carries {tini: true} for runtime observability (per codex review #5). Test coverage: - test/spawn-helpers.test.ts: pure function tests for both helpers (with-tini / without-tini / empty-args / detectTini smoke) - test/supervisor-tini.test.ts: constructor wiring with PATH stripped vs. PATH containing a fake-tini script in a tmpdir Both files are *.test.ts (parallel-safe) and pass scripts/check-test-isolation.sh without new allow-list entries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(serve-http): extract probeHealth() + drop /health timeout 5s -> 3s Three changes folded into one commit because they touch the same route handler and would conflict if split: 1. Extract probeHealth(engine, engineName, version, timeoutMs) as a pure exported function. Route handler becomes one branchless line: res.status(result.status).json(result.body) This makes the timeout / db-error / happy paths unit-testable directly without an Express test client and without a hardcoded 5000 literal inside the route closure. 2. Export HEALTH_TIMEOUT_MS = 3000 (was inline 5000). Fly.io default health-check timeout is 5s; at 5s exact, the orchestrator may record a request as a timeout instead of getting the 503 (race). 3s gives 2s of headroom for TCP, response framing, and clock skew. The DB-pool-saturation signal still surfaces; we just stop racing the orchestrator deadline. 3. The route handler shape change (4 try/catch lines -> 1 wrapper line) keeps response semantics identical for all three paths. Test coverage: - test/serve-http-health.test.ts: 4 cases (happy / timeout / db-error / exported constant). Calls probeHealth directly with mock engines whose getStats() resolves / rejects / hangs forever. Wall-clock per test bounded by passing timeoutMs: 100. - Existing test/e2e/serve-http-oauth.test.ts /health happy-path case still covers the Express wiring (one-line route handler is identical Express plumbing for 200 and 503). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(worker): log engine.disconnect errors during shutdown instead of swallowing Replace bare \`try { await this.engine.disconnect(); } catch {}\` with \`catch (e) { console.error('[worker] disconnect failed during shutdown:', e); }\`. Why: shutdown is best-effort, but the original silent catch was exactly the bug class the v0.26.9 D14 direction (isUndefinedColumnError swap-in on oauth-provider.ts) was created to surface. If a future regression breaks pool teardown so disconnect rejects, we'll never know without an audit log line. Two-character diff to the catch, no behavior change for the happy path. Test coverage in test/worker-shutdown-disconnect.test.ts: - Happy path: disconnect spy called once during shutdown (intercept-only, not call-through, so the shared engine stays connected for the next test in the file). - Error path: disconnect throws, error is logged with the \`[worker] disconnect failed during shutdown:\` prefix and the bare Error as second arg, and start() still resolves (no rethrow). Spy via spyOn() on the engine instance — object-level, not module-level, so R2 of scripts/check-test-isolation.sh (which forbids module-level mocks in non-serial unit tests) is satisfied. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(e2e): real-binary zombie reaping reproduction (DATABASE_URL-gated) Spawns the gbrain CLI as \`bun run src/cli.ts jobs work --concurrency 1\` against a real Postgres with GBRAIN_ALLOW_SHELL_JOBS=1, submits a shell job from the CLI side (remote: false, bypasses the v0.26.9 RCE gate), captures the worker's shell child PID from the job result, sleeps 300ms, then \`ps -o stat= -p <pid>\` to assert the process is NOT lingering as a zombie (Z state). Why this shape: - \`gbrain serve --http\` was the original plan but doesn't start a worker (only the MCP server) AND submit_job over MCP carries remote: true, which rejects shell at operations.ts:1391 (the v0.26.9 RCE-fix gate). jobs work + CLI-side submit is the only architecture that boots through cli.ts (so installSigchldHandler() actually runs) and lets a shell job execute. - \`shell\` requires absolute cwd (shell.ts:53). Payload includes cwd: '/tmp'. - ps check is run while the worker is STILL ALIVE (no PID-recycle race — worker holds the process tree, so the captured PID is meaningful). Negative control (manual, NOT in CI, documented in test header): Comment out installSigchldHandler() in src/cli.ts -> rebuild -> re-run -> expect stat=Z. Re-enable -> expect stat empty (process gone, reaped). Demonstrates the test catches the regression class without paying CI cost for a separate broken-build target. Skips: - DATABASE_URL not set (matches existing E2E pattern in helpers.ts) - Windows (POSIX-only; tini and SIGCHLD don't exist there) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(postgres-engine): make disconnect() idempotent so it doesn't clobber the module-level singleton PostgresEngine.disconnect() was non-idempotent: after the first call ended \`_sql\` and set it to null, a second call fell through to the \`else\` branch that calls db.disconnect() — which clears the GLOBAL module-level connection used by helpers.ts, the CLI main path, and every test that hadn't opted into a private pool. This bit minions-shell.test.ts and the entire downstream E2E suite when commit 671ef09 (in this branch) added engine.disconnect() to MinionWorker.start()'s finally block. Tests that did: await worker.start(); // worker disconnects (was the new behavior) await engine.disconnect(); // test cleanup; pre-fix fell through // to db.disconnect() and killed // the global connection …would silently kill the helpers.ts singleton, and the next test in the file would fail in its beforeEach with "No database connection". Fix: track \`_connectionStyle\` ('instance' | 'module' | null) on the engine and only call db.disconnect() when this engine actually owns the global. After ending an instance-pool, _connectionStyle stays 'instance' so a second disconnect() is a no-op rather than a side-effect. Test coverage: test/e2e/postgres-engine-disconnect-idempotency.test.ts pins both contracts: - instance-pool engine: second disconnect MUST NOT clobber the module singleton (the bug above). - module-singleton engine: second disconnect is a no-op (resolves cleanly, no throw). Required for: minions-shell.test.ts to keep passing alongside the worker changes on this branch. Discovered during E2E sweep after the unit-test green light. Commit 7 in this branch then walks back the worker-side disconnect entirely (engine ownership belongs to the CLI handler) but this idempotency fix stays in place as a defense-in-depth guard against any future code calling disconnect twice on the same engine. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: move engine.disconnect() from worker.start() to gbrain jobs work CLI handler (engine ownership) Commit 671ef09 (the original fix in this branch) put \`await this.engine.disconnect()\` inside MinionWorker.start()'s finally block to free PgBouncer pool slots immediately on shutdown. That was the right intent on the wrong layer: the worker doesn't own the engine, the CLI handler that creates the engine does. The mismatched ownership broke every test that shares a single engine across multiple worker.start() / worker.stop() cycles: - test/e2e/minions-shell-pglite.test.ts → shared PGLite engine, second test failed with "PGLite not connected" - test/e2e/worker-abort-recovery.test.ts → 3 tests, same shape - test/e2e/minions-shell.test.ts → 3 Postgres tests broken by the second-disconnect-clobbers-global-singleton symptom (commit 6 of this branch fixed the underlying engine non-idempotency, but the worker-disconnect call was still wrong on its own) Fix: - worker.ts: remove the engine.disconnect() call. Add a comment documenting WHY the worker doesn't disconnect (ownership invariant) so a future contributor doesn't put it back. - src/commands/jobs.ts case 'work': wrap worker.start() in a try/finally that calls engine.disconnect() on shutdown. The CLI created the engine (line 631 area), so the CLI disposes of it. Disconnect failure logs to stderr with the "[gbrain jobs work] engine disconnect failed during shutdown:" prefix rather than the bare \`catch {}\` of earlier waves — matches the v0.26.9 D14 direction of preferring loud-but-best-effort over silent. Test: - test/worker-shutdown-disconnect.test.ts now pins the inverse invariant: worker.start() MUST NOT call engine.disconnect(), and the engine MUST remain queryable after start() returns. Two tests, instance-level spy, parallel-safe (no module mocking). End state: gbrain jobs work in production still frees pool slots immediately on shutdown (intent of 671ef09 preserved), tests that share an engine don't break (regression class fixed), and the engine ownership invariant is now codified in code AND in the test suite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: clearTimeout in probeHealth race + platform guard SIGCHLD on Windows Two adversarial-review auto-fixes from /ship's pre-landing review pass. Both reviewers (Claude adversarial subagent + Codex adversarial) flagged the timer leak independently; Codex additionally caught the Windows crash risk. 1. probeHealth race timer leak (serve-http.ts): `Promise.race([getStats(), setTimeout(...)])` doesn't cancel the loser. Without `clearTimeout`, every fast /health request leaves a 3s pending timer in the event loop until it fires. Under sustained probe rates (Fly.io polls every ~10s, orchestrator load balancers can be much tighter), this builds a rolling backlog of timers and avoidable event loop wakeups in the hottest endpoint. Capture the timer handle, clear it in a `finally` block. No-op when the timer already fired. 2. SIGCHLD platform guard (zombie-reap.ts): SIGCHLD is POSIX-only. On Windows, `process.on('SIGCHLD', ...)` throws ENOTSUP because Windows doesn't have signals. Bun behaves the same. Without this guard, any future Windows port of a gbrain CLI tool would crash at boot before main() even runs. The zombie-reaping fix is itself POSIX-only (tini, ps, /proc), so the guard is consistent with the platform's capability set. NOT in this commit (intentionally out of scope): - Cancelling engine.getStats() when /health times out. Both reviewers noted this would need AbortController support in the engine layer which doesn't exist yet. The 503 timeout already improves on master's hang behavior; full cancellation is a follow-up. - Switching /health to a lighter probe (SELECT 1 instead of count(*) across 6 tables). Pre-existing behavior; refactoring the probe shape is wider blast radius than this branch's zombie-reaping scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.28.1) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: update CLAUDE.md for v0.28.1 zombie reaping + health + engine ownership Add v0.28.1 file annotations covering: - src/core/zombie-reap.ts (new) — Layer 1 SIGCHLD reaper module - src/core/minions/spawn-helpers.ts (new) — pure detectTini + buildSpawnInvocation helpers - src/core/minions/worker.ts — engine-ownership invariant (no engine.disconnect) - src/core/minions/supervisor.ts — consumes spawn-helpers, exposes isTiniDetected - src/commands/serve-http.ts — probeHealth() + HEALTH_TIMEOUT_MS = 3000 - src/commands/jobs.ts — case 'work' owns engine lifecycle via try/finally - src/commands/autopilot.ts — resolves tini once at startup - src/core/postgres-engine.ts — disconnect() is idempotent via _connectionStyle Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Wintermute <wintermute@garrytan.com> Co-authored-by: Garry Tan <garrytan@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…es after gateway restarts (#675) * feat(recipes): add restart-sweep — detect dropped messages after gateway restarts Adds a tool to detect Telegram messages dropped during OpenClaw gateway restarts by analyzing session state patterns. Features: - Detects sessions with abortedLastRun flag (primary heuristic) - Identifies timing gaps (active before restart, silent after) - Configurable alert modes (Telegram, stdout) - Environment-based configuration - Comprehensive test suite - PII-scrubbed for public use The tool addresses webhook message loss that occurs when the gateway restarts while messages are in-flight. Unlike long-polling, webhooks cannot replay missed messages, making this detection crucial for production reliability. * feat(recipes): reshape restart-sweep into single .md recipe + harden script Reshape the directory-shaped recipes/restart-sweep/ into a single self-contained recipes/restart-sweep.md with the (fixed) script inlined as a fenced code block. The recipe loader at integrations.ts:445-485 only discovers *.md, so the directory shape was invisible. Eight script fixes: 1. Newline double-escape ('\\n' → '\n') at 8 sites 2. Hard-coded /tmp/ paths → ~/.gbrain/integrations/restart-sweep/ (honors GBRAIN_HOME); bootstrap-log path env-overridable via OPENCLAW_BOOTSTRAP_LOG 3. exec() of interpolated string → execFile with argv array (no shell) 4. Idempotency: loadAlerted/saveAlerted helpers, atomic tmp+rename, corrupt- JSON recovery, 30-day prune 5. Aggressive heuristic gated behind OPENCLAW_RESTART_SWEEP_AGGRESSIVE=1 (default OFF — false-positive prone during quiet periods) 6. Old directory shape removed 7. Env reads moved from module top-level to constructor (fixes the import- time-snapshot bug that made tests semantically bogus) 8. Cooldown layer keyed on (sessionKey, lastAlertedAt) with 6h re-alert threshold — prevents re-alerting forever when the bootstrap log is missing and restartTime is synthesized fresh each run Recipe body adds a Cron environment troubleshooting section with the wrapper-script pattern (set -a; source .env; set +a; exec node ...) plus explicit PATH= line for the cron entry. Plus a TODO line pointing at docs/guides/plugin-handlers.md as the v2 upgrade path (registered Minion handler in the openclaw repo for queue-backed idempotency). Tests: 27 bun:test cases (12 ported + 14 new + 1 sentinel-shape guard). The extractor anchors on <!-- restart-sweep:script --> sentinel and salts the tmp filename to bypass the ESM import cache. A separate test asserts the sentinel itself is present so future doc edits dropping it fail loud. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.28.3) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: sync README + CLAUDE.md for v0.28.3 restart-sweep recipe - README.md: add restart-sweep row to "Getting Data In" recipes table - CLAUDE.md: add test/restart-sweep.test.ts to the unit-test inventory - llms-full.txt: regenerated via bun run build:llms Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: garrytan-agents <garrytan-agents@users.noreply.github.com> Co-authored-by: Garry Tan <garrytan@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ty gate (#674) * feat(skillpack): enhance skillify with cross-modal eval quality gate Updates skillify from v1.0.0 to v2.0.0 with the key innovation: cross-modal evaluation runs BEFORE tests (step 3) to establish quality, then tests lock in the proven-good behavior. Key changes: - 11-item checklist (was 10) - adds cross-modal eval as step 3 - Cross-modal eval uses 3 models to score output on 5 dimensions - Quality gate: all dimensions ≥ 7 average before proceeding to tests - Prevents locking in mediocrity through tests-first approach - References cross-modal-review skill for eval pipeline - Updated all gbrain-specific paths (bun test, scripts/*.ts) - Maintains compatibility with gbrain check-resolvable workflow The meta-skill for turning raw features into properly-skilled, tested, resolvable capabilities. Cross-modal eval ensures output quality before tests cement the behavior. * feat: skillify hardened via 2 cross-modal eval cycles (8.1/10) Applied top improvements from GPT-5.5 + Opus 4-7 + DeepSeek V4 Pro: - Named 3 frontier models explicitly with provider table - Inlined eval prompt template with CONTEXT param + scoring calibration - Defined aggregation math: mean >= 7 AND no single dim < 5 - Added eval receipt JSON schema - Structured 3-cycle fix loop with before/after delta tracking - Added worked example (summarize-pr, end-to-end) - Added cost guardrails (skip < 200 tokens, max 9 API calls) - Added representative input selection rule - Added SKILL.md frontmatter template (copy-paste ready) - Added Phase 0 decision gate (is this worth skillifying?) Also includes cross-modal-eval runner recipe with robust JSON parsing for LLMs that return malformed JSON (3-tier repair). * chore(recipes): remove cross-modal-eval.mjs Superseded by `gbrain eval cross-modal` (next commit). The .mjs script was the original PR's hand-rolled provider stack; the replacement reuses src/core/ai/gateway.ts so config/auth/model-aliasing comes from the canonical recipe registry instead of a parallel stack. No code references the .mjs (it was invoked by skill prose only), so this delete is independently safe to bisect through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(eval): cross-modal-eval core module + unit tests Pure-logic foundation for the new `gbrain eval cross-modal` command (wired in the next commit). All five modules are self-contained — no CLI surface, no I/O outside the receipt writer's mkdirSync. Imported from src/core/ai/gateway.ts at runtime via gwChat (no config impact at load time). Modules: - json-repair.ts: parseModelJSON 4-strategy fallback chain. Adversarial nuclear-option throws rather than fabricating scores (Q6 + Q3 in plan). - aggregate.ts: verdict logic. PASS = (>=2 successes) AND (every dim mean >= 7) AND (every dim min across models >= 5). INCONCLUSIVE when <2/3 models returned parseable scores — closes the v1 .mjs `Object.values({}).every(...) === true` empty-array silent-PASS bug (Q2 + Q3). - receipt-name.ts: receipt filename binds (slug, sha8 of SKILL.md) so `gbrain skillify check` can detect stale audits (T10 in plan). - receipt-write.ts: thin wrapper over writeFileSync that auto-mkdirs the parent directory. Standalone module because gbrainPath() does NOT auto-mkdir (T5 plan correction — Codex caught this). - runner.ts: orchestrator. Promise.allSettled across 3 slots per cycle; up to 3 cycles; stops early on PASS or INCONCLUSIVE. Default slots: openai:gpt-4o / anthropic:claude-opus-4-7 / google:gemini-1.5-pro. estimateCost() exports a small per-model pricing table (drifts; refresh alongside model-family bumps). Tests (32 cases total, all green): - json-repair.test.ts: 10 cases (clean JSON, fences, trailing commas, single quotes, embedded newlines, mismatched braces, nuclear-option success + adversarial throws, empty input, numeric-shorthand scores). - aggregate.test.ts: 8 cases pinning Q2/Q3/dedup. The 0-of-3 INCONCLUSIVE case is the regression guard for the v1 silent-PASS bug. - cli.test.ts: 12 cases on receipt-name / receipt-write / GBRAIN_HOME isolation. Uses withEnv() helper for env mutation (R1 isolation rule). Verifies bisect-clean: typecheck passes, all 32 unit cases green. The runner.ts import of gateway.chat() is dead until commit 3 wires the CLI surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(eval): wire `gbrain eval cross-modal` CLI subcommand User-facing surface for the multi-model quality gate. Three different- provider frontier models score the OUTPUT against the TASK on a 5-dim rubric. Verdict drives exit code: 0 PASS, 1 FAIL, 2 INCONCLUSIVE (<2/3 models returned parseable scores per Q3 in plan). Wiring touches three files: - src/commands/eval-cross-modal.ts (new, ~290 lines) CLI handler. Self-configures the AI gateway from loadConfig() + process.env so it works without `gbrain init` (the cli.ts no-DB branch bypasses connectEngine()). Defaults: cycles=3 in TTY, cycles=1 in non-TTY (T11 partial cost guardrail — limits scripted bulk spend; full --budget-usd hard cap is a v0.27.x TODO). Prints estimated max-cost-per-cycle to stderr before each run. Uses gbrainPath('eval-receipts') for receipt directory. - src/cli.ts (no-DB dispatch branch, 5-line addition) Special-cases `eval cross-modal` BEFORE the existing handleCliOnly path that requires connectEngine(). Mirrors the `dream` no-DB pattern but doesn't even attempt the connect — the command never touches the DB. New users can run the gate before `gbrain init` (T3 in plan). - src/commands/eval.ts (sub-subcommand dispatch) Adds `cross-modal` alongside `export`/`prune`/`replay`. The cli.ts branch takes precedence in the user-facing path; this branch only fires when callers re-enter runEvalCommand with an existing engine. Engine is intentionally unused — the handler self-routes. - test/e2e/cross-modal-eval.test.ts (new, 4 cases) Mocked-fetch E2E. Lives at test/e2e/* (NOT *.serial.test.ts) per plan T8: test/e2e/* is exempt from the test-isolation lint and already runs serially via scripts/run-e2e.sh, so the mock.module() call doesn't need a quarantine rename. Cases: PASS / FAIL (mean<7) / FAIL (min<5 — Q2 floor) / INCONCLUSIVE (2 mock 5xx — Q3 contract). The runner from commit 2 now has live callers. typecheck passes; the 4 E2E cases all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(skillify): add informational 11th item (cross-modal eval) Promotes the skillify contract from 10 to 11 items. The 11th item (cross-modal eval) is `required:false` per T7 in the plan — a missing or stale receipt surfaces in the audit output but does not fail the gate. Existing skills keep their current required-score; the bump is additive, not breaking. Changes: - src/commands/skillify.ts Header jsdoc updated 10-item -> 11-item. No code-flow changes. - src/commands/skillify-check.ts (the per-skill audit; not src/commands/skillpack-check.ts which is a different command — plan T6 corrected the conflation in the original plan) New informational item at position 11. Reuses findReceiptForSkill() helper from src/core/cross-modal-eval/receipt-name.ts to detect: * found — receipt matches current SKILL.md sha-8 * stale — receipt exists for an older SKILL.md * missing — no receipt yet Audit output cases pass through to existing pretty/JSON formats. - src/core/skillify/templates.ts Scaffolded SKILL.md now includes a "Phase 3: Cross-modal eval (informational)" section with copy-paste `gbrain eval cross-modal` invocation, pass criteria, and receipt-naming convention. Helps new skill authors discover the gate. - test/skillify-scaffold.test.ts New T9 case verifies the scaffold emits the Phase 3 section, points at the correct command, documents the receipt path, and appends exactly one resolver row. Replaces the original plan's `gbrain skillify scaffold demo-eleven` shell verification (which Codex caught as invalid + repo-mutating). Verifies: typecheck passes; scaffold test 19/19 (was 18, +1 T9 case). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: skillify v1.1.0 + cross-modal-eval references Documentation catches up with the new behavior shipped in commits 1-4. - skills/skillify/SKILL.md (1.0.0 -> 1.1.0) Full rewrite. Frontmatter version is additive (T7 in plan); the 11th item is informational, not breaking. Phase 3 now points at `gbrain eval cross-modal` with copy-paste invocation, default slot table, pass criteria, receipt-naming convention, cycles + cost guardrails (T11 partial cap), provider configuration via the AI gateway, and the cycle-1/2/3 fix loop. Adds Output Format section (skills-conformance.test.ts requires it). Drops the original `(or lib/cross-modal-eval.ts)` parenthetical (Q5 plan correction — that path never existed). - skills/cross-modal-review/SKILL.md Adds 4-line Relationship section pointing at `gbrain eval cross-modal` (D3 plan reciprocal). Distinguishes the manual second-opinion gate (this skill) from the automated multi-model score-and-iterate gate (the new command). - CLAUDE.md Key Files entries for src/commands/eval-cross-modal.ts and the five new src/core/cross-modal-eval/* modules. Commands list gains the `gbrain eval cross-modal` entry under v0.27.x. Notes the non-TTY default 1-cycle behavior + the gbrainPath('eval- receipts') resolution. - TODOS.md Four v0.27.x follow-ups filed under a new "cross-modal-eval" section: full --budget-usd cap (T11 follow-up), subagent integration (recovers cross-process rate-leases T4 deferred), skill adoption telemetry (revisit T7=C with data after 30 days), docs/cross-modal-eval.md user guide. - llms-full.txt Regenerated via `bun run build:llms` to match the CLAUDE.md edits — sync guard at test/build-llms.test.ts requires this. Verifies: typecheck passes; skills-conformance 199/199 green; build-llms 7/7 green; full unit fast loop 3861/3861 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.28.4) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: garrytan-agents <garrytan-agents@users.noreply.github.com> Co-authored-by: Garry Tan <garrytan@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…bun-link foot-gun (#697) * fix(engines): pre-add v0.20 + v0.26.3 forward-reference columns in bootstrap The forward-reference bootstrap (PostgresEngine + PGLiteEngine applyForwardReferenceBootstrap) covered v0.18 + v0.19 + v0.26.5 columns but missed two later groups. Brains upgrading from v0.14-era to current master crash before the migration ladder runs: 1. v0.20 Cathedral II — content_chunks.search_vector, parent_symbol_path, doc_comment, symbol_name_qualified. `CREATE INDEX idx_chunks_search_vector` and `CREATE INDEX idx_chunks_symbol_qualified` in schema.sql/PGLITE_SCHEMA_SQL crash with "column search_vector does not exist" / "column symbol_name_qualified does not exist". 2. v0.26.3 — mcp_request_log.agent_name, params, error_message. `CREATE INDEX idx_mcp_log_agent_time ON mcp_request_log(agent_name,...)` crashes with "column agent_name does not exist". Reproduces deterministically on a v0.13/v0.14 brain upgraded straight to current master. The user hits the wall before any of v15-v36 can run. Both engines now probe for these columns and pre-add them via `ALTER TABLE ADD COLUMN IF NOT EXISTS` before SCHEMA_SQL runs. Migrations v26, v27, v33 still run later via runMigrations and remain idempotent (they handle backfill on top of the bootstrap-added columns). Test coverage extended in test/schema-bootstrap-coverage.test.ts: REQUIRED_BOOTSTRAP_COVERAGE now lists 6 new forward references; the strip-and-rebuild block drops the corresponding indexes/triggers so the test exercises a brain that pre-dates v0.20 + v0.26.3 migrations. Repro: brain on schema v13/v14 + run `gbrain init --migrate-only` against current master → fails. With this patch → succeeds; ladder runs to v36. * fix(engines): pre-add v0.27 subagent_messages.provider_id in bootstrap PR #682 covered v0.20 (chunks) + v0.26.3 (mcp_request_log) but missed v0.27's subagent_messages.provider_id. The composite index `idx_subagent_messages_provider ON subagent_messages (job_id, provider_id)` in PGLITE_SCHEMA_SQL crashes on brains pinned at v0.18-v0.26 because provider_id is the SECOND column in the composite — array-extraction patterns that scan only first-column references miss it entirely. This is the wedge surfaced by issue #670 (v0.22.0 → v0.27.0 init --migrate-only crashes with "column 'provider_id' does not exist") and contributing to #661/#657. Both engines now probe for subagent_messages.provider_id and pre-add the column via ALTER TABLE ADD COLUMN IF NOT EXISTS before SCHEMA_SQL runs. Migration v36 (subagent_provider_neutral_persistence_v0_27) still runs later via runMigrations and remains idempotent. Note on the test side: REQUIRED_BOOTSTRAP_COVERAGE is hand-maintained and just gained a v0.27 entry. v0.28.5's Step 3 replaces this array with a SQL parser that auto-derives coverage from PGLITE_SCHEMA_SQL, including composite-index columns. This commit is the targeted follow-up to PR #682's cherry-pick; A2's parser closes the class permanently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cli): conditional schema-init on connect (closes #651) Adds `hasPendingMigrations(engine)` next to `runMigrations` in migrate.ts: single getConfig('version') probe, returns true when current < LATEST_VERSION, defensively returns true on getConfig failure (treats wedged-config as pending). `connectEngine` in cli.ts now wraps `engine.initSchema()` in a probe gate: short-lived CLI calls (gbrain stats, query, doctor, etc.) on already-migrated brains skip the bootstrap-probe + SCHEMA_SQL replay + ledger-check entirely. Wedged brains still auto-heal — the probe says "yes pending" and initSchema runs as before. Building on oyi77's investigation in PR #652. Same correctness as #652's unconditional initSchema-on-every-connect, but no perf regression on the hot path. Failure non-fatal: if probe or init throws, log a hint and let subsequent operations surface the real error in context. Test coverage in test/migrate.test.ts: 3 cases covering fully-migrated (false), version-rewound (true), and missing-version-config (defensive true). Pairs with v0.28.5's X1 (post-upgrade auto-apply) — the upgrade path runs initSchema explicitly while every other code path that goes through connectEngine gets the cheap probe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(upgrade): post-upgrade auto-applies pending schema migrations (X1) Prior behavior: `gbrain upgrade` → `gbrain post-upgrade` → `apply-migrations` only WARNs at apply-migrations.ts:296-302 when schema version is behind LATEST_VERSION, telling the user to run `gbrain init --migrate-only`. 11 wedge incidents over 2 years have proven users don't read that WARN — they file an issue instead. This commit makes `runPostUpgrade` explicitly call `engine.initSchema()` after the orchestrator migration pass, mirroring `init --migrate-only`'s flow. Side-effect: `gbrain upgrade` now walks away with a healthy brain in the cluster A wedge case (#670, #661, #657, #651, #625, #615, #609). Defensive: wrapped in try/catch so a connection or DDL failure falls back to the existing user-facing WARN. The hint to run `gbrain init --migrate-only` is preserved as the manual escape hatch. Pairs with v0.28.5's A1 (hasPendingMigrations probe in connectEngine): the upgrade path runs initSchema explicitly here, while every other code path that goes through connectEngine gets the cheap probe. Codex outside-voice review caught this gap during plan review: "the plan still does not prove `upgrade` will actually run schema migrations." This is the load-bearing fix that makes v0.28.5's headline outcome ("run upgrade, brain works") literally true for cluster A. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(bootstrap): auto-derive coverage from PGLITE_SCHEMA_SQL (A2) Replaces the hand-maintained REQUIRED_BOOTSTRAP_COVERAGE assertion with a SQL-parser-backed structural check. The new test: 1. parseIndexColumnReferences(PGLITE_SCHEMA_SQL) extracts every column referenced by every CREATE INDEX — including composite-index second and third columns. Codex outside-voice review caught that earlier first-col-only patterns missed v0.27's `idx_subagent_messages_provider ON subagent_messages (job_id, provider_id)`, which is exactly how the v0.28.5 wedge happened. 2. parseBaseTableColumns(PGLITE_SCHEMA_SQL) extracts every column declared in CREATE TABLE bodies (including via ALTER TABLE ADD COLUMN inside the schema blob). 3. parseAlterAddColumns(pglite-engine.ts source) extracts every column that applyForwardReferenceBootstrap adds. 4. Static contract: every (table, column) pair from step 1 must appear in either step 2 or step 3. Otherwise the test fails loud, names every uncovered pair, and points at the bootstrap function for the fix. Self-updating: any future CREATE INDEX added to PGLITE_SCHEMA_SQL on a column that bootstrap doesn't yet provide fails this test at PR time. No human required to remember to update an array. Closes the 11-incident wedge class identified in CLAUDE.md (#239, #243, #266, #357, #366, #374, #375, #378, #395, #396). Helper parsers also have their own unit tests covering composite-index second columns, function-wrapped columns (lower(col)), HNSW operator-class suffixes (vector_cosine_ops), and ALTER TABLE column extraction. Existing REQUIRED_BOOTSTRAP_COVERAGE-based tests preserved as a coarse-grained lower bound; the new parser-based test is the load-bearing structural gate going forward. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: support Voyage 2048d schema setup * fix: harden Voyage schema templating * feat: Voyage 4 embedding support + doctor eval - Add voyage-4-large/4/4-lite/4-nano + domain models to Voyage recipe - Fix AI SDK compatibility: strip encoding_format (Voyage rejects 'float'), patch response to add prompt_tokens from total_tokens - Add embedding_provider doctor check: live smoke test verifying model, API key, dimensions, and DB column alignment - Add embedding provider eval qrels for post-migration quality testing Closes: Voyage AI integration for gbrain embedding pipeline * fix: adaptive embed batch sizing for Voyage token limits Voyage's tokenizer is 3-4x denser than OpenAI tiktoken, causing batches of 50+ texts to exceed the 120K token-per-batch limit even when DB token counts (from tiktoken) suggest they'd fit. Changes: - Add max_batch_tokens to EmbeddingTouchpoint type (provider-declared limit) - Set Voyage recipe to 120K token limit - Gateway embed() now auto-splits batches using conservative char-to-token estimate (1:1 ratio, 80% budget utilization) - On token-limit errors, embedSubBatch recursively halves and retries (down to single-text batches before giving up) - Reduce embedding.ts BATCH_SIZE from 100 to 50 as a secondary guard - Add tests for batch splitting logic and error pattern matching Fixes infinite retry loops where the same oversized batch would fail repeatedly because WHERE embedding IS NULL re-fetches identical rows. * fix(init): error on existing-brain dim mismatch + embedding-migration recipe Adds A4 hard-error path: when `gbrain init --embedding-dimensions N` is run against an existing brain whose `content_chunks.embedding` column is a different `vector(M)`, init exits 1 with an inline four-step ALTER recipe and a pointer to docs/embedding-migrations.md. This kills the silent-corruption pattern surfaced by issue #673: the v0.27 schema seeded `('embedding_dimensions', '1536')` regardless of the flag, so users got a config saying 768 but a column at 1536 — first sync write blew up with "expected 1536, got 768." A4's contract: 1. Connect to engine BEFORE saveConfig so we can read the live column type 2. If column exists AND dim != requested, exit 1 (loud failure) 3. If column doesn't exist (fresh init) OR dim matches, proceed normally Recipe in docs/embedding-migrations.md (and inlined in init's error output) covers all four destructive steps codex's plan-review caught: 1. DROP INDEX IF EXISTS idx_chunks_embedding (HNSW won't survive ALTER) 2. ALTER TABLE content_chunks ALTER COLUMN embedding TYPE vector(N) 3. UPDATE content_chunks SET embedding = NULL, embedded_at = NULL 4. CREATE INDEX HNSW *only if N <= 2000* (pgvector cap) Step 4 is conditional: dims > 2000 (e.g. Voyage 4 Large 2048d) cannot be HNSW-indexed in pgvector; the recipe explicitly says "Skip reindex" in that case so the user doesn't paste a CREATE INDEX that crashes. Helper `readContentChunksEmbeddingDim` and message builder `embeddingMismatchMessage` live in src/core/embedding-dim-check.ts so doctor 8b (next commit) can reuse the same source of truth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gateway): correct dim-mismatch error to point at manual ALTER recipe (#672) Previous error message recommended running `gbrain migrate --embedding-model … --embedding-dimensions …`, but `gbrain migrate` only handles engine migration (postgres ↔ pglite), not embedding reconfiguration. Following that hint produced a different error and confused users further. New message: - Names the actual options: change models OR migrate the existing brain - Inlines a one-line quick recipe (DROP INDEX → ALTER → UPDATE NULL → config set → embed --stale) - Points at docs/embedding-migrations.md (added in commit 306fc0e) for the full four-step recipe with HNSW conditional handling Closes #672. Note: #671 (config show hides embedding_model / dimensions) appears to be already fixed on master — `Object.entries(loadConfig())` in config.ts:24 correctly enumerates all keys including embedding_*. Will close #671 with that note when shipping v0.28.5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(types): doctor 8b uses portable executeRaw + Voyage fetch-shim cast #665's doctor 8b dim-probe used `engine.sql\`...\`` directly (Postgres template literal) which doesn't typecheck against the BrainEngine interface (only PostgresEngine has the .sql getter; PGLite does not). Refactored to use `readContentChunksEmbeddingDim` from src/core/embedding-dim-check.ts — same helper init's A4 hard-error path uses, runs portably on both engines. #680's Voyage fetch-shim passes a custom fetch handler to `createOpenAICompatible` for the encoding_format + prompt_tokens normalization. The SDK accepts the field at runtime but the typed parameter on the pinned version doesn't expose it. Cast to the parameter type so the shim ships without a type error. Both fixes are mechanical cleanup of cherry-picked PRs that didn't typecheck against current master's stricter shape. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cli): mark cli.ts executable so bun-linked installs work `package.json` declares `"bin": { "gbrain": "src/cli.ts" }`, and bun's linker creates `~/.bun/bin/gbrain` as a symlink to the file. The shebang `#!/usr/bin/env bun` works only when the target file is executable — otherwise bun runs it as a script (because it sees the script via the shebang interpreter), but executing the symlinked target itself fails: $ ls -la ~/.bun/bin/gbrain lrwxrwxrwx ... -> ../install/global/node_modules/gbrain/src/cli.ts $ ~/.bun/bin/gbrain --version /opt/homebrew/bin/bash: line 1: /Users/brandon/.bun/bin/gbrain: Permission denied This bites the postinstall hook that calls `gbrain apply-migrations` (masked by the `||` fallback) and any subprocess that invokes the binary by absolute path (e.g., subagent_messages migration v0.16's `execSync('gbrain init --migrate-only', ...)`). Setting the mode in-tree to 755 fixes both. No content change. * test(ci): guard against src/cli.ts mode-bit regression (cluster C) Cluster C cherry-pick (#683) restored the executable bit on src/cli.ts. This commit adds scripts/check-cli-executable.sh that asserts the git index mode is 100755 and wires it into `bun run verify` (and check:all). Why a CI guard: bun-link installs symlink to src/cli.ts directly. If the mode bit ever regresses to 100644, the very first `gbrain --version` fails with `permission denied` — the exact symptom that motivated #683. This guard runs in <100ms, fast enough for the inner verify loop. Failure mode: clear instructions on what command to run to fix (`chmod +x src/cli.ts && git add --chmod=+x src/cli.ts`) plus a pointer back to issue #683 so future maintainers know why the guard exists. Note: darwin and linux only. Windows preserves the git-stored mode regardless of filesystem chmod, so the index-mode check works the same on every platform CI uses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(upgrade): detect bun-link, warn on npm squatter (#656, #658) Rewrites detectInstallMethod() in src/commands/upgrade.ts:247 with three layered signals per v0.28.5 plan cluster D + codex finding C1: 1. bun-link signal (closes #656): when argv[1] is a symlink, walk up from realpath(argv[1]) up to 6 levels looking for a .git/config whose contents include `garrytan/gbrain` (case-insensitive substring). Returns 'bun-link'. Best-effort: forks, tarballs, and detached source trees fall through to the existing chain. 2. canonical bun authenticity check (closes #658 detection half): when the install lives in node_modules, read package.json and verify repository.url contains `garrytan/gbrain` OR src/cli.ts coexists (squatter ships compiled binary, not source). On 'suspect' verdict, print printSquatterRecovery() — names both git-clone AND release-binary recovery paths so users without a local clone can still recover. 3. Source-marker fallback inside (2). Codex flagged this is spoofable by a determined squatter; accepted — best-effort warning, not assertion. The structural fix is publishing under @garrytan/gbrain (tracked v0.29 follow-up). The squatter's `name: gbrain` field doesn't disambiguate (codex caught this in plan review of my original heuristic). repository.url is the field a careless squatter is least likely to set correctly; src/cli.ts presence is the secondary signal. bun-link installs return 'bun-link' from the switch in runUpgrade, which prints the source-clone upgrade path (`git pull && bun install && bun link`) instead of trying `bun update gbrain` which doesn't apply. README updated with the corresponding "DO NOT use `bun add -g gbrain`" callout naming both #658 and the v0.29 scoped-name plan. Tests in test/upgrade.test.ts cover return-type extension, bun-link signal shape, classifyBunInstall's two-signal check, and the recovery message contents. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.28.5 release: PGLite upgrade wedge + embedding dim corruption + bun-link foot-gun Fix wave bundling 9 community PRs to unwedge users stuck since v0.27. Cluster A — PGLite upgrade wedge (#670, #661, #657, #651, #625, #615, #609): - Bootstrap now covers v0.20+v0.26.3+v0.27 forward references (both engines) - hasPendingMigrations() probe gates initSchema() in connectEngine - Post-upgrade auto-applies pending schema migrations (X1) - SQL-parser-backed bootstrap coverage replaces hand-maintained array (A2) Cluster B — Embedding dim corruption (#673, #672, #666, #640): - Schema templating cascade fixed end-to-end (#641 from @100yenadmin) - gbrain doctor 8b live embedding-provider probe (#665) - Voyage adaptive batch sizing for 120K-token cap (#680) - gbrain init A4 hard-error on existing-brain dim mismatch - docs/embedding-migrations.md with conditional-HNSW four-step recipe - #672 misleading migrate-suggestion error replaced with inline recipe Cluster C — CLI exec bit (#683, dupe of #655): - src/cli.ts mode 100644 → 100755 (#683 from @brandonlipman) - scripts/check-cli-executable.sh CI guard against future regression Cluster D — bun add -g foot-gun (#656, #658): - 3-signal detectInstallMethod rewrite (bun-link, repo.url, source-marker) - Loud-red recovery message names source-clone AND release-binary paths - README "DO NOT use bun add -g gbrain" callout Contributors: @brandonlipman (#682, #683), @mdcruz88 (#668), @ChenyqThu (#627), @alan-mathison-enigma (#610), @oyi77 (#652 building block), @abkrim (#655), @100yenadmin (#641). VERSION 0.27.0 → 0.28.5 package.json 0.27.0 → 0.28.5 schema-embedded.ts regenerated via bun run build:schema llms-full.txt regenerated via bun run build:llms Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(e2e): v0.28.5 fix-wave end-to-end coverage PGLite-only E2E covering the three regression scenarios v0.28.5 was shipped to fix: 1. cluster A — pre-v0.20 brain (missing v0.20 + v0.26.3 + v0.27 columns) re-runs initSchema cleanly. Strips the column set v0.28.5's bootstrap claims to restore (search_vector, parent_symbol_path, doc_comment, symbol_name_qualified, agent_name, params, error_message, provider_id), resets the version row to 13, then re-runs initSchema. Asserts every column comes back AND version reaches LATEST_VERSION. Closes the gap that pre-v0.28.5 produced 11 wedge incidents. 2. cluster B — fresh init at non-default dims templates the column correctly (768d AND 2048d cases). The 2048d case explicitly verifies idx_chunks_embedding is NOT created (codex finding #8 — pgvector's HNSW cap is 2000). 3. A4 — existing-brain dim mismatch helper produces a recipe that inlines all four steps (DROP INDEX, ALTER TYPE, NULL, conditional reindex). Validates the conditional CREATE INDEX HNSW for dims <= 2000 AND its omission for dims > 2000. The recipe a user copy-pastes won't crash them on Voyage 4 Large. Plus a hasPendingMigrations() lifecycle test covering the four states (fresh / migrated / rewound / re-applied) — pairs with the unit test in test/migrate.test.ts but exercises the engine end-to-end. PGLite-only because none of these cases need real Postgres. Postgres-side bootstrap is covered by test/e2e/postgres-bootstrap.test.ts. Run: bun test test/e2e/v0_28_5-fix-wave.test.ts (no DATABASE_URL needed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: refactor embedding-dim-check.test.ts to canonical PGLite pattern Test-isolation lint (R3+R4) requires PGLiteEngine in beforeAll() context with afterAll() disconnect. Refactored to single-engine-per-file pattern; the fresh-brain test uses a one-off engine inside its own try/finally so the file-level engine stays at LATEST schema for the migrated-brain test. No behavior change to the assertions. `bun run verify` now passes clean (privacy + jsonb + progress + test-isolation + wasm + admin-build + cli-exec + typecheck). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(doctor): make 8b embedding-provider probe non-fatal (CI green) CI Tier 1 was failing on `gbrain doctor exits 0 on healthy DB` because the v0.28.5 doctor 8b check (cherry-picked from #665) pushed `status: 'fail'` in two non-fatal scenarios: 1. No API key configured (`isAvailable('embedding')` returns false) 2. Probe throws (network blip, transient 5xx, DNS, rate limit) Both are noise in CI and on offline workstations — the brain is healthy, the provider just isn't reachable from this environment. The v0.28.5 plan P1 decision called for non-fatal-on-offline behavior: > Doctor 8b probes live every run (taken as-is). Non-fatal on network > failure (warns rather than errors); silently skipped when no API key > configured. This commit aligns the implementation with that decision: - !available → status 'ok' with "Skipped (no provider credentials)" message so the run is visible in --json output without failing exit code - catch block → status 'warn' (was 'fail') so probe failures surface informationally without crashing CI / autopilot's periodic doctor runs The mismatch slipped past plan-time review because #665 was cherry-picked before P1 was finalized; the type-fix pass in 4c26e48 only adjusted the DB-column probe shape, not the API-availability gate. CI Tier 1 (Mechanical) — `test/e2e/mechanical.test.ts:1220` — "gbrain doctor exits 0 on healthy DB" now passes against a fresh Postgres without `OPENAI_API_KEY` / `VOYAGE_API_KEY` set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Brandon Lipman <brandon@offdeck.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Eva <eva@100yen.org> Co-authored-by: garrytan-agents <garrytan-agents@users.noreply.github.com>
…low-list (#563) * v0.28 schema: takes + synthesis_evidence (v31) + access_tokens.permissions (v32) Migration v31 adds the takes table (typed/weighted/attributed claims) and synthesis_evidence (provenance for `gbrain think` outputs). Page-scoped via page_id FK (slug isn't unique alone in v0.18+ multi-source). HNSW partial index on embedding for active rows. ON DELETE CASCADE on synthesis_evidence so deleting a source take cascades the provenance row. Migration v32 adds access_tokens.permissions JSONB with safe-default backfill (`{"takes_holders":["world"]}`). Default keeps non-world holders hidden from MCP-bound tokens until the operator explicitly grants access via the v0.28 auth permissions CLI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.28 engine: addTakesBatch, listTakes, searchTakes/Vector, supersede, resolve, synthesis_evidence Extends BrainEngine with the takes domain object. Both engines implement the same surface; PGLite uses manual `$N` placeholders, Postgres uses postgres-js unnest() — same shape as addLinksBatch and addTimelineEntriesBatch. Methods: - addTakesBatch (upsert via ON CONFLICT (page_id, row_num) DO UPDATE) - listTakes (filter by holder/kind/active/resolved, takesHoldersAllowList for MCP-bound calls, sortBy weight/since_date/created_at) - searchTakes / searchTakesVector (pg_trgm + cosine; honor allow-list) - countStaleTakes / listStaleTakes (mirror countStaleChunks pattern; embedding column intentionally omitted from listStale payload) - updateTake (mutable fields only; throws TAKE_ROW_NOT_FOUND) - supersedeTake (transactional: insert new at next row_num, mark old active=false, set superseded_by; throws TAKE_RESOLVED_IMMUTABLE on resolved bets) - resolveTake (sets resolved_*; throws TAKE_ALREADY_RESOLVED on re-resolve; resolution is immutable per Codex P1 #13 fold) - addSynthesisEvidence (provenance persist; ON CONFLICT DO NOTHING) - getTakeEmbeddings (parallel to getEmbeddingsByChunkIds) Types live in src/core/engine.ts adjacent to LinkBatchInput. Page-scoped via page_id (slug not unique in v0.18+ multi-source). PageType gains 'synthesis'. takeRowToTake mapper in utils.ts handles Date → ISO string normalization. Tests: test/takes-engine.test.ts — 16 cases against PGLite covering upsert/list/filter/search happy paths, takesHoldersAllowList isolation, the four invariant errors (TAKE_ROW_NOT_FOUND, TAKES_WEIGHT_CLAMPED, TAKE_RESOLVED_IMMUTABLE, TAKE_ALREADY_RESOLVED), supersede flow, resolve metadata round-trip, FK CASCADE on synthesis_evidence when source take deletes. All pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.28 model-config: unified resolveModel with 6-tier precedence + alias resolution Replaces every hardcoded `claude-*-X` and per-phase `dream.<phase>.model` config key with a single resolver. Hierarchy: 1. CLI flag (--model) 2. New-key config (e.g. models.dream.synthesize) 3. Old-key config (deprecated dream.synthesize.model, dream.patterns.model) — read with stderr deprecation warning, one-per-process 4. Global default (models.default) 5. Env var (GBRAIN_MODEL or caller-supplied) 6. Hardcoded fallback Aliases (`opus`, `sonnet`, `haiku`, `gemini`, `gpt`) resolve at the end so any tier can use a short name. User-defined `models.aliases.<name>` config overrides built-ins. Cycle-safe (depth 2 break). Unknown alias passes through unchanged so users can pass full provider IDs without registering. When new-key + old-key are BOTH set (Codex P1 #11 fix), new-key wins and stderr warns "deprecated config X ignored; Y is set and wins". When only old-key is set, it's honored with a softer "rename to Y before v0.30" warning. Both warnings emit once per (key, process) — a Set memo prevents log spam in long-running daemons. Migrated call sites: synthesize.ts (model + verdictModel), patterns.ts (model). subagent.ts and search/expansion.ts to be migrated later in v0.28 (staying compatible until then). Tests: test/model-config.test.ts — 11 cases pinning the 6-tier ordering, alias resolution + cycle break, deprecated-key warning emit-once, and unknown-alias pass-through. All pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.28 takes-fence: parser/renderer/upserter + chunker strip (privacy P0 fix) src/core/takes-fence.ts — pure functions for the fenced markdown surface: - parseTakesFence(body) — extracts ParsedTake[] from `<!--- gbrain:takes:begin/end -->` blocks. Strict on canonical form, lenient on hand-edits with warnings (TAKES_FENCE_UNBALANCED, TAKES_TABLE_MALFORMED, TAKES_ROW_NUM_COLLISION). Strikethrough `~~claim~~` → active=false; date ranges `since → until` split into sinceDate/untilDate. - renderTakesFence(takes) — round-trip safe with parseTakesFence. - upsertTakeRow(body, row) — append-only per CEO-D6 + eng-D9. Creates a fresh `## Takes` section if no fence present. row_num is monotonic (max + 1, never gap-filled — keeps cross-page refs and synthesis_evidence stable forever). - supersedeRow(body, oldRow, replacement) — strikes through old row's claim AND appends the new row at end. Both rows preserved in markdown for git-blame archaeology. - stripTakesFence(body) — removes the fenced block entirely. Used by the chunker so takes content lives ONLY in the takes table. Codex P0 #3 fix: src/core/chunkers/recursive.ts now calls stripTakesFence() before computing chunk boundaries. Without this, page chunks would contain the rendered takes table and the per-token MCP allow-list would be bypassed at the index layer (token bound to takes_holders=['world'] would see garry's hunches via page hits). Doctor's takes_fence_chunk_leak check (plan-side) asserts no chunk contains the begin marker. Tests: 15 cases covering canonical parse, strikethrough, date range, fence unbalanced detection, malformed-row skip + warning, row_num collision detection, round-trip render, append-only upsert into existing fence, fresh-section creation, monotonic row_num under hand-edit gaps, supersede flow, stripTakesFence verifying takes content removed AND surrounding prose preserved. Existing chunker tests still pass (15 + 15 = 30). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.28 page-lock: PID-liveness file lock for atomic markdown read-modify-write src/core/page-lock.ts — per-page file lock at ~/.gbrain/page-locks/<sha256-of-slug>.lock so two concurrent `gbrain takes add` calls or `takes seed --refresh` from autopilot can't race on the same `<slug>.md` read-modify-write. Eng-review fold: reuses the v0.17 cycle.lock pattern (mtime + PID liveness) but per-slug. Differences from cycle.ts's lock: - SHA-256 of slug for safe filenames (slashes, unicode, etc.) - Same-pid + fresh mtime = LIVE (cycle.ts assumes one lock per process and reclaims same-pid; page-lock allows concurrent locks for DIFFERENT slugs in one process). mtime expiry still rescues post-crash leftovers. - 5-min TTL (vs cycle's 30 min — page edits are short) - `withPageLock(slug, fn)` convenience wrapper with default 30s timeout API: - acquirePageLock(slug, opts) → handle | null (poll-with-timeout) - handle.refresh() / handle.release() (idempotent — only releases if pid matches) - withPageLock(slug, fn, opts) — acquire + run + release-in-finally Tests: 10 cases — fresh acquire, live holder returns null, stale-mtime reclaim, dead-PID reclaim, refresh updates timestamp, foreign-pid release is no-op, withPageLock callback runs and releases on success/failure, timeout-throws when held, SHA-256 filename safety for slashes/unicode. All pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.28 extract-takes: dual-path phase (fs|db) + since/until_date as TEXT src/core/cycle/extract-takes.ts — new phase that materializes the takes table from fenced markdown blocks. Two paths mirror src/commands/extract.ts: - extractTakesFromFs: walk *.md under repoPath, parse fences, batch upsert - extractTakesFromDb: iterate engine.getAllSlugs(), parse each page's compiled_truth+timeline, batch upsert (mutation-immune snapshot iteration) Single dispatcher extractTakes(opts) routes by source. Honors: - slugs filter for incremental re-extract (pipes from sync→extract) - dryRun: count would-be upserts, write nothing - rebuild: DELETE FROM takes WHERE page_id = $1 before re-insert (clean slate when markdown is canonical and DB has drifted) Schema fix: since_date/until_date were DATE in the original v31 migration. Spec uses partial dates ('2017-01', '2026-04-29 → 2026-06') that Postgres DATE rejects. Changed to TEXT in both the Postgres and PGLite blocks so parser-rendered ranges round-trip cleanly. Loses the ability to do date-range arithmetic in SQL, but date math on opinion timelines is out of scope for v0.28 anyway. utils.ts dateOrNull now annotated as v0.28 TEXT-aware. Migration v31 has not been deployed yet (this branch is the v0.28 release candidate), so the type swap is free. No data migration needed. Tests: test/extract-takes.test.ts — 5 cases against PGLite covering full walk + fence-skip on no-fence pages, takes-table populated post-extract, incremental slugs filter, dry-run no-write, rebuild=true clears + re-inserts ad-hoc rows. test/takes-engine.test.ts (16), test/takes-fence.test.ts (15) all still pass — 36/36 takes tests green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.28 takes CLI: list, search, add, update, supersede, resolve src/commands/takes.ts — surfaces the engine methods + takes-fence library through a single `gbrain takes <subcommand>` entrypoint: takes <slug> list with filters + sort takes search "<query>" pg_trgm keyword search across all takes takes add <slug> --claim ... ... append (markdown + DB, atomic via lock) takes update <slug> --row N ... mutable-fields update (markdown + DB) takes supersede <slug> --row N ... strikethrough old + append new takes resolve <slug> --row N --outcome record bet resolution (immutable) Markdown is canonical. Every mutate command: 1. acquires the per-page file lock (withPageLock) 2. re-reads the .md file 3. applies the edit via takes-fence (upsertTakeRow / supersedeRow) 4. writes the .md file back 5. mirrors to the DB via the engine method 6. releases the lock (auto via finally) Resolve currently writes only to DB — surfacing resolved_* in the markdown table is deferred to v0.29 (the takes-fence renderer's column set is fixed at # | claim | kind | who | weight | since | source per spec). Wired into src/cli.ts dispatch + CLI_ONLY allowlist. Help text follows the project convention (orphans/embed/extract pattern). --dir flag overrides sync.repo_path config when working outside the configured brain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.28 MCP + auth: takes_list / takes_search / think ops + per-token allow-list OperationContext gains takesHoldersAllowList — server-side filter for takes.holder field threaded from access_tokens.permissions through dispatch into the engine SQL. Closes Codex P0 #3 at the dispatch layer (chunker strip already closed the page-content side in the previous commit). src/core/operations.ts — three new ops: - takes_list: lists takes with holder/kind/active/resolved filters; honors ctx.takesHoldersAllowList for MCP-bound calls - takes_search: pg_trgm keyword search; honors allow-list - think: op surface registered (returns not_implemented envelope until Lane D's pipeline lands). Remote callers cannot save/take per Codex P1 #7. src/mcp/dispatch.ts — DispatchOpts.takesHoldersAllowList threads into buildOperationContext. src/mcp/http-transport.ts — validateToken now reads access_tokens.permissions.takes_holders, defaults to ['world'] when the column is absent or malformed (default-deny on private hunches). auth.takesHoldersAllowList passed to dispatchToolCall. src/mcp/server.ts (stdio) — defaults to takesHoldersAllowList: ['world'] since stdio has no per-token auth. Operators wanting full visibility use `gbrain call <op>` directly (sets remote=false). src/commands/auth.ts — `gbrain auth create <name> --takes-holders w,g,b` flag persists the per-token list; new `auth permissions <name> set-takes-holders <list>` updates an existing token. Tests: test/takes-mcp-allowlist.test.ts — 8 cases against PGLite proving the threading: local-CLI sees all holders, ['world'] returns only public, ['world','garry'] returns 2/3, no-overlap returns empty (no fallback), search honors allow-list, remote save/take on think rejected with not_implemented envelope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.28.0: ship-prep — VERSION, CHANGELOG, migration orchestrator, skill Closes the v0.28 ship-prep cycle. Bumps VERSION + package.json + bun.lock to 0.28.0. v0_28_0 migration orchestrator runs three idempotent phases on upgrade: - Schema verify: asserts schema_version >= 32 (migrations v31 + v32 already applied by the schema runner during gbrain upgrade); fails clean if not. - Backfill takes: inline runs `extractTakes(engine, { source: 'db' })` so any pre-existing fenced takes tables in markdown populate the takes index. Idempotent; ON CONFLICT DO UPDATE keeps the table in sync. - Re-chunk TODO: queues a pending-host-work entry asking the host agent to re-import pages with takes content so the v0.28 chunker-strip rule (Codex P0 #3 fix) applies retroactively. Pages imported under v0.28+ already have takes content stripped from chunks at index time; this TODO catches up legacy pages. skills/migrations/v0.28.0.md — agent-readable upgrade guide. Walks through doctor verification, deprecated-key migration, MCP token visibility configuration, and a "try the takes layer" smoke test. CHANGELOG.md — v0.28.0 release-summary in the GStack voice (no AI vocabulary, no em dashes, real numbers from git diff stat) + the mandatory "To take advantage of v0.28.0" block + itemized changes by subsystem (schema, engine, markdown surface, model config, MCP+auth, CLI, tests, accepted risks). Final test sweep: 65/65 v0.28 tests pass across 6 files. typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.28 think pipeline: gather → sanitize → synthesize → cite-render → CLI src/core/think/sanitize.ts — prompt-injection defense for take claims: 14 jailbreak patterns (ignore-prior, role-jailbreak, close-take tag, DAN, system-prompt overrides, eval-shell hooks) plus structural framing (takes wrapped in <take id="..."> tags the model is told to treat as DATA). Length-cap at 500 chars. Renders evidence blocks for the prompt. src/core/think/prompt.ts — system prompt + structured-output schema. Hard rules: cite every claim, mark hunches/low-weight explicitly, surface conflicts (never silently pick), surface gaps. JSON schema with answer + citations[] + gaps[]. Prompt adapts to anchor / time window / save flag. src/core/think/cite-render.ts — structured citations + regex fallback (Codex P1 #4 fold). normalizeStructuredCitations validates the model's structured output; parseInlineCitations is the body-scan fallback when the model omits the structured field. resolveCitations dispatches and records CITATIONS_REGEX_FALLBACK warning when used. src/core/think/gather.ts — 4-stream parallel retrieval: 1. hybridSearch (pages, existing primitive) 2. searchTakes (keyword, pg_trgm) 3. searchTakesVector (vector, when embedQuestion fn supplied) 4. traversePaths (graph, when --anchor set) RRF fusion (k=60). Each stream wrapped in try/catch — partial gather beats no synthesis. Honors takesHoldersAllowList for MCP-bound calls. src/core/think/index.ts — runThink orchestrator + persistSynthesis: INTENT (regex classify) → GATHER → render evidence blocks → resolveModel ('models.think' → 'models.default' → GBRAIN_MODEL → opus) → LLM call (injectable client) → JSON parse with code-fence + fallback strip → resolveCitations → ThinkResult. persistSynthesis writes a synthesis page + synthesis_evidence rows (page_id resolved per slug; page-level citations skip evidence). Degrades gracefully without ANTHROPIC_API_KEY. Round-loop scaffolding in place (rounds=1 only path exercised in v0.28). src/commands/think.ts — `gbrain think "<question>"` CLI. Flag parsing strips --anchor, --rounds, --save, --take, --model, --since, --until, --json. Local CLI = remote=false, so save/take honored. Human-readable output by default; --json for agent consumption. operations.ts — `think` op now calls runThink (was a not_implemented stub). Remote callers can't save/take per Codex P1 #7. Returns full ThinkResult plus saved_slug + evidence_inserted. cli.ts — wired into dispatch + CLI_ONLY allowlist. Tests: test/think-pipeline.test.ts — 18 cases against PGLite covering sanitize patterns, structural rendering, citation parsing (structured + regex fallback + dedup + invalid-slug rejection), gather streams + allow-list filter, full pipeline with stub client, malformed-LLM fallback path, no-API-key graceful degradation, persistSynthesis writes page + evidence rows. All pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.28 dream phases: auto-think + drift + budget meter (Codex P1 #10 fold) src/core/anthropic-pricing.ts — USD/1M-tokens map for Claude 4.7 family plus older aliases. estimateMaxCostUsd returns null on unpriced models so the meter caller can warn-once and bypass the gate. src/core/cycle/budget-meter.ts — cumulative cost ledger. Each submit estimates max-cost from (model + estimatedInputTokens + maxOutputTokens), accumulates per-cycle, refuses next submit when projected > cap. Codex P1 #10 fold: non-Anthropic models (gemini, gpt) bypass with one stderr warn per process and `unpriced=true` on the result. Budget=0 disables the gate. Audit trail at ~/.gbrain/audit/dream-budget-YYYY-Www.jsonl. src/core/cycle/auto-think.ts — auto_think dream phase. Reads dream.auto_think.{enabled,questions,max_per_cycle,budget,cooldown_days, auto_commit}. Iterates configured questions through runThink with the BudgetMeter pre-checking each submit. Cooldown timestamp written ONLY on success (matches v0.23 synthesize pattern — retries after partial failures pick back up). When auto_commit=true, persists synthesis pages via persistSynthesis. Default-disabled. src/core/cycle/drift.ts — drift dream phase scaffold. Reads dream.drift.{enabled,lookback_days,budget,auto_update}. Surfaces takes in the soft band (weight 0.3-0.85, unresolved) that have recent timeline evidence on the same page. v0.28 ships the orchestration; the LLM judge that proposes weight adjustments lands in v0.29. modelId + meter wired now so the ledger captures gate state for callers that opt in. Tests: - test/budget-meter.test.ts (7 cases) — pricing-map coverage, allow path, cumulative-deny, budget=0 disabled, unpriced bypass+warn-once, ledger captures all events, ISO-week filename branch. - test/auto-think-phase.test.ts (9 cases) — auto_think enable/skip, questions empty, success → cooldown ts written, cooldown blocks rerun, budget exhausted → partial. drift not_enabled, soft-band candidate detection, complete + dry-run paths. All pass. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.28 e2e Postgres: takes engine + extract + MCP allow-list (12 cases) test/e2e/takes-postgres.test.ts — full v0.28 takes pipeline against real Postgres (gated on DATABASE_URL). 12 cases: - addTakesBatch upsert via unnest() bind path (Postgres-specific) - listTakes filters: holder, kind, sort=weight, takesHoldersAllowList - searchTakes pg_trgm + allow-list filter - supersedeTake transactional path (BEGIN/COMMIT semantics) - resolveTake immutability — second resolve throws TAKE_ALREADY_RESOLVED - synthesis_evidence FK CASCADE on take delete - countStaleTakes + listStaleTakes filter active+null - extractTakesFromDb populates takes from fenced markdown - MCP dispatch with takesHoldersAllowList=['world'] returns only world - MCP dispatch local-CLI path returns all holders - MCP dispatch takes_search honors allow-list - think op forces remote_persisted_blocked even for save+take postgres-engine.ts: addTakesBatch boolean[] serialization fix. postgres-js auto-detects element type from JS arrays; for booleans it mis-detects as scalar. Cast through text[] (`'true' | 'false'`) then SQL-cast to boolean[] — same pattern other batch methods rely on for type-stable bind shapes. test/e2e/helpers.ts: setupDB now (a) tolerates non-existent tables in TRUNCATE (for fresh DBs where v31 hasn't yet created takes/synthesis_evidence) and (b) calls engine.initSchema() to actually run migrations. test/takes-mcp-allowlist.test.ts: updated 2 think-op cases to match Lane D's landed pipeline. They previously asserted not_implemented envelopes; now they assert remote_persisted_blocked + NO_ANTHROPIC_API_KEY graceful-degrade behavior. Run: DATABASE_URL=postgres://localhost:5435/gbrain_test bun test test/e2e/takes-postgres.test.ts Result: 12/12 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.28 dream phases: local DreamPhaseResult type (avoid premature CyclePhase enum extension) cycle.ts's PhaseResult is shaped {phase, status, summary, details} with a narrow PhaseStatus enum ('ok'|'warn'|'fail'|'skipped') and CyclePhase enum that doesn't yet include 'auto_think'/'drift'. The phases ship standalone in v0.28 (cycle.ts dispatcher integration is v0.28.x); using PhaseResult forced premature enum extension. Introduces DreamPhaseResult exported from auto-think.ts: { name: 'auto_think'|'drift'; status: 'complete'|'partial'|'failed'|'skipped'; detail: string; totals?: Record<string,number>; duration_ms: number } drift.ts re-exports the same type. When v0.28.x wires the dispatcher, the adapter at the call site can map DreamPhaseResult → PhaseResult cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.28 e2e: access_tokens.permissions JSONB end-to-end (5 cases) test/e2e/auth-permissions.test.ts — closes the v0.28 token-allow-list verification loop against real Postgres. Exercises: - Migration v32 default backfill: new tokens created without a permissions column get {takes_holders: ["world"]} via the schema DEFAULT clause. - Explicit ["world","garry"] → dispatch.takes_list filters to those holders only; brain hunches stay hidden from this token. - ["world"] default-deny token → takes_search hits filtered to public claims. - {} permissions row (operator tampered) gracefully defaults to ["world"] via the HTTP transport's validateToken parsing. - revoked_at IS NOT NULL → token excluded from active token query. Avoids the postgres-js JSONB double-encode trap (CLAUDE.md memory): pass the object directly to executeRaw, no JSON.stringify, no ::jsonb cast. All 5 pass against pgvector/pgvector:pg16 on port 5435. Combined v0.28 test sweep: 116/116 across 11 files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.28 e2e: chunker takes-strip integration test (Codex P0 #3 verification) test/e2e/chunker-takes-strip.test.ts — verifies the chunker actually strips fenced takes content end-to-end through the import pipeline. This is the Codex P0 #3 fix's verification path: takes content lives ONLY in the takes table for retrieval, never duplicated in content_chunks where the per-token MCP allow-list cannot reach. 5 cases: - chunkText (unit) output never contains TAKES_FENCE_BEGIN/END markers - chunkText output never contains fenced claim text - chunkText output retains non-fence prose (no over-stripping) - importFromContent end-to-end: imported page has chunks but none contain fenced content - takes_fence_chunk_leak doctor invariant: zero rows globally where chunk_text matches `<!--- gbrain:takes:%` Final v0.28 test sweep: 121 pass, 0 fail, 336 expect() calls, 12 files Coverage: schema migrations, engine methods (PGLite + Postgres), takes-fence parser, page-lock, extract phase, takes CLI engine surface, model config 6-tier resolver, MCP+auth allow-list, think pipeline (gather + sanitize + cite-render + synthesize), auto-think + drift + budget meter, JSONB end-to-end, chunker strip integration. ~95% of v0.28 surface area covered. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix CI: apply-migrations skippedFuture arrays + http-transport SQL mock Two CI failures from PR #563: test/apply-migrations.test.ts (2 fails) — `buildPlan` tests assert exact skippedFuture arrays at fixed installed-version stamps. Adding v0.28.0 to the migration registry means it shows up in skippedFuture when the test runs at installed=0.11.1 / installed=0.12.0. Append '0.28.0' to both hardcoded arrays. test/http-transport.test.ts (8 fails) — the FakeEngine mock string-prefix matches `SELECT id, name FROM access_tokens` to return a row. v0.28's validateToken now selects `SELECT id, name, permissions FROM access_tokens` to read the per-token takes_holders allow-list. Mock returned [] on the new query → validateToken treated every token as invalid → 401. Fix: mock now matches both query shapes. validTokens row gets a default `{takes_holders: ['world']}` permission injected when caller didn't supply one (mirrors the migration v33 column DEFAULT). Updated FakeEngineConfig type to allow tests to pass explicit permissions. Verification: bun test test/apply-migrations.test.ts → 18/18 pass bun test test/http-transport.test.ts → 24/24 pass bun run typecheck → clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix CI: add scope annotations to v0.28 ops (takes_list/takes_search/think) test/oauth.test.ts enforces an invariant from master's v0.26 OAuth landing: every Operation must have `scope: 'read' | 'write' | 'admin'`, and any op flagged `mutating: true` must be 'write' or 'admin'. My v0.28 ops were added before master shipped v0.26 + the new invariant; the merge surfaced the gap. Annotations: - takes_list → read - takes_search → read - think → write (mutating: true; --save persists synthesis page) Verification: bun test test/oauth.test.ts → 42/42 pass bun run typecheck → clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.28.2 feat: remote-source MCP + scope hierarchy + whoami (#690) * refactor(core): extract SSRF helpers from integrations.ts to core/url-safety.ts src/core/git-remote.ts (next commit) needs isInternalUrl etc. but importing from src/commands/ would invert the layering boundary (no existing src/core/ file imports from src/commands/). Extract the SSRF helpers (parseOctet, hostnameToOctets, isPrivateIpv4, isInternalUrl) into a new src/core/url-safety.ts and have integrations.ts re-export for backward compat. test/integrations.test.ts continues to pass without changes (110 existing tests, 214 expects). Why this matters for v0.28: the upcoming sources --url feature reuses this SSRF gate for git-clone URL validation. Codex review caught that re-rolling weaker URL classification would regress on the IPv6/v4-mapped/ metadata/CGNAT bypass forms that integrations.ts already handles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(core): add git-remote module — SSRF-defensive clone/pull + state probe New src/core/git-remote.ts (~210 lines) for v0.28's remote-source feature: - GIT_SSRF_FLAGS exported const: -c http.followRedirects=false, -c protocol.file.allow=never, -c protocol.ext.allow=never, --no-recurse-submodules. Single source of truth shared by cloneRepo and pullRepo so a future flag added to one path lands on both. Closes the SSRF surfaces codex flagged: DNS rebinding via redirects, .gitmodules as a second-fetch surface, file:// scheme in remotes. - parseRemoteUrl: https-only, rejects embedded credentials and path traversal, delegates internal-target classification to isInternalUrl from url-safety.ts (covers RFC1918, link-local, loopback, IPv6, CGNAT 100.64/10, metadata hostnames, hex/octal/single-int bypass forms). GBRAIN_ALLOW_PRIVATE_REMOTES=1 escape hatch with stderr warning is needed for self-hosted git over Tailscale (CGNAT trips the gate). - cloneRepo: --depth=1 default (full clone via depth: 0); refuses non-empty destDirs; spawns git via execFileSync (no shell injection) with GIT_TERMINAL_PROMPT=0 + askpass=/bin/false to prevent credential prompts. timeoutMs default 600s. - pullRepo: -C path + GIT_SSRF_FLAGS + pull --ff-only, same env confine. - validateRepoState: 6-state decision tree (missing | not-a-dir | no-git | corrupted | url-drift | healthy). Used by performSync's re-clone branch to recover from rmd clone dirs and refuse syncs on url-drift or corruption. test/git-remote.test.ts (304 lines, 32 tests): GIT_SSRF_FLAGS exact shape, all parseRemoteUrl rejection cases including dedicated CGNAT 100.64/10 with/without GBRAIN_ALLOW_PRIVATE_REMOTES (codex T3 case), fake-git harness for argv assertions on cloneRepo/pullRepo, all 6 validateRepoState branches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(core): add scope hierarchy + ALLOWED_SCOPES allowlist New src/core/scope.ts (~120 lines) for v0.28's scoped MCP feature. Hierarchy: - admin implies all (escape hatch) - write implies read - sources_admin and users_admin are siblings (different axes — sources-mgmt vs user-account-mgmt; neither implies the other) Exported: - hasScope(grantedScopes, requiredScope): the canonical scope check. Replaces exact-string-match at three call sites in upcoming commits (serve-http.ts:673, oauth-provider.ts:365 F3 refresh, oauth-provider.ts:498 token issuance). Without this rewrite, an admin-grant token would fail to refresh down to sources_admin (codex finding). - ALLOWED_SCOPES set + ALLOWED_SCOPES_LIST sorted array (deterministic for OAuth metadata wire format and drift-check output). - assertAllowedScopes / InvalidScopeError: registration-time gate so tokens with bogus scope strings (read flying-unicorn) get rejected with RFC 6749 §5.2 invalid_scope at auth.ts:296 + DCR /register + registerClientManual. Today's behavior accepts any string silently. - parseScopeString: space-separated wire format → array. Forward-compat: hasScope ignores unknown granted scopes rather than throwing, so pre-allowlist tokens with weird scope strings continue working without crashes (registration is the gate, runtime is best-effort). test/scope.test.ts (178 lines, 35 tests): hierarchy table including all-implies for admin, sibling non-implication of *_admin scopes, write→read but not the reverse, F3 refresh-token subset semantics under hasScope, ALLOWED_SCOPES_LIST sorted-pinning, allowlist rejection cases, parseScopeString edge cases (undefined/null/empty). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * build(admin): scope-constants mirror + drift CI for src/core/scope.ts The admin React SPA's tsconfig.json scopes include: ['src'] to admin/src/, so it cannot directly import ../../src/core/scope.ts. The plan considered widening the include or generating a single source of truth; both options either couple the SPA to the gbrain monorepo or add a build step. Eng review picked the boring choice: hand-maintained mirror at admin/src/lib/scope-constants.ts plus a CI drift check. Files: - admin/src/lib/scope-constants.ts: hand-maintained ALLOWED_SCOPES_LIST duplicate, sorted alphabetically to match src/core/scope.ts. - scripts/check-admin-scope-drift.sh: extracts the list from each file via awk, normalizes via tr/sort, diffs. Exits 0 on match, 1 on drift (with full breakdown of which scopes diverged), 2 on internal error. Tested both passing and corrupted paths. - package.json: wires check:admin-scope-drift into both `verify` and `check:all` so any update to src/core/scope.ts that forgets the admin-side mirror fails the build. The Agents.tsx scope-checkbox sites (5 hardcoded locations) get updated in a later commit to import from this constants file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(oauth): hasScope hierarchy + ALLOWED_SCOPES allowlist at registration Switch three call sites in oauth-provider.ts from exact-string-match to hasScope() so the v0.28 sources_admin and users_admin scopes — and the admin-implies-all + write-implies-read hierarchy in src/core/scope.ts — work end to end: - F3 refresh-token subset enforcement at line 365: previously rejected admin → sources_admin refresh because exact-match treated them as unrelated scopes. gstack /setup-gbrain Path 4 needs admin tokens to refresh down to least-privilege sources_admin scope; this fix lands that path. - Token issuance intersection at line 498 (client_credentials grant): same hasScope swap so a client whose stored grant is `admin` can mint tokens including any implied scope. - registerClient (DCR /register) and registerClientManual: validate every scope string against ALLOWED_SCOPES via assertAllowedScopes. Pre-fix the system silently accepted `--scopes "read flying-unicorn"` and persisted the bogus string in oauth_clients.scope. Post-fix the caller gets RFC 6749 §5.2 invalid_scope. Existing rows with pre-allowlist scopes keep working (allowlist gates registration only). Tests amended in test/oauth.test.ts: - T1 (eng-review): admin grant CAN refresh down to sources_admin - T1 sibling: write grant CANNOT refresh up to sources_admin - ALLOWED_SCOPES allowlist coverage (manual + DCR paths, all 5 valid) - Scope-annotation contract tests widened to accept the v0.28 union 62 OAuth tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(serve-http): hasScope at /mcp + advertise full ALLOWED_SCOPES Two changes against src/commands/serve-http.ts: - Line 195: scopesSupported on the mcpAuthRouter options switches from the hardcoded ['read','write','admin'] to Array.from(ALLOWED_SCOPES_LIST). Without this, /.well-known/oauth-authorization-server keeps reporting the old triple, so MCP clients (Claude Desktop, ChatGPT, Perplexity) cannot discover the v0.28 sources_admin and users_admin scopes via standard discovery — they would have to be pre-configured out of band. - Line 673: request-time scope check on /mcp swaps authInfo.scopes.includes(requiredScope) for hasScope(...). This was the most-cited codex finding: without it, sources_admin tokens could not even satisfy a `read`-scoped op (sources_admin doesn't include the literal string "read"). hasScope routes through the hierarchy table in src/core/scope.ts so admin implies all and write implies read at the gate too. T2 amendment in test/e2e/serve-http-oauth.test.ts: assert /.well-known/oauth-authorization-server includes all 5 scopes in scopes_supported. Pre-v0.28 the list was hardcoded to ['read','write', 'admin'] and this assertion would have failed. (The test is Postgres-gated; runs under bun run test:e2e with DATABASE_URL set.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(core): sources-ops module — atomic clone + symlink-safe cleanup src/core/sources-ops.ts (~470 lines): pure async functions extracted from src/commands/sources.ts so the CLI handlers and the new MCP ops share one implementation. addSource: D3 atomicity contract from the eng review. 1. Validate id (matches existing SOURCE_ID_RE). 2. Q4 pre-flight SELECT — fail loudly with structured `source_id_taken` before any clone work. Pre-fix the existing CLI used INSERT…ON CONFLICT DO NOTHING which silently no-op'd; with clone-first that would orphan the temp dir. 3. parseRemoteUrl gate (delegates to isInternalUrl from url-safety.ts). 4. Clone into $GBRAIN_HOME/clones/.tmp/<id>-<rand>/ via the new git-remote helpers. 5. INSERT row with local_path=<final clone dir>, config.remote_url=<url>. 6. fs.renameSync(tmp/, final/). Rollback on either-side failure unlinks the temp dir; rename-failed path also DELETEs the just-INSERTed row best-effort. removeSource: clone-cleanup with realpath+lstat confinement matching validateUploadPath() shape at src/core/operations.ts:61. String startsWith is symlink-unsafe and would let $GBRAIN_HOME/clones/<id> → /etc resolve out of the confine. Two defenses layered: - isPathContained (realpath-resolves both sides + parent-with-sep string check) rejects symlinks whose target falls outside the confine. - lstat-then-isSymbolicLink check refuses symlinks whose realpath happens to land back inside the confine (defense in depth). getSourceStatus: returns clone_state via validateRepoState (the 6-state decision tree from git-remote.ts). Lets a remote MCP caller diagnose "healthy | missing | not-a-dir | no-git | url-drift | corrupted" without SSH access to the brain host. listSources additionally exposes remote_url so callers can see which sources are auto-managed. recloneIfMissing: T4 follow-up for `gbrain sources restore` after the clone dir was autopurged — re-clones via the same temp + rename atomicity contract. Idempotent (returns false when clone is already healthy). test/sources-ops.test.ts (~470 lines, 24 tests): pre-flight collision (Q4), happy paths for both --path and --url, all four D3 rollback paths (clone-fail before INSERT, INSERT-fail after clone, rename-fail post-INSERT, atomic temp-dir cleanup), symlink-target-OUTSIDE-clones (realpath confinement), symlink-target-INSIDE-clones (lstat-check), removeSource refuses to delete user-supplied paths, refuses "default" source, getSourceStatus clone_state branches, T4 recloneIfMissing recovery + idempotent + no-op for path-only sources, isPathContained unit tests covering subtree / outside / symlink-escape / fail-closed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(operations): whoami + sources_{add,list,remove,status} MCP ops Five new ops in src/core/operations.ts auto-flow through src/mcp/tool-defs.ts so MCP clients (Claude Desktop, ChatGPT, Perplexity, OpenClaw) get them via standard tools/list discovery — no SDK or transport code changes needed. Operation.scope union widened to add 'sources_admin' and 'users_admin' (the v0.28 hierarchy from src/core/scope.ts). whoami (scope: read): introspect calling identity over MCP. - Returns `{transport: 'oauth', client_id, client_name, scopes, expires_at}` for OAuth clients (clientId starts with gbrain_cl_). - Returns `{transport: 'legacy', token_name, scopes, expires_at: null}` for grandfathered access_tokens. - Returns `{transport: 'local', scopes: []}` when ctx.remote === false. Empty scopes (NOT ['read','write','admin']) is the D2 decision — returning OAuth-shaped scopes for local callers would resurrect the v0.26.9 footgun where code conditionally trusted on `auth.scopes.includes('admin')` instead of `ctx.remote === false`. - Q3 fail-closed: throws unknown_transport when remote=true AND auth is missing OR ctx.remote is the literal `undefined` (cast bypass guard). A future transport that forgets to thread auth doesn't get a free pass. sources_add (sources_admin, mutating): register a source by --path (existing v0.17 behavior) or --url (v0.28 federated remote-clone path). Calls into addSource from sources-ops.ts which owns the temp-dir + rename atomicity. sources_list (read): list registered sources with page counts, federated flag, and remote_url. The remote_url field is new — lets a remote MCP caller see which sources are auto-managed. sources_remove (sources_admin, mutating): cascade-delete a source + symlink-safe clone cleanup. Requires confirm_destructive: true when the source has data. sources_status (read): per-source diagnostic returning clone_state ('healthy' | 'missing' | 'not-a-dir' | 'no-git' | 'url-drift' | 'corrupted' | 'not-applicable') — lets a remote MCP caller diagnose a busted clone without SSH access to the brain host. test/whoami.test.ts (9 tests): pinned transport-detection for all four return shapes including Q3 fail-closed throw under both auth=undefined and remote=undefined cast-bypass paths. test/sources-mcp.test.ts (16 tests): op-metadata pins (scope, mutating, localOnly), functional handler shape against PGLite, hasScope-driven scope-enforcement smoke test simulating the serve-http.ts:673 gate (read-only token rejected for sources_add; sources_admin token allowed; admin token allowed for everything; gstack /setup-gbrain Path 4 token covers all 4 ops), SSRF gate at the op layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(sync): re-clone fallback when clone is missing/no-git/corrupted src/commands/sync.ts gets a v0.28-aware front-half. When the source has config.remote_url, performSync calls validateRepoState before the existing fast-forward pull path: - 'healthy' → fall through to existing pull (unchanged) - 'missing' → loud stderr "auto-recovery: re-cloning <id>", then 'no-git' recloneIfMissing handles the temp-dir + rename. Sync 'not-a-dir' continues from the freshly-cloned head. - 'corrupted' → throw with structured hint pointing at sources remove + add (no syncing wrong state). - 'url-drift' → throw with hint pointing at the (deferred) sources rebase-clone command. Closes the operator-confidence gap: rm -rf $GBRAIN_HOME/clones/<id>/ no longer breaks future syncs. The next sync sees the missing dir and recovers via the recorded URL. src/core/operations.ts: extend ErrorCode with 'unknown_transport' so whoami's Q3 fail-closed path types check. test/sources-resync-recovery.test.ts (12 tests): full validateRepoState state matrix exercised under fake-git, recloneIfMissing recovery from each degraded state, idempotent on healthy clones, the sync.ts:320 integration path that drives the recovery. test/sources-ops.test.ts + test/sources-mcp.test.ts: drop the GBRAIN_PGLITE_SNAPSHOT-disable line so these tests stop forcing cold init across the parallel-shard runner. With snapshot allowed, init time drops from 6+s to ~50ms and parallel runs stay under the 5s hook timeout. test/sources-mcp.test.ts: tighten scope literal-type so tsc keeps the union narrow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cli): sources add --url + restore re-clone, thin-wrapper refactor src/commands/sources.ts now delegates the data-mutation work to src/core/sources-ops.ts (added in the previous commit). The CLI handler parses argv, calls into addSource, and formats output. Two new flags on `gbrain sources add`: - `--url <https-url>` : federated remote-clone path (clone + INSERT + rename, atomic rollback on failure). - `--clone-dir <path>` : override the default $GBRAIN_HOME/clones/<id>/ destination. Validation rejects mutually-exclusive `--url` + `--path`. Errors from the ops layer (SourceOpError) propagate through the CLI's standard error wrapper in src/cli.ts so existing tests that assert throw shape keep passing. `gbrain sources restore <id>` (T4 from eng review): if the source has a remote_url AND the on-disk clone was autopurged, call recloneIfMissing before declaring success. Clone errors print a WARN with recovery hints rather than failing the restore — the DB row is what restore guarantees; the clone is best-effort. 54 sources-related tests pass (existing test/sources.test.ts + sources-ops + sources-mcp). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(doctor,cycle): orphan-clones surface + autopilot purge phase (P1) addSource's atomicity contract uses a temp dir that gets renamed to the final clone path. If the process is SIGKILL'd between clone-finish and rename, the temp dir orphans on disk. Without sweeping these, a brain server accumulates gigabytes over months of failed `sources add --url` attempts. Two layers: 1. `gbrain doctor` now surfaces stale entries. A new orphan_clones check walks $GBRAIN_HOME/clones/.tmp/, names anything older than 24h, and prints a warn with disk-byte estimate. Operators see the leak before `df` complains. 2. The autopilot cycle's existing `purge` phase grows a substep that nukes .tmp/ entries past the same 72h TTL the page-soft-delete purge uses. Operator behavior stays uniform across all soft-delete-style surfaces. Both layers are filesystem-only (no DB). On a brain that never used --url cloning, both are no-ops. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * build(admin): scope checkboxes source from scope-constants mirror + dist admin/src/pages/Agents.tsx Register Client modal: - useState default sources from ALLOWED_SCOPES_LIST (defaulting `read` to true, others false; unchanged UX for the common case). - Scope checkbox map iterates ALLOWED_SCOPES_LIST instead of the old hardcoded ['read','write','admin']. Without this commit, even with the v0.28.1 server-side scope hierarchy, operators registering an OAuth client from the admin UI cannot tick the new sources_admin / users_admin scopes — defeats the whole gstack /setup-gbrain Path 4 unblock. The drift-check CI gate (scripts/check-admin-scope-drift.sh) ensures this list stays in sync with src/core/scope.ts going forward. admin/dist/* rebuilt via `cd admin && bun run build`. Old hash bundle removed; new bundle (224.96 kB / 68.70 kB gzip). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: v0.28.1 — remote-source MCP + scope hierarchy + whoami VERSION + package.json: bump to 0.28.1 (per CLAUDE.md branch-scoped versioning rule — this branch adds substantial new features on top of v0.28.0). CHANGELOG.md: new top-level entry for v0.28.1 in the gstack/Garry voice (no AI vocabulary, no em dashes, real numbers + commands). Lead paragraph names what the user can now do that they couldn't before. "Numbers that matter" table calls out the +5 MCP ops, +2 OAuth scopes, and the 4-to-0 SSH-step number for gstack /setup-gbrain Path 4. "What this means for you" closer ties the work to the operator workflow shift. "To take advantage of v0.28.1" block has paste-ready upgrade commands including the admin SPA rebuild step. Itemized changes section describes the architecture cleanly without exposing scope-string internals to public attack-surface enumeration (per CLAUDE.md responsible-disclosure rule). TODOS.md: file 6 follow-ups under a new "Remote-source MCP follow-ups (v0.28.1)" section: token rotation, migration introspection in get_health, Accept-header friendliness, sources rebase-clone for URL-drift recovery, --filter=blob:none partial-clone option, and the chunker_version PGLite-schema parity codex caught. README.md: short subsection under the existing sources CLI listing that names the new --url flag and what auto-recovery does. Capability framing (no scope-string enumeration). llms.txt + llms-full.txt: regenerated via `bun run build:llms` so the documentation bundle reflects the v0.28.1 entry. The build-llms generator's drift check passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(e2e): sources-remote-mcp — full gstack /setup-gbrain Path 4 round-trip Spins up `gbrain serve --http` against real Postgres with a fake-git binary in PATH (so `git clone` is exercised end-to-end without network), registers two OAuth clients (sources_admin + read-only), mints tokens, calls the new v0.28.1 MCP ops via /mcp, and asserts the gstack /setup-gbrain Path 4 flow works end to end. 12 tests cover the full lifecycle: - whoami over HTTP MCP returns transport=oauth + the right scopes - /.well-known/oauth-authorization-server advertises all 5 scopes - sources_add: clone fires, INSERT lands, row carries config.remote_url - sources_status: clone_state=healthy after add - sources_list: surfaces remote_url for the new source - SSRF rejection: sources_add with RFC1918 URL fails at parseRemoteUrl gate - Scope enforcement: read-only token gets insufficient_scope on sources_add - Read-only token CAN call sources_list (read-scoped op) - ALLOWED_SCOPES allowlist: CLI register-client rejects bogus scope - Recovery: rm clone dir + sources_status reports clone_state=missing - sources_remove: cascades + cleans up the auto-managed clone dir Subprocess env threading replicates the v0.26.2 bun execSync inheritance pattern — bun does NOT inherit process.env mutations, so every CLI subprocess call passes env: { ...process.env } explicitly. Cleanup contract mirrors test/e2e/serve-http-oauth.test.ts: revoke any clients we registered, force-kill the server subprocess on SIGTERM timeout, surface cleanup failures to stderr without throwing so real test failures aren't masked. The base table list in helpers.ts (ALL_TABLES) doesn't include sources or oauth_clients, so this test explicitly truncates them in beforeAll to avoid Q4 pre-flight collisions on re-run. Skipped gracefully when DATABASE_URL is unset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: codex adversarial review — confine remote sources_admin + close SSRF gaps Pre-ship adversarial review (codex exec) caught five issues. Four ship in this commit; the fifth (DNS rebinding) is filed as v0.28.x follow-up. CRITICAL — `sources_admin` tokens over HTTP MCP could plant content at any host path. The MCP op exposed `path` and `clone_dir` to remote callers; the op layer trusted them verbatim, then auto-recovery's rm -rf on degraded state turned that into arbitrary delete primitives. src/core/operations.ts sources_add handler now drops both fields when ctx.remote !== false. Local CLI keeps the override (operator trust). Loud logger.warn when a remote caller tries — visible in the SSE feed without leaking values. HIGH — Steady-state `git pull --ff-only` bypassed GIT_SSRF_FLAGS entirely. The legacy helper at src/commands/sync.ts:192 spawned git without the -c http.followRedirects=false -c protocol.{file,ext}.allow=never --no-recurse-submodules set that cloneRepo applies. Every recurring sync was reopening the redirect/submodule/protocol bypass. Routed the call site at sync.ts:381 through pullRepo from git-remote.ts so initial clone and ongoing pull share one defensive flag set. MEDIUM — listSources ignored its `include_archived` flag. The op advertised the param but the function destructured it as `_opts` and queried every row. Archived sources' ids, local_paths, and remote_urls were leaking to read-scoped MCP callers by default. Filter in SQL (`WHERE archived IS NOT TRUE` unless the flag is set) so archived rows never reach the wire. PARTIAL HIGH — IPv6 ULA fc00::/7 and link-local fe80::/10 were not in the isInternalUrl bypass list. Only ::1/:: and IPv4-mapped IPv6 were blocked. Added regex-based ULA + link-local rejection to url-safety.ts. Test coverage: - test/git-remote.test.ts: 4 new IPv6 cases (ULA fc-prefix + fd-prefix, link-local fe80::, public IPv6 still allowed). - test/sources-mcp.test.ts: 3 new cases pinning the remote/local asymmetry (clone_dir override silently ignored over MCP, path nulled, local CLI keeps the override). - test/sources-mcp.test.ts: 2 new cases for include_archived honored. DNS rebinding (codex finding #3): the current gate is lexical only. A deliberate attacker who controls a hostname's A/AAAA records can still resolve to an internal IP. Closing this requires async DNS resolution + revalidation; filed as v0.28.x follow-up in TODOS.md so the API change surface (parseRemoteUrl becomes async, every caller updates) lands in its own PR. 323 tests pass (9 files); 4071 unit tests pass (full suite). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: rebump v0.28.1 → v0.28.2 (master collision) Caught after PR creation. master is at v0.28.1 already; this branch forked from garrytan/v0.28-release at v0.28.0 and naively bumped to v0.28.1 without checking the master queue. CI version-gate would have rejected at merge time (requires VERSION strictly greater than master's). Root cause: I bumped VERSION mechanically during plan implementation (echo "0.28.1" > VERSION) without consulting the queue-aware allocator at bin/gstack-next-version. /ship Step 12's idempotency check then classified state as ALREADY_BUMPED and the workflow's "queue drift" comparison was the safety net I should have hit — but I skipped it. Files updated: - VERSION + package.json: 0.28.1 → 0.28.2 - CHANGELOG.md: header + "To take advantage of v0.28.2" subsection - README.md: sources --url note version reference - TODOS.md: 7 follow-up entries' version references - llms.txt + llms-full.txt: regenerated PR title rewrite via gstack-pr-title-rewrite.sh handled in a separate gh pr edit call; CI version-gate now passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: adaptive embed batch sizing for Voyage token limits Voyage's tokenizer is 3-4x denser than OpenAI tiktoken, causing batches of 50+ texts to exceed the 120K token-per-batch limit even when DB token counts (from tiktoken) suggest they'd fit. Changes: - Add max_batch_tokens to EmbeddingTouchpoint type (provider-declared limit) - Set Voyage recipe to 120K token limit - Gateway embed() now auto-splits batches using conservative char-to-token estimate (1:1 ratio, 80% budget utilization) - On token-limit errors, embedSubBatch recursively halves and retries (down to single-text batches before giving up) - Reduce embedding.ts BATCH_SIZE from 100 to 50 as a secondary guard - Add tests for batch splitting logic and error pattern matching Fixes infinite retry loops where the same oversized batch would fail repeatedly because WHERE embedding IS NULL re-fetches identical rows. * feat(ai): per-recipe chars_per_token + safety_factor on EmbeddingTouchpoint Voyage's tokenizer runs ~3-4× denser than OpenAI tiktoken on mixed content (code/JSON/CJK), so a global "1 char ≈ 1 token at 80%" estimate either overshoots Voyage's batch cap on dense payloads or kills OpenAI throughput. Move the policy onto the recipe. - types.ts: extend EmbeddingTouchpoint with optional chars_per_token (default 4) and safety_factor (default 0.8). Both only consulted when max_batch_tokens is also set. - voyage.ts: declare chars_per_token=1 + safety_factor=0.5 (60K char budget). * feat(ai/gateway): transport DI + adaptive shrink-on-miss + startup warning Architectural changes to make the embed pipeline testable through the public embed() seam (no private-function DI) and self-healing under tokenizer miscalibration. Per /codex outside-voice review of the original PR #680 plan. - Export splitByTokenBudget + isTokenLimitError as @internal pure helpers; the test file now imports the real functions instead of re-implementing them. - splitByTokenBudget takes chars_per_token as a third parameter (defaults to 4 for OpenAI density when omitted); 0/negative ratios fall back to default. - New __setEmbedTransportForTests(fn) seam — tests inject an embedMany stub and drive recursion / fast-path scenarios through the real embed() call. Production code never reads the override; resetGateway() restores the SDK. - New module-scoped _shrinkState Map<recipeId, {factor, consecutiveSuccesses}>: on token-limit miss, shrink the recipe's effective safety_factor by 0.5 (floor 0.05) so the next embed() pre-splits tighter; after 10 consecutive batch successes, heal back ×1.5 toward the recipe-declared ceiling. - Startup warning (once per process per recipe): configureGateway walks every registered recipe; any embedding touchpoint without max_batch_tokens (except the canonical OpenAI fast-path recipe) emits one stderr line. Future Cohere/Mistral/Jina recipes that forget the field re-create the v0.27 Voyage backfill loop — the warning catches it before traffic hits the cliff. - Embed an ASCII flow diagram in the embed() JSDoc covering the shrinkState + per-recipe budget computation. Test rewrite (23 cases): - Pure helpers: splitByTokenBudget chars_per_token threading, default fallback, isTokenLimitError pattern coverage including non-Error throwables. - Recursion via embed() with stubbed transport: halving + concat-in-order, order preservation across boundaries (slot-0 sentinel asserts mapping), terminal MIN_SUB_BATCH=1 throws normalized error (no infinite loop). - OpenAI fast path: transport called exactly once, no partition, no cross-recipe leakage of voyage shrink state. - Shrink-on-miss: first miss halves factor, floors at 0.05 under repeated misses, heals after wins, healing capped at recipe ceiling. - Startup warning: first call fires once per recipe; subsequent configureGateway calls suppressed within the same process. * chore(embedding): revert BATCH_SIZE 50→100 The PR initially dropped BATCH_SIZE to 50 as a safety guard for Voyage's batch cap, but that halved OpenAI throughput on every embed page even though OpenAI has no such cap. With per-recipe pre-split + recursive halving + adaptive shrink-on-miss now living in the gateway, the outer paginator goes back to its original purpose: progress-callback granularity, not batch protection. * chore: bump version and changelog (v0.28.7) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: annotate v0.28.7 changes in CLAUDE.md key files --------- Co-authored-by: garrytan-agents <garrytan-agents@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…6 master merge) (#706) * feat: AI gateway + 6 provider recipes + silent-drop fix (v0.15.0) Unified AI layer: src/core/ai/gateway.ts routes every AI call through Vercel AI SDK. Per-touchpoint provider selection via provider:model config strings. Six typed recipes (OpenAI, Google, Anthropic, Ollama, Voyage, LiteLLM-proxy template). Fixes the silent-drop bug at all three sites (operations.ts:237, hybrid.ts:81, import-file.ts:112): !process.env.OPENAI_API_KEY → gateway.isAvailable('embedding'). Non-OpenAI brains now actually embed. Embedding failures propagate as AIConfigError instead of quietly writing chunks with no vectors. Schema templating: getPGLiteSchema(dims, model) substitutes __EMBEDDING_DIMS__ + __EMBEDDING_MODEL__. Postgres initSchema runtime-replaces vector(1536) + 'text-embedding-3-large' based on gateway config. Preserves existing 1536-dim brains via explicit providerOptions.openai.dimensions passthrough (OpenAI API default is 3072; without this, existing brains break). Three-class error hierarchy: AIServiceError (base) + AIConfigError (user fix) + AITransientError (retry). No process.env mutation — gateway reads from GatewayContext passed in from engine. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: gbrain providers CLI + init flags + config (v0.15.0) New command: gbrain providers [list|test|env|explain]. Explain emits a schema_version:1 JSON matrix (agent-friendly). Auto-detects env keys + probes localhost:11434 /v1/models (validates JSON shape, not just port-open). Recommends the best provider with one-line reasoning. gbrain init flags: --embedding-model provider:model (verbose) or --model provider (shorthand, picks recipe default). Plus --embedding-dimensions and --expansion-model. AI config flows into saved GBrainConfig; engine.connect() configures gateway before initSchema so vector column gets right dim. config.ts: adds embedding_model, embedding_dimensions, expansion_model, provider_base_urls. loadConfig() reads env vars but NEVER mutates process.env — global-state leakage would break MCP, multi-brain, and long-running workers. cli.ts: routes 'providers' subcommand (CLI_ONLY, no engine needed); connectEngine() calls configureGateway() before engine.connect(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: AI gateway + silent-drop + schema templating + no-env-mutation (v0.15.0) 28 new unit tests across 4 files: - test/ai/gateway.test.ts — 13 tests covering isAvailable() matrix for the silent-drop regression surface. Critical case: Gemini available when GOOGLE_GENERATIVE_AI_API_KEY set AND OPENAI_API_KEY absent. Pre-v0.15 brains silently dropped vectors in this config. - test/ai/silent-drop-regression.test.ts — 3 source-level grep tests enforcing !process.env.OPENAI_API_KEY cannot re-enter the codebase at any of the three known sites. - test/ai/schema-templating.test.ts — 4 tests for dim/model substitution in getPGLiteSchema() + PGLITE_SCHEMA_SQL back-compat. - test/ai/config-no-env-mutation.test.ts — regression guard ensuring loadConfig() does not mutate process.env (Codex review C3). All 28 pass locally. Existing unit suite (1397) + Tier 1 E2E (129) + Tier 2 skills E2E (3) all green against real Postgres+pgvector and real OpenAI/Anthropic/openclaw. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.0) Adds AI SDK deps (ai, @ai-sdk/openai, @ai-sdk/google, @ai-sdk/anthropic, @ai-sdk/openai-compatible, zod, gray-matter, eventsource-parser). Note: Version jumped from 0.13.0 to 0.15.0 because upstream master shipped 0.14.x (doctor DRY detection, Knowledge Runtime) while this branch was in development. Keeping 0.15.0 as the natural next release number for the AI providers cathedral. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: silent-drop regression test uses relative paths CI failure: test hardcoded /Users/garrytan/... absolute paths that obviously don't exist outside my machine. Resolve paths relative to import.meta.dir so the test works on any checkout + in GitHub Actions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version to 0.17.0 Locked to 0.17.0 since other PRs (v0.15.x, v0.16.x) may land first. Also removes the "v0.15" comment in gateway.ts — the v0.15 label belongs to whatever ships next on master, not this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version to 0.19.0 Re-locked to 0.19.0 (from 0.17.0) to leave room for other PRs landing first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version to 0.21.0 Re-locked to 0.21.0 (from 0.19.0) to leave room for other PRs landing first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Bump version to v0.23.0 * Bump version to v0.27.0 * feat(ai): add chat touchpoint with 6 chat-capable recipes Foundation for multi-provider Minions. Purely additive — no behavior change to existing embedding/expansion paths or to subagent.ts. - types.ts: 'chat' added to TouchpointKind. New ChatTouchpoint shape with supports_subagent_loop separate from supports_tools (Codex F-OV-2: some chat-capable models are bad at durable tool loops). supports_prompt_cache gates Anthropic-specific cacheControl. AIGatewayConfig gains chat_model + chat_fallback_chain. - Recipe.aliases?: Record<string,string> (Codex F-OV-5). Friendly undated forms like 'anthropic:claude-sonnet-4-6' resolve to the dated canonical at parse time. - recipes/anthropic.ts, openai.ts, google.ts: each gains a chat touchpoint. Only Anthropic claims supports_prompt_cache=true. - recipes/deepseek.ts, groq.ts, together.ts: NEW openai-compat recipes. DeepSeek powers refusal-fallback + cheap-research. Groq is the speed tier. Together is the open-weights house (Qwen, Llama-3.3-70B-Turbo). - gateway.ts: chat() function wraps Vercel AI SDK's generateText. Returns a provider-neutral ChatResult with normalized usage (input/output + cache_read/cache_creation pulled from providerMetadata.anthropic per D7 review decision). cacheSystem: ephemeral marker only when recipe.supports_prompt_cache===true. Stop-reason mapping is structural-signal-first per D8 (Anthropic stop_reason='refusal', OpenAI finish_reason='content_filter') — refusal regex layer ships in commit 3. - config.ts: GBrainConfig adds chat_model + chat_fallback_chain. Env overrides GBRAIN_CHAT_MODEL + GBRAIN_CHAT_FALLBACK_CHAIN. - cli.ts: connectEngine plumbs chat config into configureGateway. - providers.ts: --touchpoint chat smoke harness. List shows EMBED/EXPAND/ CHAT columns. Explain matrix surfaces chat options with input/output cost. Recipe alias forms accepted in --model. - init.ts: --chat-model PROVIDER:MODEL flag. - test/ai/gateway-chat.test.ts: 21 cases covering recipe registry, resolver alias resolution, config plumbing, isAvailable('chat') semantics for chat-only/embedding-only providers. 49/49 ai/* tests pass. Typecheck clean. * feat(schema): provider-neutral subagent persistence (migration v34) D11 cross-model resolution. Codex F-OV-1 noted that subagent_messages and subagent_tool_executions store Anthropic-shaped tool_use / tool_result blocks as JSONB. When a worker resumes mid-loop and the live model is OpenAI/DeepSeek, the persisted shape becomes the runtime contract — read-side translation is lossy. Mechanical schema-only migration. No code uses these columns yet; commit 2 (subagent refactor onto gateway.chat()) starts writing schema_version=2 with provider-neutral ChatBlock[] in content_blocks. - migrate.ts: v34 ALTERs subagent_messages + subagent_tool_executions to add schema_version (DEFAULT 1) and provider_id (TEXT). All ALTERs use ADD COLUMN IF NOT EXISTS so re-runs are idempotent. - src/schema.sql + pglite-schema.ts: fresh-install DDL gains the same columns. New idx_subagent_messages_provider for cost rollups + per- provider replay diagnostics. - schema-embedded.ts: regenerated via bun run build:schema. - test/migrate.test.ts: 7 new cases pin the migration shape — column names + types, idempotency, fresh-install schema parity, embedded schema parity. 75/75 migrate tests pass. Existing rows backfill to schema_version=1 via DEFAULT, tagging them as legacy Anthropic shape. Subagent.ts read path (commit 2) checks the version and dispatches the right block mapper. * fix(ai): drop Wintermute reference from deepseek recipe comment CI's check:privacy gate caught a banned name in src/core/ai/recipes/deepseek.ts:5. CLAUDE.md (per the privacy rule) bans the private OpenClaw fork name in any checked-in code. Replaces it with neutral language describing the same capability ("second hop in a refusal-fallback chain and cheap-research delegation"). bun run verify now passes locally. * v0.27.1 feat: Voyage multimodal embeddings + image ingestion + --image search (#664) * phase 1: bun --compile probe for HEIC/AVIF decoders (Eng-1A) Verifies that compiled binaries can decode HEIC + AVIF before the multimodal ingestion pipeline depends on them. Mirrors the v0.19.0 tree-sitter check-wasm-embedded pattern: minimal harness, bun --compile, run binary, decode fixtures, fail loud on regression. Caught one real issue along the way: @jsquash/avif loads avif_dec.wasm relative to its own JS file, which fails inside a bun --compile VFS. Fix: pre-compile the WASM via init() with bytes loaded through `with { type: 'file' }` import attribute. This pattern needs to be mirrored in src/core/import-file.ts when we wire the real ingestion path. heic-decode "just works" because libheif-bundle.js inlines the WASM as base64. Adds: - heic-decode + @jsquash/avif + exifr deps - scripts/image-decoders-smoketest.ts compiled-binary harness - scripts/check-image-decoders-embedded.sh CI guard - test/fixtures/images/tiny.{heic,avif} fixtures (~33KB total) - check:image-decoders npm script wired into verify + check:all Run: bun run check:image-decoders Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * phase 2: PageType exhaustive guard (Eng-2A) Adds the assertNever() helper, the ALL_PAGE_TYPES canonical list, a CI guard that fails any future switch on .type that doesn't use assertNever in default, and a contract test that walks every PageType through serializeMarkdown + parseMarkdown round-trip. Why this is preventive: gbrain v0.20 / v0.22 both regressed when a PageType was added but a consuming switch didn't get a matching case. TypeScript can't catch that on its own when the switch is implicit (if/else chains, default branches that return a sane fallback). With assertNever in the default of any exhaustive switch, the compiler errors at the assertNever call when the discriminant isn't `never`, forcing the contributor to add the missing case. Today the codebase has zero PageType-discriminating switches — it uses the type system via union narrowing. The guard is preventive: catches the moment a contributor adds a switch and forgets the helper. The contract test in test/page-type-exhaustive.test.ts is the runtime half: walks every PageType value through public surfaces (serialize, parse round-trip, classify-via-switch) so adding 'image' to PageType later either passes silently or fails noisily right here. Wired into verify + check:all. Run: bun run check:pagetype-exhaustive && bun test test/page-type-exhaustive.test.ts Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * phase 3: BrainEngine.upsertFile + PGLite files table (F1+F5) Adds the v0.27.1 file-metadata API to the BrainEngine interface and implements it on both engines. Drops the v0.18 "PGLite has no files table" omission — that decision was about blob storage; for path- referenced binary asset metadata PGLite hosts it fine. Engine surface (src/core/engine.ts): - FileSpec + FileRow types - upsertFile(spec) -> { id, created } with idempotent ON CONFLICT - getFile(sourceId, storagePath) and listFilesForPage(pageId) PGLite (src/core/pglite-schema.ts): files table now mirrors the Postgres v0.18 shape verbatim (source_id, page_slug, page_id, filename, storage_path, mime_type, size_bytes, content_hash, metadata, created_at + 4 indexes + UNIQUE storage_path). Comment header rewritten to drop the "no files table" line. Identity is (source_id, storage_path) via UNIQUE(storage_path) + DEFAULT 'default'. Re-upserting same identity with same content_hash returns created=false; different content_hash overwrites metadata in place. Tested explicitly so re-sync of an unchanged image is idempotent and re-sync of a replaced image updates the row. The actual migration v36 (for existing brains to gain the files table on PGLite) lands in Phase 5 alongside the modality + embedding_image schema deltas. Fresh PGLite installs pick up the table from initSchema's bootstrap path immediately. Tests (test/engine-upsertFile.test.ts, 6 cases on PGLite): - happy path insert - Eng-3E ON CONFLICT idempotency: same hash → created=false - Eng-3E content_hash changes → metadata overwritten - listFilesForPage returns linked rows - getFile returns null on unknown path - source_id round-trips correctly Postgres parity will be exercised end-to-end by Phase 10's multimodal-engine-parity E2E test. Run: bun test test/engine-upsertFile.test.ts Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * phase 4: loadConfigWithEngine() DB-merge + cli.ts boot reorder (F3) Codex F3: gateway boot read file/env config only, but `gbrain config set` writes the DB plane. Result: the smoke path `gbrain config set embedding_multimodal true` did nothing — the flag never reached runtime. Fix: after engine.connect(), merge DB config on top of file/env config and stash the v0.27.1 multimodal flags into process.env where the import-image path will read them. Adds: - 3 new GBrainConfig fields: embedding_multimodal, embedding_image_ocr, embedding_image_ocr_model. All optional; default off/off/'openai:gpt-4o-mini'. - ENV vars: GBRAIN_EMBEDDING_MULTIMODAL/_OCR/_OCR_MODEL. - loadConfigWithEngine(engine, baseConfig?) async helper. Reads DB config via engine.getConfig() and overlays it. Quiet failure if the config table is missing (pre-v36 brain mid-migration). - cli.ts connectEngine reorder: file/env-loaded config still drives initSchema (embedding_dimensions sizes the schema, must be stable across connect). After engine connects, DB-merged config flows through process.env so downstream readers see flipped flags WITHOUT the gateway needing a re-configure (gateway doesn't read these flags; the import-image path does). Precedence (locked into the test): env > file > DB > defaults. - env wins because it's the operator escape hatch. - file (~/.gbrain/config.json) wins over DB because it's the durable per-machine config; explicit user edits beat config-table state. - DB fills in only when file/env left the field undefined. Tests (test/loadConfig-merge.test.ts, 7 cases): - null base returns null - DB fill-in on undefined file/env fields - file/env > DB precedence verified - partial merge (only undefined fields fall through) - engine.getConfig throwing is non-fatal - null/empty DB values are ignored (not coerced to false) - strict 'true' equality (TRUE / 1 → false) The actual import-image path consumption lands in Phase 8. Run: bun test test/loadConfig-merge.test.ts Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * phase 5: migration v36 + pgvector preflight + dual-column schema (Eng-3C) The schema half of v0.27.1 multimodal. Three changes that travel together as migration v36: 1. content_chunks gains modality TEXT NOT NULL DEFAULT 'text' so image chunks declare themselves at the row level. Search filters use it to keep image OCR text out of text-page keyword search by default. 2. content_chunks gains embedding_image vector(1024) for Voyage multimodal embeddings, plus a partial HNSW index gated by WHERE embedding_image IS NOT NULL. Footprint stays proportional to image-chunk count, not table size. Mixed-provider brains (OpenAI 1536 text + Voyage 1024 images) keep both columns populated with distinct dim spaces. 3. PGLite gains the files table mirroring the Postgres v0.18 shape so multimodal ingest can persist binary-asset metadata on the default engine. Image bytes never enter the DB; storage_path references a path inside the brain repo. The v0.18 "no files table on PGLite" omission was specific to blob storage. Eng-3C preflight: handler refuses if pgvector < 0.5 BEFORE any DDL fires. Partial HNSW indexes need pgvector 0.5.0 (HNSW landed in 0.5). PGLite ships pgvector built into the WASM bundle so the gate is Postgres-only. Error message tells the user to ALTER EXTENSION vector UPDATE. Pinning a few subtle correctness bits in the test suite: - bootstrap coverage extended: REQUIRED_BOOTSTRAP_COVERAGE + applyForwardReferenceBootstrap probe set both gain content_chunks.embedding_image. Old PGLite brains pinned at v0.18 walk forward cleanly without crashing on the partial HNSW. - contract tests pin column shape, partial HNSW indexdef, files-table parity, and that a real cosine query works against the index after migration (regression mode pgvector has shown where partial-index DDL succeeds but the index fails build). Schema source-of-truth files updated: - src/schema.sql + src/core/schema-embedded.ts (regenerated) - src/core/pglite-schema.ts (CREATE TABLE has modality + embedding_image + partial index inline) Run: bun test test/migrations-v0_27_1.test.ts test/schema-bootstrap-coverage.test.ts Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * phase 6: Voyage recipe + gateway.embedMultimodal + MultimodalInput types (D1-D3) The AI plumbing half of v0.27.1 multimodal. Recipe registers voyage-multimodal-3 alongside the existing text-only Voyage models (voyage-3-large, voyage-3, voyage-3-lite). Touchpoint declares supports_multimodal: true so a future v0.28 OpenAI/Cohere multimodal path can flip the same flag and route through the same gateway. Gateway: - MultimodalInput discriminated union (kind: 'image_base64' today; future kinds extend without breaking callers). No image_url variant by design — that would be an SSRF surface. Callers read bytes and base64-encode; the gateway never fetches external URLs. - embedMultimodal(inputs) does direct fetch to Voyage's /multimodalembeddings endpoint. Vercel AI SDK has no multimodal- embedding abstraction yet so we bypass it. Reuses the existing resolveRecipe + auth resolution + dim-mismatch error pattern. - Voyage batch size = 32 inputs/call (Voyage's published max). 100 images → ~3 calls. n=33 splits cleanly to [32, 1]. - Loud refusal when the configured embedding_model isn't multimodal: AIConfigError pointing at the v0.28 roadmap. embedding.ts re-exports embedMultimodal + MultimodalInput so the import-image path can pull both APIs from one place. Tests (test/voyage-multimodal.test.ts, 18 cases all green): - recipe registration: voyage-multimodal-3 in models, supports_multimodal=true, default_dims=1024 - happy path: 1024-dim Float32Array out, correct request body shape - Authorization header bearer-formatted - Eng-3A batch boundaries: n=0 (short-circuits, no fetch), n=1, n=32 (single batch), n=33 (off-by-one: [32, 1]), n=64 (two clean batches) - 401 → AIConfigError with auth hint - 429, 5xx → AITransientError - dim mismatch → AIConfigError naming the expected dim - malformed JSON → AITransientError - count mismatch (returned ≠ sent) → AITransientError - missing API key → AIConfigError - non-multimodal recipe → AIConfigError pointing at v0.28+ TODOs Run: bun test test/voyage-multimodal.test.ts Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * phase 7: PageType + PageKind extension for 'image' (F4) Adds 'image' to the PageType union and PageKind enum so v0.27.1 multimodal pages are first-class citizens of the type system. The Eng-2A exhaustive guard from phase 2 immediately makes 'image' a forced participant in any future switch on .type — adding the value without a matching case is a TypeScript error at the assertNever call. The page-type-exhaustive contract test gains an 'image' branch in its classify switch so the test file proves the union is complete; the test itself remains the runtime contract that walks every value through parseMarkdown + serializeMarkdown round-trip. What still works unchanged: image pages do NOT flow through parseMarkdown (the import-image-file path lands in phase 8 and writes directly via engine.putPage with pre-built frontmatter). inferType in markdown.ts only sees markdown files. So the parseMarkdown round-trip in the contract test exercises 'image' exactly the way image-ingested pages will be re-read later: type='image' set in frontmatter on disk, inferType never consulted. chunk_source extension to 'image_asset' lands in phase 8 alongside the import-image path that produces the chunks. Putting it here would introduce the value with no producer, which the v0.20 chunk_source allowlist treats as drift. Run: bun test test/page-type-exhaustive.test.ts Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * phase 8: importImageFile + withImportTransaction + sync/import walker (F2 + Sec5 + Eng-1C) The big one. Threads multimodal ingestion end-to-end on the default engine and refactors the markdown/image transaction body into a shared helper. import-file.ts adds: - withImportTransaction shared helper (Sec5/A): transaction-wraps createVersion + putPage + optional upsertFile + chunk replacement + type-specific `after` hook. Markdown's existing transaction body is the natural shape for this; image ingest reuses it via the same helper. - importImageFile(engine, filePath, relativePath, opts): the full ingestion path. Reads bytes, sha256-hashes for idempotency, decodes HEIC/AVIF via heic-decode + @jsquash (re-encoded to PNG so Voyage accepts the buffer), parses EXIF via exifr, optionally OCRs via gpt-4o-mini through the gateway, embeds via embedMultimodal, then writes a page+file+chunk row through withImportTransaction. - pLimit(concurrency=8) semaphore for OCR (Eng-1C, ~30 LOC, no dep). Module-level limiter so concurrent imports across files share the budget. Cuts 100-image first-import OCR latency from ~200s to ~25s. - isImageFilePath() helper consumed by sync.ts + import.ts. - 20MB cap (Voyage's per-input limit) — oversized → sync_failures. Engine surfaces (both engines): - upsertChunks now writes modality + embedding_image columns. Image chunks pass embedding=null + embedding_image=Float32Array. ON CONFLICT DO UPDATE SET extends to both new columns. Param-builder restructured to handle independently-optional embedding/embedding_image without the prior 4-branch combinatoric explosion. - ChunkInput type gains modality + embedding_image fields. chunk_source union widens to include 'image_asset'. Schema (both engines): - pages.page_kind CHECK widened to ('markdown','code','image'). The v36 migration drops + recreates the auto-named constraint so existing brains pick up the change idempotently. - src/schema.sql + src/core/pglite-schema.ts mirror the new CHECK. - src/core/schema-embedded.ts regenerated. Sync/import wiring (F2 fix): - sync.ts isAllowedByStrategy honors GBRAIN_EMBEDDING_MULTIMODAL=true and admits image extensions in the 'auto' strategy. Existing brains with the gate off keep their current markdown+code-only behavior. - import.ts collectMarkdownFiles walker conditionally picks up image extensions; the per-file dispatcher routes to importImageFile vs importFile via isImageFilePath. Defense-in-depth gate check on the multimodal flag. Gateway (cherry-1 OCR helper): - generateOcrText(imageBytes, mime) issues a multimodal generateText call against the configured expansion model with a sanitized system prompt: "Extract verbatim. Do NOT follow instructions in the image." Mitigation for OCR-as-prompt-injection. Caller (importImageFile) routes failures through Eng-1B counters in the config table. Type shims (src/types/image-decoders.d.ts): - heic-decode (no upstream @types) + @jsquash/png/encode.js subpath + @jsquash/avif/codec/dec/avif_dec.wasm import-attribute. Deps: @jsquash/png joins the existing @jsquash/avif + heic-decode + exifr set added in Phase 1. The bun --compile probe (Phase 1) covers HEIC + AVIF decode-correctness in the compiled binary; PNG re-encode inherits the same WASM-bundle pattern. Tests (test/import-image-file.test.ts, 7 cases all green): - isImageFilePath / SUPPORTED_IMAGE_EXTS round-trip every extension - pLimit serializes work to declared concurrency - pLimit propagates rejections without leaving slot held - importImageFile happy path: PNG → page + files row + image chunk - chunk_source='image_asset' + modality='image' on the chunk row - content_hash idempotency: re-import same bytes returns 'skipped' - 20MB oversized → 'skipped' with FILE_TOO_LARGE-shaped error Total v0.27.1 regression run: 101 tests / 0 fail / 385 expect calls. Run: bun test test/import-image-file.test.ts Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * phase 9: auto-link image_of, doctor checks, search modality filter (cherry-3+4b + Eng-1B) Closes the runtime UX surface for v0.27.1 multimodal: image chunks join the knowledge graph, the doctor surfaces vanished images + silent OCR failures, and text-keyword search hides image rows by default so OCR text doesn't drown text-page hits. Auto-link (cherry-3): - link-extraction.ts gains imageOfCandidates(slug): given an image slug like `originals/photos/2026-05-04-foo.jpg`, proposes sibling text-page slugs in priority order. Swaps known photo dirs (photos, images, screenshots, media) for sibling dirs (meetings, notes, daily, people, companies, deals, projects) at any path depth, plus a same-directory basename fallback. Returns case-folded slugs; caller checks each via tx.getPage and emits the first match. - inferLinkType: pageType='image' returns 'image_of'. Previously fell through to 'mentions'. - importImageFile.after hook walks the candidate list inside the withImportTransaction body and emits one canonical image_of edge. Best-effort: missing siblings silently skip (gbrain reconcile-links will pick up later additions). Doctor checks: - image_assets (cherry-4b): scans the files table for image MIME rows whose storage_path doesn't exist on disk. Caps at 1000 to bound worst-case scan time. Reports first 5 vanished paths in the warning with the standard remediation hint (restore from git, or `gbrain sync --skip-failed` to acknowledge). Empty index → "no image assets indexed yet" (ok). - ocr_health (Eng-1B): reads ocr_attempted / ocr_succeeded / ocr_failed_no_key / ocr_failed_other from the config table (written by importImageFile in Phase 8). Warns when OCR is opted-in but no calls succeeded — surfaces the silent failure mode where a stale OPENAI_API_KEY would otherwise leave OCR not running and the user having no idea. Search routing: - searchKeyword on both engines now filters `cc.modality = 'text'` by default. Image rows (modality='image') are invisible to text-keyword search. v0.27.2 adds the explicit image-similarity entry point that queries embedding_image directly. Default vector search continues to read from `embedding` (which is NULL on image rows) so image chunks don't accidentally surface in cosine ranking either. What's NOT in this phase (and where it lives): - `gbrain query --image <path>` flag: the image-similarity entry point. Defers to v0.27.2 because the existing query op shape doesn't have a clean way to take a path argument; threading it through cliHints + the validator is a meaningful CLI parser refactor not worth landing under v0.27.1's window. The dual-column schema and embedMultimodal API are both ready; the missing piece is purely surface. Tests (98 link-extraction cases pass; 5 new): - imageOfCandidates: parallel-dir swap, same-dir fallback, no-parent edge case, image-extension stripping, case-insensitive paths - inferLinkType returns 'image_of' for type='image' Doctor checks exercised via existing doctor.test.ts; image_assets + ocr_health quiet-skip on PGLite when the config table is too old to have the counters yet. Run: bun test test/link-extraction.test.ts Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * phase 10: v0.27.1 release — VERSION + CHANGELOG + migration notes + E2E gate Final phase. Bumps VERSION + package.json to 0.27.1, writes the release-summary CHANGELOG entry in GStack voice, adds the skills/migrations/v0.27.1.md agent-readable migration notes, and ships test/e2e/voyage-multimodal.test.ts as the gated real-API smoke that pairs with the Phase 1 bun --compile probe. CHANGELOG entry follows the v0.27.0 pattern: - Two-line bold headline (verdict, not marketing) - Lead paragraph explaining the user-facing capability - "Numbers that matter" table (image extensions admitted, voyage models, engines with files table, doctor checks, batch size, OCR concurrency, schema migration, test count, decoder probe runtime, binary size delta) - "What this means for you" smoke path: 8-line gbrain config + sync walkthrough that lands on `gbrain doctor` confirmation - "For contributors" callout naming the codex outside-voice catch - "To take advantage of v0.27.1" 5-step recovery block - Itemized changes by area (multimodal embed, schema, ingestion, auto-link, doctor, type-system, config plane unification, bun --compile gate, NOT-included list) skills/migrations/v0.27.1.md (agent-readable): - Feature pitch: "remembers what you SAW, not just what you typed" - Schema delta + page_kind widening explained as idempotent - Verification + opt-in setup walkthrough - pgvector >= 0.5 requirement with the ALTER EXTENSION fix hint - Cost expectations (Voyage free tier, gpt-4o-mini OCR pricing) - Deferred-to-v0.27.2 list E2E (gated VOYAGE_API_KEY): test/e2e/voyage-multimodal.test.ts exercises the real Voyage API by embedding the tiny.avif fixture through embedMultimodal, asserting a 1024-dim Float32Array with at least one nonzero component. Skips silently when the key is unset. Final v0.27.1 regression: 199 tests / 0 fail / 639 expect calls across 10 v0.27.1-touching files. Typecheck clean. Both v0.27.1 CI guards (check:image-decoders + check:pagetype-exhaustive) green. Run: bun run verify && bun test VOYAGE_API_KEY=... bun test test/e2e/voyage-multimodal.test.ts Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(query): land --image flag for image-similarity search (closes v0.27.2 deferral) Pulls the deferred `gbrain query --image <path>` flag into v0.27.1 itself. The dual-column schema and embedMultimodal API were already ready in Phase 6/8; only the CLI surface was missing. Adds it + threads column-routing through searchVector on both engines + 13 new tests covering the full path. SearchOpts (`src/core/types.ts`): - New `embeddingColumn?: 'embedding' | 'embedding_image'` (default 'embedding'). Image-similarity queries pass 'embedding_image' AND a 1024-dim vector that came from gateway.embedMultimodal. searchVector column routing (both engines): - `embedding_image` path queries the multimodal column with a modality='image' filter so cross-modality leaks are impossible. - Default `embedding` path adds modality='text' filter symmetrically; this also fixes the case where image rows happened to have a NULL primary embedding but text-vector-search shouldn't have wandered into them anyway. Operations (`src/core/operations.ts`): - `query.params.query` is no longer `required: true`. The op now accepts EITHER `query` (text) OR `image` (base64). Refuses with a clear error when neither is supplied. - Image branch: imports embedMultimodal, embeds the input image, calls engine.searchVector with `embeddingColumn: 'embedding_image'`. Bypasses hybridSearch (which is text-only). CLI (`src/cli.ts`): - New exported `resolveQueryImage(path, mime?)` helper that reads the file, base64-encodes, derives MIME from the extension (PNG/JPG/JPEG/ GIF/WEBP/HEIC/HEIF/AVIF; falls back to image/jpeg), enforces the 20MB cap. Throws Error on failure (caller routes to process.exit). - Dispatcher transforms `params.image` from a path to base64 via the helper before calling the op handler. The `query` positional arg's required-check is conditionally skipped when `--image` is present (the alternative-required relationship the v0.27.1 plan flagged as the missing CLI parser refactor — now implemented). Param-builder bug fix (PGLite upsertChunks): - The new test/search-image-column.test.ts caught a placeholder/ param-push ordering bug in PGLite's upsertChunks introduced by the v0.27.1 modality+embedding_image columns. embeddingImageStr was pushed AFTER the bulk fields, but its placeholder is allocated BEFORE them, so $2 mapped to pageId instead of the image vector. Fix: push embeddingImageStr right after embeddingStr (matching the Postgres engine's order). 'invalid input syntax for type vector' errors gone. Tests (3 new files, 13 new cases): - test/search-image-column.test.ts (4 cases): default routes to embedding column with text-only modality filter; embedding_image routes correctly with image-only filter; cosine ordering on the image column; searchKeyword still hides image rows. - test/query-image-flag.serial.test.ts (3 cases, mocked embedMultimodal): query op happy path with --image returns nearest image, refuses on neither-supplied, modality filter blocks text pages from leaking into image-similarity results. Renamed to *.serial.test.ts per CLAUDE.md R2 (`mock.module(...)` quarantine). - test/cli-query-image.test.ts (6 cases): resolveQueryImage helper reads + base64-encodes; mime derivation across all 8 supported extensions including case-insensitive variants; oversized rejection; explicit-mime override; missing-file error. CHANGELOG: removed `--image` from the "NOT in this release" list, added a dedicated section describing the new flag + smoke path. v0.27.1 regression: 212 tests / 0 fail / 668 expect calls across 13 v0.27.1-touching files. Typecheck clean. Bun isolation lint clean. Run: bun test test/cli-query-image.test.ts test/query-image-flag.serial.test.ts test/search-image-column.test.ts Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(e2e): real-Postgres v0.27.1 multimodal suite + schema-drift allowlist update Adds test/e2e/multimodal-postgres.test.ts (10 tests) exercising the v0.27.1 schema and APIs against real Postgres + pgvector: - modality + embedding_image columns present with correct shape - partial HNSW idx_chunks_embedding_image with WHERE clause - files table column parity with PGLite (mirroring v0.18 shape) - pages.page_kind CHECK admits 'image' (migration v36 widening) - upsertFile end-to-end (insert + idempotent re-upsert) - upsertChunks writes embedding_image + modality columns correctly - searchVector with embeddingColumn='embedding_image' returns image rows with modality filter excluding cross-mode leaks - searchKeyword hides modality='image' rows by default - cross-engine parity (Eng-3G): same fixture into PGLite + Postgres, identical chunk + file shape after round-trip - migration v36 ran on Postgres (schema_version >= 36) Catches the param-builder bug fixed in the prior commit on real Postgres (it manifested differently than PGLite — postgres.js handled NULL vs vector mismatches more gracefully but the modality + embedding_image ON CONFLICT path needed end-to-end verification). Schema-drift allowlist (test/e2e/schema-drift.test.ts): - Removed `files` from PG_ONLY_TABLES. v0.27.1 added the table to PGLite via migration v36; both engines now mirror the v0.18 shape and the parity gate enforces it. file_migration_ledger stays Postgres-only (the v0.18 storage-object rewrite ledger has no PGLite consumer). Verification: - bun run typecheck: clean - DATABASE_URL=... bun test test/e2e/multimodal-postgres.test.ts: 10/10 - DATABASE_URL=... bun test test/e2e/schema-drift.test.ts: 6/6 - DATABASE_URL=... bash scripts/run-e2e.sh (sequential, full suite): 326/332 pass. The 6 failures across 4 files (claw-test, dream-cycle, mechanical doctor host-state, serve-http-oauth) are all pre-existing and unrelated to v0.27.1 — verified by re-running on the master versions of those tests. Run: docker run -d --name gbrain-test-pg -p 5435:5432 \ -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres \ -e POSTGRES_DB=gbrain_test pgvector/pgvector:pg16 && \ DATABASE_URL=postgresql://postgres:postgres@localhost:5435/gbrain_test \ bun test test/e2e/multimodal-postgres.test.ts Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add @jsquash/avif + exifr deps; thread synthesis case into page-type exhaustive test --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tats() (#701) * fix: lightweight /health endpoint — SELECT 1 instead of getStats() On large brains (96K+ pages), getStats() runs 6× count(*) queries that routinely exceed the 3s HEALTH_TIMEOUT_MS through PgBouncer. This produces false 503s that cause external health monitors (cron, Fly.io, k8s) to restart otherwise-healthy servers — which in turn creates advisory lock pile-ups when multiple serve instances compete for the migration lock. Changes: - /health now runs `SELECT 1` for liveness (sub-millisecond) - ?full=true opt-in preserves the old getStats() behavior - /admin/api/health-indicators still returns full stats - probeHealth() retained for callers that need it * refactor(health): extract probeLiveness, move full stats to /admin/api/full-stats Addresses outside-voice review of PR #701. The original ?full=true query-param escape hatch was withdrawn because the loopback IP gate's correctness depended on app.set('trust proxy', 'loopback') semantics holding under proxy/XFF misconfiguration, and the PR's own comment misidentified /admin/api/health-indicators as a full-stats endpoint when it actually returns only {expiring_soon, error_rate}. Changes: - src/commands/serve-http.ts: new probeLiveness(sql, engineName, version, timeoutMs) helper next to probeHealth. Same shape, same return type, same finally-block clearTimeout discipline. /health is now a 2-line dispatch through probeLiveness. Removes ?full=true entirely. Adds new admin route /admin/api/full-stats behind the existing requireAdmin middleware that returns probeHealth(engine, ...) — same body shape /health used to expose (status, version, engine, page_count, chunk_count, embedded_count, link_count, tag_count, timeline_entry_count). - test/serve-http-health.test.ts: 4 new probeLiveness cases (success-shape regression with exact-keys assertion, timeout, db-error, timer-cleanup under 100 concurrent probes). - test/e2e/serve-http-oauth.test.ts: existing /health body-shape assertion rewritten to the liveness-only contract (page_count must NOT be present); 2 new admin-stats cases (401 without cookie, 200 with magic-link-derived admin cookie returns getStats() body). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: bump version and changelog (v0.28.10) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: update CLAUDE.md serve-http.ts annotation for v0.28.10 split Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(claude): explicit "run E2E without asking" + schema-bootstrap step The previous wording ("Always run E2E tests when they exist") was easy to read as a soft preference; in practice agents kept proposing the run instead of just doing it. Make the policy unmistakable: if there's a relevant E2E and you want to verify behavior, just spin up the DB and run. Also documents the schema-bootstrap step that bit a fresh container today — `oauth_clients` doesn't exist on a virgin pgvector image until `gbrain doctor` (or any engine-connecting command) triggers `initSchema()`. `apply-migrations` alone runs ALTER-style migrations on top of an already-bootstrapped schema; it does not seed base tables. Tests that bypass the engine via execSync against `gbrain auth register-client` hit the DB directly and need bootstrap first. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(serve-http): persist mcp_request_log on every JSON-RPC method + admin-scope F7 tests Closes the 4 pre-existing E2E failures in test/e2e/serve-http-oauth.test.ts that surfaced when DATABASE_URL was set on the v0.28.10 branch. The branch isn't the cause — these were broken on master too (verified by checking out origin/master's serve-http.ts + test file: 0/4 pass). Owning them here as a bisectable commit. Two root causes, both in serve-http.ts's /mcp logging + scope discipline. 1. mcp_request_log was only INSERTed inside the tools/call success/error paths. tools/list, the unknown-op early-return, and the insufficient-scope early-return all returned without logging. The v0.26.3 persistence regression test calls tools/list + tools/call non-existent and expects >= 2 rows; on the prior implementation it got 0. The agent_name resolution test (single tools/list, expects the row) had the same shape. Fix: log every JSON-RPC method exit point. tools/list logs operation = 'tools/list' with status='success' (lists never fail). Unknown-op logs operation = the attempted name with error_message starting 'unknown_operation:'. Insufficient-scope logs operation = the attempted name with error_message 'insufficient_scope: requires <scope>'. Admin agents auditing /admin/api/requests now see the full attempt log, not just successful valid-op calls. 2. The F7 RCE-regression tests minted 'read write' tokens to assert submit_job for protected names ('shell', 'subagent') gets rejected. But submit_job's required scope is 'admin' (set by hasScope-aware v0.28 enforcement), so a 'read write' token gets rejected with insufficient_scope BEFORE reaching the F7 protected-name guard at operations.ts:1527. The test's assertion checked for 'permission_denied' / 'cannot be submitted over MCP' — neither appears in an insufficient_scope response — so 'rejected' computed to false even though the call was actually rejected. Worse, if someone removed the F7 guard, the test would still pass because scope check would catch it: regression-test integrity failure. Fix: register the e2e-oauth-test client with admin in its allowed scopes (was 'read write', now 'read write admin'), and have F7 tests mint admin-scoped tokens explicitly. Adding admin to the client's allowed ceiling does not auto-grant it to subset-mint calls — other tests minting 'read' / 'read write' still get the subset they ask for. The persistence test's assertion 'rows.find(r => r.operation === "tools/call")' was also updated to match the actual logging convention (operation = inner tool name on call paths, JSON-RPC method on list/scope/unknown paths). E2E result: 29/29 pass on a fresh pgvector container (fixed 4, kept the 25 that were passing). Unit suite: 4191 pass, 0 fail, unchanged. Typecheck: clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: regenerate llms-full.txt after CLAUDE.md update Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: garrytan-agents <garrytan-agents@users.noreply.github.com> Co-authored-by: Garry Tan <garrytan@gmail.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…r multimodal embeddings (#719) * feat: embedding_multimodal_model — separate model routing for multimodal embeddings v0.28.9 shipped multimodal image embeddings via Voyage, but embedMultimodal() hardcodes to the primary embedding_model. Brains using OpenAI text-embedding-3-large (1536-dim) for text cannot use Voyage voyage-multimodal-3 (1024-dim) for images without switching their entire embedding pipeline. This adds embedding_multimodal_model as a distinct config key that embedMultimodal() prefers over embedding_model when set. The dual- column schema (embedding vs embedding_image) already supports different dimensions — this patch completes the routing. Config surface: - gbrain config set embedding_multimodal_model voyage:voyage-multimodal-3 - env: GBRAIN_EMBEDDING_MULTIMODAL_MODEL=voyage:voyage-multimodal-3 Files changed: - core/ai/types.ts: AIGatewayConfig gains embedding_multimodal_model - core/ai/gateway.ts: configureGateway stores it; embedMultimodal reads it - core/config.ts: GBrainConfig type + env loader + DB merge path - cli.ts: threads config into gateway; reconfigures after DB merge Tested on a 96K-page brain with OpenAI text + Voyage multimodal running side by side. Voyage returns 1024-dim vectors into embedding_image column; text embeddings unchanged. * refactor(cli): extract buildGatewayConfig + always re-config after DB merge Two related changes co-located so the un-gate doesn't leave the duplicated configureGateway shapes drifting: 1. Extract file-local `buildGatewayConfig(c: GBrainConfig): AIGatewayConfig` helper. Both configureGateway sites in connectEngine() now pass through it; future fields touch one place. 2. Drop the field-name-gated re-config trigger. The previous gate fired only when `merged.embedding_multimodal_model` was truthy, coupling the trigger to one field name. Future DB-mutable gateway fields would silently miss it. Re-config now always fires when loadConfigWithEngine returns non-null. One extra cache+shrinkState clear per startup is microseconds, no hot path. Schema-sizing fields stay stable because loadConfigWithEngine respects file/env first; merged.embedding_dimensions equals config.embedding_dimensions when no DB override exists. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ai): model-level multimodal validation + getMultimodalModel accessor Codex review of PR #719 (F1) caught a real footgun: the Voyage recipe shares supports_multimodal: true across all 12 models in its embedding touchpoint, of which only voyage-multimodal-3 is valid at /multimodalembeddings. A user setting embedding_multimodal_model to a text-only Voyage model (e.g. voyage-3-large) passes local validation and fails at the endpoint with HTTP 400 — which gateway.ts:626 misclassifies as transient (TODO: reclassify, tracked in TODOS.md). Adds: - EmbeddingTouchpoint.multimodal_models?: string[] (optional, model-level allow-list inside a recipe that mixes text-only + multimodal models). - Voyage declares multimodal_models: ['voyage-multimodal-3']. - embedMultimodal() validates parsed.modelId against the allow-list AFTER the existing recipe-level supports_multimodal check. Throws AIConfigError with the full multimodal_models list in the fix hint. - getMultimodalModel() public accessor mirroring getEmbeddingModel / getChatModel — needed by the cli-multimodal-integration test and useful for future doctor checks. Recipe-level fast-fail stays so non-multimodal providers (Anthropic / OpenAI today) keep their AIConfigError path unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: cover embedding_multimodal_model precedence + gateway override + cli integration PR #719 originally shipped zero tests for the new code paths. Closes that gap with three layers: 1. test/loadConfig-merge.test.ts — extends the existing env > file > DB precedence pattern (which already covers embedding_image_ocr_model) with four cases for embedding_multimodal_model: DB-only fills in, file wins over DB, all-unset stays undefined, null/empty DB ignored. 2. test/voyage-multimodal.test.ts — four cases for embedMultimodal model resolution: prefers multimodal_model over embedding_model, falls back to embedding_model when unset (regression guard), AIConfigError on non-multimodal recipe, AIConfigError on Voyage text-only model (Codex F1 model-level validation). 3. test/cli-multimodal-integration.test.ts (NEW) — three PGLite-based integration tests for the cli.ts re-config glue itself (Codex F3: the actual bug site that "mechanical glue" claims hide). Drives the loadConfigWithEngine + buildGatewayConfig + configureGateway sequence connectEngine() runs and asserts the gateway observed the DB-set value. 11 new test cases total. All pass against the production code in this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(todos): follow-ups from PR #719 codex review Three items surfaced during /codex outside-voice review of PR #719's plan that are out of scope for the current PR but worth tracking: - gbrain doctor: warn on misconfigured multimodal model (P2). Two checks: multimodal_model set without recipe API key; embedding_multimodal flag on without a multimodal-capable embedding_model. - Reclassify Voyage HTTP 4xx as AIConfigError (P2, Codex F2). Today gateway.ts:626 throws AITransientError for any non-401/403 4xx, so permanent config bugs (malformed body, model not in multimodal_models) trigger retry storms. Aligns with normalizeAIError's contract. - gbrain config unset <key> (P3, Codex F6). Once a user sets a key in DB there's no normal CLI path to clear it. Pre-existing UX gap; PR #719's new key surfaces it again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.28.11) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: v0.28.11 annotations for ai/types, ai/gateway, voyage recipe Updates the Key Files section so the per-file annotations reflect the multimodal_model routing + model-level validation that landed in #719. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: garrytan-agents <garrytan-agents@users.noreply.github.com> Co-authored-by: Garry Tan <garrytan@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.