BYK · BYK · May 21, 2026 · May 21, 2026
diff --git a/.lore.md b/.lore.md
@@ -32,16 +32,13 @@
 * **Pi plugin: which providers can be proxied through the gateway**: Pi plugin gateway proxy compatibility by wire protocol. \*\*Proxiable\*\*: \`anthropic\` → \`/v1/messages\`: \`anthropic\`, \`fireworks\`, \`github-copilot\`; \`openai-completions\` → \`/v1/chat/completions\`: \`deepseek\`, \`xai\`, \`groq\`, \`cerebras\`, \`openrouter\`, \`huggingface\`, \`opencode\`, \`opencode-go\`; \`openai-responses\` → \`/v1/responses\`: \`openai\`, \`azure-openai-responses\`, \`openai-codex\`, \`azure-openai\`, \`lm-studio\`, \`ollama\`. \*\*Cannot proxy\*\*: \`google\`, \`google-vertex\`, \`amazon-bedrock\`, \`mistral\`. \`registerProvider(name, { baseUrl })\` overrides base URL. Gateway routes by URL path only. OpenAI streaming clients receive true incremental SSE (\`stream/openai.ts\`).
 
 <!-- lore:019e49be-a3bc-7f5b-90f7-d859411f3a4d -->
-* **SaaS deployment: local-only vs hosted-only modes, no simultaneous hybrid**: SaaS deployment: two mutually exclusive modes — fully local (local gateway + Turso embedded replica, LLM calls direct) and fully hosted (SaaS gateway, env var only, horizontal scaling + sticky sessions or Redis). No hybrid mode: rejected for complexity/surface area. Corporate environments prefer hosted; power/privacy users prefer local. 95%+ of code shared (\`packages/core/\`, \`pipeline.ts\`, \`server.ts\`). Hosted requires horizontal scaling from day one, API key security (HSM/Vault-level), ~20-100ms added latency. Local requires local CLI. DB-per-org on Turso; control plane maps users→orgs→teams→databases. General principle: always prefer clean mutually exclusive modes over simultaneous hybrid architectures — smaller surface area, easier reasoning.
+* **SaaS deployment: local-only vs hosted-only modes, no simultaneous hybrid**: SaaS deployment: two mutually exclusive modes — fully local (local gateway + Turso embedded replica, LLM calls direct) and fully hosted (SaaS gateway, env var only, horizontal scaling + sticky sessions or Redis). No hybrid mode: rejected for complexity/surface area. Corporate environments prefer hosted; power/privacy users prefer local. 95%+ of code shared (\`packages/core/\`, \`pipeline.ts\`, \`server.ts\`). Hosted requires horizontal scaling from day one, API key security (HSM/Vault-level), ~20-100ms added latency. DB-per-org on Turso; control plane maps users→orgs→teams→databases. General principle: always prefer clean mutually exclusive modes over hybrid architectures.
 
 ### Gotcha
 
 <!-- lore:019e1c27-967c-7eb4-bd0e-afb195823970 -->
 * **Bun NAPI crash on process.exit() — use safeExit() via libc \_exit()**: Bun NAPI crash on process.exit() with fastembed — use safeExit(): Loading fastembed (onnxruntime NAPI bindings) causes a C++ panic on \`process.exit()\` because Bun runs NAPI teardown destructors that throw. Fix: \`packages/gateway/src/cli/exit.ts\` exports \`safeExit(code)\` — uses \`\_exit()\` from libc via \`bun:ffi\` under Bun, falls back to \`process.exit()\` under Node.js. All gateway exit paths must use \`safeExit()\`. Do NOT call \`embedding.resetProvider()\` in test teardown \`resetPipelineState()\` — move \`resetProvider()\` to \`shutdown()\` in \`start.ts\` only. \`resetPipelineState()\` must preserve the 'fastembed unavailable' cached state.
 
-<!-- lore:019e49d2-d096-7dca-81df-04837eb3f247 -->
-* **exportLoreFile() missing existsSync guard — fires ENOENT every 30s idle tick**: Trap: \`exportLoreFile()\` in \`packages/core/src/agents-file.ts\` line 543 calls \`writeFileSync(fp, content)\` with no \`existsSync(projectPath)\` check. The idle worker (idle.ts line 330) calls it every 30s — the catch logs the error and retries next tick, causing high-volume Sentry events. Contrast: \`exportToFile()\` uses \`mkdirSync(dirname(...), {recursive:true})\` at line 367; \`data.ts\` callers guard with \`existsSync\` at lines 324/475. Fix: add \`if (!existsSync(projectPath)) return;\` before line 543 — \`existsSync\` already imported at line 11. Don't create missing dirs (user may have intentionally deleted them).
-
 <!-- lore:019e2b12-6ea6-76dc-ab7a-a1532c60b312 -->
 * **git remote -v in hosted gateway — skip when header present, never run with client-controlled cwd**: \`LORE\_HOSTED\_MODE=1\` makes all FS-touching functions no-op: \`getGitRemote()\` returns null, \`config.load()\` skips \`.lore.json\`, agents-file/lat-reader/knowledge-watcher are no-ops. Activation: \`lore start\` (headless) enables hosted mode by default; opt-out via \`--local\` or \`LORE\_HOSTED\_MODE=0\`. \`lore run\` is always local. Flag set in \`initIfNeeded()\` from \`GatewayConfig.hostedMode\`. Never run \`git remote -v\` with client-controlled cwd. \`LORE\_REMOTE\_URL\` + local CLI: \`lore run\`/\`lore start\` skips local gateway and proxies to remote. Local CLI injects \`X-Lore-Git-Remote\`; remote gateway trusts it. CLI-less/SaaS: \`ANTHROPIC\_CUSTOM\_HEADERS\` requires a local \`lore\` CLI process — pure SaaS alternative not yet implemented.
 
@@ -64,7 +61,7 @@
 * **TTL downgrade hysteresis: downgradeStreak field prevents compounding cache busts**: Auto-TTL downgrade hysteresis in \`packages/gateway/src/pipeline.ts\`: downgrade from 1h→5m TTL requires 3 consecutive short-gap turns (\`ttlDowngradeStreak\` in \`SessionState\`). Block downgrade if >50% of session tokens are cached. Reset streak on any long-gap turn. Subagent turns and tool-use continuations excluded from gap recording — capture \`prevStopReason\` before line 1667 overwrites it, skip when \`prevStopReason === 'tool\_use'\` or \`isSubagentTurn\`. State persistence: immediate (session identity), per-turn (cost snapshot), 30s periodic (gradient EMAs + cache warming via dirty flag). Max data loss on crash: ~30s. Recall follow-up requests must set \`cacheConversation: false\` — otherwise modified message array triggers full cache write at 5m TTL pricing.
 
 <!-- lore:019e49d3-470f-7814-9971-fa485a78d9b5 -->
-* **Unprotected JSON.parse sites in gateway: config.ts, remote.ts, api.ts, cache-warmer.ts**: Unprotected \`JSON.parse\` sites in gateway (no try-catch): (1) \`packages/core/src/config.ts:260\` — \`.lore.json\` file read; (2) \`packages/gateway/src/cli/remote.ts:56\` — \`res.json()\` when Content-Type claims JSON but body isn't; (3) \`packages/gateway/src/api.ts:87\` — zstd-decompressed request body; (4) \`packages/gateway/src/cache-warmer.ts:609\` — stored request body from DB. Pipeline SSE parses (lines 866, 1326, 1359, 1812) are already protected. Also: \`load(projectPath)\` in \`cli/import.ts:87\` is called without \`await\` — if \`JSON.parse\` throws on malformed \`.lore.json\`, it becomes an unhandled rejection (LOREAI-GATEWAY-Y). Fix: (1) \`await load(projectPath)\` in import.ts:87; (2) wrap config.ts:260 in try-catch, fall through to default config; (3) wrap all four unprotected sites in try-catch with error logging.
+* **Unprotected JSON.parse sites in gateway: config.ts, remote.ts, api.ts, cache-warmer.ts**: Unprotected \`JSON.parse\` sites in gateway (no try-catch): (1) \`packages/core/src/config.ts:260\` — \`.lore.json\` file read — FIXED: added JSONC support via \`stripJsonComments()\` + try-catch, falls through to defaults on failure (LOREAI-GATEWAY-Y); (2) \`packages/gateway/src/cli/remote.ts:56\` — \`res.json()\` when Content-Type claims JSON but body isn't; (3) \`packages/gateway/src/api.ts:87\` — zstd-decompressed request body; (4) \`packages/gateway/src/cache-warmer.ts:609\` — stored request body from DB. Pipeline SSE parses (lines 866, 1326, 1359, 1812) are already protected. Sites 2-4 remain unprotected (no Sentry events yet). Related: \`exportLoreFile()\` in \`agents-file.ts:543\` called \`writeFileSync\` without handling missing project dirs — FIXED: try-catch swallows ENOENT silently (project dir deleted mid-session), re-throws all other FS errors (LOREAI-GATEWAY-K).
 
 <!-- lore:019e1cd6-05d2-74c5-aea8-fd827a4a45e7 -->
 * **vectorSearch() is unscoped — test cleanup must delete all embedding rows**: \`vectorSearch()\` in \`packages/core/src/ltm.ts\` queries \`knowledge WHERE embedding IS NOT NULL AND confidence > 0.2\` with no \`project\_id\` filter (intentional for cross-project search). Two gotchas: (1) Test suites scoped to one project leak embedding rows into other vectorSearch tests — \`beforeEach\` must \`DELETE FROM knowledge WHERE embedding IS NOT NULL\`. (2) \`vectorSearch()\` has no \`excludeCategories\` param — category exclusions from \`forSession()\` callers have no effect; add optional \`excludeCategories\` param and propagate from callers. Also: global entries (pid=null) force \`crossProject=true\`; confidence is clamped to \[0.0, 1.0] in \`update()\`.
@@ -83,10 +80,10 @@
 * **Always fix cache memory leaks with TTL eviction, size cap, and scheduled pruning**: Cache memory leak fix pattern: (1) TTL check in \`.get()\` — delete and return undefined if expired; (2) LRU eviction in \`.set()\` — delete oldest key when \`store.size >= maxEntries\`; (3) \`setInterval(() => this.prune(), 60\_000)\` in constructor. Defaults: \`maxEntries=10\_000\`, \`ttlMs=300\_000\`. Note: \`prune()\` is NOT currently scheduled in existing code. Locking: use \`flock\` advisory locking instead of \`proper-lockfile\` — \`proper-lockfile@4.1.2\` fails in containerized environments where PID namespaces reset on restart, leaving stale locks. \`flock\` is automatically released on process exit. Upgrade lock double-acquisition bug (\`binary.ts\`): \`downloadBinaryToTemp()\` acquires lock on \`\<execPath>.lock\`, then \`installBinary()\` tries to re-acquire same lock. Fix: in \`handleExistingLock\`, allow re-entry when \`existingPid === process.pid\`. Double \`releaseLock()\` is safe.
 
 <!-- lore:019e4422-5b29-77a8-8956-488233ef16a4 -->
-* **Always request critical code reviews with specific file paths, line numbers, and severity classifications**: Code review & investigation standards: (1) Reviews: exact file paths, line numbers, severity (C/M/L), root causes, concrete fixes. Check state-not-cleared, consume-once flags, circuit breaker bypass, concurrency edges. Critical+Medium fixed before merge. (2) Investigation: read actual source, trace full call chain (file+line), enumerate 2-4 candidates, report confirmed/falsified verdict. Distinguish co-required bugs. (3) PR discipline: critical self-review before merge, CI green, amend+force-push. (4) After bug fix: add tests (4-6 edge cases) referencing issue number. Worker test files follow a consistent 7-case spec. (5) Sentry IDs start with \`LOREAI-GATEWAY-\`. (6) Run lint, typecheck, full test suite before committing. Use Vitest (\`import { describe, it, expect } from 'vitest'\`; migrated from Mocha+Chai May 2026). Use kebab-case file naming. (7) Document process/workflow decisions in AGENTS.md before proceeding. (8) For new infrastructure/tooling: analyze tradeoffs, sketch architecture, wait for confirmation before implementing.
+* **Always request critical code reviews with specific file paths, line numbers, and severity classifications**: Code review & investigation standards: (1) Reviews: exact file paths, line numbers, severity (C/M/L), root causes, concrete fixes. Check state-not-cleared, consume-once flags, circuit breaker bypass, concurrency edges. Critical+Medium fixed before merge. (2) Investigation: read actual source, trace full call chain, enumerate 2-4 candidates, report confirmed/falsified verdict. (3) PR discipline: critical self-review before merge, CI green, amend+force-push. (4) After bug fix: add tests (4-6 edge cases) referencing issue number. (5) Sentry IDs start with \`LOREAI-GATEWAY-\`. (6) Run lint, typecheck, full test suite before committing. Use Vitest (\`import { describe, it, expect } from 'vitest'\`; migrated from Mocha+Chai May 2026). Use kebab-case file naming. (7) Document process/workflow decisions in AGENTS.md before proceeding. (8) For new infrastructure/tooling: analyze tradeoffs, sketch architecture, wait for confirmation before implementing.
 
 <!-- lore:019e498a-c0e4-70c5-ad40-d4d6d9d26ff5 -->
-* **CI/PR cycle: check failing jobs, wait for bots, resolve all comments before merging**: CI/PR cycle: After every push: (1) check failing jobs via \`gh run view --log-failed --job $(gh pr checks $PR\_NO --json state,link -q '.\[] | select(.state == "FAILURE").link | split("/")\[-1]')\`; (2) wait for 'Sentry Seer' and 'Cursor BugBot'; (3) fix all failures; (4) use \`gh api graphql\` with \`reviewThreads\` filtering \`isResolved==false, isMinimized==false\` for unresolved comments; (5) address all bot/human comments, respond or mark resolved; repeat until clean. PR creation: check if already on a relevant branch; follow repo branch/commit conventions; base PR description on implementation plan; add plan as \`git notes\`; create as draft initially. Always call \`plan\_exit\` when done planning. If BugBot finds nothing, merge and move on.
+* **CI/PR cycle: check failing jobs, wait for bots, resolve all comments before merging**: CI/PR cycle: After every push: (1) check failing jobs via \`gh run view --log-failed\`; (2) wait for 'Sentry Seer' and 'Cursor BugBot'; (3) fix all failures; (4) use \`gh api graphql\` with \`reviewThreads\` filtering \`isResolved==false, isMinimized==false\` for unresolved comments; (5) address all bot/human comments, respond or mark resolved; repeat until clean. PR creation: check if already on a relevant branch; follow repo branch/commit conventions; base PR description on implementation plan; add plan as \`git notes\`; create as draft initially. Always call \`plan\_exit\` when done planning. If BugBot finds nothing, merge and move on.
 
 <!-- lore:019e3cd7-97d3-7053-8f02-bb13d727662e -->
 * **Lore eval scores must beat or match tail-window — scoring below it means lost information**: Lore eval system: \`inflateScenario(scenario, opts?)\` in \`packages/eval/src/inflate.ts\` — opts is \`{ targetTokens?, excludeKeywords? }\`, NOT positional args. Token estimation: chars/4 (inflate), chars/3 (baselines.ts). 8 replay fixtures, 16 scenarios, 130 questions, 6 baselines in CI. \`--inflate\` incompatible with replay mode. Three baselines: (1) \`tailWindowBaseline()\`: backward scan, 80K token budget, drops prefix silently. (2) \`compactionBaseline()\`: multi-pass LLM summarization at 83.5% autoCompactThreshold. (3) \`buildLoreContext()\`: 25% distilled (40K) + 40% raw (64K). Filler turns (\`isFiller:true\`) skipped during gateway replay but included in \`allTurns\` for baseline context. Scores must beat or match tail-window — scoring below means lost information (treat as bug). QA contamination fixed via \`X-Lore-No-Store\`. Non-deterministic LLM output causes variance: re-run before concluding regression.

diff --git a/packages/core/src/agents-file.ts b/packages/core/src/agents-file.ts
@@ -540,9 +540,16 @@ export function exportLoreFile(projectPath: string): void {
   }
 
   // Content changed — write and update cache.
-  writeFileSync(fp, content, "utf8");
-  const { mtimeMs } = statSync(fp);
-  setCache(fp, { mtimeMs, hash: contentHash });
+  // Wrap in try-catch to silently handle ENOENT (project dir deleted/renamed
+  // mid-session). Other FS errors (EACCES, EIO) still propagate.
+  try {
+    writeFileSync(fp, content, "utf8");
+    const { mtimeMs } = statSync(fp);
+    setCache(fp, { mtimeMs, hash: contentHash });
+  } catch (e: unknown) {
+    if ((e as NodeJS.ErrnoException).code === "ENOENT") return;
+    throw e;
+  }
 }
 
 /**

diff --git a/packages/core/src/config.ts b/packages/core/src/config.ts
@@ -2,6 +2,18 @@ import { z } from "zod";
 import { existsSync, readFileSync } from "node:fs";
 import { join } from "node:path";
 import { isHostedMode } from "./hosted";
+import { warn } from "./log";
+
+/**
+ * Strip JS-style comments from a JSON string, enabling JSONC support for
+ * `.lore.json`. Preserves `//` and `/* ... */` inside quoted strings.
+ * Also removes trailing commas before `}` or `]`.
+ */
+function stripJsonComments(str: string): string {
+  return str
+    .replace(/("(?:[^"\\]|\\.)*")|\/\/[^\n]*|\/\*[\s\S]*?\*\//g, (m, s) => s ?? "")
+    .replace(/,\s*([}\]])/g, "$1");
+}
 
 export const LoreConfig = z.object({
   model: z
@@ -257,9 +269,15 @@ export async function load(directory: string): Promise<LoreConfig> {
   if (!isHostedMode()) {
     const path = join(directory, ".lore.json");
     if (existsSync(path)) {
-      const raw = JSON.parse(readFileSync(path, "utf8"));
-      current = LoreConfig.parse(raw);
-      return current;
+      try {
+        const raw = JSON.parse(stripJsonComments(readFileSync(path, "utf8")));
+        current = LoreConfig.parse(raw);
+        return current;
+      } catch (e) {
+        warn(
+          `Failed to parse ${path}: ${e instanceof Error ? e.message : e}. Using defaults.`,
+        );
+      }
     }
   }
   current = LoreConfig.parse({});

diff --git a/packages/gateway/instrument.ts b/packages/gateway/instrument.ts
@@ -72,6 +72,19 @@ const sentryEnabled =
   sentryEnvVar === "1" ? true : sentryEnvVar === "0" ? false : !isDev;
 
 if (sentryEnabled && !Sentry.isInitialized()) {
+  // Transient network errors that are expected in a long-running LLM proxy.
+  // These are not actionable bugs — they occur when clients disconnect,
+  // upstreams are temporarily unavailable, or network conditions degrade.
+  const TRANSIENT_ERROR_PATTERNS = [
+    /\bEPIPE\b/,
+    /socket connection was closed unexpectedly/i,
+    /ZlibError/,
+    /The operation timed out/i,
+    /Worker upstream exhausted \d+ retries/,
+    /ECONNRESET\b/,
+    /ECONNREFUSED\b/,
+  ];
+
   Sentry.init({
     dsn: "https://0282201d6a3df3bc46423e61012ae62b@o275100.ingest.us.sentry.io/4511355222622208",
 
@@ -85,6 +98,20 @@ if (sentryEnabled && !Sentry.isInitialized()) {
     // Capture 100% of transactions and logs
     tracesSampleRate: 1.0,
     enableLogs: true,
+
+    // Drop transient network errors that are not actionable bugs.
+    // Each exception in the chain is tested independently so a real bug
+    // wrapping a transient cause isn't accidentally silenced.
+    beforeSend(event) {
+      const values = event.exception?.values;
+      if (values?.some((v) => {
+        const msg = `${v.type}: ${v.value}`;
+        return TRANSIENT_ERROR_PATTERNS.some((re) => re.test(msg));
+      })) {
+        return null;
+      }
+      return event;
+    },
   });
 
   // Bridge core's log.* calls → Sentry structured logs + error capture

diff --git a/packages/gateway/script/bundle.ts b/packages/gateway/script/bundle.ts
@@ -331,6 +331,8 @@ export declare function startServer(config: GatewayConfig): {
   stop: () => void;
   port: number;
   hosts: string[];
+  /** Resolves when the server is listening. Present under Node.js; absent under Bun. */
+  ready?: Promise<void>;
 };
 
 /**

diff --git a/packages/gateway/script/node-polyfills.ts b/packages/gateway/script/node-polyfills.ts
@@ -85,6 +85,17 @@ if (typeof globalThis.Bun === "undefined") {
       }
     });
 
+    // Node's server.listen() is async — EADDRINUSE is emitted as an 'error'
+    // event, not thrown synchronously. Expose a `ready` promise so callers
+    // (startGateway) can await successful bind and catch port conflicts.
+    const ready = new Promise<void>((resolve, reject) => {
+      server.once("listening", resolve);
+      server.once("error", reject);
+    });
+    // Prevent UnhandledPromiseRejection if the caller never awaits `ready`.
+    // The real error surfaces when startGateway() awaits it.
+    ready.catch(() => {});
+
     server.listen(opts.port, opts.hostname);
 
     return {
@@ -94,6 +105,7 @@ if (typeof globalThis.Bun === "undefined") {
         if (typeof addr === "object" && addr !== null) return addr.port;
         return opts.port;
       },
+      ready,
     };
   }