LLM-1285: Web fetch tool#1
Conversation
…or web-fetch tool
…b-fetch tool Implement Phase 4 of the web-fetch extension: Playwright as primary fetcher with native fetch fallback when unavailable. Browser pool manages a single Chromium instance with lazy init, reuse, idle timeout, and crash recovery. Class-based BrowserPool design with constructor injection for testability.
…ed cache
Cache output was built using string.replace("Cache: miss", "Cache: hit") on the
full output including page body — if the page contained the literal string
"Cache: miss", the cached version was corrupted. Now builds cache output by
swapping the metadata array element before joining.
Also fixes unsafe `err as FetchError` cast with proper instanceof narrowing,
extracts EMPTY_DETAILS constant, and adds session-scoped cache with TTL,
cache integration tests, and session_shutdown cleanup.
Only set playwrightAvailable=false when the error indicates Chromium is genuinely not installed (e.g. "Executable doesn't exist"). Transient errors like ENOMEM or file locks now allow retry on the next getBrowser() call instead of permanently disabling Playwright for the session.
Wrap domino.createDocument in try/catch so malformed HTML returns a fallback string instead of crashing. Classify Playwright TimeoutError by error name rather than relying solely on message substring matching.
…tests into verify Converts the cache module from module-level state with free functions to a Cache class with constructor-injected options (ttlMs, maxEntries), matching the BrowserPool pattern. Adds LRU eviction at 100 entries, vitest to the extension, and a test:extensions script in the root verify pipeline.
- Clamp timeout parameter to [0, 120] to prevent negative values reaching Playwright/native fetch - Call shutdownBrowserPool() on session_shutdown alongside cache clear - Add comment clarifying truncated responses are cached intentionally
Wiz Scan Summary
To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension. |
|
|
||
| let response: Response; | ||
| try { | ||
| response = await fetch(url, { |
There was a problem hiding this comment.
Server-Side Request Forgery (SSRF) via User-Controlled URLs in Node.js HTTP Client Libraries (CWE-918)
Analyzed by AI as
More Details
This rule detects instances where user-controlled URLs are passed directly to HTTP client libraries, leading to a Server-Side Request Forgery (SSRF) vulnerability. SSRF allows an attacker to abuse server functionality to make requests to internal systems or services that are not directly exposed to the internet. This can result in unauthorized access to sensitive data, internal network reconnaissance, and potential pivoting into the internal network.
| Attribute | Value |
|---|---|
| Impact | |
| Likelihood |
AI Analysis
The code snippet shows a fetch() call with a url parameter, but there's insufficient context to determine if this URL comes from user input or is properly validated. The snippet doesn't show the source of the url variable or any validation logic. Without seeing how url is populated or if there are security controls elsewhere in the function, it's impossible to definitively classify this as exploitable or safe.
Remediation
Server-Side Request Forgery (SSRF) is a vulnerability that allows an attacker to force the server to make unintended requests to other systems or internal resources. This can lead to unauthorized access, data theft, and potentially further compromise of the internal network.
To mitigate SSRF, user input should never be directly passed to HTTP client libraries. Instead, implement strict input validation and whitelisting to ensure that only trusted URLs are allowed. This can be done by maintaining a list of approved domains or URLs, and rejecting any user input that does not match this whitelist.
Code examples:
// VULNERABLE CODE - User input is directly passed to the HTTP client library
const axios = require('axios');
const userUrl = req.query.url;
axios.get(userUrl)
.then(response => {
// Handle response
})
.catch(error => {
// Handle error
});
// SECURE CODE - User input is validated against a whitelist before making the request
const axios = require('axios');
const userUrl = req.query.url;
const allowedUrls = ['https://example.com', 'https://example.org'];
if (allowedUrls.includes(userUrl)) {
axios.get(userUrl)
.then(response => {
// Handle response
})
.catch(error => {
// Handle error
});
} else {
// Reject the request with an appropriate error message
}
Additional recommendations:
- Follow the principle of least privilege and only grant the minimum required permissions to the server.
- Implement strict input validation and sanitization for all user input, not just URLs.
- Keep software up-to-date and apply security patches promptly.
- Adhere to the OWASP Server-Side Request Forgery Prevention Cheat Sheet and other relevant security standards.
- Consider using a secure web gateway or web application firewall (WAF) to provide an additional layer of protection.
Rule ID: WS-I007-JAVASCRIPT-00019
To ignore this finding as an exception, reply to this conversation with #wiz_ignore reason
If you'd like to ignore this finding in all future scans, add an exception in the .wiz file (learn more) or create an Ignore Rule (learn more).
To get more details on how to remediate this issue using AI, reply to this conversation with #wiz remediate
…rompt When a user pasted an image and submitted the prompt, the user-message component rendered only the typed text — the image silently rode along to the model with no visible reference, and there was no way for the user to say "look at the second image" in their prompt. Match Claude's UX: prepend a `[Image #N]` marker per attached image to the text we forward to the model, with N being a session-wide running counter. A turn that pastes one image and types "is this a dog?" now becomes `[Image #1] is this a dog?` in both the rendered transcript and the model's input. Two-image turns become `[Image #N] [Image #M] <prompt>`. Empty prompts with images become just the markers, so the user message renders at all (UserMessageComponent is skipped when text is empty). The counter: - starts at 1 per session; - resets on `session_start`; - runs continuously across all turns and across all attached image sources (paste queue + any `event.images` an upstream extension may have populated), so reference numbers stay stable for the entire conversation. Verified against 8 input scenarios (single paste + text, multi-paste, empty text, whitespace-only text, pre-attached images, mixed sources, no-images bail-out, counter persistence across turns) — all pass. Existing 749-test suite still green. Co-Authored-By: Kimchi <noreply@kimchi.dev>
…bytes, not system icon (#148) ## Summary Two related bugs in the paste-image flow, fixed together: 1. **Wrong bytes attached to the model.** When a user copied an image file in Finder (`Cmd+C` on the file) and pasted into the Kimchi prompt, the harness either bailed with *"No image found on clipboard"* or attached the **macOS generic-PNG-document icon** (a 1024×1024 flat document glyph) instead of the file's actual contents. Vision models correctly described what they got — *"a generic PNG file icon"* — but the user's intent was lost. 2. **Pasted images had no textual reference in the prompt.** The user-message component renders only the typed text, so a turn with one paste and a caption rendered as just *"is this a dog?"* with the image riding along invisibly. There was no way for the user to write *"compare image 2 to image 3"*. This PR fixes both: the harness now reads the file's real bytes from disk when a Finder file URL is on the pasteboard, and it prepends Claude-style `[Image #N]` markers to the submitted text so each attached image has a stable reference. ## Bug 1 — root cause and fix A plain Finder `Cmd+C` of a file puts only `public.file-url` on the macOS pasteboard. The native clipboard addon (`@mariozechner/clipboard-darwin-*`, a wrapper around `clipboard-rs`) only consults `public.png` / `Apple PNG pasteboard type` for `hasImage()`, so: - Clean Finder file copy → `hasImage()` is false → handler bailed with *"No image found"*. - When some intermediary (Get Info → click icon → `Cmd+C`, or a clipboard manager) had re-rendered the file URL into a 1024×1024 NSImage system-icon rep → `hasImage()` returned true and `getImageBinary()` cheerfully handed back that icon as if it were user content. In `readClipboardImage()`: 1. Read `clipboard.availableFormats()` first (synchronous, cheap). 2. If `public.file-url` is present **and** we're on macOS, resolve the path via `osascript -e 'POSIX path of (the clipboard as «class furl»)'`. 3. Validate the path: image-typed extension (`.png`/`.jpg`/`.jpeg`/`.gif`/`.webp`), regular file, ≤ 50 MB, readable. 4. Return the file's bytes with the correct mime type — skipping the addon's image rep entirely. 5. Fall through to the addon for screenshots, image-data-only paste, and non-macOS platforms (unchanged behavior). The `availableFormats()` gate is essential: AppleScript will coerce an arbitrary text clipboard (`"hello"`) into a fake `/hello` path, so we only invoke it when a real file URL is advertised. ## Bug 2 — `[Image #N]` markers In the `input` event handler, count incoming + pending images and prepend `[Image #N]` markers to `event.text` (one marker per image, separated by spaces). The counter: - starts at 1 per session; - resets on `session_start`; - runs continuously across all turns and across all attached image sources (paste queue + any `event.images` an upstream extension may have populated). A turn that pastes one image and types *"is this a dog?"* now becomes `[Image #1] is this a dog?` in both the rendered transcript and the model's input. Two-image turns become `[Image #N] [Image #M] <prompt>`. Empty prompts with images become just the markers (so the user message renders at all — `UserMessageComponent` is otherwise skipped on empty text). ## Verification **Bug 1 — pasteboard scenarios** (5/5): | scenario | result | |---|---| | File-URL only (the bug) | reads `dog copy.png` from disk: **752 670 bytes**, `image/png` ✅ | | Real PNG bytes (screenshot-like) | addon path: 665 009 bytes ✅ | | Plain text `"hello"` | `null` (does **not** fabricate `/hello` as a path) ✅ | | File URL → non-image (`.txt`) | `null` ✅ | | File URL → missing file | `null` ✅ | **Bug 2 — marker scenarios** (8/8): | input | output text | counter after | |---|---|---| | 1 paste + `"is this a dog?"` | `[Image #1] is this a dog?` | 1 | | follow-up paste + `"what about now?"` | `[Image #2] what about now?` | 2 | | 2 pastes + `"compare these"` | `[Image #3] [Image #4] compare these` | 4 | | empty text + 1 paste | `[Image #5]` | 5 | | whitespace-only text + 1 paste | `[Image #6]` | 6 | | pre-attached `event.images` + caption | `[Image #7] see attached` | 7 | | 2 pre-attached + 1 pending + caption | `[Image #8] [Image #9] [Image #10] look` | 10 | | no images + plain text | bail out, counter unchanged | 10 | Plus: - `pnpm run check` (biome lint + tsc) — clean (one pre-existing `lsp/client.ts` warning is unrelated). - `pnpm test` — 749 tests across 40 files, all pass. - `pnpm run build:binary` + `pnpm run install:local` — clean. ## What else changed The PR also lands the supporting infrastructure that the new extension needs: - **`src/extensions/clipboard-image.ts`** — new file. Hosts both the file-URL-aware paste handler and the `[Image #N]` marker injection. - **`src/cli.ts`** — wires `clipboardImageExtension` into the default extension list. - **`src/extensions/ui.ts`** — exports `setPasteImageHandler` and `setPendingImageIndicator` so the new extension can hook the editor. - **`src/components/editor.ts`** — `PromptEditor.setPendingImageIndicator()` and a render path that pins a right-aligned status segment to the first prompt row, with content-width recomputed so typed text never collides with the indicator. - **`package.json`** — adds the prebuilt `@mariozechner/clipboard-{darwin,linux,win32}-*` addons as `optionalDependencies` so install picks the right platform binary. - **`.gitignore`** — ignores `kimchi-session-*.html` viewers produced by `/export`. ## Status Draft. Opening for early visibility while I'm: - still validating end-to-end with the running binary in everyday paste flows; - considering whether the same file-URL-prefer treatment should also apply on Linux (Wayland exposes file lists via `wl-paste --list-types` → `text/uri-list`, X11 via `xclip -selection clipboard -t TARGETS` — straightforward but currently out of scope). Reviews welcome on the macOS code path, the marker UX (`[Image #N]` placement and counter scope), and the editor indicator changes; the rest is mechanical. Closes LLM-1499 Co-Authored-By: Kimchi <noreply@kimchi.dev> --- ### Kimchi Summary ### What changed Adds support for pasting images directly from the clipboard into chat turns. When images are pasted, a right-aligned indicator appears in the prompt editor showing the number and total size of pending attachments, and `[Image #N]` markers are prepended to the submitted text for multimodal model compatibility. ### Why Enables users to attach screenshots and image files to conversations without saving them to disk first, matching the UX of multimodal chat interfaces like Claude. ### Key changes - **`src/extensions/clipboard-image.ts`**: New extension implementing cross-platform clipboard image reading via native NAPI bindings (`@mariozechner/clipboard-*` packages). Handles platform detection (Darwin/Linux/Windows), macOS file URL fallback via AppleScript, 50 MB size limits, and `[Image #N]` marker generation. - **`src/components/editor.ts`**: Extended `PromptEditor` with `setPendingImageIndicator()` and `computeContentWidth()` to reserve right-aligned space for status indicators without colliding with typed text; updates rendering to wrap content earlier when indicators are active. - **`src/extensions/ui.ts`**: Added `setPasteImageHandler()` and `setPendingImageIndicator()` bridge functions to wire clipboard events into the editor component. - **`package.json`**: Added nine platform-specific `optionalDependencies` for clipboard native addons covering Darwin (arm64/x64/universal), Linux (arm64/x64 with glibc/musl), and Windows (arm64/x64). - **`.gitignore`**: Added ignore rule for `kimchi-session-*.html` files generated by the `/export` command. ### Impact - **Dependencies**: Requires the matching `@mariozechner/clipboard-<platform>-<arch>` optional dependency; gracefully degrades with a warning if unavailable, unsupported (Termux/headless), or if the active model lacks image input capabilities. - **Size limits**: Pasted images are capped at 50 MB raw bytes to prevent excessive memory usage and API payload sizes. - ** UX**: Images are held in a pending queue until message submission, allowing multiple pastes before sending; the indicator updates in real-time to show accumulated size. --------- Co-authored-by: Kimchi <noreply@kimchi.dev>
Tier 0 of docs/ferment-review.md tracked six correctness items the post-review audit said had to land before any further feature work. This commit closes all six, plus a Critical bug from the original docs/ferment-audit.md that survived the entire rewrite and was surfaced by the new property-based test on its first real run. T0#1 handleRefinePhase running-step destruction (audit doc #4) ───────────────────────────────────────────────────────────────── refine_phase rebuilds the entire steps array, so allowing it through while a step is `running` would silently delete the in-flight step and the subagent's later complete_step would hit STEP_NOT_FOUND. The bug pre-dated this branch and was already documented in the project's own audit; it survived the rewrite anyway. Added a guard in handleRefinePhase + a new STEP_RUNNING TransitionError variant carrying { phaseId, runningStepId, runningStepIndex, runningDescription, message }. Three new state-machine tests cover reject-when-running, allow-after-running-completes, and the mixed terminal/pending case. T0#5 Distinguish unavailable judge from grade "B" ───────────────────────────────────────────────────────────────── When the judge LLM is unreachable or returns unparseable JSON, every grader used to return `{ grade: "B", rationale: "Judge unavailable — assumed good." }`. The self-improvement loop's strategy-change triggers fire only on grades C+, so a fully-down judge looked like a string of Bs and the loop never adapted. Added an `unavailable: true` flag to JudgeGrade (placeholder grade preserved for legacy consumers). Three downstream consumers updated: - stats.computeStats: skip unavailable grades when computing averageGrade and averageStepGrade so judge outages don't inflate the metric - self-improve.evaluatePhaseFeedback: short-circuit with a neutral adjustment, all triggers off, even if synthetic deltas were present - tools/phases.complete_phase: skip the corrective-step request for unavailable D/F grades — we don't trust the rationale enough to spend an extra LLM call on a fix - format.formatFermentStatus: append "?" to the grade tag (e.g. "📊 B?") so the UI distinguishes "judged B" from "unjudged" Two new self-improve tests cover the short-circuit and that a synthetic deltas array is also ignored when unavailable. T0#6 Lock the snapshot file with proper-lockfile ───────────────────────────────────────────────────────────────── Two subagents in a parallel group calling complete_step concurrently used to race on the snapshot file. The dependency was already in package.json; only used by skills-manager, never wired for ferments. Added FermentEventStore.withLock(id, fn) wrapping every mutation entry: writeWithEvents, create, delete, addDecision, addMemory, updateWorktree, migrateLegacy. proper-lockfile's lockSync API doesn't accept retries (the async API does), so a small hand-rolled exponential-backoff retry loop covers contention. Stale-recovery release errors are swallowed to keep the mutation result intact. Two new tests cover lock release between successive writes and that sibling mutations don't deadlock from the same process. T0#4 Property-based fold-equivalence test ───────────────────────────────────────────────────────────────── Generates random valid command sequences (respecting state-machine guards and the new STEP_RUNNING guard) using a seeded xorshift32 RNG, applies each via the production mutation path (applyCommand + commandToEvents + writeWithEvents), and after every command asserts: stateHash(eventStore.get(id)) === stateHash(parentStorage.get(id)) This is the contract the four state-bearing layers — state machine, command-mapper, event fold, snapshot store — must collectively satisfy. It is the Tier-0 #4 item from docs/ferment-review.md and was originally promised as the test that would have caught §4.2 and §4.3 the day they were written. Added 5 deterministic seeds (1, 7, 42, 137, 31337), each running a 40-step sequence. The legalCommands generator weights pause at 10% and abandon-from-paused at 10% so sequences don't short-circuit to terminal status. Non-PAUSED sequences exercise scope, activate_phase, activate_phase_group, refine_phase, start_step, complete_step, skip_step, fail_step, complete_phase, skip_phase, add_decision, add_memory, pause/resume, and grade-bearing variants. Bonus: audit doc finding #1 (Critical, unfixed) ───────────────────────────────────────────────────────────────── The property test caught a bug on its first real run that the audit flagged but no commit had addressed: handleActivatePhaseGroup did NOT deactivate a previously-active non-group phase. Trace: 1. activate_phase(P3) → P3 active, status=running 2. activate_phase_group(group=1) → group activates, but P3 STAYS active (no demotion sweep) 3. snapshot has 3 active phases (G1A, G1B, P3); a fresh fold has 2 (G1A, G1B). The "active phases are exactly the target group" invariant is broken in the snapshot. Added a deactivation sweep mirroring handleActivatePhase's pattern: demote any active phase whose groupIndex doesn't match the target group. New state-machine test "deactivates a previously-active non-group phase (audit #1)" covers it. Top-level scope-field drift, also caught by the property test ───────────────────────────────────────────────────────────────── The state machine writes both Ferment.scoping.{goal,criteria, constraints} (typed answers + confirmedAt) AND the denormalized top-level Ferment.{goal, successCriteria, constraints} (plain strings/arrays). The fold only reproduced the typed sub-record, so every scope command left the fold's top-level fields undefined while the snapshot had the real values. Added an optional `plain` field to ScopingGoalSetPayload, ScopingCriteriaSetPayload, ScopingConstraintsSetPayload. Mapper populates it from `post.{goal,successCriteria,constraints}`. Fold applies it to the corresponding top-level slot, falling back to `p.{goal,criteria,constraints}.answer` for legacy events. Refactor: applyEvent extracted as exported applyFermentEvent ───────────────────────────────────────────────────────────────── Already shipped earlier in §4.3 work — calling it out here because the property test depends on it. The fold and the mapper now share one source of truth for "what does this event do", which is the preconditon that makes hash-chain equivalence even tractable. Verification ───────────────────────────────────────────────────────────────── - pnpm run typecheck — 0 errors across 297 files - pnpm run lint — 0 errors across 304 files - pnpm run test — 89 files, 1608 passed, 3 pre-existing skips, 0 failures (up from 1595) Net new tests this commit: 13 - 3 in state-machine.test.ts (refine_phase running-step guard) - 1 in state-machine.test.ts (audit #1 deactivation sweep) - 2 in self-improve.test.ts (unavailable short-circuit) - 2 in event-store.test.ts (lock release / no nested deadlock) - 5 in event-store.property.test.ts (seeded random sequences)
Tier 0 of docs/ferment-review.md tracked six correctness items the post-review audit said had to land before any further feature work. This commit closes all six, plus a Critical bug from the original docs/ferment-audit.md that survived the entire rewrite and was surfaced by the new property-based test on its first real run. T0#1 handleRefinePhase running-step destruction (audit doc #4) ───────────────────────────────────────────────────────────────── refine_phase rebuilds the entire steps array, so allowing it through while a step is `running` would silently delete the in-flight step and the subagent's later complete_step would hit STEP_NOT_FOUND. The bug pre-dated this branch and was already documented in the project's own audit; it survived the rewrite anyway. Added a guard in handleRefinePhase + a new STEP_RUNNING TransitionError variant carrying { phaseId, runningStepId, runningStepIndex, runningDescription, message }. Three new state-machine tests cover reject-when-running, allow-after-running-completes, and the mixed terminal/pending case. T0#5 Distinguish unavailable judge from grade "B" ───────────────────────────────────────────────────────────────── When the judge LLM is unreachable or returns unparseable JSON, every grader used to return `{ grade: "B", rationale: "Judge unavailable — assumed good." }`. The self-improvement loop's strategy-change triggers fire only on grades C+, so a fully-down judge looked like a string of Bs and the loop never adapted. Added an `unavailable: true` flag to JudgeGrade (placeholder grade preserved for legacy consumers). Three downstream consumers updated: - stats.computeStats: skip unavailable grades when computing averageGrade and averageStepGrade so judge outages don't inflate the metric - self-improve.evaluatePhaseFeedback: short-circuit with a neutral adjustment, all triggers off, even if synthetic deltas were present - tools/phases.complete_phase: skip the corrective-step request for unavailable D/F grades — we don't trust the rationale enough to spend an extra LLM call on a fix - format.formatFermentStatus: append "?" to the grade tag (e.g. "📊 B?") so the UI distinguishes "judged B" from "unjudged" Two new self-improve tests cover the short-circuit and that a synthetic deltas array is also ignored when unavailable. T0#6 Lock the snapshot file with proper-lockfile ───────────────────────────────────────────────────────────────── Two subagents in a parallel group calling complete_step concurrently used to race on the snapshot file. The dependency was already in package.json; only used by skills-manager, never wired for ferments. Added FermentEventStore.withLock(id, fn) wrapping every mutation entry: writeWithEvents, create, delete, addDecision, addMemory, updateWorktree, migrateLegacy. proper-lockfile's lockSync API doesn't accept retries (the async API does), so a small hand-rolled exponential-backoff retry loop covers contention. Stale-recovery release errors are swallowed to keep the mutation result intact. Two new tests cover lock release between successive writes and that sibling mutations don't deadlock from the same process. T0#4 Property-based fold-equivalence test ───────────────────────────────────────────────────────────────── Generates random valid command sequences (respecting state-machine guards and the new STEP_RUNNING guard) using a seeded xorshift32 RNG, applies each via the production mutation path (applyCommand + commandToEvents + writeWithEvents), and after every command asserts: stateHash(eventStore.get(id)) === stateHash(parentStorage.get(id)) This is the contract the four state-bearing layers — state machine, command-mapper, event fold, snapshot store — must collectively satisfy. It is the Tier-0 #4 item from docs/ferment-review.md and was originally promised as the test that would have caught §4.2 and §4.3 the day they were written. Added 5 deterministic seeds (1, 7, 42, 137, 31337), each running a 40-step sequence. The legalCommands generator weights pause at 10% and abandon-from-paused at 10% so sequences don't short-circuit to terminal status. Non-PAUSED sequences exercise scope, activate_phase, activate_phase_group, refine_phase, start_step, complete_step, skip_step, fail_step, complete_phase, skip_phase, add_decision, add_memory, pause/resume, and grade-bearing variants. Bonus: audit doc finding #1 (Critical, unfixed) ───────────────────────────────────────────────────────────────── The property test caught a bug on its first real run that the audit flagged but no commit had addressed: handleActivatePhaseGroup did NOT deactivate a previously-active non-group phase. Trace: 1. activate_phase(P3) → P3 active, status=running 2. activate_phase_group(group=1) → group activates, but P3 STAYS active (no demotion sweep) 3. snapshot has 3 active phases (G1A, G1B, P3); a fresh fold has 2 (G1A, G1B). The "active phases are exactly the target group" invariant is broken in the snapshot. Added a deactivation sweep mirroring handleActivatePhase's pattern: demote any active phase whose groupIndex doesn't match the target group. New state-machine test "deactivates a previously-active non-group phase (audit #1)" covers it. Top-level scope-field drift, also caught by the property test ───────────────────────────────────────────────────────────────── The state machine writes both Ferment.scoping.{goal,criteria, constraints} (typed answers + confirmedAt) AND the denormalized top-level Ferment.{goal, successCriteria, constraints} (plain strings/arrays). The fold only reproduced the typed sub-record, so every scope command left the fold's top-level fields undefined while the snapshot had the real values. Added an optional `plain` field to ScopingGoalSetPayload, ScopingCriteriaSetPayload, ScopingConstraintsSetPayload. Mapper populates it from `post.{goal,successCriteria,constraints}`. Fold applies it to the corresponding top-level slot, falling back to `p.{goal,criteria,constraints}.answer` for legacy events. Refactor: applyEvent extracted as exported applyFermentEvent ───────────────────────────────────────────────────────────────── Already shipped earlier in §4.3 work — calling it out here because the property test depends on it. The fold and the mapper now share one source of truth for "what does this event do", which is the preconditon that makes hash-chain equivalence even tractable. Verification ───────────────────────────────────────────────────────────────── - pnpm run typecheck — 0 errors across 297 files - pnpm run lint — 0 errors across 304 files - pnpm run test — 89 files, 1608 passed, 3 pre-existing skips, 0 failures (up from 1595) Net new tests this commit: 13 - 3 in state-machine.test.ts (refine_phase running-step guard) - 1 in state-machine.test.ts (audit #1 deactivation sweep) - 2 in self-improve.test.ts (unavailable short-circuit) - 2 in event-store.test.ts (lock release / no nested deadlock) - 5 in event-store.property.test.ts (seeded random sequences)
…ssification (#225) ## Summary Two related issues blocked `jetbrains_get_all_open_file_paths` (and other MCP direct tools) from being callable in a kimchi-dev session, even when the JetBrains MCP server was running and reachable. 1. **First-run registration gap** — when the on-disk `mcp-cache.json` had a stale `configHash` for a server with `directTools: true`, the cached metadata was rejected, no direct tools registered at module load, and the bootstrap-on-connect path only emitted *"available after restart"*. A naive first-time user (or anyone who edited `.kimchi/mcp.json`) saw a permanent **Tool not found** failure. 2. **Plan-mode classifier blind spot** — `permissions/taxonomy.ts` only matched the read-verb (`get`/`list`/etc.) at position 0 of the tool name. MCP direct tools arrive prefix-flattened (`jetbrains_get_all_open_file_paths`), so the verb sits at position 1 and every read-only MCP tool got classified `unknown` → stripped from the plan-mode active set → reported as *"not available"* or *"not found"* depending on timing. This PR fixes both. After landing, the user-reported flow works on the first run, in plan mode, without a restart. ## Root cause walkthrough ### Why the cache rejection happens - `src/extensions/mcp-adapter/index.ts` at module load runs `resolveDirectTools(earlyConfig, earlyCache, …)` (line 75 pre-patch). For each server with `directTools: true`, it iterates the cache entry and registers one `pi.registerTool({ … })` per cached tool. - `direct-tools.ts:83` skips the server when `!isServerCacheValid(serverCache, definition)`. - `isServerCacheValid` (`metadata-cache.ts:106`) compares `entry.configHash` to `computeServerHash(definition)`. `computeServerHash` only hashes identity-affecting fields (`command`, `args`, `env`, `cwd`, `url`, `headers`, `auth`, `bearerToken`, `bearerTokenEnv`, `exposeResources`, `excludeTools`). - The cache file `~/.config/kimchi/harness/mcp-cache.json` is **shared across every project** (path derives from `getAgentDir()`), so an entry written by a different project's `mcp.json` lingers and silently fails the hash check. - The bootstrap path at `init.ts:171-203` could re-cache after a successful connect, but it only fired the *"available after restart"* notification; direct tools register synchronously at module load, so they only became visible on the **next** restart. Combined with the unconditional `if (!state) return []` guard in `registerAndActivate`, even adding a runtime registration callback wouldn't have worked. ### Why plan mode hid the registered tools - Reproduced from the session log: call #0 returned *"Plan mode: tool jetbrains_get_all_open_file_paths is not available. Use /permissions mode default to enable writes."* and call #1 returned *"Tool jetbrains_get_all_open_file_paths not found"*. - `permissions/index.ts:192` (`applyPlanModeTools`) runs at session start and rebuilds the active set to **only** tools where `isReadOnlyTool(name)` is true or `name` is in `PLAN_MODE_TOOLS`. - `taxonomy.ts:20` had `READ_ONLY_NAME_HINT = /^(read|get|list|search|query|describe|find|grep|ls|loki_|view|show)/i`. For `jetbrains_get_all_open_file_paths`, the regex anchors at `^` and only sees `jetbrains` → no match → classified `unknown` → not read-only → dropped from the plan-mode active set. ## Changes ### `src/extensions/mcp-adapter/metadata-cache.ts` - `overwriteMetadataCache(cache)` — writes the on-disk cache verbatim (no merge with existing). `saveMetadataCache` always merges, so it cannot delete entries; the new function fills that gap. - `purgeStaleEntries(cache, mcpServers)` — drops entries whose server **is** in the current `mcp.json` but whose `configHash` no longer matches the live definition. **Orphan entries (servers not in the current project's config) are preserved** because the cache is shared across projects; dropping them would silently nuke teammates' caches. ### `src/extensions/mcp-adapter/index.ts` - Run `purgeStaleEntries` at module load right after `loadMetadataCache()`. If any entry was removed, persist via `overwriteMetadataCache` and continue with the cleaned cache. - `registerAndActivate(specs, opts?)` now accepts a `markDynamic` flag (default `true`, preserves existing proxy describe/search callers). When `false`, the new tool names are **not** added to `state.dynamicToolNames`, so the `pi.on("input", …)` hook leaves them in the active set across turns. - Removed the early `if (!state) return []` guard so the new bootstrap callback can run from inside `initializeMcp` before its promise resolves. The lazy `() => state` closure inside `createDirectToolExecutor` resolves the state correctly at call time. `state.dynamicToolNames` is only touched when both `markDynamic` and `state` are truthy. - Added `registerBootstrappedDirectTools(specs)` — thin wrapper that calls `registerAndActivate(specs, { markDynamic: false })`. Passed as the third argument to `initializeMcp`. ### `src/extensions/mcp-adapter/init.ts` - `initializeMcp` learned an optional third argument: `registerBootstrappedDirectTools?: (specs: DirectToolSpec[]) => string[]`. - After `loadMetadataCache()` inside `initializeMcp`, run `purgeStaleEntries` again and persist via `overwriteMetadataCache`. Covers callers that don't go through `index.ts` module-load (tests, embedded use). - After the bootstrap loop at `init.ts:171-203`, for each successfully bootstrapped server name, resolve fresh specs via `resolveDirectTools(config, freshCache, prefix, envOverride)` filtered to `bootstrapped`, then invoke the callback. On non-empty injection, emit the new *"MCP: N direct tool(s) from … are now available"* notification; fall back to the previous *"available after restart"* message when the callback is absent or returns empty. ### `src/extensions/permissions/taxonomy.ts` - Added `READ_ONLY_VERB_SEGMENTS` (`get`, `list`, `search`, `query`, `describe`, `find`, `grep`, `ls`, `view`, `show`, `preview`, `inspect`, plus `read`). - `hasReadOnlyVerbSegment(toolName)` splits on `_` and checks if **any segment after the first** is an exact match for a read-verb. The first segment is intentionally skipped so a server namespace like `list` or `get` doesn't blanket-mark every tool under it as read-only. - `classifyTool` consults the new helper as a fallback after `STATIC_CATEGORIES`, the `mcp__` branch, and the original `READ_ONLY_NAME_HINT` start-anchored regex. Worked out by inspection of the `_refreshToolRegistry` flow in the bundled pi-coding-agent — tools must live in `_toolRegistry` (active set) **and** match `currentContext.tools` at agent-loop dispatch (`agent-loop.js:332`) or the agent returns `Tool X not found` before any extension `tool_call` hook runs. ## Verified classifications | Tool | Before | After | Reason | | --------------------------------------------- | --------- | --------- | ------------------------------------------- | | `jetbrains_get_all_open_file_paths` | `unknown` | `readOnly`| verb `get` in segment 1 | | `jetbrains_get_run_configurations` | `unknown` | `readOnly`| verb `get` in segment 1 | | `jetbrains_xdebug_get_stack` | `unknown` | `readOnly`| verb `get` in segment 2 | | `supabase_list_tables`, `supabase_search_docs`| `unknown` | `readOnly`| verb `list`/`search` in segment 1 | | `jetbrains_execute_run_configuration` | `unknown` | `unknown` | no read-verb segment — stays blocked | | `jetbrains_build_project` | `unknown` | `unknown` | no read-verb segment — stays blocked | | `jetbrains_create_new_file` | `unknown` | `unknown` | `create` is not in the read-verb set | | `jetbrains_rename_refactoring` | `unknown` | `unknown` | `rename` is not in the read-verb set | | `playwright_browser_click` | `unknown` | `unknown` | no read-verb segment | | `mcp`, `bash`, `read` | unchanged | unchanged | hit `STATIC_CATEGORIES` | ## Test plan - [x] `pnpm run typecheck` — clean. - [x] `pnpm run lint` — clean. - [x] `pnpm test` — 1891/1891 pass. - [x] Dry-run `purgeStaleEntries` against `~/.config/kimchi/harness/mcp-cache.json` with the project's current `.kimchi/mcp.json` — removes only the stale `jetbrains` entry, preserves `supabase`, `sentry`, `playwright` orphans. - [x] Sanity-tested classifier against the table above (all expectations matched). - [x] Manual smoke: with JetBrains IDE listening on `127.0.0.1:64342`, stop `pnpm dev`, `jq 'del(.servers.jetbrains)' ~/.config/kimchi/harness/mcp-cache.json` to force the bootstrap path, start `pnpm dev`, ask for `jetbrains_get_all_open_file_paths` in plan mode → expect the call to succeed (cache rewritten with matching hash, tool registered in-session, plan-mode allow-list now includes it). - [x] Manual smoke: kill JetBrains, repeat — expect the *"available after restart"* fallback notify and a clean `mcp({ tool: "get_all_open_file_paths", server: "jetbrains" })` *"server not available"* error. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- ### Kimchi Summary ### What changed Fixes stale MCP metadata cache entries blocking direct-tool registration on startup, enables immediate registration of bootstrapped direct tools without a restart, and corrects the permission taxonomy for flattened MCP tool names. ### Why When a server definition changed, the on-disk cache retained an old `configHash` that caused `resolveDirectTools` to silently reject every tool on load, forcing users to restart. Additionally, the permissions layer only looked for read-only verbs at the start of a tool name, so flattened direct-tool names like `jetbrains_get_all_open_file_paths` were incorrectly left as `unknown`. ### Key changes - `metadata-cache.ts` - Adds `purgeStaleEntries` to drop cache entries whose `configHash` no longer matches the current server definition, while preserving orphans from other projects. - Adds `overwriteMetadataCache` for atomic replace-without-merge writes, with best-effort temp-file cleanup and swallowed I/O errors so a bad cache never crashes the extension host. - Refactors cache I/O into `atomicWriteCache`, `loadExistingCache`, `cleanupTempFile`, and `isValidCache`. - `index.ts` - Purges stale cache and overwrites it during early extension load. - Extends `registerAndActivate` with an optional `{ markDynamic?: boolean }` flag; when `markDynamic` is `true` and `state` is not yet initialized, registration is skipped to prevent leaked dynamic tools. - Introduces `registerBootstrappedDirectTools` (always `markDynamic: false`) and passes it into `initializeMcp` so bootstrapped tools persist across turns. - `init.ts` - Mirrors the stale-cache purge inside `initializeMcp` for non-Pi entry points. - After successful bootstrap, immediately calls `registerBootstrappedDirectTools` with resolved direct-tool specs instead of showing a "restart required" notification. - `taxonomy.ts` and `taxonomy.test.ts` - Adds `hasReadOnlyVerbSegment`, which scans underscore-separated segments after the first (server prefix) to classify namespaced direct tools (e.g., `supabase_list_tables`) as `readOnly`. - `cache-resolver.integration.test.ts` (new) - Adds integration tests exercising stale-entry purge, overwrite-vs-merge semantics, end-to-end stale→purge→resolve flow, and `excludeTools` / `directTools=false` interactions. ### Impact - Bootstrapped MCP direct tools are now available in the current session without restarting. - Dynamic-proxy tools registered before `state` is ready are no longer leaked. - Cache file corruption or write failures are handled gracefully. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Adds a
web_fetchtool to kimchi-code as an in-repo extension (extensions/web-fetch/). The tool fetches web pages — including JavaScript-rendered SPAs — and converts HTML to clean markdown for the coding agent to consume.What it does
fetch()when Playwright/Chromium is not installedTool parameters
urlhttp://orhttps://)formatmarkdown|text|htmlmarkdowntimeout