Skip to content

fix(audit): set pathToClaudeCodeExecutable on all direct SDK call sites (B-006)#105

Merged
George-iam merged 1 commit intomainfrom
fix/audit-worker-claude-path-20260414
Apr 14, 2026
Merged

fix(audit): set pathToClaudeCodeExecutable on all direct SDK call sites (B-006)#105
George-iam merged 1 commit intomainfrom
fix/audit-worker-claude-path-20260414

Conversation

@George-iam
Copy link
Copy Markdown
Contributor

@George-iam George-iam commented Apr 14, 2026

Summary

Fixes B-006: audit worker in the bundled CJS build crashed on every session close with TypeError: fileURLToPath(undefined). Auto-learning (memories, decisions, safety rules, handoff extraction) was effectively dead on v0.2.7. Telemetry showed 14 consecutive audit_complete=failed events on 2026-04-13 from the developer machine; any external user on v0.2.7 hits the same crash.

Root cause

The Claude Agent SDK resolves its own executable via import.meta.url, which is undefined in the bundled CJS output and crashes inside fileURLToPath(). D-121 fixed this for the main MCP server by setting pathToClaudeCodeExecutable in the shared buildAgentQueryOptions() helper — but session-auditor and memory-extractor built their own options objects by hand and bypassed the helper, so the fix never reached the audit path.

Stack trace from .axme-code/audit-worker-logs/:

TypeError [ERR_INVALID_ARG_TYPE]: The "path" argument must be of type string or an instance of URL. Received undefined
    at fileURLToPath (node:internal/url:1487:11)
    at HL (/home/georgeb/.local/bin/axme-code:10771:43)
    at runSingleAuditCall (axme-code:57739:18)
    at runSessionAudit (axme-code:57628:25)
    at runSessionCleanup (axme-code:58693:21)

Changes

  • src/utils/agent-options.ts — export findClaudePath() so ad-hoc queryOpts can set pathToClaudeCodeExecutable without going through the full builder (different roles want different tool allowlists; the builder is too opinionated for the strict auditor/format/memory paths).
  • src/agents/session-auditor.ts — import findClaudePath, set pathToClaudeCodeExecutable: claudePath on both runSingleAuditCall and formatAuditResult queryOpts.
  • src/agents/memory-extractor.ts — same fix on runMemoryExtraction queryOpts. Currently tree-shaken (no callers), but fixed to keep future reuse safe and consistent.
  • test/agent-sdk-paths.test.ts (new) — static regression guard. Walks src/agents/**.ts, finds every file calling sdk.query(, and fails if neither buildAgentQueryOptions nor findClaudePath is imported. No SDK, no claude binary, no network — safe in CI.

Verification

Static:

  • npm test478/478 pass (3 new tests in agent-sdk-paths.test.ts)
  • npx tsc --noEmit — clean
  • npm run build — clean
  • grep pathToClaudeCodeExecutable dist/cli.mjs → 3 hits (helper + both session-auditor sites)

End-to-end smoke test on a real failed session (cb524f4b-8cc9-4e5c-9c50-21930f9001ac):

Before fix:

auditStatus: failed
auditAttempts: 1
lastAuditError: [runSessionAudit] The \"path\" argument must be of type string or an instance of URL. Received undefined

Ran node dist/cli.mjs audit-session --workspace ... --session cb524f4b... against the freshly-built bundle. Result:

{
  \"sessionId\": \"cb524f4b-...\",
  \"auditRan\": true,
  \"memories\": 0,
  \"decisions\": 0,
  \"safetyRules\": 0,
  \"handoffSaved\": false,
  \"worklogSummary\": true,
  \"oracleRescanned\": false,
  \"costUsd\": 1.0823125500000004
}

After fix:

auditStatus: done
auditedAt: 2026-04-14T07:38:31.094Z
lastAuditError: (none)

Audit log (.axme-code/audit-logs/2026-04-14T07-36-55_cb524f4b.json): phase: finished, promptTokens: 95120, chunks: 1, durationMs: 95181, costUsd: 1.08. Full LLM round-trip, no crash.

Test plan

  • Local bundle audits a real failed session without crashing
  • Merge, bump to v0.2.8, cut binary release
  • After install of v0.2.8, telemetry: `audit_complete` outcome=failed count drops to 0 on v0.2.8 machines

Fixes B-006.

🤖 Generated with Claude Code

Audit worker in the bundled CJS build crashed on every session close with
`TypeError: fileURLToPath(undefined)`. The Claude Agent SDK resolves its
own executable via `import.meta.url`, which is undefined in bundled CJS,
so every direct `sdk.query()` must pass `pathToClaudeCodeExecutable`
explicitly (D-121).

D-121 was applied to the shared `buildAgentQueryOptions()` helper, but
session-auditor and memory-extractor built their options objects by hand
and bypassed the helper. Result: main MCP server worked, but every
detached audit worker failed silently (14 consecutive audit_complete=
failed in telemetry, 2026-04-13). Auto-learning effectively dead on
v0.2.7.

Changes:
- Export `findClaudePath()` from src/utils/agent-options.ts so ad-hoc
  queryOpts can set pathToClaudeCodeExecutable without going through
  the full builder (different roles want different tool allowlists).
- Set `pathToClaudeCodeExecutable: claudePath` on the three direct
  SDK call sites: runSingleAuditCall, formatAuditResult,
  runMemoryExtraction.
- Add regression test (test/agent-sdk-paths.test.ts) that walks
  src/agents/**.ts and fails if any file calling `sdk.query(` does
  not import `buildAgentQueryOptions` or `findClaudePath`. Static —
  no SDK, no claude binary, no network.

Verified:
- 478/478 unit tests pass
- `tsc --noEmit` clean
- `npm run build` clean
- `grep pathToClaudeCodeExecutable dist/cli.mjs` = 3 hits
  (helper + both session-auditor call sites; memory-extractor is
  currently tree-shaken but fixed for future reuse).

Fixes B-006.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> #!axme pr=none repo=AxmeAI/axme-code
George-iam added a commit that referenced this pull request Apr 14, 2026
…led audits

B-007 addressed the two root causes that made the admin "Top error classes"
panel useless for triage:

1) `classifyError()` had no pattern for `ERR_INVALID_ARG_TYPE` /
   `fileURLToPath(undefined)` — every B-006 failure collapsed into `unknown`.
   All 16 failed audits in the last 30 days on prod landed in the same
   opaque bucket, impossible to distinguish from unrelated regressions.

2) `audit_complete` with `outcome="failed"` never set `category` or `fatal`,
   so failed audits did not hit the (category, error_class) composite index
   on the backend. They showed up with NULL category on the dashboard.

Vocabulary changes (src/telemetry.ts):
- Add specific node codes: `node_invalid_arg`, `module_not_found`,
  `spawn_error`, `out_of_memory`. These match BEFORE the generic
  fallbacks below so B-006-class failures keep their triage signal.
- Add generic JS kinds as last resort before `unknown`:
  `type_error`, `reference_error` — dispatched via `err.name` rather
  than message text, so a bare `TypeError: x is not a function` at
  least lands in a non-empty bucket.
- Drop the `msg.includes("enoent") → transcript_not_found` shortcut;
  a bare ENOENT is a generic missing-file hit, not a transcript issue.
  Kept `transcript not found` literal as the transcript-specific matcher
  and generic ENOENT now falls through to `transcript_not_found` only
  after `spawn ENOENT` is checked.
- Network: `econnreset` added alongside `econnrefused`.
- Doc-comment the load-bearing match order.

audit_complete event (src/session-cleanup.ts):
- When `outcome === "failed"`, stamp `category: "audit"` and
  `fatal: false`. Audit failures are non-fatal — the session still
  closes and the user's work is unaffected; only background extraction
  is lost until the next attempt. Setting these fields lets the
  existing backend index surface failed audits in the same panel as
  other categorized errors.

Tests (test/telemetry.test.ts):
- B-006 reproducer: exact TypeError message from the audit-worker-logs
  → `node_invalid_arg`.
- Order guard: the same message through the `TypeError` path must NOT
  degrade into the generic `type_error` fallback.
- One test per new class (module_not_found, spawn_error, out_of_memory,
  type_error, reference_error fallback).

Verified:
- 481/481 unit tests pass (6 new cases; full suite was 478 before)
- `tsc --noEmit` clean
- `npm run build` clean
- `grep` in `dist/cli.mjs` shows all 6 new slugs present in the bundle

Follow-up for v0.2.8 release: re-query
  SELECT error_class, COUNT(*)
  FROM telemetry_events
  WHERE event='audit_complete' AND outcome='failed'
  GROUP BY error_class
  ORDER BY COUNT(*) DESC
`unknown` should drop from 100% to a small minority; `node_invalid_arg`
should be 0 on v0.2.8 (B-006 fixed in PR #105).

Fixes B-007.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> #!axme pr=none repo=AxmeAI/axme-code
@George-iam George-iam merged commit 601c01c into main Apr 14, 2026
George-iam added a commit that referenced this pull request Apr 14, 2026
Patch release containing three bug fixes already merged on main:

- B-006 (#105): audit worker fileURLToPath(undefined) crash on every
  session close. pathToClaudeCodeExecutable now set on all three
  direct sdk.query() call sites in session-auditor + memory-extractor.
- B-007 (#106): classifyError vocabulary extended with node_invalid_arg
  / module_not_found / spawn_error / out_of_memory / type_error /
  reference_error. audit_complete failures now stamp category="audit"
  and fatal=false so they index correctly on the backend.
- B-008 (#107): #!axme safety gate regex tightened so a closing quote
  from a surrounding -m "..." string no longer gets glued onto the
  parsed repo name. Hook stops false-blocking commits on every retry.

Files bumped:
- package.json
- .claude-plugin/plugin.json
- templates/plugin-README.md (version badge)

CHANGELOG entry added under [0.2.8] - 2026-04-14.

Verified: 489/489 unit tests pass, npx tsc --noEmit clean,
npm run build clean.

Release flow after this PR merges:
1. user runs: git tag v0.2.8 && git push origin v0.2.8
2. release-binary.yml workflow auto-runs the chain:
   build (4 platforms) -> GitHub Release ->
   npm publish @axme/code@0.2.8 -> sync to axme-code-plugin

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant