fix(telemetry): extend error classifier + stamp category/fatal on failed audits (B-007)#106
Merged
George-iam merged 1 commit intomainfrom Apr 14, 2026
Merged
Conversation
…led audits
B-007 addressed the two root causes that made the admin "Top error classes"
panel useless for triage:
1) `classifyError()` had no pattern for `ERR_INVALID_ARG_TYPE` /
`fileURLToPath(undefined)` — every B-006 failure collapsed into `unknown`.
All 16 failed audits in the last 30 days on prod landed in the same
opaque bucket, impossible to distinguish from unrelated regressions.
2) `audit_complete` with `outcome="failed"` never set `category` or `fatal`,
so failed audits did not hit the (category, error_class) composite index
on the backend. They showed up with NULL category on the dashboard.
Vocabulary changes (src/telemetry.ts):
- Add specific node codes: `node_invalid_arg`, `module_not_found`,
`spawn_error`, `out_of_memory`. These match BEFORE the generic
fallbacks below so B-006-class failures keep their triage signal.
- Add generic JS kinds as last resort before `unknown`:
`type_error`, `reference_error` — dispatched via `err.name` rather
than message text, so a bare `TypeError: x is not a function` at
least lands in a non-empty bucket.
- Drop the `msg.includes("enoent") → transcript_not_found` shortcut;
a bare ENOENT is a generic missing-file hit, not a transcript issue.
Kept `transcript not found` literal as the transcript-specific matcher
and generic ENOENT now falls through to `transcript_not_found` only
after `spawn ENOENT` is checked.
- Network: `econnreset` added alongside `econnrefused`.
- Doc-comment the load-bearing match order.
audit_complete event (src/session-cleanup.ts):
- When `outcome === "failed"`, stamp `category: "audit"` and
`fatal: false`. Audit failures are non-fatal — the session still
closes and the user's work is unaffected; only background extraction
is lost until the next attempt. Setting these fields lets the
existing backend index surface failed audits in the same panel as
other categorized errors.
Tests (test/telemetry.test.ts):
- B-006 reproducer: exact TypeError message from the audit-worker-logs
→ `node_invalid_arg`.
- Order guard: the same message through the `TypeError` path must NOT
degrade into the generic `type_error` fallback.
- One test per new class (module_not_found, spawn_error, out_of_memory,
type_error, reference_error fallback).
Verified:
- 481/481 unit tests pass (6 new cases; full suite was 478 before)
- `tsc --noEmit` clean
- `npm run build` clean
- `grep` in `dist/cli.mjs` shows all 6 new slugs present in the bundle
Follow-up for v0.2.8 release: re-query
SELECT error_class, COUNT(*)
FROM telemetry_events
WHERE event='audit_complete' AND outcome='failed'
GROUP BY error_class
ORDER BY COUNT(*) DESC
`unknown` should drop from 100% to a small minority; `node_invalid_arg`
should be 0 on v0.2.8 (B-006 fixed in PR #105).
Fixes B-007.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> #!axme pr=none repo=AxmeAI/axme-code
George-iam
added a commit
that referenced
this pull request
Apr 14, 2026
Patch release containing three bug fixes already merged on main: - B-006 (#105): audit worker fileURLToPath(undefined) crash on every session close. pathToClaudeCodeExecutable now set on all three direct sdk.query() call sites in session-auditor + memory-extractor. - B-007 (#106): classifyError vocabulary extended with node_invalid_arg / module_not_found / spawn_error / out_of_memory / type_error / reference_error. audit_complete failures now stamp category="audit" and fatal=false so they index correctly on the backend. - B-008 (#107): #!axme safety gate regex tightened so a closing quote from a surrounding -m "..." string no longer gets glued onto the parsed repo name. Hook stops false-blocking commits on every retry. Files bumped: - package.json - .claude-plugin/plugin.json - templates/plugin-README.md (version badge) CHANGELOG entry added under [0.2.8] - 2026-04-14. Verified: 489/489 unit tests pass, npx tsc --noEmit clean, npm run build clean. Release flow after this PR merges: 1. user runs: git tag v0.2.8 && git push origin v0.2.8 2. release-binary.yml workflow auto-runs the chain: build (4 platforms) -> GitHub Release -> npm publish @axme/code@0.2.8 -> sync to axme-code-plugin Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes B-007: admin "Top error classes" panel was useless for triage — every failed audit collapsed into `error_class='unknown'` with `category=NULL` and `fatal=NULL`.
Root cause (two parts)
Prod query for the last 30 days:
All 16 failures — same opaque bucket, impossible to distinguish B-006 from anything else.
Changes
`src/telemetry.ts` — vocabulary
Added specific Node codes (matched before generic fallbacks):
Added generic JS kinds as last resort (via `err.name` so bland messages still classify):
Also: network catches `econnreset` alongside `econnrefused`. Load-bearing match order documented in code comment.
`src/session-cleanup.ts` — stamp category/fatal on failed audits
When `outcome === 'failed'`, stamp `category: 'audit'` and `fatal: false`. Audit failures are non-fatal — session still closes, only background extraction is lost until next attempt. This gets failed audits onto the backend's composite index so they show up in the error panel alongside other categories.
`test/telemetry.test.ts` — regression tests
Verification
Test plan
```sql
SELECT error_class, COUNT() FROM telemetry_events
WHERE event='audit_complete' AND outcome='failed' AND ts > 'v0.2.8 release'
GROUP BY error_class ORDER BY COUNT() DESC
```
`unknown` should drop from 100% to a small minority. `node_invalid_arg` specifically should be ~0 on v0.2.8 (B-006 fixed by PR fix(audit): set pathToClaudeCodeExecutable on all direct SDK call sites (B-006) #105).
Related
Fixes B-007.
🤖 Generated with Claude Code