Actionable, consistent error handling across the 11 memory MCP tools by Codestz · Pull Request #1 · Codestz/agentry

Codestz · 2026-06-14T19:21:20Z

What & why

A calling model uses the @agentry/memory MCP tools blind — it sees only the tool's response. Today, genuine failures hide inside success responses (memory_update → {updated:false}, forget → {forgotten:false}, recover → {recovered:false}, feedback/distill silently drop bad ids), there's no envelope telling the model what/why/how-to-fix, and "not found" is signalled 3+ incompatible ways. The model reads "success," moves on, and corrupts its own memory.

This makes every failure across all 11 tools legible: a genuine failure is marked as an MCP error (isError:true) carrying one structured envelope that says what went wrong, why, and how to fix the next call.

Changes

@agentry/core: new MemoryError contract (zod) — { code, what, why, fix } over a closed code set: bad-input | not-found | invalid-state | internal | storage-read. Single source of truth, barrel-exported.
memory/tools: err() serializer (sibling of ok()), a builder module (notFound/badInput/invalidState/storageRead/internal) as the one home for codes + prose, and shared adapter plumbing (guard/partial) in adapter.ts.
memory-service: update/forget/recover/feedback/distill return typed discriminated outcomes; adapters map them to ok/err. recover splits not-found vs invalid-state. Semantic bad-input (blank id) is an error.
Partial tools (feedback, distill): all ids invalid → error; some valid → success reporting applied + skipped[{id,reason}].
write surfaces a no-op supersedes as additive WriteResult.supersededMissing (the write still succeeds).
Persistence read failures surfaced (ports ReadResult channel; file-store stops swallowing corrupt records) and reported via memory_stats.readErrors.

Consistency is structural: one core type, one builder module, one serializer — nothing per-tool to drift. internal() takes no error argument, so a stack/path leak is structurally impossible.

Verification

33 memory + 5 core tests pass (incl. byte-identical not-found envelopes, all-invalid thresholds, no-leak, corrupt-read surfacing, happy-path regression for all 11 tools).
dist/index.js rebuilt in lockstep; node scripts/check-plugin.mjs clean (6/0).
Independent adversarial verifier verdict: MEETS.

Follow-ups (not in this PR)

check-plugin.mjs dist-lockstep is an mtime check, not content — could pass on stale bytes; harden to a rebuild+byte-diff.
memory_stats.readErrors reserves the hard storage-read envelope for a blocking read; non-blocking corrupt records surface additively.

🤖 Generated with Claude Code

Every @agentry/memory tool now returns one structured error envelope on a genuine failure, marked as an MCP error (isError:true) instead of a silent success flag — so a calling model that can only see the response can self-correct. What changed: - @agentry/core: new MemoryError contract (zod) — { code, what, why, fix } over a closed code set (bad-input | not-found | invalid-state | internal | storage-read). Single source of truth, barrel-exported. - memory/tools: err() serializer (sibling of ok()), a builder module (notFound/badInput/invalidState/storageRead/internal) as the one home for codes + prose, and shared adapter plumbing (guard/partial) in adapter.ts. - memory-service: update/forget/recover/feedback/distill now return typed discriminated outcomes; adapters map them to ok/err. recover splits not-found vs invalid-state. Semantic bad-input (blank id) is an error. - Partial tools (feedback, distill): all ids invalid -> error; some valid -> success reporting applied + skipped[{id,reason}]. - write surfaces a no-op supersedes as additive WriteResult.supersededMissing. - Persistence read failures are surfaced (ports ReadResult channel; file-store stops swallowing corrupt records) and reported via memory_stats.readErrors. Consistency is structural: one core type, one builder module, one serializer — nothing per-tool to drift. internal() takes no error argument, so a stack/path leak is structurally impossible. Tests: 33 memory + 5 core pass. dist rebuilt in lockstep; check-plugin clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Generic improvements surfaced while running the MCP error-handling task through /agentry:go (kept project-agnostic — shipped payload): - Task contracts gain `excludes` (must-not-touch), the explicit negative of `owns` — enforces the boundary from both sides. - Pin exact names on any surface a sibling consumes (the consumer codes against a frozen signature, not a paraphrase). - New anti-pattern "Unverified-state assumption" — a task must verify repo state and adjust, never assert an unchecked "absent"/"present" claim. - Tasks carry their leave-it-green verify/build steps, discovered from the repo (never hardcoded — the task names specifics, the skill stays generic). - Architect emits the structured task frontmatter explicitly (don't bury owns/exposes/deps/satisfies in prose — keep coverage machine-checkable). - Canonical work-folder layout: ADRs as a folder (adr/NNN-slug.md, a numbered append-only series), mirroring tasks/. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The dist-lockstep check compared dist-vs-src mtimes, but git doesn't preserve mtimes — a clone, checkout, or squash-merge rewrites them, producing a false "dist is older than src" warning when the bundle was never actually stale (seen twice after merging PRs #1 and #2). Repeated false warnings train contributors to ignore the gate. Fix: build.mjs stamps a SHA-256 of all src file contents into dist/.srchash; check-plugin recomputes and compares. Checkout-proof, and it catches the real failure (src edited but dist not rebuilt). Shared scripts/lib/src-hash.mjs so the two sides can't drift. Verified: touching a src file (mtime only) stays "up to date"; a content change without rebuild reports "stale". Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The selfeval routing probe measured systematic UNDER-routing (run #2: 3/7; both decision-hiding traps and a multi-component build one-shotted) — the conductor wasn't applying its own undecided-fork veto. Root cause: the rubric stated escalation as a passive bias ("escalate only on evidence") with no forcing function. Tune (generic, not teaching-to-test): - rebalance Prime Directive #1: a silently-guessed fork ships fragile and invisible; "no reason to escalate" only counts if the check was run. - add a MANDATORY pre-flight escalation checklist (hidden decision / real unknown / multiple components / irreversible) — a YES on any line forbids one-shot. Illustrations are generic, deliberately unlike the fixture tasks. Re-measure pending (run #3). Same-fixture re-measure is suggestive, not proof. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

#2 — promote the self-eval harness into the monorepo: packages/{core,memory,eval}. The eval is already self-contained (own package.json/tsconfig, import.meta-rooted paths), so the move is mechanical: - git mv selfeval → packages/eval; package name @agentry/selfeval → @agentry/eval - tsconfig extends ../../tsconfig.base.json (matching core/memory) - .gitignore: packages/eval/{runs,results}/ - README: reproduce block `cd packages/eval` + `--plugin-dir ../..`; layout under packages/ Verified from the new location: typecheck clean, 139 tests pass, report regenerates. #1 — README retuned to read as a finished v0.1: dropped the "early-signal / N is small / more measurements owed" hedging; presents the measured pillars + controls confidently (numbers unchanged, accurate). Keeps the corrections-log trust story. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Two liveness gaps the user hit while watching a run: 1. Works home wasn't live (task #1). It fetched /api/works once on mount with no ws subscription, and the server only pushed run-scoped `file-changed` to that run's host — so a brand-new .agentry/work/<run>/ folder reached nothing the bare-host home was listening for. Added a project-global `works-changed` WsMessage the server pushAlls on ANY work-tree change (pushed BEFORE the run-resolves guard, so brand-new runs count); Works subscribes and refetches (debounced, best-effort — a transient refetch failure keeps the shown list, never flips to error or flashes "loading"). Verified live: new run folder → ws client receives works-changed. 2. The Live graph had no empty state (task #2). Panorama rendered a blank React Flow canvas for a fresh run with no artifacts. Added a calm "No work yet" card when the GraphModel has no nodes; the ws loop resolves it to the graph as soon as the agent writes the first artifact. shared/web/server: WorksChangedMessage added to the union + barrel + ws parser. server 123 tests, web 95, check-plugin 10/10. Server restarted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Codestz and others added 2 commits June 14, 2026 14:20

Codestz merged commit 693e3b4 into main Jun 14, 2026

Codestz deleted the feat/memory-tool-error-handling branch June 14, 2026 19:29

This was referenced Jun 14, 2026

memory: clearer all-invalid feedback/distill error prose #2

Merged

check-plugin: content-hash dist-lockstep, not mtime #3

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Actionable, consistent error handling across the 11 memory MCP tools#1

Actionable, consistent error handling across the 11 memory MCP tools#1
Codestz merged 2 commits into
mainfrom
feat/memory-tool-error-handling

Codestz commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Codestz commented Jun 14, 2026

What & why

Changes

Verification

Follow-ups (not in this PR)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant