Skip to content

Actionable, consistent error handling across the 11 memory MCP tools#1

Merged
Codestz merged 2 commits into
mainfrom
feat/memory-tool-error-handling
Jun 14, 2026
Merged

Actionable, consistent error handling across the 11 memory MCP tools#1
Codestz merged 2 commits into
mainfrom
feat/memory-tool-error-handling

Conversation

@Codestz

@Codestz Codestz commented Jun 14, 2026

Copy link
Copy Markdown
Owner

What & why

A calling model uses the @agentry/memory MCP tools blind — it sees only the tool's response. Today, genuine failures hide inside success responses (memory_update{updated:false}, forget{forgotten:false}, recover{recovered:false}, feedback/distill silently drop bad ids), there's no envelope telling the model what/why/how-to-fix, and "not found" is signalled 3+ incompatible ways. The model reads "success," moves on, and corrupts its own memory.

This makes every failure across all 11 tools legible: a genuine failure is marked as an MCP error (isError:true) carrying one structured envelope that says what went wrong, why, and how to fix the next call.

Changes

  • @agentry/core: new MemoryError contract (zod) — { code, what, why, fix } over a closed code set: bad-input | not-found | invalid-state | internal | storage-read. Single source of truth, barrel-exported.
  • memory/tools: err() serializer (sibling of ok()), a builder module (notFound/badInput/invalidState/storageRead/internal) as the one home for codes + prose, and shared adapter plumbing (guard/partial) in adapter.ts.
  • memory-service: update/forget/recover/feedback/distill return typed discriminated outcomes; adapters map them to ok/err. recover splits not-found vs invalid-state. Semantic bad-input (blank id) is an error.
  • Partial tools (feedback, distill): all ids invalid → error; some valid → success reporting applied + skipped[{id,reason}].
  • write surfaces a no-op supersedes as additive WriteResult.supersededMissing (the write still succeeds).
  • Persistence read failures surfaced (ports ReadResult channel; file-store stops swallowing corrupt records) and reported via memory_stats.readErrors.

Consistency is structural: one core type, one builder module, one serializer — nothing per-tool to drift. internal() takes no error argument, so a stack/path leak is structurally impossible.

Verification

  • 33 memory + 5 core tests pass (incl. byte-identical not-found envelopes, all-invalid thresholds, no-leak, corrupt-read surfacing, happy-path regression for all 11 tools).
  • dist/index.js rebuilt in lockstep; node scripts/check-plugin.mjs clean (6/0).
  • Independent adversarial verifier verdict: MEETS.

Follow-ups (not in this PR)

  • check-plugin.mjs dist-lockstep is an mtime check, not content — could pass on stale bytes; harden to a rebuild+byte-diff.
  • memory_stats.readErrors reserves the hard storage-read envelope for a blocking read; non-blocking corrupt records surface additively.

🤖 Generated with Claude Code

Codestz and others added 2 commits June 14, 2026 14:20
Every @agentry/memory tool now returns one structured error envelope on a
genuine failure, marked as an MCP error (isError:true) instead of a silent
success flag — so a calling model that can only see the response can
self-correct.

What changed:
- @agentry/core: new MemoryError contract (zod) — { code, what, why, fix }
  over a closed code set (bad-input | not-found | invalid-state | internal |
  storage-read). Single source of truth, barrel-exported.
- memory/tools: err() serializer (sibling of ok()), a builder module
  (notFound/badInput/invalidState/storageRead/internal) as the one home for
  codes + prose, and shared adapter plumbing (guard/partial) in adapter.ts.
- memory-service: update/forget/recover/feedback/distill now return typed
  discriminated outcomes; adapters map them to ok/err. recover splits
  not-found vs invalid-state. Semantic bad-input (blank id) is an error.
- Partial tools (feedback, distill): all ids invalid -> error; some valid ->
  success reporting applied + skipped[{id,reason}].
- write surfaces a no-op supersedes as additive WriteResult.supersededMissing.
- Persistence read failures are surfaced (ports ReadResult channel; file-store
  stops swallowing corrupt records) and reported via memory_stats.readErrors.

Consistency is structural: one core type, one builder module, one serializer —
nothing per-tool to drift. internal() takes no error argument, so a stack/path
leak is structurally impossible.

Tests: 33 memory + 5 core pass. dist rebuilt in lockstep; check-plugin clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Generic improvements surfaced while running the MCP error-handling task
through /agentry:go (kept project-agnostic — shipped payload):

- Task contracts gain `excludes` (must-not-touch), the explicit negative of
  `owns` — enforces the boundary from both sides.
- Pin exact names on any surface a sibling consumes (the consumer codes
  against a frozen signature, not a paraphrase).
- New anti-pattern "Unverified-state assumption" — a task must verify repo
  state and adjust, never assert an unchecked "absent"/"present" claim.
- Tasks carry their leave-it-green verify/build steps, discovered from the
  repo (never hardcoded — the task names specifics, the skill stays generic).
- Architect emits the structured task frontmatter explicitly (don't bury
  owns/exposes/deps/satisfies in prose — keep coverage machine-checkable).
- Canonical work-folder layout: ADRs as a folder (adr/NNN-slug.md, a
  numbered append-only series), mirroring tasks/.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Codestz Codestz merged commit 693e3b4 into main Jun 14, 2026
@Codestz Codestz deleted the feat/memory-tool-error-handling branch June 14, 2026 19:29
Codestz added a commit that referenced this pull request Jun 14, 2026
The dist-lockstep check compared dist-vs-src mtimes, but git doesn't preserve
mtimes — a clone, checkout, or squash-merge rewrites them, producing a false
"dist is older than src" warning when the bundle was never actually stale
(seen twice after merging PRs #1 and #2). Repeated false warnings train
contributors to ignore the gate.

Fix: build.mjs stamps a SHA-256 of all src file contents into dist/.srchash;
check-plugin recomputes and compares. Checkout-proof, and it catches the real
failure (src edited but dist not rebuilt). Shared scripts/lib/src-hash.mjs so
the two sides can't drift.

Verified: touching a src file (mtime only) stays "up to date"; a content
change without rebuild reports "stale".

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codestz added a commit that referenced this pull request Jun 17, 2026
The selfeval routing probe measured systematic UNDER-routing (run #2: 3/7;
both decision-hiding traps and a multi-component build one-shotted) — the
conductor wasn't applying its own undecided-fork veto. Root cause: the rubric
stated escalation as a passive bias ("escalate only on evidence") with no
forcing function. Tune (generic, not teaching-to-test):
- rebalance Prime Directive #1: a silently-guessed fork ships fragile and
  invisible; "no reason to escalate" only counts if the check was run.
- add a MANDATORY pre-flight escalation checklist (hidden decision / real
  unknown / multiple components / irreversible) — a YES on any line forbids
  one-shot. Illustrations are generic, deliberately unlike the fixture tasks.

Re-measure pending (run #3). Same-fixture re-measure is suggestive, not proof.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codestz added a commit that referenced this pull request Jun 18, 2026
#2 — promote the self-eval harness into the monorepo: packages/{core,memory,eval}.
The eval is already self-contained (own package.json/tsconfig, import.meta-rooted
paths), so the move is mechanical:
- git mv selfeval → packages/eval; package name @agentry/selfeval → @agentry/eval
- tsconfig extends ../../tsconfig.base.json (matching core/memory)
- .gitignore: packages/eval/{runs,results}/
- README: reproduce block `cd packages/eval` + `--plugin-dir ../..`; layout under packages/
Verified from the new location: typecheck clean, 139 tests pass, report regenerates.

#1 — README retuned to read as a finished v0.1: dropped the "early-signal / N is
small / more measurements owed" hedging; presents the measured pillars + controls
confidently (numbers unchanged, accurate). Keeps the corrections-log trust story.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codestz added a commit that referenced this pull request Jun 21, 2026
Two liveness gaps the user hit while watching a run:

1. Works home wasn't live (task #1). It fetched /api/works once on mount with no ws
   subscription, and the server only pushed run-scoped `file-changed` to that run's
   host — so a brand-new .agentry/work/<run>/ folder reached nothing the bare-host
   home was listening for. Added a project-global `works-changed` WsMessage the server
   pushAlls on ANY work-tree change (pushed BEFORE the run-resolves guard, so brand-new
   runs count); Works subscribes and refetches (debounced, best-effort — a transient
   refetch failure keeps the shown list, never flips to error or flashes "loading").
   Verified live: new run folder → ws client receives works-changed.

2. The Live graph had no empty state (task #2). Panorama rendered a blank React Flow
   canvas for a fresh run with no artifacts. Added a calm "No work yet" card when the
   GraphModel has no nodes; the ws loop resolves it to the graph as soon as the agent
   writes the first artifact.

shared/web/server: WorksChangedMessage added to the union + barrel + ws parser. server
123 tests, web 95, check-plugin 10/10. Server restarted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant