Actionable, consistent error handling across the 11 memory MCP tools#1
Merged
Conversation
Every @agentry/memory tool now returns one structured error envelope on a
genuine failure, marked as an MCP error (isError:true) instead of a silent
success flag — so a calling model that can only see the response can
self-correct.
What changed:
- @agentry/core: new MemoryError contract (zod) — { code, what, why, fix }
over a closed code set (bad-input | not-found | invalid-state | internal |
storage-read). Single source of truth, barrel-exported.
- memory/tools: err() serializer (sibling of ok()), a builder module
(notFound/badInput/invalidState/storageRead/internal) as the one home for
codes + prose, and shared adapter plumbing (guard/partial) in adapter.ts.
- memory-service: update/forget/recover/feedback/distill now return typed
discriminated outcomes; adapters map them to ok/err. recover splits
not-found vs invalid-state. Semantic bad-input (blank id) is an error.
- Partial tools (feedback, distill): all ids invalid -> error; some valid ->
success reporting applied + skipped[{id,reason}].
- write surfaces a no-op supersedes as additive WriteResult.supersededMissing.
- Persistence read failures are surfaced (ports ReadResult channel; file-store
stops swallowing corrupt records) and reported via memory_stats.readErrors.
Consistency is structural: one core type, one builder module, one serializer —
nothing per-tool to drift. internal() takes no error argument, so a stack/path
leak is structurally impossible.
Tests: 33 memory + 5 core pass. dist rebuilt in lockstep; check-plugin clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Generic improvements surfaced while running the MCP error-handling task through /agentry:go (kept project-agnostic — shipped payload): - Task contracts gain `excludes` (must-not-touch), the explicit negative of `owns` — enforces the boundary from both sides. - Pin exact names on any surface a sibling consumes (the consumer codes against a frozen signature, not a paraphrase). - New anti-pattern "Unverified-state assumption" — a task must verify repo state and adjust, never assert an unchecked "absent"/"present" claim. - Tasks carry their leave-it-green verify/build steps, discovered from the repo (never hardcoded — the task names specifics, the skill stays generic). - Architect emits the structured task frontmatter explicitly (don't bury owns/exposes/deps/satisfies in prose — keep coverage machine-checkable). - Canonical work-folder layout: ADRs as a folder (adr/NNN-slug.md, a numbered append-only series), mirroring tasks/. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced Jun 14, 2026
Codestz
added a commit
that referenced
this pull request
Jun 14, 2026
The dist-lockstep check compared dist-vs-src mtimes, but git doesn't preserve mtimes — a clone, checkout, or squash-merge rewrites them, producing a false "dist is older than src" warning when the bundle was never actually stale (seen twice after merging PRs #1 and #2). Repeated false warnings train contributors to ignore the gate. Fix: build.mjs stamps a SHA-256 of all src file contents into dist/.srchash; check-plugin recomputes and compares. Checkout-proof, and it catches the real failure (src edited but dist not rebuilt). Shared scripts/lib/src-hash.mjs so the two sides can't drift. Verified: touching a src file (mtime only) stays "up to date"; a content change without rebuild reports "stale". Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codestz
added a commit
that referenced
this pull request
Jun 17, 2026
The selfeval routing probe measured systematic UNDER-routing (run #2: 3/7; both decision-hiding traps and a multi-component build one-shotted) — the conductor wasn't applying its own undecided-fork veto. Root cause: the rubric stated escalation as a passive bias ("escalate only on evidence") with no forcing function. Tune (generic, not teaching-to-test): - rebalance Prime Directive #1: a silently-guessed fork ships fragile and invisible; "no reason to escalate" only counts if the check was run. - add a MANDATORY pre-flight escalation checklist (hidden decision / real unknown / multiple components / irreversible) — a YES on any line forbids one-shot. Illustrations are generic, deliberately unlike the fixture tasks. Re-measure pending (run #3). Same-fixture re-measure is suggestive, not proof. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codestz
added a commit
that referenced
this pull request
Jun 18, 2026
#2 — promote the self-eval harness into the monorepo: packages/{core,memory,eval}. The eval is already self-contained (own package.json/tsconfig, import.meta-rooted paths), so the move is mechanical: - git mv selfeval → packages/eval; package name @agentry/selfeval → @agentry/eval - tsconfig extends ../../tsconfig.base.json (matching core/memory) - .gitignore: packages/eval/{runs,results}/ - README: reproduce block `cd packages/eval` + `--plugin-dir ../..`; layout under packages/ Verified from the new location: typecheck clean, 139 tests pass, report regenerates. #1 — README retuned to read as a finished v0.1: dropped the "early-signal / N is small / more measurements owed" hedging; presents the measured pillars + controls confidently (numbers unchanged, accurate). Keeps the corrections-log trust story. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codestz
added a commit
that referenced
this pull request
Jun 21, 2026
Two liveness gaps the user hit while watching a run: 1. Works home wasn't live (task #1). It fetched /api/works once on mount with no ws subscription, and the server only pushed run-scoped `file-changed` to that run's host — so a brand-new .agentry/work/<run>/ folder reached nothing the bare-host home was listening for. Added a project-global `works-changed` WsMessage the server pushAlls on ANY work-tree change (pushed BEFORE the run-resolves guard, so brand-new runs count); Works subscribes and refetches (debounced, best-effort — a transient refetch failure keeps the shown list, never flips to error or flashes "loading"). Verified live: new run folder → ws client receives works-changed. 2. The Live graph had no empty state (task #2). Panorama rendered a blank React Flow canvas for a fresh run with no artifacts. Added a calm "No work yet" card when the GraphModel has no nodes; the ws loop resolves it to the graph as soon as the agent writes the first artifact. shared/web/server: WorksChangedMessage added to the union + barrel + ws parser. server 123 tests, web 95, check-plugin 10/10. Server restarted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
A calling model uses the
@agentry/memoryMCP tools blind — it sees only the tool's response. Today, genuine failures hide inside success responses (memory_update→{updated:false},forget→{forgotten:false},recover→{recovered:false},feedback/distillsilently drop bad ids), there's no envelope telling the model what/why/how-to-fix, and "not found" is signalled 3+ incompatible ways. The model reads "success," moves on, and corrupts its own memory.This makes every failure across all 11 tools legible: a genuine failure is marked as an MCP error (
isError:true) carrying one structured envelope that says what went wrong, why, and how to fix the next call.Changes
@agentry/core: newMemoryErrorcontract (zod) —{ code, what, why, fix }over a closed code set:bad-input | not-found | invalid-state | internal | storage-read. Single source of truth, barrel-exported.memory/tools:err()serializer (sibling ofok()), a builder module (notFound/badInput/invalidState/storageRead/internal) as the one home for codes + prose, and shared adapter plumbing (guard/partial) inadapter.ts.memory-service:update/forget/recover/feedback/distillreturn typed discriminated outcomes; adapters map them took/err.recoversplits not-found vs invalid-state. Semantic bad-input (blank id) is an error.feedback,distill): all ids invalid → error; some valid → success reportingapplied+skipped[{id,reason}].writesurfaces a no-opsupersedesas additiveWriteResult.supersededMissing(the write still succeeds).ReadResultchannel;file-storestops swallowing corrupt records) and reported viamemory_stats.readErrors.Consistency is structural: one core type, one builder module, one serializer — nothing per-tool to drift.
internal()takes no error argument, so a stack/path leak is structurally impossible.Verification
dist/index.jsrebuilt in lockstep;node scripts/check-plugin.mjsclean (6/0).Follow-ups (not in this PR)
check-plugin.mjsdist-lockstep is an mtime check, not content — could pass on stale bytes; harden to a rebuild+byte-diff.memory_stats.readErrorsreserves the hardstorage-readenvelope for a blocking read; non-blocking corrupt records surface additively.🤖 Generated with Claude Code