Memory Bank v0 — Backend (den-db + den-api + agent prompt + eval)#2436
Conversation
Stage 1 of Memory Bank v0 (TASK-1). - Register `memory`/`memctx` typeid prefixes in packages/utils [B5] - Add `memory` + `memory_context` tables. No DB-level FK (den-db avoids FK constraints per PlanetScale/Vitess convention); the memory->context cascade is app-enforced in the Stage 2 delete handler, backed by an index on memory_id. - Migration 0028 (tables + indexes) with a hand-appended CREATE FULLTEXT INDEX for the migrate path (Drizzle mysql-core has no FULLTEXT DSL, drizzle-orm#1495). - ensureMemoryFulltextIndex / assertMemoryFulltextIndexExists + a bootstrap hook, so the FULLTEXT index is created idempotently on BOTH the fresh-install (drizzle-kit push + baseline, which drops migration-only indexes) and the migrate paths [B2]. - verify-memory-schema.ts: proof harness (schema, FULLTEXT on both paths, NL MATCH, typeid round-trip, explicit cascade, idempotency). - docs/memory-bank-architecture.md (authoritative architecture). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Zero-findings pass after a 5-lens review of the Stage 1 schema commit. - mig-0: run the idempotent FULLTEXT ensure on ALL apply paths, not just bootstrap. `db:migrate` and `db:push` now chain `ensure-fulltext-indexes.ts`, so a `db:migrate`-only run on an already-baselined DB (the prod path in den-db-migrate.yml, and the local db:push dev path) still creates the index. - arch-0: expose the proof harness as `db:verify-memory` (was only runnable via a raw node invocation, weakening its CI-gate claim). - arch-1: single `ensureFulltextIndexes()` seam in fulltext.ts called by both apply paths, so they cannot drift; bootstrap log is now accurate. - arch-2: extract createExecutor/Executor into scripts/db-executor.ts, shared by bootstrap.ts and the ensure runner (removes memory-scoped triplication). - ts-0/int-0: roll back before best-effort cleanup in verify-memory-schema.ts so a mid-transaction failure cannot leak marker rows. - ts-1: make isRecord file-private (dead export surface). Re-proven against MySQL 8.4: bootstrap, db:push, and the db:migrate chain each create the FULLTEXT index; second ensure is a no-op; db:verify-memory passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…SK-3) Stage 2 of Memory Bank v0. Routes (routes/memory): postMemory, getMemorySearch, getMemory (list), deleteMemoryById — all owner-scoped on user_id as the PRIMARY predicate (NOT the org-scoped worker pattern) [B4]: - getMemoryByIdForUser + 404 (non-leaking) for ids the caller does not own. - Flattened, mostly-optional save payload; shape encoded in the OpenAPI summary; server sets source and forces scope='user' (ignores any client scope) [B3]. - One-transaction save (memory + N context rows); input bounds on content/tags/ contexts/query/limit. - Bound MATCH(content) AGAINST(:q IN NATURAL LANGUAGE MODE) combined with user_id in a single and(); never sql.raw. Empty result set -> 200 (not an error). - Explicit memory->context cascade in the delete handler (den-db has no DB FKs). - Structured save/search/delete events (JSON log lines) for measurement/debugging. Guarded with authenticatedRoute(); org id read from the session (works for the MCP internal-principal path without re-resolving org membership). MCP (TASK-3): "Memory" added to SAFE_INCLUDED_TAGS; routes tagged ["Memory"], so the four capabilities surface via search_capabilities / execute_capability. The operationIds are exactly postMemory/getMemorySearch/getMemory/deleteMemoryById (never memory_save/memory_search). Tests (bun): test/memory-routes.test.ts (operationIds, MCP allow+scopes, payload bounds, scope/source not client-settable) and test/memory-ownership.test.ts — the cross-user IDOR regression merge gate [B4] proving save -> Bob 404 -> Alice recall -> cascade delete against the real app + MySQL. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-1, ts-1/2/3)
Zero-findings pass after a 5-lens review (security · perf · TS · integrity · arch).
- int-1/arch-2: save now sets created_at/updated_at explicitly to a single `now` and
returns that same row via toMemoryResponse, so the 201 body is DB-authoritative
(matches a later getMemory/getMemorySearch) — MySQL has no INSERT ... RETURNING.
- conv-1: one `const row = {...} satisfies typeof MemoryTable.$inferInsert` is the
single source for both the insert and the response — they cannot drift.
- int-3: context rows share the same `now`, so a memory and its provenance carry one
timestamp; memory.created_at is the app-clock ordering authority (documented).
- ts-1: POST response routed through toMemoryResponse (was an inline literal).
- ts-2: DELETE 204 uses emptyResponse(); ts-3: dropped dead schema exports.
- sec-1: comment — reads are user-scoped, intentionally org-spanning (§8).
- arch-1: comment — org id from the resolved session; desktop hydrates via /me/orgs.
- int-2: architecture §9 records account/org-deletion memory reaping as deferred pre-GA.
Accepted-in-writing (no code change): sec-2 (session ids already normalized → can't
throw), perf-1 (inherent MySQL fulltext post-filter), perf-2 (composite index deferred,
small v0 corpus), ts-4/arch-3 (intentional, required by the mapper).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Append a static, distinct `## Memory Bank` section to OPENWORK_AGENT_PROMPT (separate from the credential-hygiene `## Memory` section), injected unconditionally. Search-first [B1]: reach memory only by discovering a capability via search_capabilities then running it via execute_capability — never a bespoke memory_save/memory_search tool; it is NOT a local file. Covers draft → human-confirm/edit → execute save, explicit lexical recall (no auto-recall), a list/delete manage note, and no-secrets guidance (the only v0 plaintext-at-rest mitigation) extended to cited snippets. Also corrects the capabilities-knowledge plugin, which previously told the agent there was "no separate long-term memory store" — it now points at the Memory Bank capability. Test: openwork-runtime-config.test.ts asserts the section is present, distinct from `## Memory`, names search_capabilities/execute_capability, contains neither memory_save nor memory_search, and includes no-secrets guidance (4 pass). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirror mcp-search-capabilities.flow.mjs. Proves, on the live meta-MCP + a signed-in desktop app, that the agent discovers the memory-save capability search-first (no memory_save tool), execute_capability persists a scope='user' memory, a FRESH mcp call recalls it by natural-language query (cross-session lexical recall), and the desktop Memory panel shows it. Fails loudly on B1 (search-first) and B3 (flattened save body) regressions. Proven end-to-end against the running stack: postMemory discovered search-first (hasBody:true) → saved mem_… (scope forced user) → recalled via "Acme renewal deal" (FULLTEXT score 0.18) → deleted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
MCP clients (LLMs) frequently pass `path`/`query` as a JSON-encoded string
rather than an object, which failed execute_capability input validation
("expected record, received string") — the same hazard normalizeToolBody
already guarded for the body. Loosen the path/query schema to accept a string
and normalize it via a new normalizeToolRecord (mirrors normalizeToolBody), so
stringified args like `"{\"q\":\"acme\"}"` work. Unblocks getMemorySearch recall
via the agent. General improvement — applies to every capability, not just memory.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
@jthorare is attempting to deploy a commit to the Different AI Team on Vercel. A member of the Team first needs to authorize it. |
…rgMemberRoute) POST /v1/memory previously took org_id from session.activeOrganizationId with no membership check, so a stale/forged active org (or MCP principal) could write a memory tagged to an org the user isn't in. Switch the save guard from authenticatedRoute() to orgMemberRoute(): resolveOrganizationContextMiddleware resolves the active org AND verifies membership via getOrganizationContextForUser (404 otherwise) on both the cookie/bearer and MCP-principal paths. org_id now comes from the verified organizationContext.organization.id. - 400 organization_unavailable → the middleware returns 404 organization_not_found for a non-member / no-active-org caller (before the handler runs). - Reads (search/list) and delete stay authenticatedRoute() — user_id-scoped and cross-org by design (v0 recalls across all of the caller's orgs; org_id is only recorded for the future org-shared bank, §8). - memory-ownership.test.ts seeds real org + membership rows (required by the gate) and adds a non-member save → 404 assertion. Proven live via the meta-MCP (member save→recall still works) and by the ownership integration test (5 pass: member 201, non-member 404, IDOR 404, cascade delete). den-api tsc: 0 errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
benjaminshafii
left a comment
There was a problem hiding this comment.
Validated end-to-end on a combined worktree (pr-2436 + pr-2437 merged into dev — clean merge, no conflicts).
Tests run (all pass)
bun test test/memory-routes.test.ts test/memory-ownership.test.ts test/mcp-invoke-body.test.ts test/route-access-policy.test.ts→ 19 pass, 0 fail (against real MySQL 8.4openwork_test, incl. the cross-user IDOR suite)pnpm run db:verify-memory→ 10/10 checks (FULLTEXT present, NL MATCH, cascade, idempotent ensure)apps/serverbun test src/openwork-runtime-config.test.ts→ 4 pass; tsc clean- Live e2e via
pnpm fraimz --flow memory-save-recall --stack den→ PASSED, all 10 steps (search-first discovery, save via execute_capability with scope forced touser, cross-session NL recall, desktop panel shows the memory, cleanup via delete capability). fraimz.html + validated screenshot produced.
One real finding — the committed eval flow fails as-is:
evals/flows/memory-save-recall.flow.mjs mints its MCP token with body: JSON.stringify({}). Since #2280 (already in this branch's merge-base), default token scopes are read-only, so the save step deterministically fails with insufficient_mcp_scope: mcp:write. The product path is unaffected (the desktop mints with ["mcp:read","mcp:write"], apps/app/src/app/lib/den.ts:2009). One-line fix:
body: JSON.stringify({ scopes: ["mcp:read", "mcp:write"] }),With that fix the whole flow passes. Please apply before merge.
Code review — real production code is ~850 lines (82% of the diff is the generated drizzle 0028 snapshot); follows existing den-api/den-db route + schema conventions (routes/org/ layout, orgMemberRoute, no DB FKs per Vitess convention); no any/as; owner scoping on user_id everywhere, membership-gated save, non-leaking 404s, parameterized MATCH…AGAINST. The mcp/invoke.ts stringified-args fix mirrors the existing normalizeToolBody guard and fails safe. Two trivial nits: the unrelated .gitignore .playwright-mcp/ line, and verify-memory-schema.ts isn't referenced by CI (wire it in or drop it eventually).
Default token scopes are read-only since different-ai#2280, so the save step failed with insufficient_mcp_scope. Mirrors the desktop client's mint (den.ts).
Memory Bank v0 — Backend (den-db + den-api + agent prompt + eval)
The per-user Memory Bank server side: schema, owner-scoped REST routes surfaced through the existing meta-MCP, the agent prompt that primes search-first save/recall, a save→recall eval, and a general meta-MCP robustness fix. Frontend (desktop UI) is a separate PR.
Authoritative design:
docs/memory-bank-architecture.md.What's here
Schema (den-db) [TASK-1] —
memory+memory_contexttables;memory/memctxtypeids registered inpackages/utils; migration0028. FULLTEXT index onmemory.contentcreated idempotently on all apply paths (bootstrap,db:push,db:migratechain) viaensureMemoryFulltextIndex— becausedrizzle-kit push+ baseline silently drop a migration-only FULLTEXT index on fresh installs [B2]. No DB-level FK (PlanetScale/Vitess convention); the memory→context cascade is app-enforced.Routes (den-api) [TASK-2/3] —
postMemory/getMemorySearch/getMemory/deleteMemoryById. Reads (search/list) and delete are owner-scoped onuser_id(not the org pattern) with non-leaking 404 [B4] — v0 recalls a user's memories across all of their orgs. Save usesorgMemberRoute(): the memory is only written to an org the caller is verified to be a member of (resolveOrganizationContextMiddleware→getOrganizationContextForUser, 404 otherwise, on both the cookie/bearer and MCP-principal paths);org_idcomes from the verifiedorganizationContext.organization.id, never a raw/forgeablesession.activeOrganizationId. Also: flattened, mostly-optional save payload with the shape in the OpenAPI summary, server-setsource, forcedscope='user', one-transaction save, input bounds [B3]; boundMATCH … AGAINSTNL search (neversql.raw), empty→200; structured save/search/delete events."Memory"added toSAFE_INCLUDED_TAGSso the four capabilities surface viasearch_capabilities/execute_capability.Agent prompt (apps/server) [TASK-4] — static, search-first
## Memory Banksection (never namesmemory_save); draft→human-confirm→execute save, explicit lexical recall, no-secrets guidance (the only v0 plaintext-at-rest mitigation). Also corrects the capabilities-knowledge plugin that previously said there was "no separate memory store."Eval (evals) [TASK-9] —
memory-save-recall.flow.mjs: discover→save→recall in a fresh session via the live meta-MCP.Meta-MCP fix (den-api) —
execute_capabilitynow acceptspath/querypassed as a JSON string (LLMs stringify args), mirroring the existingnormalizeToolBody. Fixes a real-32602 "expected record, received string"and unblocksgetMemorySearchrecall. General — applies to every capability.Tests run — all pass
fraimz / end-to-end proof
The save→recall flow was proven live end-to-end against the running stack (den-api + MySQL + a real MCP token):
evals/flows/memory-save-recall.flow.mjsdrives this via the eval runner (pnpm fraimz --flow memory-save-recall, needs a signed-in app +--stack den). Backend-only PR — no desktop UI surface here.Review
Every stage driven to zero findings at every severity via multi-lens reviews (security/IDOR · data-migration · data-integrity · TypeScript · architecture · performance), each with an address-and-re-review round. Accepted-in-writing LOWs documented in code/§9.
Coding guidelines
No
any/as/typecasts (only requiredas conston enum tuples + onesatisfies); pnpm/bun; additive; smallest diff.🤖 Generated with Claude Code