MCP server plus Cursor hooks for governed workspace memory with token-budget packs: agents propose facts, you (or policy) confirm, then a bounded summary loads at session start—with preCompact nudges, post-tool reminders, and optional user-message triggers so long sessions still persist durable context.
Coding agents (Cursor / Claude / Copilot) forget everything between sessions. Every new chat means:
- The agent re-reads the same files to understand your project (5–20k wasted tokens per session).
- You re-explain the same architectural decisions, conventions, build commands, and known bugs.
- Long-running sessions silently lose context to summarization and the agent starts repeating mistakes you already fixed.
- "Just put it in
CLAUDE.md/AGENTS.md" doesn't scale: the file grows unbounded, eats context every turn, and has no provenance — you can't tell which lines are stale, who proposed them, or what actually got applied.
The shortcut of dumping everything into a markdown rules file works until it doesn't; you end up paying token rent on context the agent doesn't even need this turn.
A two-tier memory with a strict token budget and a human-in-the-loop gate:
- Confirmed store (
store.json) — small, curated, typed facts (decisions, fixes, APIs, build commands, conventions). Loaded at session start as a bounded markdown pack (default ≤1500 tokens). The agent proposes, you (or policy) confirm — nothing gets in by accident. - RAG layer (
rag.sqlite) — large chunks (chat distills, session capsules, design docs) stored compressed with embeddings. Pulled on demand viarag_query, ranked by hybrid score (embeddings + BM25-style lexical), returned already compressed within a token budget. Verbatim text only when explicitly fetched viarag_get_full.
Both layers respect a token budget on every read — you never pay for context the agent isn't using right now.
| Benefit | Mechanism | Example impact |
|---|---|---|
| Cross-session memory | sessionStart hook injects confirmed pack automatically | Critical facts (build cmd, security fix, race condition root cause) survive across all future Cursor windows. |
| Big token savings on context loads | RAG returns compressed, ranked snippets instead of file reads | A query that would cost ~3000–5000 tokens in Read calls cost 364 tokens of compressed RAG hits in this session (~85% reduction). |
| Bounded session-start cost | Pack hard-capped at max_tokens, deterministic ordering |
This repo's pack: 397 tok / 1500 budget = 26%, fits in any sessionStart without crowding out user prompt. |
| Compression for free | compress_candidate shrinks bodies to ~1.6× ratio |
37.8% size reduction on stored bodies; same retrieval quality. |
| No silent drift | propose → confirm gate, dedupe by hash + similarity, audit log | Every fact has a status, timestamp, and reason; you can prune with confidence. |
| Causal graph | Typed links (SOLVES, CAUSES, BUILDS_ON, …) + traverse BFS |
Ask "what fixed X?" and get back the commit that solved it, the build that depends on it, in one hop. |
| Local-first, private | Embeddings via Transformers.js (ONNX, ~25 MB model, downloaded once) | No API keys required. Optional: Ollama or OpenAI for embeddings. Your facts never leave the repo. |
| Workspace-scoped | Each project gets its own .memory-token/ directory |
No bleeding of facts between unrelated repos; gitignored by default but can be versioned. |
| Agent steering, not vibes | Hooks inject skill + policy + pack; skill is a decision tree the agent follows | Agents actually call propose/confirm/rag_query consistently instead of "maybe sometimes if they remember". |
Prevents the CLAUDE.md bloat trap |
Token-budget pack + RAG fallback | Old facts move to RAG (compressed, on-demand) instead of permanently inflating the rules file. |
- ✅ Multi-week projects where you reopen Cursor often and don't want to re-explain context.
- ✅ Codebases big enough that re-reading "to understand structure" costs noticeable tokens per session.
- ✅ Teams that want provenance and control over what the agent "remembers".
- ✅ Long debugging sessions where you want the root cause + fix to survive into next week's session.
Skip if your project is a one-off script or you don't run more than 1–2 chat sessions on the same codebase.
| Layer | Role |
|---|---|
Store (store.json) |
Typed memories (decision, fix, api, …), statuses (proposed → confirmed / rejected), links between memories, audit log. |
RAG (rag.sqlite) |
Chunked text with hybrid retrieval (embeddings + lexical). Compressed snippets in query results; verbatim text only via memory_token_rag_get_full. |
| MCP | Eighteen tools: pack, propose/confirm/reject, search, prune, link graph, export/import, RAG ingest/query/delete, stats, audit. |
| Hooks | Inject skill path + policy + pack on sessionStart; hints on beforeSubmitPrompt / preCompact / postToolUse. Hooks do not call the MCP (shell → Node CLIs only). Flow: hook → skill → MCP. |
Workspace root comes from MEMORY_TOKEN_WORKSPACE, CURSOR_PROJECT_DIR, or CLAUDE_PROJECT_DIR, else process.cwd() (see src/workspace.ts).
- Node.js ≥ 22.5.0 (uses
node:sqliteexperimental API)
No clone, no global install. Cursor pulls the package on demand:
Restart Cursor. The first call downloads the package (~52 KB tarball). The first semantic embedding call additionally downloads the local ONNX model (~25 MB) into <workspace>/.memory-token/transformers-cache/.
To use the hooks or the skill with this install method, copy them once from the installed package into your ~/.cursor/:
PKG=$(npm root -g 2>/dev/null)/@barbozaa/memory-token # or use `npm pack` to extract locally
cp -r "$PKG/.cursor/hooks" ~/.cursor/
cp "$PKG/.cursor/hooks.json" ~/.cursor/
cp -r "$PKG/.cursor/skills" ~/.cursor/
chmod +x ~/.cursor/hooks/*.shnpm install -g @barbozaa/memory-token{
"mcpServers": {
"memory-token": {
"command": "memory-token-mcp",
"env": { "MEMORY_TOKEN_WORKSPACE": "${workspaceFolder}" }
}
}
}git clone https://github.com/barbozaa/memory-token.git
cd memory-token
npm install
npm run build
chmod +x .cursor/hooks/*.sh
npm run smokePoint Cursor at dist/mcp/index.js with MEMORY_TOKEN_WORKSPACE=${workspaceFolder} (see Cursor MCP). Merge .cursor/hooks.json into your project if you use hooks elsewhere.
Use user-level config so you do not copy hooks into every repo:
~/.cursor/mcp.json— add thememory-tokenserver withMEMORY_TOKEN_WORKSPACE:${workspaceFolder}(each project still gets its own.memory-token/).~/.cursor/hooks.json+~/.cursor/hooks/memory-token/*.sh— wrappers that setMEMORY_TOKEN_CLI_ROOTto your clone and calldist/hook-*.js. See~/.cursor/hooks/memory-token/README.mdafter install.~/.cursor/skills/memory-token/SKILL.md— global skill; session hook setsMEMORY_TOKEN_SKILL_PATHso the banner points here.- User Rules — Cursor reads them from Settings → Rules, not from
~/.cursor/rules/. Use~/.cursor/rules/memory-token.mdconly as a reference to paste into User Rules, or add the same rule under each repo’s.cursor/rules/.
Restart Cursor after changing MCP or hooks. Re-run npm run build in the memory-token clone when you pull updates.
All under <workspace>/.memory-token/ (typically gitignored):
| Path | Contents |
|---|---|
store.json |
Memories, links[], audit[] |
rag.sqlite |
RAG chunks + vectors |
transformers-cache/ |
Downloaded ONNX model (first semantic embed run) |
Remove .memory-token/ from .gitignore if you want the store versioned.
src/
mcp/index.ts # MCP server + tool handlers
store.ts # JSON store + lockfile (O_EXCL) for concurrent writes
pack.ts # Token-budget pack builder (used by MCP + session hook)
types.ts # Memory types, links, audit shapes
session-policy.ts # Injected policy text for hooks
compress-lite.ts # Lightweight compression helpers
memory-dedupe.ts # Near-duplicate detection on propose
workspace.ts # Resolve workspace root from env
hook-session-start.ts # sessionStart payload
hook-post-tool-nudge.ts
hook-user-message.ts
hook-git-commit.ts # Optional git post-commit integration
mcp-nudge-messages.ts
rag/ # db, query, embed, embed-local, lexical, paths, compress-default
scripts/
smoke-mcp.mjs # MCP smoke (listTools, pack, propose, RAG, …)
metrics.mjs # Store/pack/link/RAG health report
install-git-hook.sh # Install post-commit hook in another repo
.cursor/
hooks.json # Cursor hook wiring
hooks/*.sh # Shell wrappers → dist/*.js
skills/memory-token/SKILL.md # Agent workflow (read at session start)
rules/memory-token.mdc # Optional Cursor rules
For npx or global installs see Install. The block below is for the from-source workflow:
"memory-token": {
"command": "node",
"args": [
"--disable-warning=ExperimentalWarning",
"/ABS/PATH/TO/memory-token/dist/mcp/index.js"
],
"env": {
"MEMORY_TOKEN_WORKSPACE": "${workspaceFolder}"
}
}Replace /ABS/PATH/TO/memory-token with this repo’s path. Restart Cursor after edits.
CLI / other clients: npm run mcp runs the server with cwd as workspace unless env overrides. memory-token-mcp is the package bin entry (same script).
Shipped in .cursor/hooks.json:
| Event | Script | Purpose |
|---|---|---|
sessionStart |
.cursor/hooks/session-memory.sh |
Skill banner (MEMORY_TOKEN_SKILL_PATH or .cursor/skills/memory-token/SKILL.md), policy cheat sheet, token-capped confirmed pack. |
preCompact |
.cursor/hooks/precompact-memory-hint.sh |
Reminder to rag_ingest / propose / confirm before context compaction. |
postToolUse |
.cursor/hooks/post-tool-mcp-nudge.sh |
Every MEMORY_TOKEN_NUDGE_EVERY tool calls (default 10), short [memory-token] nudge (skips tools already named like memory_token_*). |
beforeSubmitPrompt |
.cursor/hooks/user-message-triggers.sh (matcher: UserPromptSubmit) |
Same as legacy “user send” nudge: phrases like “remember this”, “root cause”, long paste → additional_context for propose / rag_ingest. |
Hook env (optional)
| Variable | Default | Meaning |
|---|---|---|
MEMORY_TOKEN_SESSION_MAX_TOKENS |
2400 |
Total rough budget for policy + pack in session hook (char/4 estimate). |
MEMORY_TOKEN_NUDGE_EVERY |
10 |
Post-tool nudge interval. |
MEMORY_TOKEN_CLI_ROOT |
auto | Absolute path to this repo when hooks live in another project (so dist/hook-*.js resolve). |
MEMORY_TOKEN_SKILL_PATH |
— | Custom SKILL.md path. |
| Tool | Purpose |
|---|---|
memory_token_get_context_pack |
Confirmed memories within max_tokens; optional query / tags rerank. |
memory_token_propose |
Create proposed memory (type, importance, optional body_compressed, force to bypass dedupe). |
memory_token_confirm / memory_token_reject |
Promote or drop proposals. |
memory_token_compress_candidate |
Deterministic squeeze for body_compressed. |
memory_token_search |
Substring search over memories. |
memory_token_list_audit |
Recent audit entries. |
memory_token_prune |
Remove old/matching proposed (and optionally confirmed) rows; dry_run preview. |
memory_token_link / memory_token_unlink |
Directed edges between memories (relation_type, optional bidirectional). |
memory_token_traverse |
BFS from a memory id over the link graph. |
memory_token_stats |
Workspace counts / health-style summary. |
memory_token_export / memory_token_import |
JSON backup / import (include_rag on export). |
memory_token_rag_ingest |
Store chunk (text_raw, optional text_compressed, source); dedupe by hash of raw. |
memory_token_rag_query |
Ranked compressed chunks within token budget (hybrid score). |
memory_token_rag_get_full |
Verbatim text_raw for an id (use before quoting or patching from RAG hits). |
memory_token_rag_delete |
Remove a chunk. |
Recommended link relation strings include SOLVES, CAUSES, BUILDS_ON, CONTRADICTS, SUPERSEDES, BLOCKS, REQUIRES, ALTERNATIVE_TO, RELATED_TO (see src/types.ts).
Default: local embeddings via @xenova/transformers + SQLite vectors (model Xenova/all-MiniLM-L6-v2). First run downloads into .memory-token/transformers-cache/ (needs network once).
| Env | Effect |
|---|---|
MEMORY_TOKEN_LOCAL_EMBED_MODEL |
Override Hugging Face model id (ONNX / Transformers.js). |
MEMORY_TOKEN_LOCAL_EMBED_THREADS |
ONNX WASM threads (default 2). |
MEMORY_TOKEN_NO_LOCAL_EMBED=1 |
Skip local embedder → lexical unless Ollama/OpenAI configured. |
MEMORY_TOKEN_OLLAMA_EMBED_MODEL |
Ollama embeddings. |
MEMORY_TOKEN_OLLAMA_URL |
Ollama base URL (default http://127.0.0.1:11434). |
OPENAI_API_KEY / MEMORY_TOKEN_OPENAI_API_KEY |
Optional cloud embeddings. |
This is not ChromaDB; the DB format is project-specific.
| Script | Command | Purpose |
|---|---|---|
build |
tsc |
Compile src/ → dist/. |
mcp |
node … dist/mcp/index.js |
Run MCP stdio server. |
smoke |
node … scripts/smoke-mcp.mjs |
End-to-end MCP smoke against repo root. |
metrics |
node … scripts/metrics.mjs |
Human or --json report: store, pack usage, links, RAG, audit. Optional --root /path/to/workspace. |
Install into another git repo:
./scripts/install-git-hook.sh /path/to/target/repoUses MEMORY_TOKEN_CLI_ROOT if the memory-token repo is not next to the script. On each commit, runs dist/hook-git-commit.js with MEMORY_TOKEN_WORKSPACE set to the target repo root (see scripts/install-git-hook.sh).
Stock VS Code has no Cursor-style sessionStart / postToolUse hooks. Options: run node dist/hook-session-start.js with MEMORY_TOKEN_WORKSPACE set, use a task, or an extension that exposes similar lifecycle events.
Kiro IDE: full install (MCP, steering, hooks, spec-task ideas, Cursor parity table) is in KIRO_SETUP.md. Example hook YAML lives under .kiro/hooks/.
- Read
.cursor/skills/memory-token/SKILL.md(injected path on session start). - Call
memory_token_get_context_packearly; usememory_token_rag_querybefore reading many files or huge pasted context. - Treat RAG snippets as non-verbatim until
memory_token_rag_get_full. - Persist stable facts with propose → confirm; link related memories when useful; prune junk or stale proposals.
MIT © 2026 barbozaa