memory-token

MCP server plus Cursor hooks for governed workspace memory with token-budget packs: agents propose facts, you (or policy) confirm, then a bounded summary loads at session start—with preCompact nudges, post-tool reminders, and optional user-message triggers so long sessions still persist durable context.

The problem this solves

Coding agents (Cursor / Claude / Copilot) forget everything between sessions. Every new chat means:

The agent re-reads the same files to understand your project (5–20k wasted tokens per session).
You re-explain the same architectural decisions, conventions, build commands, and known bugs.
Long-running sessions silently lose context to summarization and the agent starts repeating mistakes you already fixed.
"Just put it in CLAUDE.md / AGENTS.md" doesn't scale: the file grows unbounded, eats context every turn, and has no provenance — you can't tell which lines are stale, who proposed them, or what actually got applied.

The shortcut of dumping everything into a markdown rules file works until it doesn't; you end up paying token rent on context the agent doesn't even need this turn.

How memory-token solves it

A two-tier memory with a strict token budget and a human-in-the-loop gate:

Confirmed store (store.json) — small, curated, typed facts (decisions, fixes, APIs, build commands, conventions). Loaded at session start as a bounded markdown pack (default ≤1500 tokens). The agent proposes, you (or policy) confirm — nothing gets in by accident.
RAG layer (rag.sqlite) — large chunks (chat distills, session capsules, design docs) stored compressed with embeddings. Pulled on demand via rag_query, ranked by hybrid score (embeddings + BM25-style lexical), returned already compressed within a token budget. Verbatim text only when explicitly fetched via rag_get_full.

Both layers respect a token budget on every read — you never pay for context the agent isn't using right now.

Benefits (measured on this repo)

Benefit	Mechanism	Example impact
Cross-session memory	sessionStart hook injects confirmed pack automatically	Critical facts (build cmd, security fix, race condition root cause) survive across all future Cursor windows.
Big token savings on context loads	RAG returns compressed, ranked snippets instead of file reads	A query that would cost ~3000–5000 tokens in `Read` calls cost 364 tokens of compressed RAG hits in this session (~85% reduction).
Bounded session-start cost	Pack hard-capped at `max_tokens`, deterministic ordering	This repo's pack: 397 tok / 1500 budget = 26%, fits in any sessionStart without crowding out user prompt.
Compression for free	`compress_candidate` shrinks bodies to ~1.6× ratio	37.8% size reduction on stored bodies; same retrieval quality.
No silent drift	propose → confirm gate, dedupe by hash + similarity, audit log	Every fact has a status, timestamp, and reason; you can `prune` with confidence.
Causal graph	Typed links (`SOLVES`, `CAUSES`, `BUILDS_ON`, …) + `traverse` BFS	Ask "what fixed X?" and get back the commit that solved it, the build that depends on it, in one hop.
Local-first, private	Embeddings via Transformers.js (ONNX, ~25 MB model, downloaded once)	No API keys required. Optional: Ollama or OpenAI for embeddings. Your facts never leave the repo.
Workspace-scoped	Each project gets its own `.memory-token/` directory	No bleeding of facts between unrelated repos; gitignored by default but can be versioned.
Agent steering, not vibes	Hooks inject skill + policy + pack; skill is a decision tree the agent follows	Agents actually call `propose`/`confirm`/`rag_query` consistently instead of "maybe sometimes if they remember".
Prevents the `CLAUDE.md` bloat trap	Token-budget pack + RAG fallback	Old facts move to RAG (compressed, on-demand) instead of permanently inflating the rules file.

When to use it

✅ Multi-week projects where you reopen Cursor often and don't want to re-explain context.
✅ Codebases big enough that re-reading "to understand structure" costs noticeable tokens per session.
✅ Teams that want provenance and control over what the agent "remembers".
✅ Long debugging sessions where you want the root cause + fix to survive into next week's session.

Skip if your project is a one-off script or you don't run more than 1–2 chat sessions on the same codebase.

What it does

Layer	Role
Store (`store.json`)	Typed memories (`decision`, `fix`, `api`, …), statuses (`proposed` → `confirmed` / `rejected`), links between memories, audit log.
RAG (`rag.sqlite`)	Chunked text with hybrid retrieval (embeddings + lexical). Compressed snippets in query results; verbatim text only via `memory_token_rag_get_full`.
MCP	Eighteen tools: pack, propose/confirm/reject, search, prune, link graph, export/import, RAG ingest/query/delete, stats, audit.
Hooks	Inject skill path + policy + pack on sessionStart; hints on beforeSubmitPrompt / preCompact / postToolUse. Hooks do not call the MCP (shell → Node CLIs only). Flow: hook → skill → MCP.

Workspace root comes from MEMORY_TOKEN_WORKSPACE, CURSOR_PROJECT_DIR, or CLAUDE_PROJECT_DIR, else process.cwd() (see src/workspace.ts).

Requirements

Node.js ≥ 22.5.0 (uses node:sqlite experimental API)

Install

Option A — `npx` (zero-install, recommended)

No clone, no global install. Cursor pulls the package on demand:

// ~/.cursor/mcp.json
{
  "mcpServers": {
    "memory-token": {
      "command": "npx",
      "args": ["-y", "@barbozaa/memory-token"],
      "env": { "MEMORY_TOKEN_WORKSPACE": "${workspaceFolder}" }
    }
  }
}

Restart Cursor. The first call downloads the package (~52 KB tarball). The first semantic embedding call additionally downloads the local ONNX model (~25 MB) into <workspace>/.memory-token/transformers-cache/.

To use the hooks or the skill with this install method, copy them once from the installed package into your ~/.cursor/:

PKG=$(npm root -g 2>/dev/null)/@barbozaa/memory-token   # or use `npm pack` to extract locally
cp -r "$PKG/.cursor/hooks"  ~/.cursor/
cp    "$PKG/.cursor/hooks.json" ~/.cursor/
cp -r "$PKG/.cursor/skills" ~/.cursor/
chmod +x ~/.cursor/hooks/*.sh

Option B — Global install

npm install -g @barbozaa/memory-token

{
  "mcpServers": {
    "memory-token": {
      "command": "memory-token-mcp",
      "env": { "MEMORY_TOKEN_WORKSPACE": "${workspaceFolder}" }
    }
  }
}

Option C — From source (development)

git clone https://github.com/barbozaa/memory-token.git
cd memory-token
npm install
npm run build
chmod +x .cursor/hooks/*.sh
npm run smoke

Point Cursor at dist/mcp/index.js with MEMORY_TOKEN_WORKSPACE=${workspaceFolder} (see Cursor MCP). Merge .cursor/hooks.json into your project if you use hooks elsewhere.

Global Cursor install (all new windows / workspaces)

Use user-level config so you do not copy hooks into every repo:

~/.cursor/mcp.json — add the memory-token server with MEMORY_TOKEN_WORKSPACE: ${workspaceFolder} (each project still gets its own .memory-token/).
~/.cursor/hooks.json + ~/.cursor/hooks/memory-token/*.sh — wrappers that set MEMORY_TOKEN_CLI_ROOT to your clone and call dist/hook-*.js. See ~/.cursor/hooks/memory-token/README.md after install.
~/.cursor/skills/memory-token/SKILL.md — global skill; session hook sets MEMORY_TOKEN_SKILL_PATH so the banner points here.
User Rules — Cursor reads them from Settings → Rules, not from ~/.cursor/rules/. Use ~/.cursor/rules/memory-token.mdc only as a reference to paste into User Rules, or add the same rule under each repo’s .cursor/rules/.

Restart Cursor after changing MCP or hooks. Re-run npm run build in the memory-token clone when you pull updates.

Data locations

All under <workspace>/.memory-token/ (typically gitignored):

Path	Contents
`store.json`	Memories, `links[]`, `audit[]`
`rag.sqlite`	RAG chunks + vectors
`transformers-cache/`	Downloaded ONNX model (first semantic embed run)

Remove .memory-token/ from .gitignore if you want the store versioned.

Repository layout

src/
  mcp/index.ts          # MCP server + tool handlers
  store.ts              # JSON store + lockfile (O_EXCL) for concurrent writes
  pack.ts               # Token-budget pack builder (used by MCP + session hook)
  types.ts              # Memory types, links, audit shapes
  session-policy.ts     # Injected policy text for hooks
  compress-lite.ts      # Lightweight compression helpers
  memory-dedupe.ts      # Near-duplicate detection on propose
  workspace.ts          # Resolve workspace root from env
  hook-session-start.ts # sessionStart payload
  hook-post-tool-nudge.ts
  hook-user-message.ts
  hook-git-commit.ts    # Optional git post-commit integration
  mcp-nudge-messages.ts
  rag/                  # db, query, embed, embed-local, lexical, paths, compress-default
scripts/
  smoke-mcp.mjs         # MCP smoke (listTools, pack, propose, RAG, …)
  metrics.mjs         # Store/pack/link/RAG health report
  install-git-hook.sh # Install post-commit hook in another repo
.cursor/
  hooks.json            # Cursor hook wiring
  hooks/*.sh            # Shell wrappers → dist/*.js
  skills/memory-token/SKILL.md   # Agent workflow (read at session start)
  rules/memory-token.mdc         # Optional Cursor rules

Cursor MCP

For npx or global installs see Install. The block below is for the from-source workflow:

"memory-token": {
  "command": "node",
  "args": [
    "--disable-warning=ExperimentalWarning",
    "/ABS/PATH/TO/memory-token/dist/mcp/index.js"
  ],
  "env": {
    "MEMORY_TOKEN_WORKSPACE": "${workspaceFolder}"
  }
}

Replace /ABS/PATH/TO/memory-token with this repo’s path. Restart Cursor after edits.

CLI / other clients: npm run mcp runs the server with cwd as workspace unless env overrides. memory-token-mcp is the package bin entry (same script).

Cursor hooks

Shipped in .cursor/hooks.json:

Event	Script	Purpose
`sessionStart`	`.cursor/hooks/session-memory.sh`	Skill banner (`MEMORY_TOKEN_SKILL_PATH` or `.cursor/skills/memory-token/SKILL.md`), policy cheat sheet, token-capped confirmed pack.
`preCompact`	`.cursor/hooks/precompact-memory-hint.sh`	Reminder to rag_ingest / propose / confirm before context compaction.
`postToolUse`	`.cursor/hooks/post-tool-mcp-nudge.sh`	Every `MEMORY_TOKEN_NUDGE_EVERY` tool calls (default 10), short `[memory-token]` nudge (skips tools already named like `memory_token_*`).
`beforeSubmitPrompt`	`.cursor/hooks/user-message-triggers.sh` (`matcher`: `UserPromptSubmit`)	Same as legacy “user send” nudge: phrases like “remember this”, “root cause”, long paste → `additional_context` for propose / rag_ingest.

Hook env (optional)

Variable	Default	Meaning
`MEMORY_TOKEN_SESSION_MAX_TOKENS`	`2400`	Total rough budget for policy + pack in session hook (char/4 estimate).
`MEMORY_TOKEN_NUDGE_EVERY`	`10`	Post-tool nudge interval.
`MEMORY_TOKEN_CLI_ROOT`	auto	Absolute path to this repo when hooks live in another project (so `dist/hook-*.js` resolve).
`MEMORY_TOKEN_SKILL_PATH`	—	Custom `SKILL.md` path.

MCP tools (full list)

Tool	Purpose
`memory_token_get_context_pack`	Confirmed memories within `max_tokens`; optional `query` / `tags` rerank.
`memory_token_propose`	Create proposed memory (`type`, `importance`, optional `body_compressed`, `force` to bypass dedupe).
`memory_token_confirm` / `memory_token_reject`	Promote or drop proposals.
`memory_token_compress_candidate`	Deterministic squeeze for `body_compressed`.
`memory_token_search`	Substring search over memories.
`memory_token_list_audit`	Recent audit entries.
`memory_token_prune`	Remove old/matching proposed (and optionally confirmed) rows; `dry_run` preview.
`memory_token_link` / `memory_token_unlink`	Directed edges between memories (`relation_type`, optional bidirectional).
`memory_token_traverse`	BFS from a memory id over the link graph.
`memory_token_stats`	Workspace counts / health-style summary.
`memory_token_export` / `memory_token_import`	JSON backup / import (`include_rag` on export).
`memory_token_rag_ingest`	Store chunk (`text_raw`, optional `text_compressed`, `source`); dedupe by hash of raw.
`memory_token_rag_query`	Ranked compressed chunks within token budget (hybrid score).
`memory_token_rag_get_full`	Verbatim `text_raw` for an id (use before quoting or patching from RAG hits).
`memory_token_rag_delete`	Remove a chunk.

Recommended link relation strings include SOLVES, CAUSES, BUILDS_ON, CONTRADICTS, SUPERSEDES, BLOCKS, REQUIRES, ALTERNATIVE_TO, RELATED_TO (see src/types.ts).

RAG embeddings (local first)

Default: local embeddings via @xenova/transformers + SQLite vectors (model Xenova/all-MiniLM-L6-v2). First run downloads into .memory-token/transformers-cache/ (needs network once).

Env	Effect
`MEMORY_TOKEN_LOCAL_EMBED_MODEL`	Override Hugging Face model id (ONNX / Transformers.js).
`MEMORY_TOKEN_LOCAL_EMBED_THREADS`	ONNX WASM threads (default `2`).
`MEMORY_TOKEN_NO_LOCAL_EMBED=1`	Skip local embedder → lexical unless Ollama/OpenAI configured.
`MEMORY_TOKEN_OLLAMA_EMBED_MODEL`	Ollama embeddings.
`MEMORY_TOKEN_OLLAMA_URL`	Ollama base URL (default `http://127.0.0.1:11434`).
`OPENAI_API_KEY` / `MEMORY_TOKEN_OPENAI_API_KEY`	Optional cloud embeddings.

This is not ChromaDB; the DB format is project-specific.

npm scripts

Script	Command	Purpose
`build`	`tsc`	Compile `src/` → `dist/`.
`mcp`	`node … dist/mcp/index.js`	Run MCP stdio server.
`smoke`	`node … scripts/smoke-mcp.mjs`	End-to-end MCP smoke against repo root.
`metrics`	`node … scripts/metrics.mjs`	Human or `--json` report: store, pack usage, links, RAG, audit. Optional `--root /path/to/workspace`.

Optional git post-commit hook

Install into another git repo:

./scripts/install-git-hook.sh /path/to/target/repo

Uses MEMORY_TOKEN_CLI_ROOT if the memory-token repo is not next to the script. On each commit, runs dist/hook-git-commit.js with MEMORY_TOKEN_WORKSPACE set to the target repo root (see scripts/install-git-hook.sh).

VS Code and Kiro

Stock VS Code has no Cursor-style sessionStart / postToolUse hooks. Options: run node dist/hook-session-start.js with MEMORY_TOKEN_WORKSPACE set, use a task, or an extension that exposes similar lifecycle events.

Kiro IDE: full install (MCP, steering, hooks, spec-task ideas, Cursor parity table) is in KIRO_SETUP.md. Example hook YAML lives under .kiro/hooks/.

Agent workflow

Read .cursor/skills/memory-token/SKILL.md (injected path on session start).
Call memory_token_get_context_pack early; use memory_token_rag_query before reading many files or huge pasted context.
Treat RAG snippets as non-verbatim until memory_token_rag_get_full.
Persist stable facts with propose → confirm; link related memories when useful; prune junk or stale proposals.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

memory-token

The problem this solves

How memory-token solves it

Benefits (measured on this repo)

When to use it

What it does

Requirements

Install

Option A — `npx` (zero-install, recommended)

Option B — Global install

Option C — From source (development)

Global Cursor install (all new windows / workspaces)

Data locations

Repository layout

Cursor MCP

Cursor hooks

MCP tools (full list)

RAG embeddings (local first)

npm scripts

Optional git post-commit hook

VS Code and Kiro

Agent workflow

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.cursor		.cursor
.kiro		.kiro
scripts		scripts
src		src
.gitignore		.gitignore
KIRO_SETUP.md		KIRO_SETUP.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

memory-token

The problem this solves

How memory-token solves it

Benefits (measured on this repo)

When to use it

What it does

Requirements

Install

Option A — npx (zero-install, recommended)

Option B — Global install

Option C — From source (development)

Global Cursor install (all new windows / workspaces)

Data locations

Repository layout

Cursor MCP

Cursor hooks

MCP tools (full list)

RAG embeddings (local first)

npm scripts

Optional git post-commit hook

VS Code and Kiro

Agent workflow

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Option A — `npx` (zero-install, recommended)

Packages