Skip to content

barbozaa/memory-token

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

memory-token

MCP server plus Cursor hooks for governed workspace memory with token-budget packs: agents propose facts, you (or policy) confirm, then a bounded summary loads at session start—with preCompact nudges, post-tool reminders, and optional user-message triggers so long sessions still persist durable context.

The problem this solves

Coding agents (Cursor / Claude / Copilot) forget everything between sessions. Every new chat means:

  • The agent re-reads the same files to understand your project (5–20k wasted tokens per session).
  • You re-explain the same architectural decisions, conventions, build commands, and known bugs.
  • Long-running sessions silently lose context to summarization and the agent starts repeating mistakes you already fixed.
  • "Just put it in CLAUDE.md / AGENTS.md" doesn't scale: the file grows unbounded, eats context every turn, and has no provenance — you can't tell which lines are stale, who proposed them, or what actually got applied.

The shortcut of dumping everything into a markdown rules file works until it doesn't; you end up paying token rent on context the agent doesn't even need this turn.

How memory-token solves it

A two-tier memory with a strict token budget and a human-in-the-loop gate:

  1. Confirmed store (store.json) — small, curated, typed facts (decisions, fixes, APIs, build commands, conventions). Loaded at session start as a bounded markdown pack (default ≤1500 tokens). The agent proposes, you (or policy) confirm — nothing gets in by accident.
  2. RAG layer (rag.sqlite) — large chunks (chat distills, session capsules, design docs) stored compressed with embeddings. Pulled on demand via rag_query, ranked by hybrid score (embeddings + BM25-style lexical), returned already compressed within a token budget. Verbatim text only when explicitly fetched via rag_get_full.

Both layers respect a token budget on every read — you never pay for context the agent isn't using right now.

Benefits (measured on this repo)

Benefit Mechanism Example impact
Cross-session memory sessionStart hook injects confirmed pack automatically Critical facts (build cmd, security fix, race condition root cause) survive across all future Cursor windows.
Big token savings on context loads RAG returns compressed, ranked snippets instead of file reads A query that would cost ~3000–5000 tokens in Read calls cost 364 tokens of compressed RAG hits in this session (~85% reduction).
Bounded session-start cost Pack hard-capped at max_tokens, deterministic ordering This repo's pack: 397 tok / 1500 budget = 26%, fits in any sessionStart without crowding out user prompt.
Compression for free compress_candidate shrinks bodies to ~1.6× ratio 37.8% size reduction on stored bodies; same retrieval quality.
No silent drift propose → confirm gate, dedupe by hash + similarity, audit log Every fact has a status, timestamp, and reason; you can prune with confidence.
Causal graph Typed links (SOLVES, CAUSES, BUILDS_ON, …) + traverse BFS Ask "what fixed X?" and get back the commit that solved it, the build that depends on it, in one hop.
Local-first, private Embeddings via Transformers.js (ONNX, ~25 MB model, downloaded once) No API keys required. Optional: Ollama or OpenAI for embeddings. Your facts never leave the repo.
Workspace-scoped Each project gets its own .memory-token/ directory No bleeding of facts between unrelated repos; gitignored by default but can be versioned.
Agent steering, not vibes Hooks inject skill + policy + pack; skill is a decision tree the agent follows Agents actually call propose/confirm/rag_query consistently instead of "maybe sometimes if they remember".
Prevents the CLAUDE.md bloat trap Token-budget pack + RAG fallback Old facts move to RAG (compressed, on-demand) instead of permanently inflating the rules file.

When to use it

  • ✅ Multi-week projects where you reopen Cursor often and don't want to re-explain context.
  • ✅ Codebases big enough that re-reading "to understand structure" costs noticeable tokens per session.
  • ✅ Teams that want provenance and control over what the agent "remembers".
  • ✅ Long debugging sessions where you want the root cause + fix to survive into next week's session.

Skip if your project is a one-off script or you don't run more than 1–2 chat sessions on the same codebase.

What it does

Layer Role
Store (store.json) Typed memories (decision, fix, api, …), statuses (proposedconfirmed / rejected), links between memories, audit log.
RAG (rag.sqlite) Chunked text with hybrid retrieval (embeddings + lexical). Compressed snippets in query results; verbatim text only via memory_token_rag_get_full.
MCP Eighteen tools: pack, propose/confirm/reject, search, prune, link graph, export/import, RAG ingest/query/delete, stats, audit.
Hooks Inject skill path + policy + pack on sessionStart; hints on beforeSubmitPrompt / preCompact / postToolUse. Hooks do not call the MCP (shell → Node CLIs only). Flow: hook → skill → MCP.

Workspace root comes from MEMORY_TOKEN_WORKSPACE, CURSOR_PROJECT_DIR, or CLAUDE_PROJECT_DIR, else process.cwd() (see src/workspace.ts).

Requirements

  • Node.js ≥ 22.5.0 (uses node:sqlite experimental API)

Install

Option A — npx (zero-install, recommended)

No clone, no global install. Cursor pulls the package on demand:

// ~/.cursor/mcp.json
{
  "mcpServers": {
    "memory-token": {
      "command": "npx",
      "args": ["-y", "@barbozaa/memory-token"],
      "env": { "MEMORY_TOKEN_WORKSPACE": "${workspaceFolder}" }
    }
  }
}

Restart Cursor. The first call downloads the package (~52 KB tarball). The first semantic embedding call additionally downloads the local ONNX model (~25 MB) into <workspace>/.memory-token/transformers-cache/.

To use the hooks or the skill with this install method, copy them once from the installed package into your ~/.cursor/:

PKG=$(npm root -g 2>/dev/null)/@barbozaa/memory-token   # or use `npm pack` to extract locally
cp -r "$PKG/.cursor/hooks"  ~/.cursor/
cp    "$PKG/.cursor/hooks.json" ~/.cursor/
cp -r "$PKG/.cursor/skills" ~/.cursor/
chmod +x ~/.cursor/hooks/*.sh

Option B — Global install

npm install -g @barbozaa/memory-token
{
  "mcpServers": {
    "memory-token": {
      "command": "memory-token-mcp",
      "env": { "MEMORY_TOKEN_WORKSPACE": "${workspaceFolder}" }
    }
  }
}

Option C — From source (development)

git clone https://github.com/barbozaa/memory-token.git
cd memory-token
npm install
npm run build
chmod +x .cursor/hooks/*.sh
npm run smoke

Point Cursor at dist/mcp/index.js with MEMORY_TOKEN_WORKSPACE=${workspaceFolder} (see Cursor MCP). Merge .cursor/hooks.json into your project if you use hooks elsewhere.

Global Cursor install (all new windows / workspaces)

Use user-level config so you do not copy hooks into every repo:

  1. ~/.cursor/mcp.json — add the memory-token server with MEMORY_TOKEN_WORKSPACE: ${workspaceFolder} (each project still gets its own .memory-token/).
  2. ~/.cursor/hooks.json + ~/.cursor/hooks/memory-token/*.sh — wrappers that set MEMORY_TOKEN_CLI_ROOT to your clone and call dist/hook-*.js. See ~/.cursor/hooks/memory-token/README.md after install.
  3. ~/.cursor/skills/memory-token/SKILL.md — global skill; session hook sets MEMORY_TOKEN_SKILL_PATH so the banner points here.
  4. User Rules — Cursor reads them from Settings → Rules, not from ~/.cursor/rules/. Use ~/.cursor/rules/memory-token.mdc only as a reference to paste into User Rules, or add the same rule under each repo’s .cursor/rules/.

Restart Cursor after changing MCP or hooks. Re-run npm run build in the memory-token clone when you pull updates.

Data locations

All under <workspace>/.memory-token/ (typically gitignored):

Path Contents
store.json Memories, links[], audit[]
rag.sqlite RAG chunks + vectors
transformers-cache/ Downloaded ONNX model (first semantic embed run)

Remove .memory-token/ from .gitignore if you want the store versioned.

Repository layout

src/
  mcp/index.ts          # MCP server + tool handlers
  store.ts              # JSON store + lockfile (O_EXCL) for concurrent writes
  pack.ts               # Token-budget pack builder (used by MCP + session hook)
  types.ts              # Memory types, links, audit shapes
  session-policy.ts     # Injected policy text for hooks
  compress-lite.ts      # Lightweight compression helpers
  memory-dedupe.ts      # Near-duplicate detection on propose
  workspace.ts          # Resolve workspace root from env
  hook-session-start.ts # sessionStart payload
  hook-post-tool-nudge.ts
  hook-user-message.ts
  hook-git-commit.ts    # Optional git post-commit integration
  mcp-nudge-messages.ts
  rag/                  # db, query, embed, embed-local, lexical, paths, compress-default
scripts/
  smoke-mcp.mjs         # MCP smoke (listTools, pack, propose, RAG, …)
  metrics.mjs         # Store/pack/link/RAG health report
  install-git-hook.sh # Install post-commit hook in another repo
.cursor/
  hooks.json            # Cursor hook wiring
  hooks/*.sh            # Shell wrappers → dist/*.js
  skills/memory-token/SKILL.md   # Agent workflow (read at session start)
  rules/memory-token.mdc         # Optional Cursor rules

Cursor MCP

For npx or global installs see Install. The block below is for the from-source workflow:

"memory-token": {
  "command": "node",
  "args": [
    "--disable-warning=ExperimentalWarning",
    "/ABS/PATH/TO/memory-token/dist/mcp/index.js"
  ],
  "env": {
    "MEMORY_TOKEN_WORKSPACE": "${workspaceFolder}"
  }
}

Replace /ABS/PATH/TO/memory-token with this repo’s path. Restart Cursor after edits.

CLI / other clients: npm run mcp runs the server with cwd as workspace unless env overrides. memory-token-mcp is the package bin entry (same script).

Cursor hooks

Shipped in .cursor/hooks.json:

Event Script Purpose
sessionStart .cursor/hooks/session-memory.sh Skill banner (MEMORY_TOKEN_SKILL_PATH or .cursor/skills/memory-token/SKILL.md), policy cheat sheet, token-capped confirmed pack.
preCompact .cursor/hooks/precompact-memory-hint.sh Reminder to rag_ingest / propose / confirm before context compaction.
postToolUse .cursor/hooks/post-tool-mcp-nudge.sh Every MEMORY_TOKEN_NUDGE_EVERY tool calls (default 10), short [memory-token] nudge (skips tools already named like memory_token_*).
beforeSubmitPrompt .cursor/hooks/user-message-triggers.sh (matcher: UserPromptSubmit) Same as legacy “user send” nudge: phrases like “remember this”, “root cause”, long paste → additional_context for propose / rag_ingest.

Hook env (optional)

Variable Default Meaning
MEMORY_TOKEN_SESSION_MAX_TOKENS 2400 Total rough budget for policy + pack in session hook (char/4 estimate).
MEMORY_TOKEN_NUDGE_EVERY 10 Post-tool nudge interval.
MEMORY_TOKEN_CLI_ROOT auto Absolute path to this repo when hooks live in another project (so dist/hook-*.js resolve).
MEMORY_TOKEN_SKILL_PATH Custom SKILL.md path.

MCP tools (full list)

Tool Purpose
memory_token_get_context_pack Confirmed memories within max_tokens; optional query / tags rerank.
memory_token_propose Create proposed memory (type, importance, optional body_compressed, force to bypass dedupe).
memory_token_confirm / memory_token_reject Promote or drop proposals.
memory_token_compress_candidate Deterministic squeeze for body_compressed.
memory_token_search Substring search over memories.
memory_token_list_audit Recent audit entries.
memory_token_prune Remove old/matching proposed (and optionally confirmed) rows; dry_run preview.
memory_token_link / memory_token_unlink Directed edges between memories (relation_type, optional bidirectional).
memory_token_traverse BFS from a memory id over the link graph.
memory_token_stats Workspace counts / health-style summary.
memory_token_export / memory_token_import JSON backup / import (include_rag on export).
memory_token_rag_ingest Store chunk (text_raw, optional text_compressed, source); dedupe by hash of raw.
memory_token_rag_query Ranked compressed chunks within token budget (hybrid score).
memory_token_rag_get_full Verbatim text_raw for an id (use before quoting or patching from RAG hits).
memory_token_rag_delete Remove a chunk.

Recommended link relation strings include SOLVES, CAUSES, BUILDS_ON, CONTRADICTS, SUPERSEDES, BLOCKS, REQUIRES, ALTERNATIVE_TO, RELATED_TO (see src/types.ts).

RAG embeddings (local first)

Default: local embeddings via @xenova/transformers + SQLite vectors (model Xenova/all-MiniLM-L6-v2). First run downloads into .memory-token/transformers-cache/ (needs network once).

Env Effect
MEMORY_TOKEN_LOCAL_EMBED_MODEL Override Hugging Face model id (ONNX / Transformers.js).
MEMORY_TOKEN_LOCAL_EMBED_THREADS ONNX WASM threads (default 2).
MEMORY_TOKEN_NO_LOCAL_EMBED=1 Skip local embedder → lexical unless Ollama/OpenAI configured.
MEMORY_TOKEN_OLLAMA_EMBED_MODEL Ollama embeddings.
MEMORY_TOKEN_OLLAMA_URL Ollama base URL (default http://127.0.0.1:11434).
OPENAI_API_KEY / MEMORY_TOKEN_OPENAI_API_KEY Optional cloud embeddings.

This is not ChromaDB; the DB format is project-specific.

npm scripts

Script Command Purpose
build tsc Compile src/dist/.
mcp node … dist/mcp/index.js Run MCP stdio server.
smoke node … scripts/smoke-mcp.mjs End-to-end MCP smoke against repo root.
metrics node … scripts/metrics.mjs Human or --json report: store, pack usage, links, RAG, audit. Optional --root /path/to/workspace.

Optional git post-commit hook

Install into another git repo:

./scripts/install-git-hook.sh /path/to/target/repo

Uses MEMORY_TOKEN_CLI_ROOT if the memory-token repo is not next to the script. On each commit, runs dist/hook-git-commit.js with MEMORY_TOKEN_WORKSPACE set to the target repo root (see scripts/install-git-hook.sh).

VS Code and Kiro

Stock VS Code has no Cursor-style sessionStart / postToolUse hooks. Options: run node dist/hook-session-start.js with MEMORY_TOKEN_WORKSPACE set, use a task, or an extension that exposes similar lifecycle events.

Kiro IDE: full install (MCP, steering, hooks, spec-task ideas, Cursor parity table) is in KIRO_SETUP.md. Example hook YAML lives under .kiro/hooks/.

Agent workflow

  1. Read .cursor/skills/memory-token/SKILL.md (injected path on session start).
  2. Call memory_token_get_context_pack early; use memory_token_rag_query before reading many files or huge pasted context.
  3. Treat RAG snippets as non-verbatim until memory_token_rag_get_full.
  4. Persist stable facts with propose → confirm; link related memories when useful; prune junk or stale proposals.

License

MIT © 2026 barbozaa

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors