claude-rag

Local RAG over your Claude Code conversation history, exposed as an MCP server so Claude can recall past sessions inside the IDE.

The transcripts Claude Code writes to ~/.claude/projects/**/*.jsonl get parsed, chunked, embedded, and indexed into a local Chroma database. An MCP server then lets Claude query the index — solving the "Claude forgets last week's work" problem without sending anything to a cloud service.

Requirements

macOS or Linux
Python 3.11+
uv (recommended) or pip
~250 MB free disk: ~50 MB for the index after a few months of use, ~220 MB for the embedding model cache

Everything runs locally. No API keys.

Install

git clone <this-repo> ~/dev/claude-rag   # or anywhere
cd ~/dev/claude-rag
uv venv
uv pip install -e .

This creates .venv/ and installs two console entry points:

claude-rag-index — (re)builds the index
claude-rag-server — the MCP server

First-time index

.venv/bin/claude-rag-index

The first run downloads the embedding model (sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2, ~220 MB, cached in ~/.cache/fastembed/) and embeds every user/assistant turn under ~/.claude/projects/. On a typical machine with ~6 months of history, expect 5–10 minutes on CPU.

Subsequent runs are incremental — only changed JSONL files get re-embedded. State is tracked in ~/.claude-rag/state.json.

Force a full rebuild with --force.

Register with Claude Code

.venv/bin/python scripts/register_mcp.py

This writes an mcpServers.claude-rag entry into ~/.claude.json (a backup is left at ~/.claude.json.bak-claude-rag). It auto-detects the server binary in this order:

$CLAUDE_RAG_SERVER_BIN if set
<repo>/.venv/bin/claude-rag-server
claude-rag-server on $PATH

Restart Claude Code (or run /mcp and reconnect) to pick up the new server.

Usage in the IDE

Once registered, Claude has three new tools:

Tool	What it does
`search_conversations(query, project?, limit?)`	Semantic search across all indexed turns
`get_session_window(session_id, around_timestamp?, radius?)`	Expand context around a hit
`reindex(force?)`	Trigger an incremental reindex from inside the IDE

Just ask Claude things like "do you remember what we decided about X" or "回忆一下我之前是怎么处理 Y 的" — it should call search_conversations automatically.

CLI usage

The same store can be queried directly without an MCP client:

.venv/bin/python -c "
from claude_rag.store import Store
for h in Store().query('your query here', limit=5):
    print(h['metadata']['timestamp'][:19], h['metadata']['project'], '-', h['text'][:120])
"

Configuration

Env var	Default	Purpose
`CLAUDE_RAG_DATA`	`~/.claude-rag`	Where the Chroma DB and `state.json` live
`CLAUDE_RAG_SERVER_BIN`	(auto-detect)	Override path to `claude-rag-server` for `register_mcp.py`
`CLAUDE_RAG_LOG`	`WARNING`	Server log level

To change chunking or the embedding model, edit src/claude_rag/config.py.

Where data lives

Path	Contents	Sensitive?
`~/.claude/projects/`	Source: Claude Code's own JSONL transcripts	yes
`~/.claude-rag/chroma/`	Vector DB with full chunk text	yes — never commit or sync
`~/.claude-rag/state.json`	Per-file mtime/size for incremental indexing	low
`~/.cache/fastembed/`	ONNX embedding model	no

The repo's .gitignore already covers .chroma/ and data/ as belt-and-suspenders, but by default the data lives outside the repo entirely.

Architecture

~/.claude/projects/**/*.jsonl
        |
        v
   jsonl.py        strips IDE/system-reminder wrappers,
   (parse +        skips queue/snapshot/tool rows,
    chunk)         char-based chunks (1500/200 overlap)
        |
        v
   store.py        fastembed multilingual MiniLM (384 dim)
   (embed)         -> Chroma PersistentClient (cosine)
        |
        v
   server.py       MCP stdio server: 3 tools
                   <- Claude Code (in-IDE)

About 350 lines of Python total.

Troubleshooting

claude-rag doesn't show up in /mcp: confirm the entry exists in ~/.claude.json under mcpServers.claude-rag and that the command path points to an existing executable. Restart Claude Code.

Model ... is not supported: fastembed's supported model list changes between versions. Run python -c "from fastembed import TextEmbedding; print(*[m['model'] for m in TextEmbedding.list_supported_models()], sep='\n')" to see what's available, then update EMBEDDING_MODEL in config.py.

Search recall is poor for very short queries: the multilingual MiniLM model is decent but not state of the art. For better quality, swap in intfloat/multilingual-e5-large (2.2 GB, 1024 dim) — bigger model, better recall, slower index. Edit EMBEDDING_MODEL and run claude-rag-index --force.

Reset everything: rm -rf ~/.claude-rag/, then re-run claude-rag-index.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
scripts		scripts
src/claude_rag		src/claude_rag
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

claude-rag

Requirements

Install

First-time index

Register with Claude Code

Usage in the IDE

CLI usage

Configuration

Where data lives

Architecture

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

claude-rag

Requirements

Install

First-time index

Register with Claude Code

Usage in the IDE

CLI usage

Configuration

Where data lives

Architecture

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages