An MCP server that turns Claude Code into a Recursive Language Model (RLM) for processing long contexts.
Instead of calling external APIs, fast-rlm provides a Python REPL, context chunking, and session management — letting Claude drive recursive decomposition loops using its own inference quota at zero additional API cost.
- Load context — feed any long text/data into a session
- Chunk — split it into manageable pieces (by size or separator)
- Process — Claude reads each chunk, runs Python code to analyze it, and stores findings
- Aggregate — combine results across chunks into a final answer
Claude acts as both the primary agent and sub-agent, recursively decomposing complex tasks.
| Tool | Description |
|---|---|
rlm_start |
Initialize a session with context |
rlm_chunk |
Split context into chunks (by char count or separator) |
rlm_read_chunk |
Read a specific chunk |
rlm_exec |
Execute Python in the persistent REPL |
rlm_store |
Store intermediate findings |
rlm_finish |
Finalize session and return all results |
rlm_system_prompt |
Get the RLM workflow instructions |
- Python 3.10+
uv(recommended) orpip
claude mcp add fast-rlm \
-- uv run --with "mcp[cli]" --python 3.14 \
/path/to/fast-rlm/server.pyOr add to ~/.claude.json manually:
{
"mcpServers": {
"fast-rlm": {
"type": "stdio",
"command": "uv",
"args": [
"run", "--with", "mcp[cli]", "--python", "3.14",
"/path/to/fast-rlm/server.py"
]
}
}
}Tell Claude Code to "use fast-rlm" and give it a long context task. Example:
use fast-rlm. summarize this 200-page document: <paste>
Claude will automatically:
- Call
rlm_start()to load the context - Call
rlm_chunk()to split it - Process each chunk with
rlm_read_chunk()+rlm_exec() - Store findings with
rlm_store() - Aggregate and return via
rlm_finish()
The rlm_exec tool provides a persistent Python environment (like Jupyter). Variables persist across calls:
# First call
rlm_exec("items = [line for line in context.split('\\n') if 'ERROR' in line]")
# Second call — `items` is still available
rlm_exec("print(f'Found {len(items)} errors')")- Zero external cost — all inference uses Claude's own quota
- Recursive — chunks can be re-chunked if still too large
- Parallel — process multiple chunks concurrently
- Stateful — REPL persists variables across calls within a session
- Lightweight — single file, no dependencies beyond
mcp
MIT