RLM Code

Run LLM-powered agents in a REPL loop, benchmark them, and compare results.

RLM Code implements the Recursive Language Models (RLM) approach from the 2025 paper release. Instead of stuffing your entire document into the LLM's context window, RLM stores it as a Python variable and lets the LLM write code to analyze it, chunk by chunk, iteration by iteration. This is dramatically more token-efficient for large inputs.

RLM Code wraps this algorithm in an interactive terminal UI with built-in benchmarks, trajectory replay, and observability.

Documentation

Open the full documentation

Install

uv tool install "rlm-code[tui,llm-all]"

This installs rlm-code as a globally available command with its own isolated environment. You get the TUI and all LLM provider clients (OpenAI, Anthropic, Gemini).

Requirements:

Python 3.11+
uv (recommended) or pip
one model route (BYOK API key or local server like Ollama)
one secure execution backend (Docker recommended; Monty optional)

Don't have uv? Install it first:

curl -LsSf https://astral.sh/uv/install.sh | sh

Alternative: install with pip

pip install rlm-code[tui,llm-all]

Quick Start

1. Launch

mkdir -p ~/my-project && cd ~/my-project
rlm-code

This opens the terminal UI. You'll see a chat input at the bottom and tabs across the top.

2. Connect to an LLM

Type one of these in the chat input:

/connect anthropic claude-opus-4-6

or

/connect openai gpt-5.3-codex

or

/connect gemini gemini-2.5-flash

or for a free local model via Ollama:

/connect ollama llama3.2

You need the matching API key in your environment (ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY) or in a .env file in your project directory. Ollama needs no key, just a running Ollama server.

Follow the interactive path with just /connect command instead: Check it worked:

/status

3. Run your first RLM task

/rlm run "Write a Python function that finds the longest common subsequence of two strings"

This starts the RLM loop: the LLM writes code in a sandboxed REPL, executes it, sees the output, writes more code, and iterates until it calls FINAL(answer) with the result.

4. Run a benchmark

Benchmarks let you measure how well a model performs on a set of tasks:

/rlm bench preset=pure_rlm_smoke

This runs 3 test cases through the RLM loop and scores the results.

See all available benchmarks:

/rlm bench list

5. View results

Use the Research tab (Ctrl+5) for live benchmark and trajectory views. After at least two benchmark runs, export a compare report:

/rlm bench report candidate=latest baseline=previous format=markdown

6. Replay a session step-by-step

/rlm status
/rlm replay <run_id>

Walk through the last run one step at a time, see what code the LLM wrote, what output it got, and what it did next.

7. Use RLM Code as a coding agent (local/BYOK/ACP)

RLM Code can also be used as a coding-agent harness in the TUI, Just like Claude Code, Codex etc. It has mimimal harnesss to steer the model to write the code.

/harness tools
/harness run "fix failing tests and add regression test" steps=8 mcp=on

ACP is supported too:

/connect acp
/harness run "implement feature X with tests" steps=8 mcp=on

Notes:

In Local/BYOK connection modes, likely coding prompts in chat can auto-route to harness.
In ACP mode, auto-routing is intentionally off; use /harness run ... explicitly.

How the RLM Loop Works

Traditional LLM usage: paste your document into the prompt, ask a question, hope the model doesn't lose details in the middle.

RLM approach:

Your document is stored as a Python variable context in a REPL
The LLM writes code to process it (e.g., len(context), context[:5000], context.split('\n'))
The code runs, and the LLM sees the output
The LLM writes more code based on what it learned
Repeat until the LLM calls FINAL("here is my answer")

This means the LLM can handle documents much larger than its context window, because it reads them in chunks through code rather than all at once through the prompt.

What This Is (and Is Not)

RLM Code is:

a research playground for recursive/model-assisted coding workflows
a benchmarking and replay tool for reproducible experiments

RLM Code is not:

a no-config consumer chat app
guaranteed cheap (recursive runs can be expensive)
safe to run with unrestricted execution settings

Use secure backend defaults (/sandbox profile secure) for normal use.

Key Commands

Command	What it does
`/connect <provider> <model>`	Connect to an LLM
`/model`	Interactive model picker
`/status`	Show connection status
`/sandbox profile secure`	Apply secure sandbox defaults (Docker-first + strict pure RLM)
`/rlm run "<task>"`	Run a task through the RLM loop
`/rlm bench preset=<name>`	Run a benchmark preset
`/rlm bench list`	List available benchmarks
`/rlm bench compare`	Compare latest benchmark run with previous run
`/rlm abort [run_id\|all]`	Cancel active run(s) cooperatively
`/harness run "<task>"`	Run tool-using coding harness loop
`/rlm replay`	Step through the last run
`/rlm chat "<question>"`	Ask the LLM a question about your project
`/help`	Show all available commands

Cost and Safety Guardrails

Start bounded:

/rlm run "small scoped task" steps=4 timeout=30 budget=60

For benchmarks, start with small limits:

/rlm bench preset=dspy_quick limit=1

If a run is going out of hand:

/rlm abort all

What You Can Do With It

Analyze large documents: Feed in a 500-page PDF and ask questions, then the LLM reads it in chunks via code
Compare models: Run the same benchmark with different providers and see who scores higher
Compare paradigms: Test Pure RLM vs CodeAct vs Traditional approaches on the same task
Debug agent behavior: Replay any run step-by-step to see exactly what the agent did
Track experiments: Every run is logged with metrics, tokens used, and trajectory

Supported LLM Providers

Provider	Latest Models	Setup
Anthropic	`claude-opus-4-6`, `claude-sonnet-4-5-20250929`	`ANTHROPIC_API_KEY` env var
OpenAI	`gpt-5.3-codex`, `gpt-5.2-pro`	`OPENAI_API_KEY` env var
Google	`gemini-2.5-pro`, `gemini-2.5-flash`	`GEMINI_API_KEY` or `GOOGLE_API_KEY` env var
Ollama	`llama3.2`, `qwen2.5-coder:7b`	Running Ollama server at `localhost:11434`

Configuration

Create an rlm_config.yaml in your project directory to customize settings:

name: my-project

models:
  openai_api_key: null
  openai_model: gpt-5.3-codex

default_model: gpt-5.3-codex

sandbox:
  runtime: docker
  superbox_profile: secure
  superbox_auto_fallback: true
  superbox_fallback_runtimes: [docker, daytona, e2b]
  pure_rlm_backend: docker
  pure_rlm_strict: true
  pure_rlm_allow_unsafe_exec: false

rlm:
  default_benchmark_preset: dspy_quick
  benchmark_pack_paths: []

Or generate a full sample config:

/init

Development Setup

git clone https://github.com/SuperagenticAI/rlm-code.git
cd rlm-code
uv sync --all-extras
uv run pytest

Project Structure

rlm_code/
  rlm/              # Core RLM engine (runner, environments, policies)
  ui/               # Terminal UI (Textual-based TUI)
  mcp/              # MCP server for tool integration
  models/           # LLM provider adapters
  sandbox/          # Sandboxed code execution
  harness/          # Tool-using coding harness (/harness)

Resources

Full docs: https://superagenticai.github.io/rlm-code/

Contributing

See CONTRIBUTING.md.

License

Apache-2.0

Brought to You by Superagentic AI

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github		.github
adk_rlm		adk_rlm
assets		assets
docs		docs
eval/packs		eval/packs
examples		examples
rlm_code		rlm_code
scripts		scripts
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
SUPPORT.md		SUPPORT.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RLM Code

Documentation

Install

Quick Start

1. Launch

2. Connect to an LLM

3. Run your first RLM task

4. Run a benchmark

5. View results

6. Replay a session step-by-step

7. Use RLM Code as a coding agent (local/BYOK/ACP)

How the RLM Loop Works

What This Is (and Is Not)

Key Commands

Cost and Safety Guardrails

What You Can Do With It

Supported LLM Providers

Configuration

Development Setup

Project Structure

Resources

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

SuperagenticAI/rlm-code

Folders and files

Latest commit

History

Repository files navigation

RLM Code

Documentation

Install

Quick Start

1. Launch

2. Connect to an LLM

3. Run your first RLM task

4. Run a benchmark

5. View results

6. Replay a session step-by-step

7. Use RLM Code as a coding agent (local/BYOK/ACP)

How the RLM Loop Works

What This Is (and Is Not)

Key Commands

Cost and Safety Guardrails

What You Can Do With It

Supported LLM Providers

Configuration

Development Setup

Project Structure

Resources

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages