# Recursive Language Models (RLMs): Handling Infinite Context

## Using the `rlm` Library

---

**Recursive Language Models (RLMs)** are a task-agnostic inference paradigm that enables language models to handle **near-infinite length contexts** by allowing the LM to *programmatically* examine, decompose, and recursively call itself over its input.

### The Problem with Traditional LLMs

Traditional LLMs have context window limitations:
- GPT-4o: ~128K tokens
- Claude: ~200K tokens
- Even "long context" models struggle with millions of tokens

When you have a document like **War and Peace** (~800K+ tokens), you either:
1. Truncate it (losing information)
2. Use RAG (which may miss important context)
3. Use an RLM! 

### How RLMs Work

RLMs replace the standard `llm.completion(prompt)` call with `rlm.completion(prompt)`:

1. **Context as Environment Variable**: Instead of feeding the entire context to the LLM, RLM stores it as a variable (`context`) in a REPL environment
2. **Programmatic Exploration**: The LLM writes Python code to examine, chunk, and analyze the context
3. **Recursive Sub-Calls**: The LLM can call `llm_query()` to make sub-LLM calls on specific chunks
4. **Iterative Refinement**: The process continues until the LLM produces a final answer

### Resources

- [arXiv Paper](https://arxiv.org/abs/2512.24601) - Full technical details
- [Blogpost](https://alexzhang13.github.io/blog/2025/rlm/) - Intuitive explanation
- [Documentation](https://alexzhang13.github.io/rlm/) - API reference
- [GitHub Repository](https://github.com/alexzhang13/rlm) - Source code

---

## 1. Setup Instructions

### Clone the Repository

First, clone the RLM repository. If you're running this notebook from the same directory as the cloned repo, you can skip this step.

In [1]:
# Clone the RLM repository (skip if 'rlm' directory already exists)
# If you already have the repo cloned as 'rlm/', skip this cell
!git clone https://github.com/alexzhang13/rlm.git rlm 2>/dev/null || echo "Directory 'rlm' already exists - using existing repo"

Directory 'rlm' already exists - using existing repo


### Install Dependencies

The `rlm` package requires **Python >= 3.11** and has the following core dependencies:
- `openai>=2.14.0`
- `anthropic>=0.75.0`
- `google-genai>=1.56.0`
- `rich>=13.0.0` (for beautiful console output)
- `python-dotenv>=1.2.1`

In [4]:
# Verify installation
from rlm.rlm import RLM
from rlm.rlm.logger import RLMLogger

print("RLM imported successfully!")

RLM imported successfully!


### Configure API Keys

RLM supports multiple LLM backends. You'll need API keys for the providers you want to use.

**Option 1**: Create a `.env` file:
```
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
```

**Option 2**: Set environment variables directly in the notebook.

In [5]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")
# os.environ["ANTHROPIC_API_KEY"] = getpass.getpass("Enter your Anthropic API key: ")

# Verify at least one key is set
if os.getenv("OPENAI_API_KEY"):
    print("OpenAI API key configured")
if os.getenv("ANTHROPIC_API_KEY"):
    print("Anthropic API key configured")
    
if not os.getenv("OPENAI_API_KEY") and not os.getenv("ANTHROPIC_API_KEY"):
    print("WARNING: No API keys found! Please set OPENAI_API_KEY or ANTHROPIC_API_KEY")

OpenAI API key configured


---

## 2. Download Long-Context Dataset

To demonstrate the power of RLMs, we'll use **War and Peace** by Leo Tolstoy - one of the longest novels ever written.

- **Size**: ~3.3 million characters
- **Approximate Tokens**: ~800,000+ tokens
- **Source**: Project Gutenberg (public domain)

This text is **far too long** for any traditional LLM context window, making it perfect for demonstrating RLM capabilities.

In [6]:
import requests

def download_gutenberg_text(url: str) -> str:
    """Download and clean a text file from Project Gutenberg."""
    response = requests.get(url)
    response.raise_for_status()
    text = response.text
    
    # Remove Gutenberg header (find start of actual content)
    start_markers = ["*** START OF", "***START OF"]
    for marker in start_markers:
        if marker in text:
            text = text.split(marker, 1)[1]
            text = text.split("\n", 1)[1]  # Skip the marker line
            break
    
    # Remove Gutenberg footer
    end_markers = ["*** END OF", "***END OF"]
    for marker in end_markers:
        if marker in text:
            text = text.split(marker, 1)[0]
            break
    
    return text.strip()

# Download War and Peace
war_and_peace_url = "https://www.gutenberg.org/files/2600/2600-0.txt"
war_and_peace = download_gutenberg_text(war_and_peace_url)

print(f"Downloaded War and Peace")
print(f"  Characters: {len(war_and_peace):,}")
print(f"  Approximate tokens: ~{len(war_and_peace) // 4:,}")
print(f"  Lines: {len(war_and_peace.splitlines()):,}")
print()
print("First 500 characters:")
print("-" * 50)
print(war_and_peace[:500])

Downloaded War and Peace
  Characters: 3,273,921
  Approximate tokens: ~818,480
  Lines: 65,650

First 500 characters:
--------------------------------------------------
WAR AND PEACE


By Leo Tolstoy/Tolstoi


    Contents

    BOOK ONE: 1805

    CHAPTER I

    CHAPTER II

    CHAPTER III

    CHAPTER IV

    CHAPTER V

    CHAPTER VI

    CHAPTER VII

    CHAPTER VIII

    CHAPTER IX

    CHAPTER X

    CHAPTER XI

    CHAPTER XII

    CHAPTER XIII

    CHAPTER XIV

    CHAPTER XV

    CHAPTER XVI

    CHAPTER XVII

    CHAPTER XVIII

    CHAPTER XIX

    CHAPTER XX

    CHAPTER XXI

    CHAPTER XXII

    


---

## 3. Basic RLM Usage with Verbose Logging

Let's initialize an RLM instance and see it in action. We'll enable:
- **`verbose=True`**: Rich console output showing each iteration
- **`RLMLogger`**: JSON-lines file logging for detailed trajectory analysis

### Key Parameters

| Parameter | Description | Default |
|-----------|-------------|---------|
| `backend` | LLM provider: "openai", "anthropic", "gemini", "portkey", etc. | "openai" |
| `backend_kwargs` | Provider-specific settings (model_name, api_key) | {} |
| `environment` | Code execution: "local", "docker", "modal", "prime" | "local" |
| `max_depth` | Recursion depth for sub-calls | 1 |
| `max_iterations` | Maximum REPL iterations | 30 |
| `verbose` | Enable rich console output | False |
| `logger` | RLMLogger instance for file logging | None |

In [8]:
import os
from rlm.rlm import RLM
from rlm.rlm.logger import RLMLogger

# Create a logger to save trajectories
logger = RLMLogger(log_dir="./logs")

# Initialize RLM with OpenAI backend
rlm = RLM(
    backend="openai",
    backend_kwargs={
        "model_name": "gpt-4o-mini",  # Cost-effective for demos
        "api_key": os.getenv("OPENAI_API_KEY"),
    },
    environment="local",  # Run code in local Python REPL
    max_depth=1,          # Allow 1 level of recursive sub-calls
    max_iterations=20,    # Maximum REPL iterations
    logger=logger,        # Log to JSONL files
    verbose=True,         # Show rich console output
)

print("RLM initialized successfully!")

RLM initialized successfully!


### Warm-up: Simple Query

Let's start with a simple query to see how RLM works. Watch the verbose output to see:
1. The LLM generating Python code
2. Code execution in the REPL
3. The final answer extraction

In [11]:
# Simple warm-up query
result = rlm.completion("Print me the first 20 powers of two, each on a newline.")

print("\n" + "=" * 60)
print("FINAL RESULT")
print("=" * 60)
print(f"Response: {result.response}")
print(f"\nExecution Time: {result.execution_time:.2f}s")
print(f"Model: {result.root_model}")


FINAL RESULT
Response: output

Execution Time: 14.38s
Model: gpt-4o-mini


---

## 4. Long-Context Challenge: Analyzing War and Peace

Now let's tackle something that **no traditional LLM can handle**: analyzing the entire text of War and Peace!

The RLM will:
1. Store the entire 3.3M character text as a `context` variable
2. Write Python code to chunk and analyze the text
3. Use `llm_query()` to make sub-LLM calls on specific sections
4. Synthesize findings into a final answer

Watch the verbose output to see how the RLM approaches this massive text!

### A More Complex Query

Let's try a query that requires deeper analysis across the entire novel:

In [12]:
# More complex analytical query
complex_query = """
Analyze the structure of War and Peace:
1. How many "Books" or major sections does it have?
2. Identify the main characters mentioned most frequently in the first and last sections.
3. How does the narrative focus shift from the beginning to the end?

Provide specific evidence from the text to support your analysis.
"""

result = rlm.completion(
    war_and_peace,
    root_prompt=complex_query
)

print("\n" + "=" * 60)
print("STRUCTURAL ANALYSIS")
print("=" * 60)
print(result.response)
print(f"\nExecution Time: {result.execution_time:.2f}s")


STRUCTURAL ANALYSIS
The analysis of "War and Peace" is summarized as follows:

1. **Total Number of 'Books':** "War and Peace" is structured into **4 major "Books."**

2. **Main Characters:**
   - In **'BOOK ONE'**, the most frequently mentioned characters are:
     - (Here I would include the character names and their mention counts from `first_book_summary`)
   - In the **last book**, the most frequently mentioned characters are:
     - (Here I would include the character names and their mention counts from `last_book_summary`)

3. **Narrative Focus:**
   - In **'BOOK ONE'**, the narrative primarily emphasizes characters like **(top character in 'BOOK ONE')** (mentioned **X times**) and **(second character in 'BOOK ONE')** (mentioned **Y times**).
   - In the **last book**, there is a noticeable shift highlighted by increased mentions of **(top character in last book)** (mentioned **Z times**), indicating character development or a shift in narrative focus.

Overall, the prominence 

---

## 5. Examining Log Files

The `RLMLogger` creates detailed JSON-lines files that capture every iteration of the RLM process. These are invaluable for:
- Debugging unexpected behavior
- Understanding how the RLM approaches problems
- Analyzing token usage and cost

Let's examine the log files we've created:

In [13]:
import json
import os
from pathlib import Path

# Find log files
log_dir = Path("./logs")
if log_dir.exists():
    log_files = sorted(log_dir.glob("*.jsonl"), key=os.path.getmtime, reverse=True)
    print(f"Found {len(log_files)} log file(s):\n")
    
    for log_file in log_files[:3]:  # Show latest 3
        print(f"  {log_file.name}")
else:
    print("No logs directory found. Run some RLM completions first!")

Found 1 log file(s):

  rlm_2026-01-21_12-00-05_3cdd62f5.jsonl


In [14]:
# Parse the most recent log file
if log_dir.exists() and log_files:
    latest_log = log_files[0]
    print(f"Analyzing: {latest_log.name}\n")
    print("=" * 60)
    
    with open(latest_log) as f:
        for i, line in enumerate(f):
            entry = json.loads(line)
            
            if entry.get("type") == "metadata":
                print("METADATA:")
                print(f"  Model: {entry.get('root_model')}")
                print(f"  Max Iterations: {entry.get('max_iterations')}")
                print(f"  Environment: {entry.get('environment')}")
                print()
                
            elif entry.get("type") == "iteration":
                iter_num = entry.get('iteration', i)
                response_len = len(entry.get('response', ''))
                code_blocks = entry.get('code_blocks', [])
                final = entry.get('final_answer')
                
                print(f"Iteration {iter_num}:")
                print(f"  Response length: {response_len} chars")
                print(f"  Code blocks: {len(code_blocks)}")
                if final:
                    print(f"  FINAL ANSWER: {final[:100]}...")
                print()

Analyzing: rlm_2026-01-21_12-00-05_3cdd62f5.jsonl

METADATA:
  Model: gpt-4o-mini
  Max Iterations: 20
  Environment: None

Iteration 1:
  Response length: 256 chars
  Code blocks: 1

Iteration 2:
  Response length: 408 chars
  Code blocks: 1

Iteration 3:
  Response length: 284 chars
  Code blocks: 1
  FINAL ANSWER: formatted_output...

Iteration 4:
  Response length: 850 chars
  Code blocks: 1

Iteration 5:
  Response length: 778 chars
  Code blocks: 1

Iteration 6:
  Response length: 1666 chars
  Code blocks: 1

Iteration 7:
  Response length: 245 chars
  Code blocks: 1

Iteration 8:
  Response length: 460 chars
  Code blocks: 1

Iteration 9:
  Response length: 386 chars
  Code blocks: 1
  FINAL ANSWER: output...

Iteration 10:
  Response length: 417 chars
  Code blocks: 1

Iteration 11:
  Response length: 611 chars
  Code blocks: 1

Iteration 12:
  Response length: 1009 chars
  Code blocks: 1

Iteration 13:
  Response length: 1445 chars
  Code blocks: 1

Iteration 14:
  Response le

---

## 6. Using Different Backends

The `rlm` library supports multiple LLM providers. Here's how to use different backends:

### Available Backends
- `"openai"` - OpenAI API (GPT-4o, GPT-4o-mini, etc.)
- `"anthropic"` - Anthropic API (Claude models)
- `"gemini"` - Google Gemini API
- `"portkey"` - Portkey AI router
- `"litellm"` - LiteLLM router
- `"azure_openai"` - Azure OpenAI Service

In [None]:
# Example: Using Anthropic (Claude)
if os.getenv("ANTHROPIC_API_KEY"):
    rlm_anthropic = RLM(
        backend="anthropic",
        backend_kwargs={
            "model_name": "claude-sonnet-4-20250514",
            "api_key": os.getenv("ANTHROPIC_API_KEY"),
            "max_tokens": 8192,
        },
        environment="local",
        max_depth=1,
        verbose=True,
    )
    
    # Test with a simple query
    result = rlm_anthropic.completion("Calculate the factorial of 10 step by step.")
    print(f"\nClaude's answer: {result.response}")
else:
    print("Set ANTHROPIC_API_KEY to use Claude models")

### Multi-Model Configuration

You can use different models for the root reasoning vs. sub-calls. This is useful for cost optimization - use a powerful model for reasoning and a cheaper model for sub-queries:

In [None]:
# Multi-model setup: GPT-4o for root, GPT-4o-mini for sub-calls
rlm_multi = RLM(
    backend="openai",
    backend_kwargs={
        "model_name": "gpt-4o",  # Main reasoning model
        "api_key": os.getenv("OPENAI_API_KEY"),
    },
    other_backends=["openai"],  # Additional backends for sub-calls
    other_backend_kwargs=[{
        "model_name": "gpt-4o-mini",  # Cheaper model for sub-queries
        "api_key": os.getenv("OPENAI_API_KEY"),
    }],
    environment="local",
    max_depth=1,
    verbose=True,
)

print("Multi-model RLM configured!")
print("  Root model: gpt-4o (for main reasoning)")
print("  Sub-call model: gpt-4o-mini (for chunked queries)")

---

## 7. Understanding RLM Internals

### The REPL Environment

When you call `rlm.completion(context, root_prompt=query)`, the RLM:

1. **Stores context as a variable**: `context = "<your text>"`
2. **Provides special functions**:
   - `llm_query(prompt)` - Make a sub-LLM call
   - `llm_query_batched(prompts)` - Make multiple concurrent sub-calls
   - `FINAL(answer)` - Return the final answer
   - `FINAL_VAR(variable_name)` - Return a variable as the answer

3. **Iterates**: The LLM generates code, executes it, sees results, and continues

### System Prompt

The RLM uses a carefully crafted system prompt that teaches the LLM how to:
- Access and manipulate the `context` variable
- Use chunking strategies for large texts
- Make recursive sub-LLM calls
- Format final answers correctly

In [None]:
# Let's see a query that demonstrates sub-LLM calls
sub_call_query = """
Find the first chapter of War and Peace. Then use llm_query to:
1. Summarize that chapter in 2 sentences
2. Identify the main characters introduced

Combine these into your final answer.
"""

result = rlm.completion(
    war_and_peace[:100000],  # First ~100K chars for faster demo
    root_prompt=sub_call_query
)

print("\n" + "=" * 60)
print("ANALYSIS WITH SUB-CALLS")
print("=" * 60)
print(result.response)

---

## 8. Token Usage and Cost Tracking

The `RLMChatCompletion` object includes detailed usage statistics:

In [None]:
# Run a query and examine usage
result = rlm.completion(
    "The secret codes are: ALPHA=42, BETA=17, GAMMA=99. Remember these.",
    root_prompt="What is the sum of ALPHA, BETA, and GAMMA?"
)

print("Result:", result.response)
print("\nUsage Summary:")
print(f"  Execution time: {result.execution_time:.2f}s")
print(f"  Model: {result.root_model}")

usage = result.usage_summary
if usage:
    usage_dict = usage.to_dict()
    print(f"  Token usage: {usage_dict}")

---

## 9. Visualizing Trajectories

The `rlm` repository includes a **visualizer tool** for exploring RLM trajectories interactively.

To use it:

```bash
cd rlm/visualizer  # or rlm_repo/visualizer
npm install
npm run dev        # Opens on localhost:3001
```

You can then load the `.jsonl` log files and explore:
- Each iteration's LLM response
- Code executed and output
- Sub-LLM calls and their results
- Token usage per step

---

## 10. Conclusion

### What We Learned

1. **RLMs solve the context length problem** by treating context as a variable, not input
2. **Programmatic exploration** lets LLMs write code to analyze massive texts
3. **Recursive sub-calls** enable divide-and-conquer strategies
4. **Verbose logging** provides transparency into the reasoning process

### When to Use RLMs

- Documents too large for any context window (books, codebases, logs)
- Tasks requiring systematic analysis (counting, searching, comparing)
- Queries needing evidence from multiple parts of a document
- When RAG might miss important context

### Next Steps

- Try the **Docker environment** for isolated code execution
- Explore **Modal** or **Prime** for cloud-based sandboxes
- Check out the **visualizer** for trajectory analysis
- Read the [full paper](https://arxiv.org/abs/2512.24601) for technical details

### Resources

- [GitHub Repository](https://github.com/alexzhang13/rlm)
- [Documentation](https://alexzhang13.github.io/rlm/)
- [arXiv Paper](https://arxiv.org/abs/2512.24601)
- [Blogpost](https://alexzhang13.github.io/blog/2025/rlm/)