# RLM with Modal Sandbox (DSPy 3.1.3)

This tutorial shows how to use **`dspy.RLM`** (Recursive Language Model) with [Modal](https://modal.com) for secure, sandboxed code execution in the cloud.

**What is RLM?** RLM is an inference strategy where the LLM writes Python code to programmatically explore data, call sub-LLMs over snippets, and iteratively build up answers — instead of feeding long contexts directly into the model.

**Why Modal?** By default, `dspy.RLM` uses a local Deno/Pyodide WASM sandbox. Modal lets you run that code in an isolated cloud container with configurable resources, dependencies, and secrets.

**What we'll do:**
1. Implement a `ModalInterpreter` that satisfies DSPy's `CodeInterpreter` protocol
2. Use `modal.Sandbox` to execute code inside an ephemeral cloud container
3. Run an RLM agent that writes and executes code remotely

## Prerequisites

- **Python 3.10+**
- **Modal account**: Sign up at [modal.com](https://modal.com) and run `modal setup`
- **Modal secret**: Create a secret named `LITELLM` that contains the environment variables used by DSPy/LiteLLM:
  - `DSPY_LM_MODEL` (e.g., `openai/gemini-3-flash-preview`)
  - `DSPY_LM_API_BASE` (your LiteLLM proxy base URL)
  - `DSPY_LLM_API_KEY` (API key for the proxy/provider)
  - optional: `DSPY_LM_MAX_TOKENS`

  Example (run in a terminal):
  ```bash
  modal secret create LITELLM \
    DSPY_LM_MODEL=... \
    DSPY_LM_API_BASE=... \
    DSPY_LLM_API_KEY=... \
    DSPY_LM_MAX_TOKENS=...
  ```

- **Security note**: don’t hard-code API keys in notebooks, and don’t print them. If a key was ever pasted into a notebook/chat, rotate it.

## 1. Install Dependencies

In [3]:
!uv pip install -qU "dspy==3.1.3" modal

## 2. Imports and Configuration

We configure one LM locally for the *planner* (the model that writes Python code each iteration).

This notebook expects the following environment variables to be set **locally** (for the planner):
- `DSPY_LM_MODEL`
- `DSPY_LM_API_BASE`
- `DSPY_LLM_API_KEY`
- optional: `DSPY_LM_MAX_TOKENS`

The same variables are also injected into the Modal sandbox via the `LITELLM` secret, so any sandbox-side LM calls (via tool-bridged `llm_query`) use identical credentials without hard-coding secrets in the notebook.

**Important**: Modal secrets are only available *inside* Modal containers/sandboxes. They do **not** automatically set environment variables for your local notebook kernel.
This notebook will try to load a local `.env` from the project root (if present) to configure the planner LM.

In [4]:
import os
import sys
from pathlib import Path

import dspy


# ---- Locate project root and add src/ to path ----
def _find_project_root(start: Path) -> Path:
    for p in [start, *start.parents]:
        if (p / "pyproject.toml").exists():
            return p
    return start


PROJECT_ROOT = _find_project_root(Path.cwd())
src_path = PROJECT_ROOT / "src"
if str(src_path) not in sys.path:
    sys.path.insert(0, str(src_path))

# ---- Configure the planner LM using the package ----
from fleet_rlm import configure_planner_from_env  # noqa: E402

PLANNER_READY = configure_planner_from_env(env_file=PROJECT_ROOT / ".env")

if PLANNER_READY:
    print(f"✓ Planner LM configured: {dspy.settings.lm.model}")
    print("  (Tip: never print API keys.)")
else:
    print(
        "⚠ Planner LM not configured. Set DSPY_LM_MODEL and DSPY_LLM_API_KEY\n"
        "  in your .env file or environment, then re-run this cell."
    )

# We'll pass `modal.Secret.from_name('LITELLM')` into the sandbox so the *remote*
# Python REPL can access the same environment variables without hard-coding them.

✓ Planner LM configured: openai/gemini-3-flash-preview
  (Tip: never print API keys.)


### Optional: sanity-check the Modal secret (without leaking it)

The snippet below confirms that the `LITELLM` secret is mounted in Modal by checking for the *presence* of environment variables. It deliberately does **not** print secret values.

In [5]:
import modal

# Sandboxes require an App when created from a local environment.
app = modal.App.lookup("dspy-rlm-secret-check", create_if_missing=True)

sb = modal.Sandbox.create(
    app=app,
    secrets=[modal.Secret.from_name("LITELLM")],
    timeout=60,
)
try:
    code = r"""
import json, os
keys = [
  'DSPY_LM_MODEL',
  'DSPY_LM_API_BASE',
  'DSPY_LLM_API_KEY',
  'DSPY_LM_MAX_TOKENS',
]
print(json.dumps({k: bool(os.environ.get(k)) for k in keys}))
"""
    p = sb.exec("python", "-c", code, timeout=60)
    p.wait()
    print("Secret env presence:", p.stdout.read().strip())
finally:
    sb.terminate()

Secret env presence: {"DSPY_LM_MODEL": true, "DSPY_LM_API_BASE": true, "DSPY_LLM_API_KEY": true, "DSPY_LM_MAX_TOKENS": true}


### Don’t print secrets

This is **unsafe**:
- `print(os.environ["DSPY_LLM_API_KEY"])`

Instead, verify the secret is present (and optionally its length), without revealing the value.

In [6]:
app = modal.App.lookup("dspy-rlm-secret-check", create_if_missing=True)

sb = modal.Sandbox.create(
    app=app,
    secrets=[modal.Secret.from_name("LITELLM")],
    timeout=60,
)
try:
    code = r"""
import json, os
key = os.environ.get('DSPY_LLM_API_KEY', '')
print(json.dumps({'present': bool(key), 'length': len(key)}))
"""
    p = sb.exec("python", "-c", code, timeout=60)
    p.wait()
    print("DSPY_LLM_API_KEY:", p.stdout.read().strip())
finally:
    sb.terminate()

DSPY_LLM_API_KEY: {"present": true, "length": 67}


## 3. The Modal Sandbox Driver

Modal Sandboxes are ephemeral containers. We use a **driver program** pattern (from [Modal's code interpreter example](https://modal.com/docs/examples/simple_code_interpreter)):

1. A Python driver script runs inside the sandbox, reading JSON commands from `stdin`.
2. For each command, it `exec()`s the code, captures stdout/stderr, and checks for `SUBMIT()` calls.
3. It writes the result as JSON to `stdout`.

This keeps state between iterations (variables persist in the `globals` dict) — exactly what RLM needs.

The driver implementation is in the `fleet_rlm.driver` module, which we'll import and use.

In [7]:
# Import the full fleet_rlm package
from fleet_rlm.interpreter import ModalInterpreter
from fleet_rlm.driver import sandbox_driver
from fleet_rlm.signatures import (
    ExtractArchitecture,
    ExtractAPIEndpoints,
    FindErrorPatterns,
    ExtractWithCustomTool,
    AnalyzeLongDocument,
    SummarizeLongDocument,
)
from fleet_rlm.tools import regex_extract
from fleet_rlm.chunking import (
    chunk_by_size,
    chunk_by_headers,
    chunk_by_timestamps,
    chunk_by_json_keys,
)

import modal

# Create the sandbox Image (DSPy is NOT needed inside — llm_query() is bridged to the host)
# NOTE: We no longer create a global MODAL_APP here.
# modal.App.lookup() returns a transient handle whose _client expires between
# Jupyter cells.  Instead, ModalInterpreter defers the lookup to start() time
# via the `app_name` parameter, keeping the client fresh.
SANDBOX_IMAGE = modal.Image.debian_slim().pip_install("numpy", "pandas")
MODAL_APP_NAME = "dspy-rlm-modal"

print("✓ Imports from fleet_rlm package:")
print(f"  ModalInterpreter   – {ModalInterpreter.__module__}")
print(f"  sandbox_driver     – {sandbox_driver.__module__}")
print("  Signatures         – ExtractArchitecture, ExtractAPIEndpoints,")
print("                       FindErrorPatterns, ExtractWithCustomTool,")
print("                       AnalyzeLongDocument, SummarizeLongDocument")
print("  Tools              – regex_extract")
print("  Chunking           – chunk_by_size, chunk_by_headers,")
print("                       chunk_by_timestamps, chunk_by_json_keys")
print()
print(f"✓ Modal config: App={MODAL_APP_NAME}, Image=debian_slim + numpy, pandas")

✓ Imports from fleet_rlm package:
  ModalInterpreter   – fleet_rlm.interpreter
  sandbox_driver     – fleet_rlm.driver
  Signatures         – ExtractArchitecture, ExtractAPIEndpoints,
                       FindErrorPatterns, ExtractWithCustomTool,
                       AnalyzeLongDocument, SummarizeLongDocument
  Tools              – regex_extract
  Chunking           – chunk_by_size, chunk_by_headers,
                       chunk_by_timestamps, chunk_by_json_keys

✓ Modal config: App=dspy-rlm-modal, Image=debian_slim + numpy, pandas


## 4. The `ModalInterpreter`

This class implements DSPy's [`CodeInterpreter`](https://github.com/stanfordnlp/dspy/blob/main/dspy/primitives/code_interpreter.py) protocol. The protocol requires:

| Method | Purpose |
|---|---|
| `tools` (property) | Dict of callable tools available in the sandbox |
| `start()` | Initialize resources (idempotent) |
| `execute(code, variables)` | Run code, return stdout or `FinalOutput` |
| `shutdown()` | Release resources |

Our implementation creates a `modal.Sandbox`, launches the driver program, and communicates via stdin/stdout JSON messages.

### Key Features
- **Context manager** — use `with ModalInterpreter(...) as interp:` for automatic cleanup
- **Sandbox-side helpers** — `peek()`, `grep()`, `chunk_by_size()`, `chunk_by_headers()`, buffers, volume I/O are injected into the sandbox automatically
- **Volume support** — Volumes V2 mount at `/data/` for persistent storage
- **Sensitive data redaction** — API keys are automatically masked in logs
- **Timeout control** — configurable `timeout` and `idle_timeout` for sandboxes

In [8]:
# The ModalInterpreter class has been imported from the package above.
# It implements DSPy's CodeInterpreter protocol with these key features:

print("ModalInterpreter Configuration")
print("=" * 60)
print(f"Image: {SANDBOX_IMAGE}")
print(f"App name: {MODAL_APP_NAME}")
print()
print("Features:")
print("  • Context manager support (with ... as interp:)")
print("  • Deferred App.lookup() — avoids stale _client across cells")
print("  • Modal sandbox lifecycle management (idempotent start)")
print("  • JSON protocol communication with driver")
print("  • Sandbox-side helpers (peek, grep, chunk, buffers, volume)")
print("  • Custom tools support")
print("  • Volume support (Volumes V2) mounted at /data/")
print("  • upload_to_volume() — batch upload local dirs/files")
print("  • Timeout & idle_timeout configuration")
print("  • Sensitive text redaction (API keys masked)")
print("=" * 60)

ModalInterpreter Configuration
Image: Image(<function _Image.pip_install.<locals>.build_dockerfile at 0x1156034c0>)
App name: dspy-rlm-modal

Features:
  • Context manager support (with ... as interp:)
  • Deferred App.lookup() — avoids stale _client across cells
  • Modal sandbox lifecycle management (idempotent start)
  • JSON protocol communication with driver
  • Sandbox-side helpers (peek, grep, chunk, buffers, volume)
  • Custom tools support
  • Volume support (Volumes V2) mounted at /data/
  • upload_to_volume() — batch upload local dirs/files
  • Timeout & idle_timeout configuration
  • Sensitive text redaction (API keys masked)


### Sandbox-Side Helpers

The driver automatically injects these helper functions into the sandbox's `globals`, so the LLM-generated code can call them directly:

| Helper | Purpose |
|---|---|
| `peek(text, start, length)` | Inspect a slice of a large string |
| `grep(text, pattern, context)` | Case-insensitive line search with optional context |
| `chunk_by_size(text, size, overlap)` | Fixed-size chunking inside the sandbox |
| `chunk_by_headers(text, pattern)` | Header-based section splitting |
| `add_buffer(name, value)` | Accumulate data across iterations |
| `get_buffer(name)` | Retrieve buffered data |
| `clear_buffer(name)` | Reset a named buffer |
| `save_to_volume(path, content)` | Persist data to `/data/` volume |
| `load_from_volume(path)` | Load data from `/data/` volume |

These helpers follow the RLM principle: the LLM writes code that **navigates** data programmatically, sending only relevant snippets to `llm_query()` for semantic understanding.

In [9]:
# Demonstrate sandbox-side helpers by running code that uses them directly.
# This cell shows how the LLM-generated code can use these helpers inside the sandbox.

with ModalInterpreter(image=SANDBOX_IMAGE, app_name=MODAL_APP_NAME) as interp:
    interp.start()

    # 1. peek() — inspect a slice of the document
    code_peek = '''
doc = """Line 1: Introduction to DSPy
Line 2: DSPy is a framework for programming language models
Line 3: It provides modules like ChainOfThought, Predict
Line 4: Optimizers include BootstrapFewShot, MIPRO
Line 5: RLM treats context as an external environment"""

result = peek(doc, 0, 80)
print("peek(doc, 0, 80):", repr(result))
'''
    out = interp.execute(code_peek)
    print("1. peek():", out)

    # 2. grep() — search for patterns
    code_grep = """
matches = grep(doc, "dspy", context=0)
print("grep(doc, 'dspy'):", matches)
"""
    out = interp.execute(code_grep)
    print("2. grep():", out)

    # 3. chunk_by_size() — split text into fixed-size chunks
    code_chunk = """
chunks = chunk_by_size(doc, 80, 10)
print(f"chunk_by_size: {len(chunks)} chunks")
for i, c in enumerate(chunks):
    print(f"  Chunk {i}: {len(c)} chars")
"""
    out = interp.execute(code_chunk)
    print("3. chunk_by_size():", out)

    # 4. Buffers — accumulate data across iterations
    code_buffers = """
add_buffer("findings", "Found module: ChainOfThought")
add_buffer("findings", "Found optimizer: BootstrapFewShot")
result = get_buffer("findings")
print("Buffered findings:", result)
clear_buffer("findings")
print("After clear:", get_buffer("findings"))
"""
    out = interp.execute(code_buffers)
    print("4. Buffers:", out)

print("\n✓ All sandbox-side helpers working correctly")

1. peek(): peek(doc, 0, 80): 'Line 1: Introduction to DSPy\nLine 2: DSPy is a framework for programming languag'

2. grep(): grep(doc, 'dspy'): ['Line 1: Introduction to DSPy', 'Line 2: DSPy is a framework for programming language models']

3. chunk_by_size(): chunk_by_size: 4 chunks
  Chunk 0: 80 chars
  Chunk 1: 80 chars
  Chunk 2: 80 chars
  Chunk 3: 40 chars

4. Buffers: Buffered findings: ['Found module: ChainOfThought', 'Found optimizer: BootstrapFewShot']
After clear: []


✓ All sandbox-side helpers working correctly


### Host-Side Chunking Strategies

The `fleet_rlm.chunking` module provides four pure-function chunking strategies that work both on the host and inside the sandbox (stdlib-only, no external dependencies):

| Strategy | Function | Best for |
|---|---|---|
| Fixed-size | `chunk_by_size(text, size, overlap)` | Generic text, logs |
| Header-based | `chunk_by_headers(text, pattern)` | Markdown, structured docs |
| Timestamp-based | `chunk_by_timestamps(text, pattern)` | Log files with timestamps |
| JSON keys | `chunk_by_json_keys(text)` | JSON objects |

In [10]:
# Demonstrate host-side chunking strategies from fleet_rlm.chunking

sample_md = """# Introduction
DSPy is a framework for programming language models.

## Modules
ChainOfThought, Predict, and ReAct are core modules.

## Optimizers
BootstrapFewShot, MIPRO, and BayesianSignatureOptimizer help tune prompts.

## Advanced
RLM treats long context as an external environment.
"""

# 1. chunk_by_size — fixed-size chunks with overlap (returns list[str])
print("1. chunk_by_size(text, 80, overlap=20):")
chunks = chunk_by_size(sample_md, 80, overlap=20)
for i, c in enumerate(chunks):
    print(f"   Chunk {i} ({len(c)} chars): {c[:50]}...")

# 2. chunk_by_headers — split by markdown headers (returns list[dict])
print("\n2. chunk_by_headers(text):")
sections = chunk_by_headers(sample_md)
for i, s in enumerate(sections):
    print(f"   Section {i}: {s['header'] or '(preamble)'} ({len(s['content'])} chars)")

# 3. chunk_by_timestamps — split log text by timestamp (returns list[dict])
sample_log = """2024-01-15 10:00:00 Starting process
Processing item 1
Processing item 2
2024-01-15 10:01:00 Checkpoint reached
Processing item 3
2024-01-15 10:02:00 Process complete
"""
print("\n3. chunk_by_timestamps(log_text):")
log_chunks = chunk_by_timestamps(sample_log)
for i, c in enumerate(log_chunks):
    print(f"   Entry {i}: ts={c['timestamp']!r} ({len(c['content'])} chars)")

# 4. chunk_by_json_keys — split JSON (returns list[dict])
sample_json = '{"name": "DSPy", "version": "3.1", "modules": ["Predict", "CoT"]}'
print("\n4. chunk_by_json_keys(json_text):")
json_chunks = chunk_by_json_keys(sample_json)
for c in json_chunks:
    print(f"   {c['key']} ({c['value_type']}): {c['content']}")

print("\n✓ All chunking strategies demonstrated")

1. chunk_by_size(text, 80, overlap=20):
   Chunk 0 (80 chars): # Introduction
DSPy is a framework for programming...
   Chunk 1 (80 chars): models.

## Modules
ChainOfThought, Predict, and R...
   Chunk 2 (80 chars): ore modules.

## Optimizers
BootstrapFewShot, MIPR...
   Chunk 3 (80 chars): esianSignatureOptimizer help tune prompts.

## Adv...
   Chunk 4 (48 chars): treats long context as an external environment.
...

2. chunk_by_headers(text):
   Section 0: # Introduction (52 chars)
   Section 1: ## Modules (52 chars)
   Section 2: ## Optimizers (74 chars)
   Section 3: ## Advanced (51 chars)

3. chunk_by_timestamps(log_text):
   Entry 0: ts='2024-01-15' (72 chars)
   Entry 1: ts='2024-01-15' (56 chars)
   Entry 2: ts='2024-01-15' (36 chars)

4. chunk_by_json_keys(json_text):
   name (str): "DSPy"
   version (str): "3.1"
   modules (list): [
  "Predict",
  "CoT"
]

✓ All chunking strategies demonstrated


## 5. Basic RLM Demo: Code Generation

A simple example showing RLM writing Python code to solve a problem.

> **Note:** We use `with ModalInterpreter(...) as interp:` context manager for automatic cleanup.
> This ensures the sandbox is shut down even if an error occurs.

In [11]:
# Ensure the planner LM is configured
if not PLANNER_READY and dspy.settings.lm is None:
    raise RuntimeError("Planner LM not configured")

# Context manager pattern — interpreter.shutdown() is called automatically
with ModalInterpreter(image=SANDBOX_IMAGE, app_name=MODAL_APP_NAME) as interpreter:
    rlm = dspy.RLM(
        signature="question -> answer",
        interpreter=interpreter,
        max_iterations=15,
        max_llm_calls=30,
        verbose=True,
    )
    result = rlm(
        question="What are the first 12 Fibonacci numbers? Return as comma-separated."
    )
    print("\nFINAL ANSWER:", result.answer)

2026/02/08 12:38:17 INFO dspy.predict.rlm: RLM iteration 1/15
Reasoning: The question asks for the first 12 Fibonacci numbers, separated by commas. I will write a simple Python script to calculate these numbers. By convention, the Fibonacci sequence starts with 0 and 1, or sometimes 1 and 1. I will provide the sequence starting from 0, 1, 1, 2... and confirm if 12 numbers are generated.
Code:
```python
def fibonacci(n):
    fib_sequence = [0, 1]
    while len(fib_sequence) < n:
        fib_sequence.append(fib_sequence[-1] + fib_sequence[-2])
    return fib_sequence[:n]

first_12 = fibonacci(12)
result = ", ".join(map(str, first_12))
print(result)
```
2026/02/08 12:38:17 INFO dspy.predict.rlm: RLM iteration 2/15
Reasoning: The previous step successfully calculated the first 12 Fibonacci numbers starting from 0: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89. I will now submit this result as requested.
Code:
```python
SUBMIT("0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89")
```



FINAL ANSWER: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89


## 6. Core Capability: Long Document Analysis

RLM treats long documents as an external environment. The document lives in the sandbox,
code navigates and extracts relevant sections, and only snippets are sent to llm_query().

### Use Case: Extract DSPy Architecture

In [12]:
# ExtractArchitecture was imported from fleet_rlm.signatures above.
# Let's inspect its fields:
print("ExtractArchitecture signature fields:")
for name, field in ExtractArchitecture.model_fields.items():
    prefix = (
        "→"
        if field.json_schema_extra
        and field.json_schema_extra.get("__dspy_field_type") == "output"
        else "←"
    )
    print(
        f"  {prefix} {name}: {field.annotation.__name__ if hasattr(field.annotation, '__name__') else field.annotation}"
    )

docs_path = PROJECT_ROOT / "rlm_content" / "dspy-knowledge" / "dspy-doc.txt"
with open(docs_path, "r") as f:
    dspy_docs = f.read()

print(f"\n✓ Loaded DSPy docs: {len(dspy_docs):,} chars from {docs_path.name}")

ExtractArchitecture signature fields:
  ← docs: str
  ← query: str
  → modules: list
  → optimizers: list
  → design_principles: str

✓ Loaded DSPy docs: 81,397 chars from dspy-doc.txt


## 7. Parallel Processing with llm_query_batched()

Process multiple chunks in parallel for dramatic speedup.

In [13]:
# ExtractAPIEndpoints was imported from fleet_rlm.signatures above.
# Uses context manager for automatic cleanup.

with ModalInterpreter(image=SANDBOX_IMAGE, app_name=MODAL_APP_NAME) as interpreter:
    rlm = dspy.RLM(
        signature=ExtractAPIEndpoints,
        interpreter=interpreter,
        max_iterations=20,
        max_llm_calls=30,
        verbose=True,
    )
    result = rlm(docs=dspy_docs)
    print(f"Found {len(result.api_endpoints)} endpoints")
    for ep in result.api_endpoints[:5]:
        print(f"  - {ep}")

2026/02/08 12:38:19 INFO dspy.predict.rlm: RLM iteration 1/20
Reasoning: I will start by exploring the content of `docs` to understand how the API documentation is structured. Since the total length is about 81k characters, I can read a significant portion of it, but I'll begin with the first few thousand characters to get a sense of the formatting and identify where the actual endpoint/method definitions are located. I'll also check if there are distinct sections for classes, methods, or REST endpoints.
Code:
```python
print(f"Total length: {len(docs)}")
print("First 2000 characters:")
print(docs[:2000])
```
2026/02/08 12:38:20 INFO dspy.predict.rlm: RLM iteration 2/20
Reasoning: The `docs` variable appears to be a directory listing followed by the content of various Markdown files. Since the goal is to extract API endpoints (classes, methods, and their parameters/usage), I need to split the document into logical sections (likely per file) and then use the LLM to process these section

Found 60 endpoints
  - {'name': 'dspy.Adapter', 'description': 'The base class for DSPy adapters, responsible for formatting signatures, demos, and interactions into prompts and parsing the model outputs back into structured data.', 'parameters': [], 'usage': 'import dspy\n# Creating a custom adapter by inheriting from dspy.Adapter\nclass MyAdapter(dspy.Adapter):\n    def format(self, signature, demos, inputs):\n        # Custom formatting logic\n        pass'}
  - {'name': 'dspy.ChatAdapter', 'description': 'An adapter designed for Chat-based Language Models. It formats the signature and data into a structured conversation history (system, user, and assistant messages).', 'parameters': [], 'usage': 'import dspy\n# Using ChatAdapter with a model\nchat_adapter = dspy.ChatAdapter()\n# Usually passed to dspy.settings.configure(adapter=chat_adapter)'}
  - {'name': 'dspy.JSONAdapter', 'description': 'An adapter that formats the input and output requirements into JSON structures, often used 

## 8. Stateful Multi-Step Reasoning

RLM maintains state across iterations. Variables persist, enabling multi-step workflows.

In [14]:
# FindErrorPatterns was imported from fleet_rlm.signatures above.


def get_errors(docs: str, verbose: bool = True) -> dict:
    """Find and categorize error patterns in documentation using RLM.

    Args:
        docs: The documentation text to analyze
        verbose: Whether to print RLM traces

    Returns:
        Dict with 'total' count and 'categories' dict
    """
    with ModalInterpreter(image=SANDBOX_IMAGE, app_name=MODAL_APP_NAME) as interpreter:
        rlm = dspy.RLM(
            signature=FindErrorPatterns,
            interpreter=interpreter,
            max_iterations=30,
            max_llm_calls=40,
            verbose=verbose,
        )
        result = rlm(docs=docs)
        return {
            "total": result.total_errors_found,
            "categories": result.error_categories,
        }


# Example usage
error_data = get_errors(dspy_docs)

print(f"Found {error_data['total']} error patterns")
for cat, errors in error_data["categories"].items():
    print(f"{cat}: {len(errors)} errors")

2026/02/08 12:38:23 INFO dspy.predict.rlm: RLM iteration 1/30
Reasoning: I need to find and categorize error patterns from the documentation provided in `docs`. The `docs` variable seems to contain a directory structure followed by actual documentation content. 

First, I will explore the content of `docs` to identify sections related to troubleshooting, common errors, or guides that might contain error information. I'll print the first and last few thousand characters and look for keywords like "error", "exception", "troubleshoot", "fix", or "issue".

Plan:
1.  Read the first 5000 and last 5000 characters of `docs` to understand the structure.
2.  Search for keywords related to errors.
3.  Identify specific files or sections that contain error patterns.
4.  Use `llm_query` to extract structured data from those sections.
Code:
```python
print(f"Total length of docs: {len(docs)}")
print("--- Start of docs ---")
print(docs[:2000])
print("\n--- End of docs ---")
print(docs[-2000:])

# Che

Found 5 error patterns
Adapter Formatting & Parsing Errors: 2 errors
Optimizer Compilation Failure: 2 errors
State Persistence & Configuration Mismatch: 2 errors
Program Flow & Execution Errors: 2 errors
Evaluation & Metric Failures: 2 errors


## 9. Inspecting the Trajectory

Every RLM result includes a trajectory - complete history of reasoning, code, and outputs.

In [15]:
with ModalInterpreter(image=SANDBOX_IMAGE, app_name=MODAL_APP_NAME) as interpreter:
    rlm = dspy.RLM(
        signature="text -> summary",
        interpreter=interpreter,
        max_iterations=10,
        max_llm_calls=10,
        verbose=False,
    )
    text_sample = dspy_docs[:3000]
    result = rlm(text=text_sample)

    print(f"Trajectory ({len(result.trajectory)} steps):\n")
    for i, step in enumerate(result.trajectory):
        print(f"\nStep {i + 1}:")
        print(f"  Reasoning: {step.get('reasoning', 'N/A')[:100]}...")
        print(f"  Code: {step.get('code', '')[:60]}...")

Trajectory (3 steps):


Step 1:
  Reasoning: I need to summarize the content provided in the `text` variable, which appears to be a directory str...
  Code: print(f"Total length: {len(text)}")
print("--- Full Text Sta...

Step 2:
  Reasoning: The text provides a detailed directory structure of the `stanfordnlp-dspy` repository, specifically ...
  Code: prompt = f"""Summarize the purpose and structure of the proj...

Step 3:
  Reasoning: The previous step successfully generated a comprehensive and well-structured summary of the `stanfor...
  Code: # The summary was already generated in the previous step.
# ...


## 10. Long-Context Analysis with Dedicated Signatures

The package provides two specialized signatures for long-document workflows:

| Signature | Purpose |
|---|---|
| `AnalyzeLongDocument` | Navigate → query → synthesize findings from a long document |
| `SummarizeLongDocument` | Chunk-based summarization with controllable focus |

Both signatures are designed so the **LLM-generated sandbox code** uses helpers like `peek()`, `grep()`, `chunk_by_headers()`, and `llm_query()` to explore the document programmatically — only sending relevant snippets to sub-LLMs for semantic understanding.

In [16]:
# 9a. Long-Context Analysis — AnalyzeLongDocument

analyze_sig = AnalyzeLongDocument
rlm_analyze = dspy.RLM(
    signature=analyze_sig,
    interpreter=interp,
    max_iterations=20,
    verbose=True,
)

result = rlm_analyze(
    document=dspy_docs[:30_000],
    query="What are the main design decisions and core abstractions?",
)
print(f"\nFindings ({len(result.findings)} items):")
for i, f in enumerate(result.findings[:5]):
    print(f"  {i + 1}. {f}")
print(f"\nAnswer: {result.answer[:300]}...")
print(f"Sections examined: {result.sections_examined}")

2026/02/08 12:38:27 INFO dspy.predict.rlm: RLM iteration 1/20
Reasoning: I will start by exploring the document's structure. The preview shows a directory structure, suggesting the document might be a collection of files or a repository overview. I'll print the first 2000 characters to understand the format and then look for sections likely to contain information about "main design decisions" and "core abstractions" (e.g., introduction, architecture, or core modules).
Code:
```python
print(f"Document length: {len(document)}")
print("--- Document Start ---")
print(document[:2000])
```
2026/02/08 12:38:29 INFO dspy.predict.rlm: RLM iteration 2/20
Reasoning: The document appears to be a documentation dump for the `dspy` library, starting with a directory structure of its API. To understand the "main design decisions and core abstractions," I need to find the high-level introductory or conceptual sections. Based on the file tree, `docs/api/index.md`, `docs/api/modules/Module.md`, `docs/api


Findings (9 items):
  1. {'concept': 'Declarative Programming', 'type': 'Design Decision', 'description': 'Defining the intent (what) rather than the prompt string (how).'}
  2. {'concept': 'Separation of Logic and Implementation', 'type': 'Design Decision', 'description': "Separating the program's control flow from its prompts and model weights."}
  3. {'concept': 'PyTorch-Inspired Paradigm', 'type': 'Design Decision', 'description': 'Structuring LLM programs as modular, hierarchical, and stateful components similar to neural networks.'}
  4. {'concept': 'Systematic Optimization', 'type': 'Design Decision', 'description': "Replacing manual prompt engineering with data-driven 'compilation' via Optimizers."}
  5. {'concept': 'Signatures', 'type': 'Core Abstraction', 'description': "Declarative specifications of a task's input/output behavior (e.g., 'question -> answer')."}

Answer: 
The DSPy framework is built on a fundamental shift from manual 'prompt engineering' to systematic 'langu

In [17]:
# 9b. SummarizeLongDocument — chunk-based summarization with controllable focus

with ModalInterpreter(image=SANDBOX_IMAGE, app_name=MODAL_APP_NAME) as interp:
    rlm = dspy.RLM(
        signature=SummarizeLongDocument,
        interpreter=interp,
        max_iterations=20,
        max_llm_calls=30,
        verbose=True,
    )
    result = rlm(
        document=dspy_docs[:30_000],
        focus="DSPy optimizers and how they work",
    )
    print(f"\nKey points ({len(result.key_points)} items):")
    for i, kp in enumerate(result.key_points[:5]):
        print(f"  {i + 1}. {kp}")
    print(f"\nSummary: {result.summary[:300]}...")
    print(f"Coverage: ~{result.coverage_pct}%")

2026/02/08 12:38:33 INFO dspy.predict.rlm: RLM iteration 1/20
Reasoning: I need to summarize a long document (30,000 characters) with a focus on "DSPy optimizers and how they work". 
First, I will inspect the document structure and content to determine how to chunk it effectively. Since the document seems to be a directory structure followed by text, I'll print the first 2000 and last 2000 characters to get a sense of the layout.

Plan:
1. Examine document structure.
2. Split the document into manageable chunks (e.g., 5000-8000 characters).
3. Query the LLM for each chunk to extract information related to the focus.
4. Consolidate the findings into a summary and key points.
Code:
```python
print(f"Document length: {len(document)}")
print(f"Focus: {focus}")
print("-" * 20)
print("Document Start Preview:")
print(document[:2000])
print("-" * 20)
print("Document End Preview:")
print(document[-2000:])
```
2026/02/08 12:38:33 INFO dspy.predict.rlm: RLM iteration 2/20
Reasoning: The document 


Key points (6 items):
  1. Optimizers automate the process of tuning prompts and module parameters in DSPy programs.
  2. The 'compile' method is the universal interface used to transform a standard module into an optimized one.
  3. Optimizers identify tunable parts of a program by accessing a module's internal 'parameters' and 'predictors'.
  4. BootstrapFewShot works by generating new few-shot training examples through a teacher-student execution flow.
  5. BootstrapFewShotWithRandomSearch enhances basic bootstrapping by searching across multiple configurations to identify the highest-performing version.

Summary: DSPy optimizers (formerly known as teleprompters) are specialized algorithms designed to automate the refinement of DSPy programs. They work by tuning the parameters of a module—such as prompts and demonstration examples—to improve performance against a specific metric. 

The primary mechanism for o...
Coverage: ~40%


## 11. Persistent Storage with Modal Volumes V2

Modal Volumes V2 provide persistent storage across sandbox sessions with better performance and consistency. This is useful for:
- Caching large documents to avoid re-uploading
- Storing intermediate results
- Sharing data between multiple RLM runs
- **Hosting knowledge files** (RLM paper, DSPy docs) directly in the sandbox filesystem

### Volume Setup
First, create a V2 volume (one-time setup):
```bash
modal volume create --version=2 rlm-volume-dspy
```

### Volumes V2 Key Features
- Uses `modal.Volume.from_name(name, create_if_missing=True, version=2)`
- Mounts via `volumes={"/data": volume}` in `Sandbox.create()`
- **No file count limit** (V1 had 500K inode limit)
- **Concurrent writes** from hundreds of containers
- Commit via `sync /data` from inside the sandbox (V2 only)
- Background commits persist data automatically on container shutdown

### Uploading Files from Local Machine
```python
vol = modal.Volume.from_name("my-volume", create_if_missing=True, version=2)
with vol.batch_upload() as batch:
    batch.put_directory("/local/dir", "/remote/dir")
    batch.put_file("local.txt", "/remote/file.txt")
```

Or via `ModalInterpreter.upload_to_volume()`:
```python
interpreter.upload_to_volume(
    local_dirs={"rlm_content/dspy-knowledge": "/dspy-knowledge"},
)
```

In [18]:
# Upload rlm-knowledge/ and dspy-knowledge/ to the Modal Volume
# This makes files available at /data/rlm-knowledge/ and /data/dspy-knowledge/
# inside every sandbox that mounts the volume.

VOLUME_NAME = "rlm-volume-dspy"

rlm_knowledge_dir = str(PROJECT_ROOT / "rlm_content" / "rlm-knowledge")
dspy_knowledge_dir = str(PROJECT_ROOT / "rlm_content" / "dspy-knowledge")

# Verify local directories exist
for d in [rlm_knowledge_dir, dspy_knowledge_dir]:
    assert os.path.isdir(d), f"Directory not found: {d}"

# Use ModalInterpreter's upload helper
upload_interpreter = ModalInterpreter(
    image=SANDBOX_IMAGE,
    app_name=MODAL_APP_NAME,
    volume_name=VOLUME_NAME,
)

upload_interpreter.upload_to_volume(
    local_dirs={
        rlm_knowledge_dir: "/rlm-knowledge",
        dspy_knowledge_dir: "/dspy-knowledge",
    },
)

print(f"✓ Uploaded knowledge directories to volume '{VOLUME_NAME}':")
print(f"  {rlm_knowledge_dir} → /data/rlm-knowledge/")
print(f"  {dspy_knowledge_dir} → /data/dspy-knowledge/")
print()

# List uploaded files for confirmation
vol = modal.Volume.from_name(VOLUME_NAME, create_if_missing=True, version=2)
print("Volume contents:")
for entry in vol.listdir("/"):
    print(f"  /{entry.path}")
    if entry.type.name == "DIRECTORY":
        for sub in vol.listdir(f"/{entry.path}"):
            size_str = (
                f" ({sub.stat().size:,} bytes)"
                if hasattr(sub, "stat") and sub.stat()
                else ""
            )
            print(f"    /{entry.path}/{sub.path}{size_str}")

Volume: '/rlm-knowledge' exists, skipping upload.
Volume: '/dspy-knowledge' exists, skipping upload.
✓ Uploaded knowledge directories to volume 'rlm-volume-dspy':
  /Volumes/Samsung-SSD-T7/Workspaces/Github/qredence/agent-framework/v0.5/_WORLD/_RLM/fleet-rlm-dspy/rlm_content/rlm-knowledge → /data/rlm-knowledge/
  /Volumes/Samsung-SSD-T7/Workspaces/Github/qredence/agent-framework/v0.5/_WORLD/_RLM/fleet-rlm-dspy/rlm_content/dspy-knowledge → /data/dspy-knowledge/

Volume contents:
  /dspy-knowledge
    /dspy-knowledge/dspy-knowledge/dspy-RLM.md
    /dspy-knowledge/dspy-knowledge/dspy-doc.txt
    /dspy-knowledge/dspy-knowledge/mermaid.md
  /rlm-knowledge
    /rlm-knowledge/rlm-knowledge/rlm-pape.pdf
    /rlm-knowledge/rlm-knowledge/rlm-paper.md
  /notebook-demo.json
  /dspy-doc-cached.txt


In [19]:
# Demonstrate reading uploaded knowledge files from the volume inside a sandbox
volume_interpreter = ModalInterpreter(
    image=SANDBOX_IMAGE,
    app_name=MODAL_APP_NAME,
    volume_name=VOLUME_NAME,
    timeout=600,
)

print("✓ Created ModalInterpreter with Volumes V2 support")
print(f"  Volume: {VOLUME_NAME} → mounted at /data/")
print()

# Start sandbox and verify knowledge files are accessible
volume_interpreter.start()

code = """
import pathlib, json

data_dir = pathlib.Path("/data")
print(f"Volume mounted: {data_dir.exists()}")

# List top-level contents
top_level = sorted([p.name for p in data_dir.iterdir()])
print(f"Top-level dirs: {top_level}")

# List dspy-knowledge contents
dspy_dir = data_dir / "dspy-knowledge"
if dspy_dir.exists():
    files = sorted(p.name for p in dspy_dir.iterdir())
    print(f"dspy-knowledge/: {files}")
    for f in dspy_dir.iterdir():
        size = f.stat().st_size
        print(f"  {f.name}: {size:,} bytes")

# List rlm-knowledge contents
rlm_dir = data_dir / "rlm-knowledge"
if rlm_dir.exists():
    files = sorted(p.name for p in rlm_dir.iterdir())
    print(f"rlm-knowledge/: {files}")
    for f in rlm_dir.iterdir():
        size = f.stat().st_size
        print(f"  {f.name}: {size:,} bytes")

# Read a snippet of dspy-RLM.md to confirm content
rlm_doc = data_dir / "dspy-knowledge" / "dspy-RLM.md"
if rlm_doc.exists():
    content = rlm_doc.read_text()
    print(f"\\ndspy-RLM.md preview ({len(content):,} chars):")
    print(content[:300])
    print("...")

SUBMIT(result="volume knowledge files verified")
"""

result = volume_interpreter.execute(code)
print(f"\nSandbox result: {result}")
volume_interpreter.shutdown()
print("\n✓ Knowledge files accessible inside sandbox via volume mount.")

✓ Created ModalInterpreter with Volumes V2 support
  Volume: rlm-volume-dspy → mounted at /data/


Sandbox result: FinalOutput({'result': 'volume knowledge files verified'})

✓ Knowledge files accessible inside sandbox via volume mount.


### RLM on Volume-Hosted Docs

The key benefit of hosting knowledge files on a Volume: the RLM sandbox can read
documents directly from `/data/` instead of serializing them through Python variables.
This is the canonical RLM pattern — treat context as an **external environment**, not input.

In [20]:
# RLM task that reads docs directly from the Volume filesystem
# instead of passing them as a Python variable.
# The sandbox reads /data/dspy-knowledge/dspy-RLM.md and extracts info.


class ExtractRLMCapabilities(dspy.Signature):
    """Extract RLM capabilities from documentation stored on a Volume.

    Strategy:
    1. Read /data/dspy-knowledge/dspy-RLM.md from the volume
    2. Search for constructor parameters, built-in tools, and usage patterns
    3. Use llm_query() on relevant sections for semantic extraction
    """

    query: str = dspy.InputField(desc="What to extract from the RLM docs")
    parameters: list = dspy.OutputField(desc="List of RLM constructor parameters")
    builtin_tools: list = dspy.OutputField(desc="List of built-in sandbox tools")
    key_concepts: str = dspy.OutputField(desc="Summary of key RLM concepts")


with ModalInterpreter(
    image=SANDBOX_IMAGE,
    app_name=MODAL_APP_NAME,
    volume_name=VOLUME_NAME,
    timeout=600,
) as vol_interp:
    rlm = dspy.RLM(
        signature=ExtractRLMCapabilities,
        interpreter=vol_interp,
        max_iterations=15,
        max_llm_calls=20,
        verbose=True,
    )
    # NOTE: We do NOT pass docs as input — the RLM reads them from /data/ in the sandbox!
    result = rlm(
        query="Read /data/dspy-knowledge/dspy-RLM.md and extract all constructor parameters, built-in tools, and key concepts about how RLM works."
    )
    print(f"\nParameters: {result.parameters}")
    print(f"Built-in tools: {result.builtin_tools}")
    print(f"Key concepts: {result.key_concepts[:300]}...")

2026/02/08 12:38:39 INFO dspy.predict.rlm: RLM iteration 1/15
Reasoning: I will start by reading the contents of the file `/data/dspy-knowledge/dspy-RLM.md` to understand the documentation structure and identify the relevant sections for constructor parameters, built-in tools, and key concepts.
Code:
```python
import os

file_path = "/data/dspy-knowledge/dspy-RLM.md"

if os.path.exists(file_path):
    with open(file_path, 'r') as f:
        content = f.read()
    print(f"File length: {len(content)}")
    print("First 1000 characters:")
    print(content[:1000])
else:
    print(f"File {file_path} not found.")
```
2026/02/08 12:38:39 INFO dspy.predict.rlm: RLM iteration 2/15
Reasoning: The file is roughly 8KB, which is small enough to read entirely into memory. I'll read the full content and then use `llm_query` to extract the specific lists of constructor parameters, built-in tools, and key concepts as requested. I'll structure the prompt to ensure I get structured data for the lists.
C


Parameters: ["signature: Defines inputs and outputs (e.g., 'context, query -> answer')", 'max_iterations: Maximum REPL interaction loops before fallback extraction', 'max_llm_calls: Maximum llm_query/llm_query_batched calls per execution', 'max_output_chars: Maximum characters to include from REPL output', 'verbose: Log detailed execution info', 'tools: Additional tool functions callable from interpreter code', 'sub_lm: LM for sub-queries (defaults to dspy.settings.lm)', 'interpreter: Custom interpreter (defaults to PythonInterpreter via Deno/Pyodide WASM)']
Built-in tools: ['llm_query(prompt): Query a sub-LLM for semantic analysis', 'llm_query_batched(prompts): Query multiple prompts concurrently', 'print(): Required to see results in the REPL output', 'SUBMIT(...): Submit final output and end execution', 'Standard library: re, json, collections, math, etc.']
Key concepts: ["Context Separation: Solves 'context rot' by separating the variable space (stored in the REPL) from the token 

In [21]:
# CLI Usage Examples
print("=" * 60)
print("CLI Commands — fleet-rlm Package")
print("=" * 60)

print("""
The fleet-rlm package provides a Typer CLI with the following commands:

1. Basic Code Generation:
   $ uv run fleet-rlm run-basic \\
       --question "What are the first 10 Fibonacci numbers?" \\
       --volume-name rlm-volume-dspy

2. Architecture Extraction:
   $ uv run fleet-rlm run-architecture \\
       --docs-path rlm_content/dspy-knowledge/dspy-doc.txt \\
       --query "Extract all modules and optimizers" \\
       --volume-name rlm-volume-dspy

3. API Endpoint Extraction:
   $ uv run fleet-rlm run-api-endpoints \\
       --docs-path rlm_content/dspy-knowledge/dspy-doc.txt \\
       --volume-name rlm-volume-dspy

4. Error Pattern Analysis:
   $ uv run fleet-rlm run-error-patterns \\
       --docs-path rlm_content/dspy-knowledge/dspy-doc.txt \\
       --volume-name rlm-volume-dspy

5. Execution Trajectory:
   $ uv run fleet-rlm run-trajectory \\
       --docs-path rlm_content/dspy-knowledge/dspy-doc.txt

6. Custom Tool Demo:
   $ uv run fleet-rlm run-custom-tool \\
       --text "Extract emails from [email protected] and [email protected]"

7. Long-Context Analysis (NEW):
   $ uv run fleet-rlm run-long-context \\
       --docs-path rlm_content/dspy-knowledge/dspy-doc.txt \\
       --query "What are the main design decisions?" \\
       --mode analyze

8. Long-Context Summarization (NEW):
   $ uv run fleet-rlm run-long-context \\
       --docs-path rlm_content/dspy-knowledge/dspy-doc.txt \\
       --query "DSPy optimizers" \\
       --mode summarize

9. Check Modal Secrets:
   $ uv run fleet-rlm check-secret
   $ uv run fleet-rlm check-secret-key --key DSPY_LLM_API_KEY

Volume Setup (One-time):
   $ uv run modal volume create rlm-volume-dspy
""")

print("=" * 60)
print("All run-* Commands Support --volume-name")
print("=" * 60)
print("Use --volume-name to enable persistent storage for:")
print("  • Document caching")
print("  • Intermediate results")
print("  • Data sharing between runs")
print()
print("Data persists at /data/ inside the Modal sandbox")
print("using Modal Volumes V2.")

CLI Commands — fleet-rlm Package

The fleet-rlm package provides a Typer CLI with the following commands:

1. Basic Code Generation:
   $ uv run fleet-rlm run-basic \
       --question "What are the first 10 Fibonacci numbers?" \
       --volume-name rlm-volume-dspy

2. Architecture Extraction:
   $ uv run fleet-rlm run-architecture \
       --docs-path rlm_content/dspy-knowledge/dspy-doc.txt \
       --query "Extract all modules and optimizers" \
       --volume-name rlm-volume-dspy

3. API Endpoint Extraction:
   $ uv run fleet-rlm run-api-endpoints \
       --docs-path rlm_content/dspy-knowledge/dspy-doc.txt \
       --volume-name rlm-volume-dspy

4. Error Pattern Analysis:
   $ uv run fleet-rlm run-error-patterns \
       --docs-path rlm_content/dspy-knowledge/dspy-doc.txt \
       --volume-name rlm-volume-dspy

5. Execution Trajectory:
   $ uv run fleet-rlm run-trajectory \
       --docs-path rlm_content/dspy-knowledge/dspy-doc.txt

6. Custom Tool Demo:
   $ uv run fleet-rlm run-c

## 12. Custom Tools

RLM supports custom Python tools that the generated code can call. The `regex_extract()` tool and `ExtractWithCustomTool` signature are imported from the `fleet_rlm` package.

In [22]:
# regex_extract and ExtractWithCustomTool were imported from the package above.
# Let's see how they work:
print("regex_extract tool:")
print(f"  Module: {regex_extract.__module__}")
print(f"  Doc: {regex_extract.__doc__.strip().split(chr(10))[0]}")
print()

# Quick local test of the tool

test_matches = regex_extract(
    "Hello [email protected] and [email protected]", r"[\w.+-]+@[\w-]+\.[\w.]+"
)
print(f"  Local test: regex_extract('...', email_pattern) = {test_matches}")
print()

# Now use it in an RLM task
with ModalInterpreter(image=SANDBOX_IMAGE, app_name=MODAL_APP_NAME) as interpreter:
    rlm = dspy.RLM(
        signature=ExtractWithCustomTool,
        interpreter=interpreter,
        tools=[regex_extract],  # Pass custom tool here
        max_iterations=15,
        max_llm_calls=20,
        verbose=True,
    )
    result = rlm(docs=dspy_docs[:10000])  # First 10KB for demo
    print(f"\nFound {len(result.headers)} headers")
    print(f"Found {len(result.code_blocks)} code blocks")
    print(f"\nStructure: {result.structure_summary[:200]}...")

regex_extract tool:
  Module: fleet_rlm.tools
  Doc: Extract all regex matches from text using a compiled pattern.

  Local test: regex_extract('...', email_pattern) = []



2026/02/08 12:38:42 INFO dspy.predict.rlm: RLM iteration 1/15
Reasoning: I will start by exploring the content of the `docs` variable to understand its structure. I'll print the first 2000 characters and check the total length to confirm what I'm dealing with. Then I will define regex patterns to extract markdown headers (e.g., `# Header`) and code blocks (e.g., ```python ... ```).
Code:
```python
print(f"Total length: {len(docs)}")
print("--- Sample (first 1000 chars) ---")
print(docs[:1000])
print("--- Sample (last 500 chars) ---")
print(docs[-500:])
```
2026/02/08 12:38:42 INFO dspy.predict.rlm: RLM iteration 2/15
Reasoning: The document appears to be a directory structure combined with API documentation details. I need to extract headers and code blocks.
1. **Headers**: Markdown headers typically start with one or more `#` characters at the beginning of a line. I will use `regex_extract` with a multiline approach.
2. **Code Blocks**: Code blocks are delimited by triple backticks (`


Found 3 headers
Found 0 code blocks

Structure: Based on the provided text, the document is organized as a technical reference guide for the `stanfordnlp-dspy` repository, specifically focusing on its API documentation. The structure is broken down...


## 13. RLM vs Direct LLM Comparison

| Aspect | Direct LLM | RLM |
|--------|-----------|-----|
| **Context size** | ~128K tokens | Virtually unlimited |
| **Attention** | Dilutes over long context | Focused (code selects snippets) |
| **Cost** | High (all tokens in context) | Lower (targeted sub-LLM calls) |
| **Accuracy** | Lower on long docs | Higher (targeted analysis) |
| **Verifiability** | Black box | Transparent (full trajectory) |
| **Tool use** | Limited | Full Python + custom tools |
| **Iterative refinement** | Manual (chat) | Automated (code loops) |
| **Structured output** | Prompt-dependent | Type-enforced via Signature |

### When to use RLM:
- Documents > 50KB
- Need structured extraction (lists, dicts, nested data)
- Multi-step analysis (filter → extract → validate)
- Need programmatic validation or computation
- Repetitive analysis across many documents

### When NOT to use RLM:
- Simple Q&A on short text (< 1K tokens)
- Creative writing or brainstorming
- Tasks that don't benefit from code execution

## 14. RLM Best Practices

### Signature Design

1. **Describe the strategy** in the docstring:
   ```python
   class MySignature(dspy.Signature):
       """Extract X from Y.
       
       Strategy:
       1. Use peek()/grep() to locate relevant sections
       2. Use chunk_by_headers() to split the document
       3. Use llm_query() on matching sections
       4. Aggregate results
       """
   ```

2. **Use typed output fields** — `list`, `dict`, `int` guide the code:
   ```python
   items: list = dspy.OutputField(desc="List of found items")
   count: int = dspy.OutputField(desc="Total count")
   ```

3. **Import from the package** rather than redefining inline:
   ```python
   from fleet_rlm.signatures import AnalyzeLongDocument, SummarizeLongDocument
   ```

### Sandbox-Side Helpers

The driver injects these helpers automatically — the LLM code can call them:
- `peek(text, start, length)` — inspect a slice without loading the whole string
- `grep(text, pattern, context=0)` — case-insensitive line search
- `chunk_by_size(text, size, overlap)` / `chunk_by_headers(text, pattern)` — split text
- `add_buffer(name, value)` / `get_buffer(name)` / `clear_buffer(name)` — accumulate across iterations
- `save_to_volume(path, content)` / `load_from_volume(path)` — persist data to `/data/`

### Tuning Parameters

| Parameter | Typical Range | Notes |
|-----------|---------------|-------|
| `max_iterations` | 10-50 | Complex docs need more iterations |
| `max_llm_calls` | 20-100 | Primary cost control |
| `max_output_chars` | 10K-100K | Prevents output flooding |

### Context Manager Pattern

Always prefer the context manager for automatic cleanup:
```python
with ModalInterpreter(image=img, app_name=name) as interp:
    rlm = dspy.RLM(signature=MySig, interpreter=interp, ...)
    result = rlm(...)
```

### Debugging Workflow

1. **Start with `verbose=True`**: See real-time reasoning and code
2. **Inspect `result.trajectory`**: Full execution history
3. **Test on subsets**: Use `docs[:5000]` before full runs
4. **Check sandbox logs**: Modal shows actual execution
5. **Validate tools**: Test custom tools independently
6. **Use host-side chunking** (`fleet_rlm.chunking`) to pre-process before passing to RLM

## 15. Summary

This notebook demonstrated the full capabilities of **dspy.RLM** with the `fleet-rlm` package:

1. **Basic code generation** — LLM writes and executes Python
2. **Long document analysis** — Process 80KB+ documents efficiently
3. **Parallel processing** — `llm_query_batched()` for speed
4. **Stateful reasoning** — Multi-step workflows with persistent variables
5. **Trajectory inspection** — Full transparency into reasoning
6. **Sandbox-side helpers** — `peek()`, `grep()`, `chunk_by_*()`, buffers, volume I/O
7. **Host-side chunking** — `chunk_by_size`, `chunk_by_headers`, `chunk_by_timestamps`, `chunk_by_json_keys`
8. **Long-context signatures** — `AnalyzeLongDocument`, `SummarizeLongDocument`
9. **Persistent storage** — Modal Volumes V2 for caching and persistence
10. **Custom tools** — Extend sandbox capabilities with user-defined functions
11. **Context manager** — `with ModalInterpreter(...) as interp:` for safe cleanup

### Key Takeaways

- RLM treats long context as an **environment**, not input
- Code navigates data; `llm_query()` understands semantics
- **Sandbox helpers** (`peek`, `grep`, `chunk_*`, buffers) enable the LLM to explore data programmatically
- The **trajectory** provides unprecedented observability
- **Modal Volumes V2** enable persistent storage across sandbox sessions
- **Import signatures, tools, chunking** from the `fleet_rlm` package — avoid inline redefinition
- All capabilities are available via both notebook and CLI (`fleet-rlm`)