# RLM with Modal Sandbox (DSPy 3.1.3)

This tutorial shows how to use **`dspy.RLM`** (Recursive Language Model) with [Modal](https://modal.com) for secure, sandboxed code execution in the cloud.

**What is RLM?** RLM is an inference strategy where the LLM writes Python code to programmatically explore data, call sub-LLMs over snippets, and iteratively build up answers — instead of feeding long contexts directly into the model.

**Why Modal?** By default, `dspy.RLM` uses a local Deno/Pyodide WASM sandbox. Modal lets you run that code in an isolated cloud container with configurable resources, dependencies, and secrets.

**What we'll do:**
1. Implement a `ModalInterpreter` that satisfies DSPy's `CodeInterpreter` protocol
2. Use `modal.Sandbox` to execute code inside an ephemeral cloud container
3. Run an RLM agent that writes and executes code remotely

## Prerequisites

- **Python 3.10+**
- **Modal account**: Sign up at [modal.com](https://modal.com) and run `modal setup`
- **Modal secret**: Create a secret named `LITELLM` that contains the environment variables used by DSPy/LiteLLM:
  - `DSPY_LM_MODEL` (e.g., `openai/gemini-3-flash-preview`)
  - `DSPY_LM_API_BASE` (your LiteLLM proxy base URL)
  - `DSPY_LLM_API_KEY` (API key for the proxy/provider)
  - optional: `DSPY_LM_MAX_TOKENS`

  Example (run in a terminal):
  ```bash
  modal secret create LITELLM \
    DSPY_LM_MODEL=... \
    DSPY_LM_API_BASE=... \
    DSPY_LLM_API_KEY=... \
    DSPY_LM_MAX_TOKENS=...
  ```

- **Security note**: don’t hard-code API keys in notebooks, and don’t print them. If a key was ever pasted into a notebook/chat, rotate it.

## 1. Install Dependencies

In [1]:
!uv pip install -qU "dspy==3.1.3" modal

## 2. Imports and Configuration

We configure one LM locally for the *planner* (the model that writes Python code each iteration).

This notebook expects the following environment variables to be set **locally** (for the planner):
- `DSPY_LM_MODEL`
- `DSPY_LM_API_BASE`
- `DSPY_LLM_API_KEY`
- optional: `DSPY_LM_MAX_TOKENS`

The same variables are also injected into the Modal sandbox via the `LITELLM` secret, so any sandbox-side LM calls (via tool-bridged `llm_query`) use identical credentials without hard-coding secrets in the notebook.

**Important**: Modal secrets are only available *inside* Modal containers/sandboxes. They do **not** automatically set environment variables for your local notebook kernel.
This notebook will try to load a local `.env` from the project root (if present) to configure the planner LM.

In [2]:
import os
import sys
from pathlib import Path

import dspy

# ---- Load local .env (for the planner LM) ----
# Modal secrets are only available *inside* Modal; they do not configure your local kernel.
def _find_project_root(start: Path) -> Path:
    for p in [start, *start.parents]:
        if (p / "pyproject.toml").exists():
            return p
    return start

def _load_dotenv(path: Path) -> None:
    if not path.exists():
        return
    try:
        for raw in path.read_text().splitlines():
            line = raw.strip()
            if not line or line.startswith("#") or "=" not in line:
                continue
            k, v = line.split("=", 1)
            k, v = k.strip(), v.strip()
            if len(v) >= 2 and ((v[0] == v[-1] == '\"') or (v[0] == v[-1] == "'")):
                v = v[1:-1]
            if k and k not in os.environ:
                os.environ[k] = v
    except Exception as e:
        print(f"Warning: could not load {path}: {e}")

PROJECT_ROOT = _find_project_root(Path.cwd())
_load_dotenv(PROJECT_ROOT / ".env")

# ---- Guard against module shadowing ----
# A local `modal.py` (or even a stale compiled `__pycache__/modal.*.pyc`) in the
# notebook's working directory can shadow the third-party `modal` package.
shadow_py = Path.cwd() / "modal.py"
shadow_pyc_dir = Path.cwd() / "__pycache__"
shadow_pycs = list(shadow_pyc_dir.glob("modal.*.pyc")) if shadow_pyc_dir.exists() else []

if shadow_py.exists():
    raise RuntimeError(
        f"Found {shadow_py} which shadows the 'modal' package. "
        "Rename/delete it (e.g., modal_get_started.py) and restart the kernel."
    )

if shadow_pycs:
    removed: list[str] = []
    failed: list[str] = []
    for p in shadow_pycs:
        try:
            p.unlink()
            removed.append(str(p))
        except Exception:
            failed.append(str(p))

    if removed:
        print("Removed shadowing bytecode files:\n" + "\n".join(removed))
    if failed:
        raise RuntimeError(
            "Found shadowing bytecode files but could not remove them:\n"
            + "\n".join(failed)
            + "\nDelete them manually and restart the kernel."
        )

# If a previous import attempt loaded a bad `modal` module, clear modal-related
# modules to avoid weird partially-initialized states.
#
# Note: Modal uses a generated `modal_proto` package under the hood; when upgrading
# modal in a running kernel, stale `modal_proto` modules can cause type mismatches.
MODULE_PREFIXES_TO_PURGE = (
    "modal",
    "modal_proto",
    "grpclib",
    "fleet_rlm",
)

for name in list(sys.modules.keys()):
    if name in MODULE_PREFIXES_TO_PURGE or any(name.startswith(p + ".") for p in MODULE_PREFIXES_TO_PURGE):
        sys.modules.pop(name, None)

# Import modal after purging to ensure clean import
import modal  # noqa: E402


def configure_planner_from_env() -> bool:
    """Configure DSPy planner LM from environment variables.

    Expected (local):
      - DSPY_LM_MODEL
      - DSPY_LLM_API_KEY (or DSPY_LM_API_KEY)
      - optional: DSPY_LM_API_BASE, DSPY_LM_MAX_TOKENS

    Returns True if configured, False if required env vars are missing.
    """

    api_key = os.environ.get("DSPY_LLM_API_KEY") or os.environ.get("DSPY_LM_API_KEY")
    missing: list[str] = []
    if not os.environ.get("DSPY_LM_MODEL"):
        missing.append("DSPY_LM_MODEL")
    if not api_key:
        # DSPy expects DSPY_LLM_API_KEY, but some setups use DSPY_LM_API_KEY.
        missing.append("DSPY_LLM_API_KEY")
    if missing:
        print(
            "Planner LM not configured yet. Missing env vars: "
            + ", ".join(missing)
            + "\nSet them locally (e.g., export in your shell before starting Jupyter, or create a .env at the project root) and re-run this cell." 
        )
        return False

    planner_lm = dspy.LM(
        os.environ["DSPY_LM_MODEL"],
        api_base=os.environ.get("DSPY_LM_API_BASE"),
        api_key=api_key,
        max_tokens=int(os.environ.get("DSPY_LM_MAX_TOKENS", "16000")),
    )

    dspy.configure(lm=planner_lm)
    print(f"Planner LM configured: {planner_lm.model}")
    print("(Tip: don’t print API keys.)")
    return True


PLANNER_READY = configure_planner_from_env()

# We’ll pass `modal.Secret.from_name('LITELLM')` into the sandbox so the *remote*
# Python REPL can access the same environment variables without hard-coding them.

Planner LM configured: openai/gemini-3-flash-preview
(Tip: don’t print API keys.)


### Optional: sanity-check the Modal secret (without leaking it)

The snippet below confirms that the `LITELLM` secret is mounted in Modal by checking for the *presence* of environment variables. It deliberately does **not** print secret values.

In [3]:
# Sandboxes require an App when created from a local environment.
app = modal.App.lookup("dspy-rlm-secret-check", create_if_missing=True)

sb = modal.Sandbox.create(
    app=app,
    secrets=[modal.Secret.from_name("LITELLM")],
    timeout=60,
)
try:
    code = r"""
import json, os
keys = [
  'DSPY_LM_MODEL',
  'DSPY_LM_API_BASE',
  'DSPY_LLM_API_KEY',
  'DSPY_LM_MAX_TOKENS',
]
print(json.dumps({k: bool(os.environ.get(k)) for k in keys}))
"""
    p = sb.exec("python", "-c", code, timeout=60)
    p.wait()
    print("Secret env presence:", p.stdout.read().strip())
finally:
    sb.terminate()

Secret env presence: {"DSPY_LM_MODEL": true, "DSPY_LM_API_BASE": true, "DSPY_LLM_API_KEY": true, "DSPY_LM_MAX_TOKENS": true}


### Don’t print secrets

This is **unsafe**:
- `print(os.environ["DSPY_LLM_API_KEY"])`

Instead, verify the secret is present (and optionally its length), without revealing the value.

In [4]:
app = modal.App.lookup("dspy-rlm-secret-check", create_if_missing=True)

sb = modal.Sandbox.create(
    app=app,
    secrets=[modal.Secret.from_name("LITELLM")],
    timeout=60,
)
try:
    code = r"""
import json, os
key = os.environ.get('DSPY_LLM_API_KEY', '')
print(json.dumps({'present': bool(key), 'length': len(key)}))
"""
    p = sb.exec("python", "-c", code, timeout=60)
    p.wait()
    print("DSPY_LLM_API_KEY:", p.stdout.read().strip())
finally:
    sb.terminate()

DSPY_LLM_API_KEY: {"present": true, "length": 67}


## 3. The Modal Sandbox Driver

Modal Sandboxes are ephemeral containers. We use a **driver program** pattern (from [Modal's code interpreter example](https://modal.com/docs/examples/simple_code_interpreter)):

1. A Python driver script runs inside the sandbox, reading JSON commands from `stdin`.
2. For each command, it `exec()`s the code, captures stdout/stderr, and checks for `SUBMIT()` calls.
3. It writes the result as JSON to `stdout`.

This keeps state between iterations (variables persist in the `globals` dict) — exactly what RLM needs.

The driver implementation is in the `fleet_rlm.driver` module, which we'll import and use.

In [5]:
# Add src/ to path to import the real ModalInterpreter with volume support
import sys
src_path = PROJECT_ROOT / "src"
if str(src_path) not in sys.path:
    sys.path.insert(0, str(src_path))

# Import the real ModalInterpreter from the package
from fleet_rlm.interpreter import ModalInterpreter  # noqa: E402
from fleet_rlm.driver import sandbox_driver  # noqa: E402

# Create the sandbox Image (DSPy is NOT needed inside — llm_query() is bridged to the host)
import modal  # noqa: E402

# NOTE: We no longer create a global MODAL_APP here.
# modal.App.lookup() returns a transient handle whose _client expires between
# Jupyter cells.  Instead, ModalInterpreter defers the lookup to start() time
# via the `app_name` parameter, keeping the client fresh.
SANDBOX_IMAGE = modal.Image.debian_slim().pip_install("numpy", "pandas")

MODAL_APP_NAME = "dspy-rlm-modal"

print("✓ Set up Modal Image and App name")
print(f"  App name: {MODAL_APP_NAME} (resolved lazily at sandbox start)")
print("  Image: debian_slim + numpy, pandas")
print()
print("✓ Imported ModalInterpreter and sandbox_driver from package")
print(f"  ModalInterpreter: {ModalInterpreter.__module__}")
print(f"  sandbox_driver: {sandbox_driver.__module__}")

✓ Set up Modal Image and App name
  App name: dspy-rlm-modal (resolved lazily at sandbox start)
  Image: debian_slim + numpy, pandas

✓ Imported ModalInterpreter and sandbox_driver from package
  ModalInterpreter: fleet_rlm.interpreter
  sandbox_driver: fleet_rlm.driver


## 4. Implement the `ModalInterpreter`

This class implements DSPy's [`CodeInterpreter`](https://github.com/stanfordnlp/dspy/blob/main/dspy/primitives/code_interpreter.py) protocol. The protocol requires:

| Method | Purpose |
|---|---|
| `tools` (property) | Dict of callable tools available in the sandbox |
| `start()` | Initialize resources (idempotent) |
| `execute(code, variables)` | Run code, return stdout or `FinalOutput` |
| `shutdown()` | Release resources |

Our implementation creates a `modal.Sandbox`, launches the driver program, and communicates via stdin/stdout JSON messages.

In [6]:
# The ModalInterpreter class has been imported from the package above.
# It implements DSPy's CodeInterpreter protocol with these key features:

print("ModalInterpreter Configuration")
print("=" * 60)
print(f"Image: {SANDBOX_IMAGE}")
print(f"App name: {MODAL_APP_NAME}")
print("Features:")
print("  • Deferred App.lookup() — avoids stale _client across cells")
print("  • Modal sandbox lifecycle management (idempotent start)")
print("  • JSON protocol communication with driver")
print("  • Custom tools support")
print("  • Volume support (Volumes V2)")
print("  • upload_to_volume() — batch upload local dirs/files")
print("  • Timeout & idle_timeout configuration")
print("  • Sensitive text redaction")
print("=" * 60)

ModalInterpreter Configuration
Image: Image(<function _Image.pip_install.<locals>.build_dockerfile at 0x1132fec00>)
App name: dspy-rlm-modal
Features:
  • Deferred App.lookup() — avoids stale _client across cells
  • Modal sandbox lifecycle management (idempotent start)
  • JSON protocol communication with driver
  • Custom tools support
  • Volume support (Volumes V2)
  • upload_to_volume() — batch upload local dirs/files
  • Timeout & idle_timeout configuration
  • Sensitive text redaction


## 5. Basic RLM Demo: Code Generation

A simple example showing RLM writing Python code to solve a problem.

In [7]:
# Ensure the planner LM is configured
if not PLANNER_READY and dspy.settings.lm is None:
    raise RuntimeError("Planner LM not configured")

interpreter = ModalInterpreter(image=SANDBOX_IMAGE, app_name=MODAL_APP_NAME)

rlm = dspy.RLM(
    signature="question -> answer",
    interpreter=interpreter,
    max_iterations=15,
    max_llm_calls=30,
    verbose=True,
)

try:
    result = rlm(question="What are the first 12 Fibonacci numbers? Return as comma-separated.")
    print("\nFINAL ANSWER:", result.answer)
finally:
    interpreter.shutdown()

2026/02/07 23:36:04 INFO dspy.predict.rlm: RLM iteration 1/15
Reasoning: The question asks for the first 12 Fibonacci numbers, separated by commas. I will write a simple Python script to calculate these numbers. By convention, the Fibonacci sequence starts with 0 and 1, or sometimes 1 and 1. I will provide the sequence starting from 0, 1, 1, 2... and confirm if 12 numbers are generated.
Code:
```python
def fibonacci(n):
    fib_sequence = [0, 1]
    while len(fib_sequence) < n:
        fib_sequence.append(fib_sequence[-1] + fib_sequence[-2])
    return fib_sequence[:n]

first_12 = fibonacci(12)
result = ", ".join(map(str, first_12))
print(result)
```
2026/02/07 23:36:07 INFO dspy.predict.rlm: RLM iteration 2/15
Reasoning: The previous step successfully calculated the first 12 Fibonacci numbers starting from 0: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89. I will now submit this result as requested.
Code:
```python
SUBMIT("0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89")
```



FINAL ANSWER: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89


## 6. Core Capability: Long Document Analysis

RLM treats long documents as an external environment. The document lives in the sandbox,
code navigates and extracts relevant sections, and only snippets are sent to llm_query().

### Use Case: Extract DSPy Architecture

In [8]:
class ExtractArchitecture(dspy.Signature):
    """Extract architectural information from DSPy documentation."""
    
    docs: str = dspy.InputField(desc="Full DSPy documentation text")
    query: str = dspy.InputField(desc="What to extract")
    modules: list = dspy.OutputField(desc="List of DSPy modules")
    optimizers: list = dspy.OutputField(desc="List of optimizers")
    design_principles: str = dspy.OutputField(desc="Key design principles")


docs_path = PROJECT_ROOT / "rlm_content" / "dspy-knowledge" / "dspy-doc.txt"
with open(docs_path, "r") as f:
    dspy_docs = f.read()

print(f"✓ Loaded DSPy docs: {len(dspy_docs):,} chars from {docs_path.name}")

✓ Loaded DSPy docs: 81,397 chars from dspy-doc.txt


## 7. Parallel Processing with llm_query_batched()

Process multiple chunks in parallel for dramatic speedup.

In [9]:
class ExtractAPIEndpoints(dspy.Signature):
    """Extract API endpoints using batched analysis."""
    
    docs: str = dspy.InputField(desc="API documentation")
    api_endpoints: list = dspy.OutputField(desc="List of API endpoints")


interpreter = ModalInterpreter(image=SANDBOX_IMAGE, app_name=MODAL_APP_NAME)

rlm = dspy.RLM(
    signature=ExtractAPIEndpoints,
    interpreter=interpreter,
    max_iterations=20,
    max_llm_calls=30,
    verbose=True,
)

try:
    result = rlm(docs=dspy_docs)
    print(f"Found {len(result.api_endpoints)} endpoints")
    for ep in result.api_endpoints[:5]:
        print(f"  - {ep}")
finally:
    interpreter.shutdown()

2026/02/07 23:36:07 INFO dspy.predict.rlm: RLM iteration 1/20
Reasoning: The goal is to extract API endpoints from the `docs` variable. The preview shows a directory structure of the DSPy documentation. API endpoints in this context likely refer to class/method definitions or REST endpoints, but given this is a Python library (DSPy), it's more likely to be documentation of programmatic APIs (classes, functions, etc.).

I will start by exploring the contents of `docs` to understand the structure and how the endpoints are documented. Since the documentation is large (~81k characters), I'll print the first few thousand characters and check if there are explicit lists of endpoints or if they are scattered across markdown files.

Plan:
1. Print the beginning of `docs` to understand the format.
2. Search for common API keywords (e.g., "GET", "POST", "class", "def", "endpoint").
3. Determine if the documentation contains REST endpoints or programmatic API references.
4. Use `llm_query` or `ll

Found 359 endpoints
  - __call__
  - acall
  - adapt_to_native_lm_feature
  - aforward
  - append


## 8. Stateful Multi-Step Reasoning

RLM maintains state across iterations. Variables persist, enabling multi-step workflows.

In [10]:
class FindErrorPatterns(dspy.Signature):
    """Find and categorize error patterns."""
    
    docs: str = dspy.InputField(desc="Documentation text")
    error_categories: dict = dspy.OutputField(desc="Error types mapped to solutions")
    total_errors_found: int = dspy.OutputField(desc="Total errors identified")


def get_errors(docs: str, verbose: bool = True) -> dict:
    """Find and categorize error patterns in documentation using RLM.
    
    Args:
        docs: The documentation text to analyze
        verbose: Whether to print RLM traces
        
    Returns:
        Dict with 'total' count and 'categories' dict
    """
    interpreter = ModalInterpreter(image=SANDBOX_IMAGE, app_name=MODAL_APP_NAME)

    rlm = dspy.RLM(
        signature=FindErrorPatterns,
        interpreter=interpreter,
        max_iterations=30,
        max_llm_calls=40,
        verbose=verbose,
    )

    try:
        result = rlm(docs=docs)
        return {
            "total": result.total_errors_found,
            "categories": result.error_categories
        }
    finally:
        interpreter.shutdown()


# Example usage
error_data = get_errors(dspy_docs)

print(f"Found {error_data['total']} error patterns")
for cat, errors in error_data['categories'].items():
    print(f"{cat}: {len(errors)} errors")

2026/02/07 23:36:15 INFO dspy.predict.rlm: RLM iteration 1/30

Plan:
1. Explore the structure of `docs`.
2. Extract sections or lines containing error-related keywords.
3. Use `llm_query` to categorize these errors if the data is dense or complex.
Code:
```python
# Initial exploration of the docs content
print(f"Total length of docs: {len(docs)}")
print("First 2000 characters of docs:")
print(docs[:2000])

# Look for specific keywords related to errors
import re

found_keywords = {}
for kw in keywords:
    matches = re.findall(kw, docs, re.IGNORECASE)
    found_keywords[kw] = len(matches)

print("\nKeyword frequency:")
print(found_keywords)

# Let's try to find some snippets containing 'error' or 'exception' to see context
snippets = []
    snippets.append(match.group(0))
    if len(snippets) >= 10:
        break

print("\nSample snippets containing keywords:")
for s in snippets:
    print(f"- {s}")
```
2026/02/07 23:36:19 INFO dspy.predict.rlm: RLM iteration 2/30
Reasoning: The keywor

Found 18 error patterns
Input & Serialization Failures: 3 errors
Model Output & Parsing Failures: 5 errors
Execution & Logic Failures: 4 errors
Evaluation & Reliability Failures: 3 errors
Configuration & Meta-Programming Failures: 3 errors


## 9. Inspecting the Trajectory

Every RLM result includes a trajectory - complete history of reasoning, code, and outputs.

In [11]:
interpreter = ModalInterpreter(image=SANDBOX_IMAGE, app_name=MODAL_APP_NAME)

rlm = dspy.RLM(
    signature="text -> summary",
    interpreter=interpreter,
    max_iterations=10,
    max_llm_calls=10,
    verbose=False,
)

try:
    text_sample = dspy_docs[:3000]
    result = rlm(text=text_sample)
    
    print(f"Trajectory ({len(result.trajectory)} steps):\n")
    for i, step in enumerate(result.trajectory):
        print(f"\nStep {i+1}:")
        print(f"  Reasoning: {step.get('reasoning', 'N/A')[:100]}...")
        print(f"  Code: {step.get('code', '')[:60]}...")
finally:
    interpreter.shutdown()

Trajectory (3 steps):


Step 1:
  Reasoning: I need to summarize the content provided in the `text` variable, which appears to be a directory str...
  Code: print(f"Total length: {len(text)}")
print("--- Full Text Sta...

Step 2:
  Reasoning: The text provides a detailed directory structure of the `stanfordnlp-dspy` repository, specifically ...
  Code: prompt = f"""Summarize the purpose and structure of the proj...

Step 3:
  Reasoning: The previous step successfully generated a comprehensive and well-structured summary of the `stanfor...
  Code: # The summary was already generated in the previous step.
# ...


## 10. Persistent Storage with Modal Volumes V2

Modal Volumes V2 provide persistent storage across sandbox sessions with better performance and consistency. This is useful for:
- Caching large documents to avoid re-uploading
- Storing intermediate results
- Sharing data between multiple RLM runs
- **Hosting knowledge files** (RLM paper, DSPy docs) directly in the sandbox filesystem

### Volume Setup
First, create a V2 volume (one-time setup):
```bash
modal volume create --version=2 rlm-volume-dspy
```

### Volumes V2 Key Features
- Uses `modal.Volume.from_name(name, create_if_missing=True, version=2)`
- Mounts via `volumes={"/data": volume}` in `Sandbox.create()`
- **No file count limit** (V1 had 500K inode limit)
- **Concurrent writes** from hundreds of containers
- Commit via `sync /data` from inside the sandbox (V2 only)
- Background commits persist data automatically on container shutdown

### Uploading Files from Local Machine
```python
vol = modal.Volume.from_name("my-volume", create_if_missing=True, version=2)
with vol.batch_upload() as batch:
    batch.put_directory("/local/dir", "/remote/dir")
    batch.put_file("local.txt", "/remote/file.txt")
```

Or via `ModalInterpreter.upload_to_volume()`:
```python
interpreter.upload_to_volume(
    local_dirs={"rlm_content/dspy-knowledge": "/dspy-knowledge"},
)
```

In [12]:
# Upload rlm-knowledge/ and dspy-knowledge/ to the Modal Volume
# This makes files available at /data/rlm-knowledge/ and /data/dspy-knowledge/
# inside every sandbox that mounts the volume.

VOLUME_NAME = "rlm-volume-dspy"

rlm_knowledge_dir = str(PROJECT_ROOT / "rlm_content" / "rlm-knowledge")
dspy_knowledge_dir = str(PROJECT_ROOT / "rlm_content" / "dspy-knowledge")

# Verify local directories exist
for d in [rlm_knowledge_dir, dspy_knowledge_dir]:
    assert os.path.isdir(d), f"Directory not found: {d}"

# Use ModalInterpreter's upload helper
upload_interpreter = ModalInterpreter(
    image=SANDBOX_IMAGE,
    app_name=MODAL_APP_NAME,
    volume_name=VOLUME_NAME,
)

upload_interpreter.upload_to_volume(
    local_dirs={
        rlm_knowledge_dir: "/rlm-knowledge",
        dspy_knowledge_dir: "/dspy-knowledge",
    },
)

print(f"✓ Uploaded knowledge directories to volume '{VOLUME_NAME}':")
print(f"  {rlm_knowledge_dir} → /data/rlm-knowledge/")
print(f"  {dspy_knowledge_dir} → /data/dspy-knowledge/")
print()

# List uploaded files for confirmation
vol = modal.Volume.from_name(VOLUME_NAME, create_if_missing=True, version=2)
print("Volume contents:")
for entry in vol.listdir("/"):
    print(f"  /{entry.path}")
    if entry.type.name == "DIRECTORY":
        for sub in vol.listdir(f"/{entry.path}"):
            size_str = f" ({sub.stat().size:,} bytes)" if hasattr(sub, 'stat') and sub.stat() else ""
            print(f"    /{entry.path}/{sub.path}{size_str}")

Volume: '/rlm-knowledge' exists, skipping upload.
Volume: '/dspy-knowledge' exists, skipping upload.
✓ Uploaded knowledge directories to volume 'rlm-volume-dspy':
  /Volumes/Samsung-SSD-T7/Workspaces/Github/qredence/agent-framework/v0.5/_WORLD/_RLM/fleet-rlm-dspy/rlm_content/rlm-knowledge → /data/rlm-knowledge/
  /Volumes/Samsung-SSD-T7/Workspaces/Github/qredence/agent-framework/v0.5/_WORLD/_RLM/fleet-rlm-dspy/rlm_content/dspy-knowledge → /data/dspy-knowledge/

Volume contents:
  /dspy-knowledge
    /dspy-knowledge/dspy-knowledge/dspy-RLM.md
    /dspy-knowledge/dspy-knowledge/dspy-doc.txt
    /dspy-knowledge/dspy-knowledge/mermaid.md
  /rlm-knowledge
    /rlm-knowledge/rlm-knowledge/rlm-pape.pdf
    /rlm-knowledge/rlm-knowledge/rlm-paper.md
  /notebook-demo.json
  /dspy-doc-cached.txt


In [13]:
# Demonstrate reading uploaded knowledge files from the volume inside a sandbox
volume_interpreter = ModalInterpreter(
    image=SANDBOX_IMAGE,
    app_name=MODAL_APP_NAME,
    volume_name=VOLUME_NAME,
    timeout=600,
)

print("✓ Created ModalInterpreter with Volumes V2 support")
print(f"  Volume: {VOLUME_NAME} → mounted at /data/")
print()

# Start sandbox and verify knowledge files are accessible
volume_interpreter.start()

code = """
import pathlib, json

data_dir = pathlib.Path("/data")
print(f"Volume mounted: {data_dir.exists()}")

# List top-level contents
top_level = sorted([p.name for p in data_dir.iterdir()])
print(f"Top-level dirs: {top_level}")

# List dspy-knowledge contents
dspy_dir = data_dir / "dspy-knowledge"
if dspy_dir.exists():
    files = sorted(p.name for p in dspy_dir.iterdir())
    print(f"dspy-knowledge/: {files}")
    for f in dspy_dir.iterdir():
        size = f.stat().st_size
        print(f"  {f.name}: {size:,} bytes")

# List rlm-knowledge contents
rlm_dir = data_dir / "rlm-knowledge"
if rlm_dir.exists():
    files = sorted(p.name for p in rlm_dir.iterdir())
    print(f"rlm-knowledge/: {files}")
    for f in rlm_dir.iterdir():
        size = f.stat().st_size
        print(f"  {f.name}: {size:,} bytes")

# Read a snippet of dspy-RLM.md to confirm content
rlm_doc = data_dir / "dspy-knowledge" / "dspy-RLM.md"
if rlm_doc.exists():
    content = rlm_doc.read_text()
    print(f"\\ndspy-RLM.md preview ({len(content):,} chars):")
    print(content[:300])
    print("...")

SUBMIT(result="volume knowledge files verified")
"""

result = volume_interpreter.execute(code)
print(f"\nSandbox result: {result}")
volume_interpreter.shutdown()
print("\n✓ Knowledge files accessible inside sandbox via volume mount.")

✓ Created ModalInterpreter with Volumes V2 support
  Volume: rlm-volume-dspy → mounted at /data/


Sandbox result: FinalOutput({'result': 'volume knowledge files verified'})

✓ Knowledge files accessible inside sandbox via volume mount.


### RLM on Volume-Hosted Docs

The key benefit of hosting knowledge files on a Volume: the RLM sandbox can read
documents directly from `/data/` instead of serializing them through Python variables.
This is the canonical RLM pattern — treat context as an **external environment**, not input.

In [14]:
# RLM task that reads docs directly from the Volume filesystem
# instead of passing them as a Python variable.
# The sandbox reads /data/dspy-knowledge/dspy-RLM.md and extracts info.

class ExtractRLMCapabilities(dspy.Signature):
    """Extract RLM capabilities from documentation stored on a Volume.
    
    Strategy:
    1. Read /data/dspy-knowledge/dspy-RLM.md from the volume
    2. Search for constructor parameters, built-in tools, and usage patterns
    3. Use llm_query() on relevant sections for semantic extraction
    """
    
    query: str = dspy.InputField(desc="What to extract from the RLM docs")
    parameters: list = dspy.OutputField(desc="List of RLM constructor parameters")
    builtin_tools: list = dspy.OutputField(desc="List of built-in sandbox tools")
    key_concepts: str = dspy.OutputField(desc="Summary of key RLM concepts")


vol_rlm_interpreter = ModalInterpreter(
    image=SANDBOX_IMAGE,
    app_name=MODAL_APP_NAME,
    volume_name=VOLUME_NAME,
    timeout=600,
)

rlm = dspy.RLM(
    signature=ExtractRLMCapabilities,
    interpreter=vol_rlm_interpreter,
    max_iterations=15,
    max_llm_calls=20,
    verbose=True,
)

try:
    # NOTE: We do NOT pass docs as input — the RLM reads them from /data/ in the sandbox!
    result = rlm(
        query="Read /data/dspy-knowledge/dspy-RLM.md and extract all constructor parameters, built-in tools, and key concepts about how RLM works."
    )
    print(f"\nParameters: {result.parameters}")
    print(f"Built-in tools: {result.builtin_tools}")
    print(f"Key concepts: {result.key_concepts[:300]}...")
finally:
    vol_rlm_interpreter.shutdown()

2026/02/07 23:36:34 INFO dspy.predict.rlm: RLM iteration 1/15
Reasoning: I will start by reading the contents of the file `/data/dspy-knowledge/dspy-RLM.md` to understand the documentation structure and identify the relevant sections for constructor parameters, built-in tools, and key concepts.
Code:
```python
import os

file_path = "/data/dspy-knowledge/dspy-RLM.md"

if os.path.exists(file_path):
    with open(file_path, 'r') as f:
        content = f.read()
    print(f"File length: {len(content)}")
    print("First 1000 characters:")
    print(content[:1000])
else:
    print(f"File {file_path} not found.")
```
2026/02/07 23:36:36 INFO dspy.predict.rlm: RLM iteration 2/15
Reasoning: The file is roughly 8KB, which is small enough to read entirely into memory. I'll read the full content and then use `llm_query` to extract the specific lists of constructor parameters, built-in tools, and key concepts as requested. I'll structure the prompt to ensure I get structured data for the lists.
C


Parameters: ["signature: Defines inputs and outputs (e.g., 'context, query -> answer')", 'max_iterations: Maximum REPL interaction loops before fallback extraction', 'max_llm_calls: Maximum llm_query/llm_query_batched calls per execution', 'max_output_chars: Maximum characters to include from REPL output', 'verbose: Log detailed execution info', 'tools: Additional tool functions callable from interpreter code', 'sub_lm: LM for sub-queries (defaults to dspy.settings.lm)', 'interpreter: Custom interpreter (defaults to PythonInterpreter via Deno/Pyodide WASM)']
Built-in tools: ['llm_query(prompt): Query a sub-LLM for semantic analysis', 'llm_query_batched(prompts): Query multiple prompts concurrently', 'print(): Required to see results in the REPL output', 'SUBMIT(...): Submit final output and end execution', 'Standard library: re, json, collections, math, etc.']
Key concepts: ["Context Separation: Solves 'context rot' by separating the variable space (stored in the REPL) from the token 

In [15]:
# CLI Usage Examples
print("="*60)
print("CLI Commands - Using the Package")
print("="*60)

print("""
The fleet-rlm package provides a Typer CLI with the following commands:

1. Basic Code Generation:
   $ uv run fleet-rlm run-basic \\
       --question "What are the first 10 Fibonacci numbers?" \\
       --volume-name rlm-volume-dspy

2. Architecture Extraction:
   $ uv run fleet-rlm run-architecture \\
       --docs-path rlm_content/dspy-knowledge/dspy-doc.txt \\
       --query "Extract all modules and optimizers" \\
       --volume-name rlm-volume-dspy

3. API Endpoint Extraction:
   $ uv run fleet-rlm run-api-endpoints \\
       --docs-path rlm_content/dspy-knowledge/dspy-doc.txt \\
       --volume-name rlm-volume-dspy

4. Error Pattern Analysis:
   $ uv run fleet-rlm run-error-patterns \\
       --docs-path rlm_content/dspy-knowledge/dspy-doc.txt \\
       --volume-name rlm-volume-dspy

5. Execution Trajectory:
   $ uv run fleet-rlm run-trajectory \\
       --docs-path rlm_content/dspy-knowledge/dspy-doc.txt

6. Custom Tool Demo:
   $ uv run fleet-rlm run-custom-tool \\
       --text "Extract emails from [email protected] and [email protected]"

7. Check Modal Secrets:
   $ uv run fleet-rlm check-secret
   $ uv run fleet-rlm check-secret-key --key DSPY_LLM_API_KEY

Volume Setup (One-time):
   $ uv run modal volume create rlm-volume-dspy
""")

print("=" * 60)
print("All Examples Support --volume-name Parameter")
print("=" * 60)
print("Use --volume-name to enable persistent storage for:")
print("  • Document caching")
print("  • Intermediate results")
print("  • Data sharing between runs")
print()
print("Data is persisted at /data/ inside the Modal sandbox")
print("and survives sandbox shutdown using Modal Volumes V2")

CLI Commands - Using the Package

The fleet-rlm package provides a Typer CLI with the following commands:

1. Basic Code Generation:
   $ uv run fleet-rlm run-basic \
       --question "What are the first 10 Fibonacci numbers?" \
       --volume-name rlm-volume-dspy

2. Architecture Extraction:
   $ uv run fleet-rlm run-architecture \
       --docs-path rlm_content/dspy-knowledge/dspy-doc.txt \
       --query "Extract all modules and optimizers" \
       --volume-name rlm-volume-dspy

3. API Endpoint Extraction:
   $ uv run fleet-rlm run-api-endpoints \
       --docs-path rlm_content/dspy-knowledge/dspy-doc.txt \
       --volume-name rlm-volume-dspy

4. Error Pattern Analysis:
   $ uv run fleet-rlm run-error-patterns \
       --docs-path rlm_content/dspy-knowledge/dspy-doc.txt \
       --volume-name rlm-volume-dspy

5. Execution Trajectory:
   $ uv run fleet-rlm run-trajectory \
       --docs-path rlm_content/dspy-knowledge/dspy-doc.txt

6. Custom Tool Demo:
   $ uv run fleet-rlm run-c

## 11. Custom Tools

RLM supports custom Python tools that the generated code can call. Below we define a `regex_extract()` tool and use it in an RLM signature.

In [16]:
# Define a custom tool function
def regex_extract(text: str, pattern: str, flags: int = 0) -> list:
    """Extract all matches of regex pattern from text.
    
    Args:
        text: Source text to search
        pattern: Regex pattern string
        flags: Regex flags (e.g., re.IGNORECASE=2)
    
    Returns:
        List of match groups or full matches
    """
    import re
    compiled = re.compile(pattern, flags)
    matches = compiled.findall(text)
    return matches


class ExtractWithCustomTool(dspy.Signature):
    """Extract specific patterns using custom regex tool.
    
    Strategy:
    1. Use regex_extract() to find all markdown headers
    2. Use regex_extract() to find all code blocks
    3. Summarize structure
    """
    
    docs: str = dspy.InputField(desc="Documentation to analyze")
    headers: list = dspy.OutputField(desc="All markdown headers found")
    code_blocks: list = dspy.OutputField(desc="All code block languages found")
    structure_summary: str = dspy.OutputField(desc="Summary of document structure")


interpreter = ModalInterpreter(image=SANDBOX_IMAGE, app_name=MODAL_APP_NAME)

rlm = dspy.RLM(
    signature=ExtractWithCustomTool,
    interpreter=interpreter,
    tools=[regex_extract],  # Pass custom tool here
    max_iterations=15,
    max_llm_calls=20,
    verbose=True,
)

try:
    result = rlm(docs=dspy_docs[:10000])  # First 10KB for demo
    print(f"\nFound {len(result.headers)} headers")
    print(f"Found {len(result.code_blocks)} code blocks")
    print(f"\nStructure: {result.structure_summary[:200]}...")
finally:
    interpreter.shutdown()

2026/02/07 23:36:36 INFO dspy.predict.rlm: RLM iteration 1/15
Reasoning: I will start by exploring the content of `docs` to understand its structure and how headers and code blocks are formatted. This will help me craft accurate regex patterns. I'll print the first 2000 characters and check for common markdown patterns.
Code:
```python
print(f"Total length: {len(docs)}")
print("First 2000 characters:")
print(docs[:2000])
```
2026/02/07 23:36:38 INFO dspy.predict.rlm: RLM iteration 2/15
Reasoning: The initial exploration shows a directory structure but not the actual markdown content yet. I need to see more of the document to find the markdown headers (e.g., `# Header`, `## Header`) and code blocks (e.g., ` ```python ... ``` `). I'll print a segment from the middle and the end of the string to see where the actual content starts.
Code:
```python
# The first part was just a directory tree. Let's look further in.
print("Middle segment (4000-6000):")
print(docs[4000:6000])
print("\nEnd seg


Found 3 headers
Found 2 code blocks

Structure: The document consists of a directory tree followed by specific file contents for the stanfordnlp-dspy project. It includes API documentation for dspy.Adapter and dspy.ChatAdapter using custom ::: bloc...


## 12. RLM vs Direct LLM Comparison

| Aspect | Direct LLM | RLM |
|--------|-----------|-----|
| **Context size** | ~128K tokens | Virtually unlimited |
| **Attention** | Dilutes over long context | Focused (code selects snippets) |
| **Cost** | High (all tokens in context) | Lower (targeted sub-LLM calls) |
| **Accuracy** | Lower on long docs | Higher (targeted analysis) |
| **Verifiability** | Black box | Transparent (full trajectory) |
| **Tool use** | Limited | Full Python + custom tools |
| **Iterative refinement** | Manual (chat) | Automated (code loops) |
| **Structured output** | Prompt-dependent | Type-enforced via Signature |

### When to use RLM:
- Documents > 50KB
- Need structured extraction (lists, dicts, nested data)
- Multi-step analysis (filter → extract → validate)
- Need programmatic validation or computation
- Repetitive analysis across many documents

### When NOT to use RLM:
- Simple Q&A on short text (< 1K tokens)
- Creative writing or brainstorming
- Tasks that don't benefit from code execution

## 13. RLM Best Practices

### Signature Design

1. **Describe the strategy** in the docstring:
   ```python
   class MySignature(dspy.Signature):
       """Extract X from Y.
       
       Strategy:
       1. Search for headers containing 'X'
       2. Use llm_query() on matching sections
       3. Aggregate results
       """
   ```

2. **Use typed output fields** — `list`, `dict`, `int` guide the code:
   ```python
   items: list = dspy.OutputField(desc="List of found items")
   count: int = dspy.OutputField(desc="Total count")
   ```

### Tuning Parameters

| Parameter | Typical Range | Notes |
|-----------|---------------|-------|
| `max_iterations` | 10-50 | Complex docs need more iterations |
| `max_llm_calls` | 20-100 | Primary cost control |
| `max_output_chars` | 10K-100K | Prevents output flooding |

### Debugging Workflow

1. **Start with `verbose=True`**: See real-time reasoning and code
2. **Inspect `result.trajectory`**: Full execution history
3. **Test on subsets**: Use `docs[:5000]` before full runs
4. **Check sandbox logs**: Modal shows actual execution
5. **Validate tools**: Test custom tools independently

## 14. Summary

This notebook demonstrated the full capabilities of **dspy.RLM**:

1. **Basic code generation** - LLM writes and executes Python
2. **Long document analysis** - Process 80KB+ documents efficiently
3. **Parallel processing** - `llm_query_batched()` for speed
4. **Stateful reasoning** - Multi-step workflows with persistent variables
5. **Trajectory inspection** - Full transparency into reasoning
6. **Persistent storage** - Modal Volumes V2 for caching and persistence
7. **Custom tools** - Extend sandbox capabilities with user-defined functions

### Key Takeaways

- RLM treats long context as an **environment**, not input
- Code navigates data; `llm_query()` understands semantics
- The **trajectory** provides unprecedented observability
- **Modal Volumes V2** enable persistent storage across sandbox sessions
- All capabilities are available via both notebook and CLI (`fleet-rlm`)