Skip to content

Phase 2: RLM runtime primitives (llm_query, rlm_query, js, FINAL) #42

@Hmbown

Description

@Hmbown

Parent tracking issue: #40
Depends on: #41

Goal

Implement the execution runtime for repl statements so that llm_query, rlm_query, js, FINAL, etc. become first-class engine operations that bypass the JSON function-calling schema.

Primitives to implement

Statement Behavior Model used
llm_query name = expr One-shot completion, same depth, no tool access Session model
rlm_query name = expr Spawns a child RLM frame (depth+1) with its own repl loop Default: deepseek-v4-flash
llm_query_batched name = a | b | c Parallel one-shot completions Session model
rlm_query_batched name = a | b | c Parallel child RLM frames Default: deepseek-v4-flash
js name = "..." Runs code in a sandboxed JS environment (Node vm or QuickJS)
FINAL(expr) Returns expr as the result of the current repl block
FINAL_VAR(name) Returns the value of name

Runtime architecture

ReplContext

pub struct ReplContext {
    pub variables: HashMap<String, String>,
    pub depth: usize,
    pub max_depth: usize,
    pub max_iterations: usize,
    pub iteration: usize,
    pub root_prompt: String,         // the original user prompt for this frame
    pub parent_client: DeepSeekClient,
    pub child_model: String,         // default "deepseek-v4-flash"
    pub child_client: DeepSeekClient, // configured with child_model
    pub usage_accumulator: Arc<Mutex<UsageAccumulator>>,
}

llm_query execution

  1. Resolve expr to a string.
  2. Call child_client.create_message(...) with the string as the user content and no tools (flat completion).
  3. Store the response text in variables[name].
  4. Accumulate tokens into usage_accumulator.

rlm_query execution

  1. Resolve expr to a string (this becomes the child's context).
  2. If depth >= max_depth, store an error string in variables[name] and continue.
  3. Otherwise, create a new ReplContext with:
    • depth = parent.depth + 1
    • root_prompt = resolved_expr
    • same clients and limits
  4. Send the prompt to the child model.
  5. If the child response contains repl blocks, recurse into repl_runtime.execute().
  6. If the child response contains FINAL(...), that value is stored in variables[name].
  7. If the child response contains neither, the raw text is stored in variables[name].

rlm_query_batched execution

  1. Resolve all prompt expressions.
  2. Spawn N rlm_query futures concurrently using tokio::join! or FuturesUnordered.
  3. Collect results into a single indexed string:
    [0] <result 0>
    [1] <result 1>
    ...
    
  4. Store the concatenated string in variables[name].
  5. Crucial: accumulate usage from all children into the shared usage_accumulator so the user sees one total cost.

js execution

For Phase 2, use a minimal sandbox:

  • Option A: Shell out to node -e "..." with a timeout (simplest, matches zigrlm's current approach).
  • Option B: Embed a JS engine (much heavier, defer to later phase).

The JS code should have access to a context global and must end with FINAL(...) or the result is the last expression.

Configuration

Add to ConfigToml / Settings:

[rlm]
enabled = true               # whether the engine checks for repl blocks
max_depth = 2
max_iterations = 20
child_model = "deepseek-v4-flash"
main_model = "deepseek-v4-pro"  # optional override

Open questions

  • Should llm_query have tool access? For phase 2, no — keep it as a flat completion primitive. Tool access is what the root engine loop provides.
  • How do we prevent infinite rlm_query recursion? Depth limit + iteration limit + a max_calls budget (shared across the tree).

Files to touch

  • crates/tui/src/core/repl_runtime.rs (new)
  • crates/tui/src/core/mod.rs
  • crates/tui/src/core/engine.rs (wire in the runtime)
  • crates/config/src/lib.rs (add [rlm] table)
  • crates/tui/src/settings.rs (RLM UI settings)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions