Phase 2: RLM runtime primitives (llm_query, rlm_query, js, FINAL)

Parent tracking issue: #40
Depends on: #41

## Goal

Implement the execution runtime for `repl` statements so that `llm_query`, `rlm_query`, `js`, `FINAL`, etc. become first-class engine operations that bypass the JSON function-calling schema.

## Primitives to implement

| Statement | Behavior | Model used |
|---|---|---|
| `llm_query name = expr` | One-shot completion, same depth, no tool access | Session model |
| `rlm_query name = expr` | Spawns a child RLM frame (depth+1) with its own `repl` loop | **Default: `deepseek-v4-flash`** |
| `llm_query_batched name = a \| b \| c` | Parallel one-shot completions | Session model |
| `rlm_query_batched name = a \| b \| c` | Parallel child RLM frames | **Default: `deepseek-v4-flash`** |
| `js name = "..."` | Runs code in a sandboxed JS environment (Node `vm` or QuickJS) | — |
| `FINAL(expr)` | Returns `expr` as the result of the current `repl` block | — |
| `FINAL_VAR(name)` | Returns the value of `name` | — |

## Runtime architecture

### `ReplContext`

```rust
pub struct ReplContext {
    pub variables: HashMap<String, String>,
    pub depth: usize,
    pub max_depth: usize,
    pub max_iterations: usize,
    pub iteration: usize,
    pub root_prompt: String,         // the original user prompt for this frame
    pub parent_client: DeepSeekClient,
    pub child_model: String,         // default "deepseek-v4-flash"
    pub child_client: DeepSeekClient, // configured with child_model
    pub usage_accumulator: Arc<Mutex<UsageAccumulator>>,
}
```

### `llm_query` execution

1. Resolve `expr` to a string.
2. Call `child_client.create_message(...)` with the string as the user content and **no tools** (flat completion).
3. Store the response text in `variables[name]`.
4. Accumulate tokens into `usage_accumulator`.

### `rlm_query` execution

1. Resolve `expr` to a string (this becomes the child's `context`).
2. If `depth >= max_depth`, store an error string in `variables[name]` and continue.
3. Otherwise, create a **new** `ReplContext` with:
   - `depth = parent.depth + 1`
   - `root_prompt = resolved_expr`
   - same clients and limits
4. Send the prompt to the child model.
5. If the child response contains `repl` blocks, recurse into `repl_runtime.execute()`.
6. If the child response contains `FINAL(...)`, that value is stored in `variables[name]`.
7. If the child response contains neither, the raw text is stored in `variables[name]`.

### `rlm_query_batched` execution

1. Resolve all prompt expressions.
2. Spawn N `rlm_query` futures concurrently using `tokio::join!` or `FuturesUnordered`.
3. Collect results into a single indexed string:
   ```
   [0] <result 0>
   [1] <result 1>
   ...
   ```
4. Store the concatenated string in `variables[name]`.
5. **Crucial**: accumulate usage from all children into the shared `usage_accumulator` so the user sees one total cost.

### `js` execution

For Phase 2, use a **minimal sandbox**:
- Option A: Shell out to `node -e "..."` with a timeout (simplest, matches zigrlm's current approach).
- Option B: Embed a JS engine (much heavier, defer to later phase).

The JS code should have access to a `context` global and must end with `FINAL(...)` or the result is the last expression.

## Configuration

Add to `ConfigToml` / `Settings`:

```toml
[rlm]
enabled = true               # whether the engine checks for repl blocks
max_depth = 2
max_iterations = 20
child_model = "deepseek-v4-flash"
main_model = "deepseek-v4-pro"  # optional override
```

## Open questions

- Should `llm_query` have tool access? For phase 2, **no** — keep it as a flat completion primitive. Tool access is what the root engine loop provides.
- How do we prevent infinite `rlm_query` recursion? Depth limit + iteration limit + a `max_calls` budget (shared across the tree).

## Files to touch

- `crates/tui/src/core/repl_runtime.rs` (new)
- `crates/tui/src/core/mod.rs`
- `crates/tui/src/core/engine.rs` (wire in the runtime)
- `crates/config/src/lib.rs` (add `[rlm]` table)
- `crates/tui/src/settings.rs` (RLM UI settings)


Statement	Behavior	Model used
`llm_query name = expr`	One-shot completion, same depth, no tool access	Session model
`rlm_query name = expr`	Spawns a child RLM frame (depth+1) with its own `repl` loop	Default: `deepseek-v4-flash`
`llm_query_batched name = a \| b \| c`	Parallel one-shot completions	Session model
`rlm_query_batched name = a \| b \| c`	Parallel child RLM frames	Default: `deepseek-v4-flash`
`js name = "..."`	Runs code in a sandboxed JS environment (Node `vm` or QuickJS)	—
`FINAL(expr)`	Returns `expr` as the result of the current `repl` block	—
`FINAL_VAR(name)`	Returns the value of `name`	—

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 2: RLM runtime primitives (llm_query, rlm_query, js, FINAL) #42

Goal

Primitives to implement

Runtime architecture

`ReplContext`

`llm_query` execution

`rlm_query` execution

`rlm_query_batched` execution

`js` execution

Configuration

Open questions

Files to touch

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Phase 2: RLM runtime primitives (llm_query, rlm_query, js, FINAL) #42

Description

Goal

Primitives to implement

Runtime architecture

ReplContext

llm_query execution

rlm_query execution

rlm_query_batched execution

js execution

Configuration

Open questions

Files to touch

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`ReplContext`

`llm_query` execution

`rlm_query` execution

`rlm_query_batched` execution

`js` execution