Parent tracking issue: #40
Depends on: #41
Goal
Implement the execution runtime for repl statements so that llm_query, rlm_query, js, FINAL, etc. become first-class engine operations that bypass the JSON function-calling schema.
Primitives to implement
| Statement |
Behavior |
Model used |
llm_query name = expr |
One-shot completion, same depth, no tool access |
Session model |
rlm_query name = expr |
Spawns a child RLM frame (depth+1) with its own repl loop |
Default: deepseek-v4-flash |
llm_query_batched name = a | b | c |
Parallel one-shot completions |
Session model |
rlm_query_batched name = a | b | c |
Parallel child RLM frames |
Default: deepseek-v4-flash |
js name = "..." |
Runs code in a sandboxed JS environment (Node vm or QuickJS) |
— |
FINAL(expr) |
Returns expr as the result of the current repl block |
— |
FINAL_VAR(name) |
Returns the value of name |
— |
Runtime architecture
ReplContext
pub struct ReplContext {
pub variables: HashMap<String, String>,
pub depth: usize,
pub max_depth: usize,
pub max_iterations: usize,
pub iteration: usize,
pub root_prompt: String, // the original user prompt for this frame
pub parent_client: DeepSeekClient,
pub child_model: String, // default "deepseek-v4-flash"
pub child_client: DeepSeekClient, // configured with child_model
pub usage_accumulator: Arc<Mutex<UsageAccumulator>>,
}
llm_query execution
- Resolve
expr to a string.
- Call
child_client.create_message(...) with the string as the user content and no tools (flat completion).
- Store the response text in
variables[name].
- Accumulate tokens into
usage_accumulator.
rlm_query execution
- Resolve
expr to a string (this becomes the child's context).
- If
depth >= max_depth, store an error string in variables[name] and continue.
- Otherwise, create a new
ReplContext with:
depth = parent.depth + 1
root_prompt = resolved_expr
- same clients and limits
- Send the prompt to the child model.
- If the child response contains
repl blocks, recurse into repl_runtime.execute().
- If the child response contains
FINAL(...), that value is stored in variables[name].
- If the child response contains neither, the raw text is stored in
variables[name].
rlm_query_batched execution
- Resolve all prompt expressions.
- Spawn N
rlm_query futures concurrently using tokio::join! or FuturesUnordered.
- Collect results into a single indexed string:
[0] <result 0>
[1] <result 1>
...
- Store the concatenated string in
variables[name].
- Crucial: accumulate usage from all children into the shared
usage_accumulator so the user sees one total cost.
js execution
For Phase 2, use a minimal sandbox:
- Option A: Shell out to
node -e "..." with a timeout (simplest, matches zigrlm's current approach).
- Option B: Embed a JS engine (much heavier, defer to later phase).
The JS code should have access to a context global and must end with FINAL(...) or the result is the last expression.
Configuration
Add to ConfigToml / Settings:
[rlm]
enabled = true # whether the engine checks for repl blocks
max_depth = 2
max_iterations = 20
child_model = "deepseek-v4-flash"
main_model = "deepseek-v4-pro" # optional override
Open questions
- Should
llm_query have tool access? For phase 2, no — keep it as a flat completion primitive. Tool access is what the root engine loop provides.
- How do we prevent infinite
rlm_query recursion? Depth limit + iteration limit + a max_calls budget (shared across the tree).
Files to touch
crates/tui/src/core/repl_runtime.rs (new)
crates/tui/src/core/mod.rs
crates/tui/src/core/engine.rs (wire in the runtime)
crates/config/src/lib.rs (add [rlm] table)
crates/tui/src/settings.rs (RLM UI settings)
Parent tracking issue: #40
Depends on: #41
Goal
Implement the execution runtime for
replstatements so thatllm_query,rlm_query,js,FINAL, etc. become first-class engine operations that bypass the JSON function-calling schema.Primitives to implement
llm_query name = exprrlm_query name = exprreplloopdeepseek-v4-flashllm_query_batched name = a | b | crlm_query_batched name = a | b | cdeepseek-v4-flashjs name = "..."vmor QuickJS)FINAL(expr)expras the result of the currentreplblockFINAL_VAR(name)nameRuntime architecture
ReplContextllm_queryexecutionexprto a string.child_client.create_message(...)with the string as the user content and no tools (flat completion).variables[name].usage_accumulator.rlm_queryexecutionexprto a string (this becomes the child'scontext).depth >= max_depth, store an error string invariables[name]and continue.ReplContextwith:depth = parent.depth + 1root_prompt = resolved_exprreplblocks, recurse intorepl_runtime.execute().FINAL(...), that value is stored invariables[name].variables[name].rlm_query_batchedexecutionrlm_queryfutures concurrently usingtokio::join!orFuturesUnordered.variables[name].usage_accumulatorso the user sees one total cost.jsexecutionFor Phase 2, use a minimal sandbox:
node -e "..."with a timeout (simplest, matches zigrlm's current approach).The JS code should have access to a
contextglobal and must end withFINAL(...)or the result is the last expression.Configuration
Add to
ConfigToml/Settings:Open questions
llm_queryhave tool access? For phase 2, no — keep it as a flat completion primitive. Tool access is what the root engine loop provides.rlm_queryrecursion? Depth limit + iteration limit + amax_callsbudget (shared across the tree).Files to touch
crates/tui/src/core/repl_runtime.rs(new)crates/tui/src/core/mod.rscrates/tui/src/core/engine.rs(wire in the runtime)crates/config/src/lib.rs(add[rlm]table)crates/tui/src/settings.rs(RLM UI settings)