-
-
Notifications
You must be signed in to change notification settings - Fork 2
Graph Agents
Graph-based agents are a declarative, YAML-driven workflow engine layered on top of Loki's existing agent system. Where a normal agent runs as a single LLM loop driven by tool calls, a graph agent is a directed graph of typed nodes. Each node performs one well-defined step (call an LLM, run a script, ask the user a question, spawn a child agent, etc.) and routes to the next node based on its result.
Graph agents are best for workflows that:
- Have a fixed shape (e.g. parse -> query -> grade -> synthesize -> verify)
- Mix LLM calls with deterministic steps (scripts, user prompts)
- Need explicit human-in-the-loop checkpoints
- Benefit from per-step model / tool / temperature overrides
If you just want an agent that takes a goal and figures out the steps on its own, stick with a regular agent.
A graph agent is defined by a single graph.yaml. It holds both the
agent-level config (model, tools, MCP servers) and the workflow:
<loki-config-dir>/agents
└── my-graph-agent
├── graph.yaml # agent config + workflow definition
├── tools.sh # optional custom tools
├── <rag-node-id>.yaml # auto-built knowledge base for a rag node
└── scripts/ # optional script-node implementations
├── decide.py
└── verify.py
<rag-node-id>.yaml files are generated by Loki at agent load time - one
per rag node - and should not be hand-edited.
An agent directory must contain either a config.yaml (a normal,
LLM-loop agent (see Agents)) or a graph.yaml (a graph
agent). Never both. The presence of graph.yaml is what marks an agent
as a graph agent; when Loki runs it, execution is driven entirely by the
graph.
Both files present is an error. If an agent directory contains both
config.yaml and graph.yaml, Loki refuses to load it and tells you to
remove one. Pick the model that fits: config.yaml for an open-ended
LLM-loop agent, graph.yaml for a fixed-shape workflow.
name: my-graph-agent
description: |
Plain prose describing what the workflow does.
version: "1.0"
# --- agent-level config ---
model: anthropic:claude-sonnet-4-6 # default model for llm nodes
temperature: 0.0 # default sampling temperature
top_p: null # default sampling top-p
global_tools: # global tools available to nodes
- web_search_loki.sh
mcp_servers: # MCP servers available to nodes
- pubmed-search
conversation_starters: # suggested prompts in the UI
- "Research WebAssembly outside of the browser"
settings:
max_loop_iterations: 100 # PER-NODE visit cap; default 100 (see below)
log_state_snapshots: true # log state JSON before each node executes
validate_before_run: true # run the graph validator on startup
timeout: 600 # optional overall timeout in seconds
initial_state: # optional seed state for the run
topic: "auth"
start: parse_input # required: ID of the first node to run
nodes:
parse_input: { ... }
...-
version: Currently only"1.0"is accepted by the parser. Anything else fails at startup. This is the graph schema version, not your agent's version. -
Agent-level config (
model,temperature,top_p,global_tools,mcp_servers,conversation_starters) are all optional. These are the same fields a normal agent'sconfig.yamlcarries; in a graph agent they live at the top ofgraph.yamlinstead.model/temperature/top_pact as the defaults forllmnodes that don't set their own.global_toolsandmcp_serversdefine the tool universe that anllmnode'stools:whitelist selects from (a node with notools:field gets none of them). -
can_spawn_agentsis derived, not declared. A graph agent can spawn child agents iff its graph contains at least oneagentnode. You don't set a flag. Theagentnode's presence is the declaration. -
max_loop_iterations: This is a per-node visit cap, not a total graph-step cap. If the same node id is entered more than this many times, execution aborts withNode 'X' visited N times (max_loop_iterations=...). Default: 100. -
timeout: Wall-clock cap on the entire graph run. The executor checks this between every node transition; nodes that block longer than the timeout will still finish before the check fires. -
initial_state: A JSON-compatible object. Values are seeded into state before any node runs and are referenced from any node via{{key}}templates.
When Loki invokes a graph agent with a user prompt (whether from the
command line loki -a my-agent "what is X?", from the REPL, or from a
parent agent that spawned it as a sub-agent), the dispatcher automatically
seeds the prompt text into state under the key initial_prompt before
any node runs.
This means every graph agent's first node can reference the user's request
via {{initial_prompt}}:
parse_input:
id: parse_input
type: llm
prompt: "{{initial_prompt}}" # the user's command-line / REPL text
...You do not need to (and should not) put initial_prompt in initial_state as it is overwritten by the dispatcher.
There are seven node types: agent, script, approval, input, llm, rag, and end. Every node has these common fields:
my_node:
id: my_node # must match the map key
type: <one of the seven>
description: optional # free-form
next: another_node # optional default next node; semantics vary per typeThe next field defines the default routing edge. Node types interpret it
differently (some types ignore it in favor of internal routing; see each type
below).
Spawns a Loki sub-agent and waits for it to finish. This is how a graph agent delegates a sub-goal to a fully autonomous Loki agent (with its own tool loop and configuration).
research_topic:
id: research_topic
type: agent
agent: deep-researcher # name of an existing Loki agent
prompt: "Research {{topic}}" # interpolated against state
timeout: 600 # optional, in seconds (default 300)
state_updates:
findings: "{{output}}"
output_schema: { ... } # optional, see "Structured Output" below
next: render-
agent: Name of the child agent to spawn. Must exist in<loki-config-dir>/agents/. -
prompt: The user message sent to the child agent. Templated against the current graph state. -
timeout: Hard wall-clock cap. If the child agent exceeds it, the whole graph fails (no built-in fallback path on agent nodes). -
state_updates: Map ofstate_key: "{{template}}". The child agent's final text is available inside this map as{{output}}.
Runs a Bash, Python, or TypeScript script and merges its JSON-object stdout
into state. Script files live under the agent's scripts/ directory.
Supported extensions and runtimes:
| Extension | Runtime invoked | Notes |
|---|---|---|
.sh |
bash <script> |
|
.py |
python3 <script> |
not python. Must be Python 3 |
.ts |
npx tsx <script> |
requires Node + tsx available on PATH |
.js / .mjs / other extensions are not supported. The shebang line
inside the script is not used for script-node dispatch (it is for normal
custom-tools); the file extension is the source of truth.
route_after_parse:
id: route_after_parse
type: script
script: scripts/route_after_parse.py
timeout: 30 # seconds, default 30
fallback: handle_error # optional: where to route on script failure
state_updates: # applied after stdout merge
last_run: "{{some_value}}"The script receives the current state in two forms; use whichever fits:
| Env var | Contents |
|---|---|
GRAPH_STATE |
Inline JSON when serialized state is <= 32 KiB |
GRAPH_STATE_FILE |
Path to a temp JSON file when serialized state exceeds 32 KiB |
Exactly one of the two is set per script invocation; always check both. The temp file (when used) is cleaned up automatically after the graph finishes.
The script must print a single JSON object on stdout. All keys merge into
state; the reserved _next key is extracted and overrides the default next
routing.
#!/usr/bin/env python3
import json, os
def load_state():
if path := os.environ.get("GRAPH_STATE_FILE"):
with open(path) as f:
return json.load(f)
return json.loads(os.environ.get("GRAPH_STATE", "{}"))
state = load_state()
codes = (state.get("web_search_results") or "").strip()
next_node = "query_db" if codes else "ask_for_code"
print(json.dumps({"_next": next_node, "trimmed_codes": codes}))Tolerant-fail: if the script exits non-zero or produces invalid JSON, the
node routes to fallback (if set) or to next (if set). Without either,
the graph errors.
Prompts the user with a question and a list of options, then routes based on their answer. This is the human-in-the-loop checkpoint.
approve:
id: approve
type: approval
question: |
Final report:
{{report}}
Approve?
options:
- "yes"
- "no"
routes:
"yes": end_accepted
"no": end_rejected
on_other: clarify # Required - see below
state_updates:
decision: "{{choice}}"This field is required and easy to miss. Loki's user__ask tool always
gives the user a "type your own answer" option in addition to the listed
options. There is no way to disable this. Without on_other, a user who
types something other than the listed options would crash the graph at
runtime.
on_other says where to route when the user's answer does not match any
routes key. The free-form text they typed is available downstream via
the {{choice}} template variable inside state_updates.
Common patterns:
-
Free-form means "I want to clarify" ->
on_other: clarify_nodewhereclarify_nodeis aninputorllmnode that processes their text. -
Free-form means "rejection by default" ->
on_other: end_rejected.
Collects a free-form string from the user.
ask_for_code:
id: ask_for_code
type: input
question: "Enter a search term:"
default: "{{last_used_code}}" # optional, interpolated against state
validation: "len(input) > 0" # optional, see below
state_updates:
web_search_result: "{{input}}"
next: query_db-
default: If the user submits an empty response, this template is used. Onlydefaultitself is templated, not the surrounding question (which is also templated). -
validation: A length predicate of the formlen(input) <op> <integer>, where<op>is>,>=,<,<=, or==. This is a deliberately narrow grammar; regex / type / range validation are not yet supported. If validation fails, the node fails (no fallback). - The user's text is exposed to
state_updatesas{{input}}.
A one-shot LLM call with an optional bounded tool-call loop. Unlike agent
nodes, this does NOT spawn a sub-agent; it runs in a fresh isolated context
with a caller-supplied system prompt and user prompt. Tool access is strictly
opt-in: an llm node gets no tools at all unless its tools field
explicitly lists them (see below).
grade_research:
id: grade_research
type: llm
instructions: | # optional system prompt
You decide whether research is needed for {{topic}}.
prompt: | # required user prompt
Research context:
{{research_text}}
Reply with YES or NO.
tools: [] # see below
model: anthropic:haiku # optional override
temperature: 0.0
top_p: null
max_attempts: 1 # transient-error retries (default 1)
max_iterations: 10 # tool-call-loop turn cap (default 10)
fallback: skip # routes here if all attempts fail
state_updates:
grade: "{{output}}"
output_schema: { ... } # optional, see "Structured Output" below
timeout: 120 # optional; node wall-clock cap in seconds (unset = no timeout)
next: synthesizeThe tools field is a strict opt-in whitelist: an llm node receives
only the tools it explicitly lists, never the agent's full tool set.
Three modes:
-
Unset (field omitted) -> no tools. The LLM produces output but
cannot make any tool calls. This is identical to
tools: []. Leaving the field out does not inherit the agent's tools. -
tools: []-> no tools. Same as unset. -
tools: [a, b, mcp:server-name]-> only those specific tools, and nothing else. Entries are either exact tool names (matchingglobal_tools, agent custom tools, or individual MCP function names) or the shorthandmcp:<server-name>(which enables all functions for that MCP server).
Even when tools lists entries, the LLM receives exactly that set. The
whitelist is enforced against global tools, agent custom tools, and MCP
alike. Each entry is validated at startup against the active agent's tool
list; an unknown entry is a startup error.
| Outcome | Routes to |
|---|---|
| Success | next |
Failure WITH fallback set |
fallback |
Failure WITHOUT fallback
|
next (output is "LLM node failed: ...") |
state_updates are always applied (success or failure). On failure,
{{output}} resolves to an error description so downstream nodes can detect
it.
max_attempts retries the LLM call only on transient errors. The
failure message containing one of: timed out, rate limit, 429,
Connection reset, Connection refused, or produced no output. Any
other error fails immediately without consuming further attempts. The
default is 1 (no retries).
Runs a hybrid (vector + keyword) retrieval against a per-node knowledge base
and writes the result into state. This is how a graph agent does
Retrieval-Augmented Generation: the rag node retrieves context, downstream
llm/agent nodes inject it into their prompts via normal templating.
research_context:
id: research_context
type: rag
documents: # required; The knowledge sources
- ./knowledge/
- https://example.com/spec
query: "{{initial_prompt}}" # templated; defaults to "{{initial_prompt}}"
top_k: 5 # optional; default = the knowledge base's own top_k
timeout: 120 # optional; retrieval timeout in seconds (default 120)
state_updates: # required in practice (see below)
rag_context: "{{output.context}}"
rag_sources: "{{output.sources}}"
next: answer
answer:
type: llm
prompt: |
Use this context to answer:
{{rag_context}}
Question: {{initial_prompt}}-
documents: Knowledge sources: files, directories, URLs, or loader-protocol paths. Required. It's what makes the node aragnode. Relative paths resolve against the agent's directory. -
query: The retrieval query, templated against state. Defaults to{{initial_prompt}}. Set it to{{refined_query}}to retrieve against a query an upstreamllmnode produced. -
top_k: Number of chunks to retrieve. Defaults to the knowledge base's own configuredtop_k. -
timeout: Retrieval timeout in seconds. Default 120. -
state_updates: Where the result goes. Aragnode with nostate_updatesdiscards its result (the validator warns).
Knowledge-base build config (all optional; used only when the knowledge base is first built):
-
embedding_model: Embedding model for the corpus. -
chunk_size: Document chunk size. -
chunk_overlap: Overlap between chunks. -
reranker_model: Reranker applied to hybrid-search results. -
batch_size: Embedding-request batch size.
Each falls back to the app-level rag_* config when omitted. When
embedding_model, chunk_size, and chunk_overlap are all set, the
knowledge base builds with no interactive prompts. So a fully-specified
rag node works in non-interactive runs.
Inside state_updates, {{output}} is a JSON object:
{
"context": "[Source: ./knowledge/a.md]\n...chunk...",
"sources": ["./knowledge/a.md", "https://example.com/spec"]
}-
{{output.context}}: The retrieved context block, ready to inject into a prompt. -
{{output.sources}}: An array of source paths;{{output.sources[0]}}indexes individual sources (useful for downstream citation/verification nodes).
Each rag node's knowledge base is built once, at agent load time, into
<agent-dir>/<node-id>.yaml:
- If that file exists -> it is loaded (no prompt; works non-interactively).
- If it's missing and the node is fully specified (
embedding_model+chunk_size+chunk_overlapall set) -> it is built directly, no prompts. Works in non-interactive runs. - If it's missing, not fully specified, and Loki is interactive -> you are asked to initialize it, then prompted for the missing build values; declining is a hard error.
- If it's missing, not fully specified, and Loki is non-interactive (no TTY) -> hard error, with a hint to set the build-config fields or run the agent once interactively.
A graph with a rag node whose knowledge base isn't built cannot run.
This is deliberate fail-fast behavior. (In --info mode the agent is only
inspected, not run, so knowledge-base building is skipped entirely.)
Retrieval at execution time is fast (no re-embedding of the corpus). It's the same hybrid vector + keyword search normal Loki RAG uses. The corpus embedding/chunking cost is paid once, at load time.
Terminates execution and returns a final result.
end_accepted:
id: end_accepted
type: end
output: |
Approved report:
{{report}}
state_updates: # optional last state mutations
completed_at: "now"-
output: Templated against state, printed as the graph's final result. - Multiple
endnodes are fine; you pick which one routes here based on upstream conditions.
Graph state is a serde_json::Value map. Templates use {{path}} syntax
inside any string field.
| Form | Resolves to |
|---|---|
{{key}} |
top-level value |
{{a.b.c}} |
nested object path |
{{arr[0]}} |
array index |
{{matrix[0][1]}} |
nested array indices |
{{users[0].name}} |
object field via index |
{{a.b.arr[2].field}} |
mixed path |
Rendering rules per value type:
- String -> as-is
-
Number / bool / null -> stringified (
true,42,null) -
Array / Object -> JSON-encoded compactly (
["a","b"],{"k":"v"})
Missing keys / paths behave differently per template-evaluation site:
- Inside a node's primary fields (
prompt,instructions,question,output) -> strict mode, missing keys raise an error. - Inside
state_updatesvalues -> lenient mode, missing keys become empty strings.
Every node type (except end, which has a slightly different shape) accepts
an optional state_updates map:
state_updates:
some_key: "{{template}}"
other_key: "literal text with {{var}}"After the node body executes, each template is interpolated against state and
the result is stored under the corresponding key. Three scoped variables are
available only inside state_updates:
| Variable | Available in | Resolves to |
|---|---|---|
{{output}} |
agent, llm
|
The node's primary text output (or parsed JSON value if output_schema is set) |
{{choice}} |
approval |
The option the user picked, or their free-form text |
{{input}} |
input |
The user's text (or interpolated default if they submitted empty) |
These variables are cleared after state_updates runs, so they don't leak
into the next node's templates.
End nodes are different. An
endnode'sstate_updatesruns with plain lenient interpolation. There is no scoped{{output}}because there is no node-body output to scope. Afterstate_updatesapply, theendnode's ownoutputtemplate is interpolated against the resulting state and returned as the graph's final result.
Nodes route via three mechanisms in priority order:
-
Script
_nextoverride:scriptnodes can set"_next": "node_id"in their stdout JSON to dynamically choose the next node. -
Internal routing:
approvalroutes via itsroutesmap (oron_otherwhen the answer matches no listed option). -
Default
nextedge: thenextfield on the node.
| Node type | Needs next? |
|---|---|
agent |
Yes - next is required (unless the agent node is unreachable). Error at runtime if missing. |
script |
Either _next from script output OR static next (or fallback on failure). Error if neither. |
approval |
No - routing is via routes and on_other. next is ignored. |
input |
Yes - next is the success route. |
llm |
Yes - next is the success route (and the default for failures without fallback). |
rag |
Yes - next is required. Error at runtime if missing. |
end |
No - terminal. |
Currently honored by script and llm nodes:
- Success -> default routing
- Failure with
fallbackset ->fallbacktarget - Failure without
fallback-> default routing, with the error description exposed in state so the next node can react
agent and input nodes do NOT have a tolerant-fail fallback path;
their failures propagate as graph failures.
Both llm and agent nodes can specify an output_schema field: a JSON
Schema (written inline in YAML) describing the expected shape of the node's
output:
extract_task:
type: llm
prompt: 'Parse: "{{raw_task}}"'
output_schema:
type: object
properties:
action: { type: string }
items:
type: array
items: { type: string }
time_minutes: { type: ["integer", "null"] }
priority:
type: string
enum: [low, medium, high]
required: [action, items, priority]When output_schema is set:
- The node body runs normally.
- The raw text output is tried as JSON first (with light cleanup of markdown code fences); the fast path. If parsing succeeds, that's the structured output.
- Otherwise Loki invokes a built-in
__structured_output__role (constructed inline; not visible in the user's role list) to extract a JSON object matching the schema. One repair retry on extractor failure. - When the parsed value is a JSON object, its top-level keys
auto-merge into state permanently (a non-object result is still
reachable via
{{output}}but has no top-level keys to merge). -
{{output}}(insidestate_updates) resolves to the full parsed value. - Explicit
state_updateswin over auto-merge if the same key is set in both.
After the example above, downstream nodes can use {{action}}, {{items}},
{{items[0]}}, {{priority}}, etc. directly.
This is the most important behavioral difference between the two node
types when output_schema is set:
-
LLM nodes: Loki automatically appends a schema hint to the prompt
(to the system prompt if
instructionsis set, otherwise to the user prompt). The hint tells the model to respond with JSON matching the schema. This means the main LLM call usually emits valid JSON directly -> the fast path succeeds -> the extractor LLM call is skipped entirely (cheaper, faster, more reliable). - Agent nodes: Loki does NOT inject any schema hint. Agents are multi-turn with their own tool-use loop; stuffing a schema into the initial prompt risks the agent fixating on JSON output instead of doing its actual work. The agent runs to completion freely, and the extractor converts its final text to JSON afterward.
If you need an agent to emit JSON-shaped output, include schema language in its prompt yourself. The auto-injected hint for LLM nodes uses this form:
Respond with a JSON object that matches this schema. Output ONLY the JSON
object with no surrounding prose or markdown fences.
Schema:
{...}
-
LLM node: extraction failure = node failure -> routes via
fallbackornext. -
Agent node: extraction failure propagates as a graph error (agent
nodes have no
fallback).
A compact illustrative graph -input -> llm (with output_schema) ->
end - exercising structured output and all template-path forms. For a
full-featured reference covering every node type and field, see the
heavily-commented graph.example.yaml at the root of the Loki repository.
Illustrative graph.yaml:
name: structured-test
version: "1.0"
start: ask_task
nodes:
ask_task:
id: ask_task
type: input
question: "Describe a task in free-form text."
validation: "len(input) > 0"
state_updates:
raw_task: "{{input}}"
next: extract_task
extract_task:
id: extract_task
type: llm
instructions: |
You are a task parser. If a field cannot be determined, use a sensible
default (empty array, null, or "medium" for priority).
prompt: 'Parse this task description: "{{raw_task}}"'
tools: []
output_schema:
type: object
properties:
action: { type: string }
items:
type: array
items: { type: string }
time_minutes: { type: ["integer", "null"] }
priority:
type: string
enum: [low, medium, high]
details:
type: object
properties:
urgent: { type: boolean }
deadline: { type: ["string", "null"] }
required: [urgent]
required: [action, items, priority, details]
next: done
done:
id: done
type: end
output: |
Action: {{action}}
Priority: {{priority}}
Time: {{time_minutes}} min
Urgent? {{details.urgent}}
First item: {{items[0]}}
All items: {{items}}With the sample input Buy groceries: milk, eggs, bread. About 15 minutes. Urgent.
Sample state after extract_task:
{
"raw_task": "Buy groceries: milk, eggs, bread. About 15 minutes. Urgent.",
"action": "buy",
"items": ["milk", "eggs", "bread"],
"time_minutes": 15,
"priority": "high",
"details": { "urgent": true, "deadline": null }
}When validate_before_run: true (the default), Loki validates the graph at
startup.
Errors (abort startup):
- Start node missing or pointing to a non-existent node
- Any
next/routes/fallback/on_othertarget pointing to a non-existent node - Any cycle in declared static edges (cycles are always errors. The
per-node
max_loop_iterationsis a runtime safety net for dynamically- routed loops, not a license for static cycles) - Graph has zero
endnodes. Execution would never terminate -
approvaloption without a matchingroutesentry -
scriptfile path does not exist relative to the agent's directory -
agentnode references an agent name that doesn't exist in the loki agents directory, or that exists but has neither aconfig.yamlnor agraph.yaml -
ragnode with nodocuments(at least one knowledge source is required) -
llmnode referencing an unknown tool ormcp:<server>in itstoolswhitelist, or an unknownmodel. Validated against the agent's tool, MCP-server, and model sets
Warnings (printed, execution continues):
- Any node unreachable from the start via declared static edges
- No
endnode reachable from the start via declared static edges -
approvalroutesentry without a matching option -
ragnode with nostate_updates(its retrieval result goes nowhere)
Why some of these are warnings and not errors: the validator only follows declared static edges (
next,routes,fallback,on_other). Script nodes can also route dynamically at runtime via_nextin their JSON output, and those edges are invisible to static analysis. To avoid false positives against dynamically-routed graphs, "unreachable" and "no reachable end" are reported as warnings, not errors.
A graph agent can be entered from three places, all of which seed the
caller's prompt into state as {{initial_prompt}}:
-
Top-level CLI:
loki -a my-graph-agent "user prompt here" -
REPL: When the active agent has a
graph.yaml, every user message in the REPL runs the graph fresh; the message becomes{{initial_prompt}} -
Child-agent spawn: When another (graph or normal) agent invokes
this one via Loki's sub-agent mechanism, the parent's request becomes
{{initial_prompt}}for the child graph
After the graph finishes, any sub-agents this graph spawned via
agent-type nodes are cancelled, so a graph cannot leak background tool
loops. The graph's final end node output is what's returned to the
caller.
Graph execution has two observability channels:
1. stderr narration: Dimmed ▸ lines you follow along with in real
time, regardless of log level:
▸ graph: my-agent (start: extract_task)
▸ extract_task (llm)
▸ llm call: model=<active> tools=<none>
▸ extract_task -> done
▸ done (end)
▸ graph done in 2.41s
2. tracing logs: Structured info!/debug!/warn!/error!
records gated by RUST_LOG (see Configuration below).
This is the developer-facing channel and includes:
- Graph start / completion / failure
- Per-node entry and routing decisions (
debug) - A performance summary at completion — every node's visit count,
total/avg/max wall-clock time, slowest first:
[graph:my-agent] performance summary (slowest first): [graph:my-agent] deep_research: 1 visit(s), total 8200ms, avg 8200ms, max 8200ms [graph:my-agent] extract_task: 1 visit(s), total 1400ms, avg 1400ms, max 1400ms
State snapshots: when log_state_snapshots: true (the default), before
each node runs Loki logs the state's byte size and key list at debug
level, and the full state at trace level. The full state is
deliberately kept at trace because graph state can contain secrets so
be careful sharing trace-level logs.
Control the tracing channel with RUST_LOG:
RUST_LOG=loki::graph=debug loki -a my-agent "..." # graph debug logs
RUST_LOG=loki::graph=trace loki -a my-agent "..." # + full state snapshots
RUST_LOG=loki::graph=info loki -a my-agent "..." # start/end/perf summaryThe stderr ▸ narration is always shown and is not affected by RUST_LOG.
A short, honest list of things that bite people:
-
A graph agent is
graph.yaml-only. It must not also have aconfig.yaml. Both files present is a hard load error. -
Graph agents do not support sessions. A graph manages its own state
(
GraphState), so there is no conversational history to persist. Explicitly requesting a session is a hard error.--sessionon the CLI, a session name passed to.agentin the REPL, or running.sessionwhile inside a graph agent. Any app-levelagent_sessiondefault is silently skipped for graph agents rather than applied. -
RAG is per-node, not agent-wide. Graph agents do RAG via
ragnodes (each with its own knowledge base); there is no agent-widedocumentsfield at thegraph.yamltop level. -
A
ragnode's knowledge base is built once, at load time. Changing aragnode'sdocumentsdoes not rebuild it. Delete<agent-dir>/<node-id>.yamlto force a fresh build on next run. -
on_otheris required on everyapprovalnode becauseuser__askalways permits free-form responses (see the approval section). -
validationoninputnodes is length-only. The grammar islen(input) <op> <integer>with<op>in> >= < <= ==. No regex, no type coercion, no range checks. Use a follow-upscriptnode for richer validation. -
An
inputnode'sdefaultis not re-validated. When the user submits an empty response and thedefaultis substituted in, that substituted value is not checked againstvalidation. Make sure anydefaultyou set would itself satisfy thevalidationpredicate. -
Tool whitelist is
llm-only.agentnodes always use the child agent's full tool universe. They ignore anytools:field. This is by design: child agents own their tool surface. -
{{output}},{{choice}},{{input}}are scoped tostate_updates. Outsidestate_updates(e.g. in another node'sprompt), these scoped variables are not available unless the previous node explicitly stored them viastate_updates.endnodes do NOT get a scoped{{output}}. They have no node body output to scope. -
Schema-hint auto-injection happens for
llmnodes only, notagentnodes (see Structured Output). -
Script-output JSON must be an object, not an array or primitive,
even if you only want to set
_next. -
Cycles in declared static edges are always errors. The per-node
max_loop_iterationsis a runtime safety net for cycles built via dynamicscript._nextrouting, not permission to write static cycles. -
Schema version is fixed at
"1.0"today. Any other value is a startup error. -
Script extensions are exactly
.sh,.py,.ts. No JavaScript, no Ruby, no Lua. Python must be available aspython3and TypeScript requiresnpx tsxon PATH.
-
graph.example.yaml- A fully-commented, full-featured reference graph agent at the root of the Loki repository (every top-level field, every node type). - Agents - non-graph agent system (config.yaml + LLM loop)
-
Custom Tools - building
tools.sh/tools.py/tools.tsfiles for use in graph nodes -
Roles - note that the built-in
__structured_output__role used byoutput_schemais intentionally internal and is not user-visible -
MCP Servers -
mcp:<server>shorthand inside anllmnode'stools:whitelist