You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add agent-loop orchestration / toolMode: 'auto' to scope.models
Context
The model-access API (#510) declares toolMode: 'return' | 'auto' on GenerateOpts. 'return' mode is trivial (backend returns tool-call requests; caller resolves them externally). 'auto' mode is the in-process agent loop — scope.models resolves model-issued tool calls against scope.resources and registered MCP tools, executes them, and re-invokes the backend until the model produces a terminal answer.
That loop is substantial enough to deserve its own issue: tool resolution conventions, MCP integration surface, streaming behavior across tool-call boundaries, audit, error handling, and safety limits all need design work. Implementing it inside #510 would gate the model-access primitives behind 1-2 weeks of agent-loop design. This issue carries that work separately.
Once shipped, toolMode: 'auto' is what unlocks Harper's compressed-stack agent story: tool calls resolve in-process against local Resources, audited via the standard transaction log. The loop itself is the same regardless of target; the invocation latency depends on the tool target — sub-millisecond for Resource-backed tools (in-process dispatch), normal HTTP for tools the unified registry resolves from external MCP servers. The compressed-stack win applies to the Resource-backed case.
If the result has no tool calls, return to caller. Done.
If tool calls are present: validate each call's arguments against the tool's JSON Schema (opts.toolArgValidation: 'strict' | 'lenient' | 'none', default 'strict' — models hallucinate JSON shape often enough that this catches real bugs cheaply). Resolve each call against scope.resources (or the unified MCP tool registry), execute, capture results.
Append the assistant's tool-call message and the tool results to the message list.
Re-invoke the backend with the updated message list.
Repeat from step 3 until terminal (no more tool calls) or any safety limit trips.
Safety limits
The orchestrator enforces three independent limits, any one of which trips a structured budget-exceeded abort with the partial conversation for debugging:
opts.maxToolIterations — bounds the count of tool-call rounds. Default 10.
opts.maxTokens — bounds the cumulative prompt + completion + tool-output tokens across the whole loop. No default (uncapped). Lets dependent issues (e.g. Built-in Harper Agent Component #626's built-in agent enforcing per-session token budgets) bound spend at the orchestrator rather than via post-hoc inspection of analytics.model_call.
opts.maxCostUsd — bounds cumulative USD cost. Per-call cost is computed from analytics.model_call.gpu_ms / token counts × the configured rate card (per-model). No default (uncapped). Resolves the per-session cost-cap ask from Built-in Harper Agent Component #626.
A single iteration with a large prompt + large tool result + large response can blow a budget on its own — count is not a proxy for spend.
Parallel tool calls
Both OpenAI and Anthropic backends emit multiple tool calls in a single assistant message. Default execution: parallel — matches model expectations, latency-optimal. Configurable via:
opts.toolParallelism: 'parallel' | 'serial'
Within a parallel batch, mutations do not share a transaction boundary — each tool call is its own Resource dispatch and commits independently (see "Transaction scope per turn" below). Apps that need atomicity across multiple steps should wrap the multi-step intent in a single Resource method exposed as one tool, not rely on parallel-batch atomicity.
Tool result size cap
A tool returning 10MB inflates the next round-trip's prompt and can exceed the model's context window. The orchestrator enforces a per-tool-result size cap with a truncation marker:
opts.toolResultMaxBytes — default 64KB. Result exceeding the cap is truncated and a marker (e.g. […truncated; full result <Nbytes> bytes]) is appended; the truncated form is what feeds back into the model.
Cheap to add now; awkward to retrofit once apps depend on full results flowing through.
AbortSignal propagation across the loop
When the caller aborts a generate mid-loop:
Cancel the upstream LLM call in flight.
Propagate the abort into in-flight tool executions via the tool's invocation context (AbortSignal on the dispatched Resource call, when supported).
Already-committed writes from completed tool calls stay committed — no implicit rollback. The orchestrator returns a structured abort error.
Important for long-running agent endpoints (e.g. SSE-served chat completions where the client disconnects).
Tool resolution
Once the native MCP server in #465 lands, the scope.resources registry and "registered MCP tools" collapse into one unified tool registry — the same set of tools is exposed to external MCP clients (via #618 Application profile) and to in-process scope.models orchestration.
Sources of tools in 'auto' mode:
Auto-generated from the Resources registry via #618. Each @export-ed Resource produces get_*, search_*, create_*, update_*, delete_* tools (only the verbs the class implements, RBAC-filtered).
Custom Resource methods opted in via static mcpTools declaration per #622. For non-verb methods or computed tools.
Tool resolution is a lookup in the central tool registry from #615, which already handles RBAC-aware filtering. scope.models consults the same registry external MCP clients see.
Tool name convention
Inherited from #618: tool names are <verb>_<resource> (get_product, create_order, etc.) with sanitization for path separators. This orchestration uses the same naming so external MCP clients and in-process scope.models callers reason about the same set of tools.
Auto-discovery via the unified registry
With #618 landed, every @export-ed Resource is automatically a candidate tool — no explicit registration in GenerateOpts.tools required. The caller can still narrow the exposed set per call (tools: ['search_product', 'get_product']) or pass an explicit list for tighter control. Default behavior in 'auto' mode: the full RBAC-filtered registry the calling user can see.
Streaming with tool calls
For generateStream with toolMode: 'auto':
Yields content deltas as they arrive from the backend.
When the model emits a tool-call delta, the iterator yields that delta (signaling the consumer; useful for UI "agent is calling X" indicators).
The orchestrator suspends the upstream stream, executes the tool call(s), feeds results back to the backend.
The new stream from the backend resumes, yielding more deltas.
Loop until terminal.
The existing GenerateChunk shape (deltaContent?, deltaToolCalls?, finishReason?) supports this transition without changes.
Audit and replication
Tool calls inherit Harper's standard machinery for free:
Each tool call goes through the Resource dispatch — auth chain runs (tool calls have the caller's permissions), the call is logged in the transaction log, mutations replicate.
The model call itself is logged in analytics.model_call (per Add unified model-access API (scope.models) #510). The model call records the tool-call count; individual tool calls show up in the transaction audit log alongside data changes.
Querying "what did this agent actually do" is a regular Harper query against the audit log filtered by request / conversation / tenant.
This is the compressed-stack advantage made concrete: an agent's actions are recorded the same way as a human user's actions, in the same store, queryable with the same primitives.
Transaction scope per turn
Each tool call commits its own transaction. If the agent calls create_order and then update_inventory and the second fails, the first is already committed. This is the right default for composability with non-agent callers (tools-as-Resources behave identically whether called by an agent or directly), but worth calling out explicitly:
Multi-step atomic intent should be wrapped in a single Resource method (exposed as one tool), not relied on across multiple tool calls.
The orchestrator does not provide automatic rollback or saga semantics across the loop.
ConversationResource integration
Optional affordance for callers using ConversationResource (#511): pass opts.conversation?: ConversationResource and the orchestrator appends each turn (user message, assistant message, tool calls, tool results) as it runs. Without it, conversation persistence is the caller's responsibility — they'd reimplement the same write pattern around every agent loop invocation.
The orchestrator does not require a ConversationResource. Stateless agent calls work fine. When provided, the integration:
Appends the input as a user/system turn before the loop starts.
For each tool call: appends the assistant's tool-call turn and the tool result(s) as a role: 'tool' turn.
Appends the final assistant response as a turn before returning.
The contract: the orchestrator writes turns as it runs; the caller decides whether to provide a conversation at all.
Error handling per tool call
When a tool call fails, default behavior: append the error as the tool result (so the model can react and choose to retry, work around, or abort). Configurable via opts.toolErrorMode:
'recover' (default) — return error to model, continue loop
'abort' — abort the loop, return error to caller
Return shape
On success, scope.models.generate({ toolMode: 'auto' }) returns the final assistant response — same GenerateResult shape callers get from 'return' mode, just without toolCalls (those were resolved internally). Low overhead; matches the "I just want the answer" common case.
Opt in to the full sequence (assistant turns, tool calls, tool results, intermediate responses) via opts.includeToolTrace: true. Returns the response plus an ordered trace: ToolTraceEntry[] for inspection and debugging.
On error or safety-abort (budget exceeded, abort signal, tool error with 'abort' mode), the trace is included regardless of includeToolTrace. Operators need that trace to diagnose; hiding debug info on failure is an antipattern.
opts.maxToolIterations, opts.maxTokens, and opts.maxCostUsd independently enforced; any trip returns a structured budget-exceeded abort with the partial trace.
Parallel tool-call execution by default; opts.toolParallelism: 'serial' runs serially.
Per-tool-result size cap (opts.toolResultMaxBytes, default 64KB) with truncation marker.
Transaction scope per turn explicitly documented — each tool call commits independently.
Streaming + tool calls: chunks yield correctly through tool-execution transitions.
Tool calls audited in the transaction log alongside data writes (inherited from Resource dispatch).
Permissions enforced — tool calls run with caller's identity.
Error handling per opts.toolErrorMode; trace returned on error/abort even when includeToolTrace would otherwise be off.
Tool argument validation strict by default; configurable per opts.toolArgValidation.
Optional opts.conversation?: ConversationResource integration appends turns as the loop runs.
Documented: building an agent with auto mode; how to register a Resource as a tool; tool error handling; budget enforcement; abort semantics; conversation integration.
Per-tenant accounting on the model calls is preserved through the loop (each round-trip writes its own analytics.model_call row).
Dependencies
Hard: model-access API (Add unified model-access API (scope.models) #510), specifically Phase 3 — needs a backend that natively supports tools (openai backend, then anthropic / bedrock follow). ollama does not currently expose tools in a portable way; if/when it does, this orchestration picks it up automatically.
Hard: native MCP server family (#465 umbrella), specifically:
#615 — Tool registry + class-level introspection + RBAC-aware filtering (the registry this orchestration consults)
Previously this dependency was framed against the external HarperFast/mcp-server addon; that path is superseded by the native MCP server (and HarperFast/mcp-server is being archived per #623).
Pairs with Add ConversationResource for agent memory and conversation state #511 (ConversationResource) — turns.tool_calls and turns.tool_results record what the orchestrator did; optional integration via opts.conversation per the "ConversationResource integration" section above.
Downstream consumer: Built-in Harper Agent Component #626 (Built-in Harper Agent Component) — uses this orchestration as its loop; surfaced the per-call cost-cap and per-turn cost-surfacing asks that informed the safety-limits section.
Out of scope
Tool catalog / discovery UX in Studio.
Cross-tenant tool invocation.
A full multi-agent orchestration layer (this issue does single-agent tool-call loops; multi-agent coordination is its own design space).
Per-model-call permission scoping (opts.toolPermissions to run the loop with a tighter permission set than the caller's). Tool calls inherit the caller's permissions in v1; tighter scoping is a v1.1 / follow-up consideration.
Structural cycle detection.maxToolIterations already catches infinite loops eventually. A "same tool, same args, N times consecutively" check would fail faster with a more actionable error — fast-follow if needed; not load-bearing for v1.
Add agent-loop orchestration /
toolMode: 'auto'toscope.modelsContext
The model-access API (#510) declares
toolMode: 'return' | 'auto'onGenerateOpts.'return'mode is trivial (backend returns tool-call requests; caller resolves them externally).'auto'mode is the in-process agent loop —scope.modelsresolves model-issued tool calls againstscope.resourcesand registered MCP tools, executes them, and re-invokes the backend until the model produces a terminal answer.That loop is substantial enough to deserve its own issue: tool resolution conventions, MCP integration surface, streaming behavior across tool-call boundaries, audit, error handling, and safety limits all need design work. Implementing it inside #510 would gate the model-access primitives behind 1-2 weeks of agent-loop design. This issue carries that work separately.
Once shipped,
toolMode: 'auto'is what unlocks Harper's compressed-stack agent story: tool calls resolve in-process against local Resources, audited via the standard transaction log. The loop itself is the same regardless of target; the invocation latency depends on the tool target — sub-millisecond for Resource-backed tools (in-process dispatch), normal HTTP for tools the unified registry resolves from external MCP servers. The compressed-stack win applies to the Resource-backed case.API contract from #510 that this fulfills
'return'mode lands with #510 itself.'auto'is type-declared by #510 but throws "not yet implemented" until this issue ships.Proposed semantics
The auto loop:
scope.models.generate(input, { tools, toolMode: 'auto', ...opts }).opts.toolArgValidation: 'strict' | 'lenient' | 'none', default'strict'— models hallucinate JSON shape often enough that this catches real bugs cheaply). Resolve each call againstscope.resources(or the unified MCP tool registry), execute, capture results.Safety limits
The orchestrator enforces three independent limits, any one of which trips a structured budget-exceeded abort with the partial conversation for debugging:
opts.maxToolIterations— bounds the count of tool-call rounds. Default 10.opts.maxTokens— bounds the cumulative prompt + completion + tool-output tokens across the whole loop. No default (uncapped). Lets dependent issues (e.g. Built-in Harper Agent Component #626's built-in agent enforcing per-session token budgets) bound spend at the orchestrator rather than via post-hoc inspection ofanalytics.model_call.opts.maxCostUsd— bounds cumulative USD cost. Per-call cost is computed fromanalytics.model_call.gpu_ms/ token counts × the configured rate card (per-model). No default (uncapped). Resolves the per-session cost-cap ask from Built-in Harper Agent Component #626.A single iteration with a large prompt + large tool result + large response can blow a budget on its own — count is not a proxy for spend.
Parallel tool calls
Both OpenAI and Anthropic backends emit multiple tool calls in a single assistant message. Default execution: parallel — matches model expectations, latency-optimal. Configurable via:
opts.toolParallelism: 'parallel' | 'serial'Within a parallel batch, mutations do not share a transaction boundary — each tool call is its own Resource dispatch and commits independently (see "Transaction scope per turn" below). Apps that need atomicity across multiple steps should wrap the multi-step intent in a single Resource method exposed as one tool, not rely on parallel-batch atomicity.
Tool result size cap
A tool returning 10MB inflates the next round-trip's prompt and can exceed the model's context window. The orchestrator enforces a per-tool-result size cap with a truncation marker:
opts.toolResultMaxBytes— default 64KB. Result exceeding the cap is truncated and a marker (e.g.[…truncated; full result <Nbytes> bytes]) is appended; the truncated form is what feeds back into the model.Cheap to add now; awkward to retrofit once apps depend on full results flowing through.
AbortSignal propagation across the loop
When the caller aborts a
generatemid-loop:AbortSignalon the dispatched Resource call, when supported).Important for long-running agent endpoints (e.g. SSE-served chat completions where the client disconnects).
Tool resolution
Once the native MCP server in #465 lands, the
scope.resourcesregistry and "registered MCP tools" collapse into one unified tool registry — the same set of tools is exposed to external MCP clients (via #618 Application profile) and to in-processscope.modelsorchestration.Sources of tools in
'auto'mode:@export-ed Resource producesget_*,search_*,create_*,update_*,delete_*tools (only the verbs the class implements, RBAC-filtered).mcpToolsdeclaration per #622. For non-verb methods or computed tools.Tool resolution is a lookup in the central tool registry from #615, which already handles RBAC-aware filtering.
scope.modelsconsults the same registry external MCP clients see.Tool name convention
Inherited from #618: tool names are
<verb>_<resource>(get_product,create_order, etc.) with sanitization for path separators. This orchestration uses the same naming so external MCP clients and in-processscope.modelscallers reason about the same set of tools.Auto-discovery via the unified registry
With #618 landed, every
@export-ed Resource is automatically a candidate tool — no explicit registration inGenerateOpts.toolsrequired. The caller can still narrow the exposed set per call (tools: ['search_product', 'get_product']) or pass an explicit list for tighter control. Default behavior in'auto'mode: the full RBAC-filtered registry the calling user can see.Streaming with tool calls
For
generateStreamwithtoolMode: 'auto':The existing
GenerateChunkshape (deltaContent?,deltaToolCalls?,finishReason?) supports this transition without changes.Audit and replication
Tool calls inherit Harper's standard machinery for free:
analytics.model_call(per Add unified model-access API (scope.models) #510). The model call records the tool-call count; individual tool calls show up in the transaction audit log alongside data changes.This is the compressed-stack advantage made concrete: an agent's actions are recorded the same way as a human user's actions, in the same store, queryable with the same primitives.
Transaction scope per turn
Each tool call commits its own transaction. If the agent calls
create_orderand thenupdate_inventoryand the second fails, the first is already committed. This is the right default for composability with non-agent callers (tools-as-Resources behave identically whether called by an agent or directly), but worth calling out explicitly:ConversationResource integration
Optional affordance for callers using
ConversationResource(#511): passopts.conversation?: ConversationResourceand the orchestrator appends each turn (user message, assistant message, tool calls, tool results) as it runs. Without it, conversation persistence is the caller's responsibility — they'd reimplement the same write pattern around every agent loop invocation.The orchestrator does not require a
ConversationResource. Stateless agent calls work fine. When provided, the integration:role: 'tool'turn.The contract: the orchestrator writes turns as it runs; the caller decides whether to provide a conversation at all.
Error handling per tool call
When a tool call fails, default behavior: append the error as the tool result (so the model can react and choose to retry, work around, or abort). Configurable via
opts.toolErrorMode:'recover'(default) — return error to model, continue loop'abort'— abort the loop, return error to callerReturn shape
On success,
scope.models.generate({ toolMode: 'auto' })returns the final assistant response — sameGenerateResultshape callers get from'return'mode, just withouttoolCalls(those were resolved internally). Low overhead; matches the "I just want the answer" common case.Opt in to the full sequence (assistant turns, tool calls, tool results, intermediate responses) via
opts.includeToolTrace: true. Returns the response plus an orderedtrace: ToolTraceEntry[]for inspection and debugging.On error or safety-abort (budget exceeded, abort signal, tool error with
'abort'mode), the trace is included regardless ofincludeToolTrace. Operators need that trace to diagnose; hiding debug info on failure is an antipattern.Acceptance
toolMode: 'auto'orchestration loop implemented end-to-end.scope.resourcesworks for an@export-ed Resource.opts.maxToolIterations,opts.maxTokens, andopts.maxCostUsdindependently enforced; any trip returns a structured budget-exceeded abort with the partial trace.opts.toolParallelism: 'serial'runs serially.opts.toolResultMaxBytes, default 64KB) with truncation marker.opts.toolErrorMode; trace returned on error/abort even whenincludeToolTracewould otherwise be off.opts.toolArgValidation.opts.conversation?: ConversationResourceintegration appends turns as the loop runs.analytics.model_callrow).Dependencies
Hard: model-access API (Add unified model-access API (scope.models) #510), specifically Phase 3 — needs a backend that natively supports tools (
openaibackend, thenanthropic/bedrockfollow).ollamadoes not currently expose tools in a portable way; if/when it does, this orchestration picks it up automatically.Hard: native MCP server family (#465 umbrella), specifically:
mcpToolsdeclaration (for non-verb methods)Previously this dependency was framed against the external
HarperFast/mcp-serveraddon; that path is superseded by the native MCP server (andHarperFast/mcp-serveris being archived per #623).Related
toolMode: 'auto'declared there.turns.tool_callsandturns.tool_resultsrecord what the orchestrator did; optional integration viaopts.conversationper the "ConversationResource integration" section above.Out of scope
opts.toolPermissionsto run the loop with a tighter permission set than the caller's). Tool calls inherit the caller's permissions in v1; tighter scoping is a v1.1 / follow-up consideration.maxToolIterationsalready catches infinite loops eventually. A "same tool, same args, N times consecutively" check would fail faster with a more actionable error — fast-follow if needed; not load-bearing for v1.🤖 Generated with Claude Code