Add agent-loop orchestration / `toolMode: 'auto'` to `scope.models`

# Add agent-loop orchestration / `toolMode: 'auto'` to `scope.models`

## Context

The model-access API (#510) declares `toolMode: 'return' | 'auto'` on `GenerateOpts`. `'return'` mode is trivial (backend returns tool-call requests; caller resolves them externally). `'auto'` mode is the in-process agent loop — `scope.models` resolves model-issued tool calls against `scope.resources` and registered MCP tools, executes them, and re-invokes the backend until the model produces a terminal answer.

That loop is substantial enough to deserve its own issue: tool resolution conventions, MCP integration surface, streaming behavior across tool-call boundaries, audit, error handling, and safety limits all need design work. Implementing it inside #510 would gate the model-access primitives behind 1-2 weeks of agent-loop design. This issue carries that work separately.

Once shipped, `toolMode: 'auto'` is what unlocks Harper's compressed-stack agent story: tool calls resolve in-process against local Resources, audited via the standard transaction log. The **loop itself** is the same regardless of target; the **invocation latency** depends on the tool target — sub-millisecond for Resource-backed tools (in-process dispatch), normal HTTP for tools the unified registry resolves from external MCP servers. The compressed-stack win applies to the Resource-backed case.

## API contract from #510 that this fulfills

```ts
type GenerateOpts = {
  tools?: ToolDef[];
  toolMode?: 'return' | 'auto';
  // ... other fields from #510
};
```

`'return'` mode lands with #510 itself. `'auto'` is type-declared by #510 but throws "not yet implemented" until this issue ships.

## Proposed semantics

The auto loop:

1. Caller invokes `scope.models.generate(input, { tools, toolMode: 'auto', ...opts })`.
2. Backend returns a result.
3. If the result has no tool calls, return to caller. Done.
4. If tool calls are present: **validate each call's arguments against the tool's JSON Schema** (`opts.toolArgValidation: 'strict' | 'lenient' | 'none'`, default `'strict'` — models hallucinate JSON shape often enough that this catches real bugs cheaply). Resolve each call against `scope.resources` (or the unified MCP tool registry), execute, capture results.
5. Append the assistant's tool-call message and the tool results to the message list.
6. Re-invoke the backend with the updated message list.
7. Repeat from step 3 until terminal (no more tool calls) or any safety limit trips.

### Safety limits

The orchestrator enforces three independent limits, any one of which trips a structured budget-exceeded abort with the partial conversation for debugging:

- **`opts.maxToolIterations`** — bounds the *count* of tool-call rounds. Default 10.
- **`opts.maxTokens`** — bounds the cumulative prompt + completion + tool-output tokens across the whole loop. No default (uncapped). Lets dependent issues (e.g. #626's built-in agent enforcing per-session token budgets) bound spend at the orchestrator rather than via post-hoc inspection of `analytics.model_call`.
- **`opts.maxCostUsd`** — bounds cumulative USD cost. Per-call cost is computed from `analytics.model_call.gpu_ms` / token counts × the configured rate card (per-model). No default (uncapped). Resolves the per-session cost-cap ask from #626.

A single iteration with a large prompt + large tool result + large response can blow a budget on its own — count is not a proxy for spend.

### Parallel tool calls

Both OpenAI and Anthropic backends emit multiple tool calls in a single assistant message. **Default execution: parallel** — matches model expectations, latency-optimal. Configurable via:

- `opts.toolParallelism: 'parallel' | 'serial'`

Within a parallel batch, mutations **do not share a transaction boundary** — each tool call is its own Resource dispatch and commits independently (see "Transaction scope per turn" below). Apps that need atomicity across multiple steps should wrap the multi-step intent in a single Resource method exposed as one tool, not rely on parallel-batch atomicity.

### Tool result size cap

A tool returning 10MB inflates the next round-trip's prompt and can exceed the model's context window. The orchestrator enforces a per-tool-result size cap with a truncation marker:

- `opts.toolResultMaxBytes` — default **64KB**. Result exceeding the cap is truncated and a marker (e.g. `[…truncated; full result <Nbytes> bytes]`) is appended; the truncated form is what feeds back into the model.

Cheap to add now; awkward to retrofit once apps depend on full results flowing through.

### AbortSignal propagation across the loop

When the caller aborts a `generate` mid-loop:

1. Cancel the upstream LLM call in flight.
2. Propagate the abort into in-flight tool executions via the tool's invocation context (`AbortSignal` on the dispatched Resource call, when supported).
3. Already-committed writes from completed tool calls **stay committed** — no implicit rollback. The orchestrator returns a structured abort error.

Important for long-running agent endpoints (e.g. SSE-served chat completions where the client disconnects).

## Tool resolution

Once the native MCP server in [#465](https://github.com/HarperFast/harper/issues/465) lands, the `scope.resources` registry and "registered MCP tools" collapse into one unified tool registry — the same set of tools is exposed to external MCP clients (via [#618](https://github.com/HarperFast/harper/issues/618) Application profile) and to in-process `scope.models` orchestration.

Sources of tools in `'auto'` mode:

1. **Auto-generated from the Resources registry** via [#618](https://github.com/HarperFast/harper/issues/618). Each `@export`-ed Resource produces `get_*`, `search_*`, `create_*`, `update_*`, `delete_*` tools (only the verbs the class implements, RBAC-filtered).
2. **Custom Resource methods** opted in via static `mcpTools` declaration per [#622](https://github.com/HarperFast/harper/issues/622). For non-verb methods or computed tools.

Tool resolution is a lookup in the central tool registry from [#615](https://github.com/HarperFast/harper/issues/615), which already handles RBAC-aware filtering. `scope.models` consults the same registry external MCP clients see.

### Tool name convention

Inherited from [#618](https://github.com/HarperFast/harper/issues/618): tool names are `<verb>_<resource>` (`get_product`, `create_order`, etc.) with sanitization for path separators. This orchestration uses the same naming so external MCP clients and in-process `scope.models` callers reason about the same set of tools.

### Auto-discovery via the unified registry

With #618 landed, every `@export`-ed Resource is automatically a candidate tool — no explicit registration in `GenerateOpts.tools` required. The caller can still narrow the exposed set per call (`tools: ['search_product', 'get_product']`) or pass an explicit list for tighter control. Default behavior in `'auto'` mode: the full RBAC-filtered registry the calling user can see.

## Streaming with tool calls

For `generateStream` with `toolMode: 'auto'`:

- Yields content deltas as they arrive from the backend.
- When the model emits a tool-call delta, the iterator yields that delta (signaling the consumer; useful for UI "agent is calling X" indicators).
- The orchestrator suspends the upstream stream, executes the tool call(s), feeds results back to the backend.
- The new stream from the backend resumes, yielding more deltas.
- Loop until terminal.

The existing `GenerateChunk` shape (`deltaContent?`, `deltaToolCalls?`, `finishReason?`) supports this transition without changes.

## Audit and replication

Tool calls inherit Harper's standard machinery for free:

- Each tool call goes through the Resource dispatch — auth chain runs (tool calls have the caller's permissions), the call is logged in the transaction log, mutations replicate.
- The model call itself is logged in `analytics.model_call` (per #510). The model call records the tool-call count; individual tool calls show up in the transaction audit log alongside data changes.
- Querying "what did this agent actually do" is a regular Harper query against the audit log filtered by request / conversation / tenant.

This is the compressed-stack advantage made concrete: an agent's actions are recorded the same way as a human user's actions, in the same store, queryable with the same primitives.

### Transaction scope per turn

**Each tool call commits its own transaction.** If the agent calls `create_order` and then `update_inventory` and the second fails, the first is already committed. This is the right default for composability with non-agent callers (tools-as-Resources behave identically whether called by an agent or directly), but worth calling out explicitly:

- Multi-step atomic intent should be wrapped in a single Resource method (exposed as one tool), not relied on across multiple tool calls.
- The orchestrator does not provide automatic rollback or saga semantics across the loop.

## ConversationResource integration

Optional affordance for callers using `ConversationResource` (#511): pass `opts.conversation?: ConversationResource` and the orchestrator appends each turn (user message, assistant message, tool calls, tool results) as it runs. Without it, conversation persistence is the caller's responsibility — they'd reimplement the same write pattern around every agent loop invocation.

The orchestrator does **not require** a `ConversationResource`. Stateless agent calls work fine. When provided, the integration:

- Appends the input as a user/system turn before the loop starts.
- For each tool call: appends the assistant's tool-call turn and the tool result(s) as a `role: 'tool'` turn.
- Appends the final assistant response as a turn before returning.
- Uses #511's streaming-append protocol (open → chunk → commit) on the streaming path.

The contract: the orchestrator writes turns as it runs; the caller decides whether to provide a conversation at all.

## Error handling per tool call

When a tool call fails, default behavior: append the error as the tool result (so the model can react and choose to retry, work around, or abort). Configurable via `opts.toolErrorMode`:

- `'recover'` (default) — return error to model, continue loop
- `'abort'` — abort the loop, return error to caller

## Return shape

On success, `scope.models.generate({ toolMode: 'auto' })` returns the final assistant response — same `GenerateResult` shape callers get from `'return'` mode, just without `toolCalls` (those were resolved internally). Low overhead; matches the "I just want the answer" common case.

Opt in to the full sequence (assistant turns, tool calls, tool results, intermediate responses) via `opts.includeToolTrace: true`. Returns the response plus an ordered `trace: ToolTraceEntry[]` for inspection and debugging.

**On error or safety-abort (budget exceeded, abort signal, tool error with `'abort'` mode), the trace is included regardless** of `includeToolTrace`. Operators need that trace to diagnose; hiding debug info on failure is an antipattern.

## Acceptance

- [ ] `toolMode: 'auto'` orchestration loop implemented end-to-end.
- [ ] Tool resolution against `scope.resources` works for an `@export`-ed Resource.
- [ ] Tool resolution against MCP tools registered via the native MCP server family (#615 / #618 / #622) works.
- [ ] `opts.maxToolIterations`, `opts.maxTokens`, and `opts.maxCostUsd` independently enforced; any trip returns a structured budget-exceeded abort with the partial trace.
- [ ] Parallel tool-call execution by default; `opts.toolParallelism: 'serial'` runs serially.
- [ ] Per-tool-result size cap (`opts.toolResultMaxBytes`, default 64KB) with truncation marker.
- [ ] AbortSignal propagation: caller-side abort cancels upstream LLM call AND in-flight tool executions; already-committed writes stay committed.
- [ ] Transaction scope per turn explicitly documented — each tool call commits independently.
- [ ] Streaming + tool calls: chunks yield correctly through tool-execution transitions.
- [ ] Tool calls audited in the transaction log alongside data writes (inherited from Resource dispatch).
- [ ] Permissions enforced — tool calls run with caller's identity.
- [ ] Error handling per `opts.toolErrorMode`; trace returned on error/abort even when `includeToolTrace` would otherwise be off.
- [ ] Tool argument validation strict by default; configurable per `opts.toolArgValidation`.
- [ ] Optional `opts.conversation?: ConversationResource` integration appends turns as the loop runs.
- [ ] Documented: building an agent with auto mode; how to register a Resource as a tool; tool error handling; budget enforcement; abort semantics; conversation integration.
- [ ] Per-tenant accounting on the model calls is preserved through the loop (each round-trip writes its own `analytics.model_call` row).

## Dependencies

- **Hard**: model-access API (#510), specifically Phase 3 — needs a backend that natively supports tools (`openai` backend, then `anthropic` / `bedrock` follow). `ollama` does not currently expose tools in a portable way; if/when it does, this orchestration picks it up automatically.
- **Hard**: native MCP server family ([#465](https://github.com/HarperFast/harper/issues/465) umbrella), specifically:
  - [#615](https://github.com/HarperFast/harper/issues/615) — Tool registry + class-level introspection + RBAC-aware filtering (the registry this orchestration consults)
  - [#618](https://github.com/HarperFast/harper/issues/618) — Application profile (auto-generates per-Resource tools the orchestration resolves against)
  - [#622](https://github.com/HarperFast/harper/issues/622) — Custom Resource opt-in via static `mcpTools` declaration (for non-verb methods)

  Previously this dependency was framed against the external `HarperFast/mcp-server` addon; that path is superseded by the native MCP server (and `HarperFast/mcp-server` is being archived per [#623](https://github.com/HarperFast/harper/issues/623)).

## Related

- Pairs with #510 — this implements `toolMode: 'auto'` declared there.
- Pairs with #465 (native MCP server umbrella) — the tool registry this orchestration consults.
- Pairs with #511 (ConversationResource) — `turns.tool_calls` and `turns.tool_results` record what the orchestrator did; optional integration via `opts.conversation` per the "ConversationResource integration" section above.
- Downstream consumer: #626 (Built-in Harper Agent Component) — uses this orchestration as its loop; surfaced the per-call cost-cap and per-turn cost-surfacing asks that informed the safety-limits section.

## Out of scope

- Tool catalog / discovery UX in Studio.
- Cross-tenant tool invocation.
- A full multi-agent orchestration layer (this issue does single-agent tool-call loops; multi-agent coordination is its own design space).
- **Per-model-call permission scoping** (`opts.toolPermissions` to run the loop with a tighter permission set than the caller's). Tool calls inherit the caller's permissions in v1; tighter scoping is a v1.1 / follow-up consideration.
- **Structural cycle detection.** `maxToolIterations` already catches infinite loops eventually. A "same tool, same args, N times consecutively" check would fail faster with a more actionable error — fast-follow if needed; not load-bearing for v1.


---

🤖 Generated with [Claude Code](https://claude.com/claude-code)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add agent-loop orchestration / `toolMode: 'auto'` to `scope.models` #612

Add agent-loop orchestration / `toolMode: 'auto'` to `scope.models`

Context

API contract from #510 that this fulfills

Proposed semantics

Safety limits

Parallel tool calls

Tool result size cap

AbortSignal propagation across the loop

Tool resolution

Tool name convention

Auto-discovery via the unified registry

Streaming with tool calls

Audit and replication

Transaction scope per turn

ConversationResource integration

Error handling per tool call

Return shape

Acceptance

Dependencies

Related

Out of scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add agent-loop orchestration / toolMode: 'auto' to scope.models #612

Description

Add agent-loop orchestration / toolMode: 'auto' to scope.models

Context

API contract from #510 that this fulfills

Proposed semantics

Safety limits

Parallel tool calls

Tool result size cap

AbortSignal propagation across the loop

Tool resolution

Tool name convention

Auto-discovery via the unified registry

Streaming with tool calls

Audit and replication

Transaction scope per turn

ConversationResource integration

Error handling per tool call

Return shape

Acceptance

Dependencies

Related

Out of scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Add agent-loop orchestration / `toolMode: 'auto'` to `scope.models` #612

Add agent-loop orchestration / `toolMode: 'auto'` to `scope.models`