diff --git a/CHANGELOG.md b/CHANGELOG.md index da1b259..e616943 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,10 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The ## [Unreleased] +### Changed + +- **Docs sweep: stale references and em-dash normalization.** Fixed three definite stale references (`spec_version='0.26.0'` in the Langfuse example output now reads `'0.38.0'`; the dangling `v0.16.1` qualifier dropped from the parallel-branches concept page; `compiled.attach_observer` corrected to `graph.attach_observer` in `non-obvious-shapes.md` for variable-name consistency with the rest of the docs). Swept em dashes out of the user-facing docs (130 instances across 17 files) per the convention set during the patterns expansion. mkdocs strict build clean; no broken intra-docs links. + ### Added - **vLLM production deployment notes.** `docs/model-providers/vllm.md` grows a "Production deployment" section covering the `VLLM_HTTP_TIMEOUT_KEEP_ALIVE` gotcha (vLLM's stock 5s uvicorn keep-alive lapses pooled OA-side httpx connections and surfaces as `ProviderUnavailable`; widen to roughly 300s), a systemd unit skeleton, and the three throughput knobs that interact with OA's shared connection pool (`--max-model-len`, `--max-num-seqs`, `--gpu-memory-utilization`). The existing "Tool calling" section grows a `--tool-call-parser` family table verified against vLLM's docs (Llama 3.x / Llama 4 / Mistral / Hermes / Qwen3 / DeepSeek V3 / GPT-OSS), plus explicit "not supported here" callouts for Anthropic / Gemini (proprietary cloud) and mainstream Gemma (no vLLM parser). diff --git a/docs/agent/non-obvious-shapes.md b/docs/agent/non-obvious-shapes.md index e89a9ea..f582c5c 100644 --- a/docs/agent/non-obvious-shapes.md +++ b/docs/agent/non-obvious-shapes.md @@ -4,7 +4,7 @@ Recipes that aren't deducible from the API surface alone. The primitives docs te ### Declare a non-clobbering reducer on accumulator list fields -State fields default to `last_write_wins` — each node's write replaces the prior value for that field. For scalar fields (`status: str`, `count: int`) that's usually what you want. For list fields that accumulate contributions across multiple nodes (`messages: list[Message]`, `events: list[Event]`, `results: list[Result]`), it's the wrong default — every node's contribution silently clobbers everything before it. +State fields default to `last_write_wins`: each node's write replaces the prior value for that field. For scalar fields (`status: str`, `count: int`) that's usually what you want. For list fields that accumulate contributions across multiple nodes (`messages: list[Message]`, `events: list[Event]`, `results: list[Result]`), it's the wrong default; every node's contribution silently clobbers everything before it. Declare `append` (or another non-clobbering reducer) at the state class: @@ -19,15 +19,15 @@ class WorkflowState(State): final_status: str = "pending" # last_write_wins is fine here ``` -The failure mode without `append` is silent and easy to misdiagnose — the final state shows only the last node's contribution to the list, with no error. Common "why is my accumulator empty?" question. `merge` is the equivalent for `dict[str, V]` fields that accumulate keys across nodes. +The failure mode without `append` is silent and easy to misdiagnose: the final state shows only the last node's contribution to the list, with no error. Common "why is my accumulator empty?" question. `merge` is the equivalent for `dict[str, V]` fields that accumulate keys across nodes. ### Branch on `Response.finish_reason` before reading `message.content` After `await provider.complete(messages, tools=[...])` returns, the shape of `Response` varies by `finish_reason`: -- `finish_reason == "stop"` — assistant produced a content response. `message.content` carries the text; `message.tool_calls` is empty. -- `finish_reason == "tool_calls"` — assistant emitted tool calls. `message.tool_calls` carries the list; `message.content` is typically empty (model didn't say anything beyond the tool calls). -- `finish_reason == "length"` / `"content_filter"` / `"error"` — completion was cut off or refused; `message.content` may be partial or empty. +- `finish_reason == "stop"`: assistant produced a content response. `message.content` carries the text; `message.tool_calls` is empty. +- `finish_reason == "tool_calls"`: assistant emitted tool calls. `message.tool_calls` carries the list; `message.content` is typically empty (model didn't say anything beyond the tool calls). +- `finish_reason == "length"` / `"content_filter"` / `"error"`: completion was cut off or refused; `message.content` may be partial or empty. Post-LLM logic that reads `message.content` without checking `finish_reason` misses the entire tool-calling path: @@ -48,11 +48,11 @@ else: The discriminator is one branch; missing it gives you empty data on tool-call responses and silently wrong behavior on truncations. -### `disable_llm_payload` defaults to `True` — flip it for LLM-aware observability backends +### `disable_llm_payload` defaults to `True`: flip it for LLM-aware observability backends The `OTelObserver` (and any spec-conformant observer reading LLM events) defaults `disable_llm_payload: bool = True` per spec §5.5's "default-off by privacy" framing. Without flipping the flag, LLM spans carry GenAI semconv attributes (token counts, model name, finish reason) but NOT the message payload (input messages, response content, request extras). -That's the right default for general OpenArmature use — payloads may contain PII the user hasn't audited, and storage cost grows with prompt size. But it's the WRONG default if you're wiring up an LLM-aware observability backend (Langfuse, Phoenix, Honeycomb's LLM lens) that renders the message stream as part of its generation view. Backends will show "empty" generations and you'll wonder why. +That's the right default for general OpenArmature use: payloads may contain PII the user hasn't audited, and storage cost grows with prompt size. But it's the WRONG default if you're wiring up an LLM-aware observability backend (Langfuse, Phoenix, Honeycomb's LLM lens) that renders the message stream as part of its generation view. Backends will show "empty" generations and you'll wonder why. Flip the flag once at observer construction: @@ -63,36 +63,36 @@ observer = OTelObserver( span_processor=your_exporter, disable_llm_payload=False, # opt in to message-payload attributes ) -compiled.attach_observer(observer) +graph.attach_observer(observer) ``` -The companion `disable_genai_semconv` flag defaults to `False` — GenAI semconv attributes emit by default since they're how LLM-aware backends render anything at all. Don't flip that one unless you're routing GenAI emission through a different layer. +The companion `disable_genai_semconv` flag defaults to `False`: GenAI semconv attributes emit by default since they're how LLM-aware backends render anything at all. Don't flip that one unless you're routing GenAI emission through a different layer. ### Use the bundled `FilesystemCheckpointer` or `SQLiteCheckpointer`, not a hand-rolled serializer -The temptation when persisting graph state is to `json.dumps(state.model_dump())` and write to a file. Don't. The shipped Checkpointer backends handle every contract `openarmature.checkpoint.Checkpointer` defines — round-trip integrity, `parent_states` for inner-save resume, fan-out progress tracking, schema-version migration, listing by `correlation_id`, `CheckpointRecordInvalid` on shape drift. A hand-rolled serializer that "works" on the happy path silently fails the moment a fan-out crash leaves an in-flight save record, and you'll be debugging it for hours before realizing the bundled backend exists. +The temptation when persisting graph state is to `json.dumps(state.model_dump())` and write to a file. Don't. The shipped Checkpointer backends handle every contract `openarmature.checkpoint.Checkpointer` defines: round-trip integrity, `parent_states` for inner-save resume, fan-out progress tracking, schema-version migration, listing by `correlation_id`, `CheckpointRecordInvalid` on shape drift. A hand-rolled serializer that "works" on the happy path silently fails the moment a fan-out crash leaves an in-flight save record, and you'll be debugging it for hours before realizing the bundled backend exists. -If your storage requirement isn't local disk (`FilesystemCheckpointer`) or local SQLite (`SQLiteCheckpointer` — also supports `:memory:` and arbitrary file paths), implement the `Checkpointer` Protocol against your backend rather than wrapping state serialization yourself. Custom backends inherit the spec's correctness contract for free. +If your storage requirement isn't local disk (`FilesystemCheckpointer`) or local SQLite (`SQLiteCheckpointer`, which also supports `:memory:` and arbitrary file paths), implement the `Checkpointer` Protocol against your backend rather than wrapping state serialization yourself. Custom backends inherit the spec's correctness contract for free. ### Subgraphs > conditional-edge spaghetti when branches don't share state A common shape is "after this LLM call, route to either a JSON-extraction node or a tool-dispatch node depending on `finish_reason`." The naive solution is two conditional edges from the LLM node, one to each downstream. That works for two branches; it scales poorly past three. -When the branches operate on different sub-shapes of state — e.g., one path is "extract JSON, then validate" while another is "dispatch tools, loop until done, then summarize" — encapsulate each as a `SubgraphNode` and route from the LLM node to the right subgraph. Each subgraph has its own state schema (projected from the parent), its own entry node, and its own internal topology. The parent graph becomes a switchboard with a few edges; the complexity lives one layer down where it composes cleanly. +When the branches operate on different sub-shapes of state (e.g., one path is "extract JSON, then validate" while another is "dispatch tools, loop until done, then summarize"), encapsulate each as a `SubgraphNode` and route from the LLM node to the right subgraph. Each subgraph has its own state schema (projected from the parent), its own entry node, and its own internal topology. The parent graph becomes a switchboard with a few edges; the complexity lives one layer down where it composes cleanly. ### `OpenAIProvider.ready()` exercises `chat/completions` by default; opt back into the catalog-only probe for cost-sensitive callers -`OpenAIProvider(..., readiness_probe=...)` accepts `"chat_completions"` (default), `"models"`, or `"both"`. The default issues `POST /v1/chat/completions` with a `max_tokens=1` body so a green `ready()` actually proves the inference wire path works, not just that the catalog endpoint answers. The motivating failure class: OpenAI-compatible proxies (Bifrost is the field-reported case) that return 200 on `GET /v1/models` while 405'ing the completions endpoint — the previous catalog-only default reported ready and every real call broke. The `"models"` opt-in is the old behavior, useful for cost-sensitive cloud callers where every `ready()` would otherwise bill one prompt's worth of tokens. `"both"` runs catalog then chat — strongest signal at double the cost. Non-200 responses on either probe route through `classify_http_error`, so the canonical error categories (`ProviderAuthentication`, `ProviderUnavailable`, `ProviderInvalidModel`, etc.) surface consistently regardless of which probe ran. +`OpenAIProvider(..., readiness_probe=...)` accepts `"chat_completions"` (default), `"models"`, or `"both"`. The default issues `POST /v1/chat/completions` with a `max_tokens=1` body so a green `ready()` actually proves the inference wire path works, not just that the catalog endpoint answers. The motivating failure class: OpenAI-compatible proxies (Bifrost is the field-reported case) that return 200 on `GET /v1/models` while 405'ing the completions endpoint, so the previous catalog-only default reported ready and every real call broke. The `"models"` opt-in is the old behavior, useful for cost-sensitive cloud callers where every `ready()` would otherwise bill one prompt's worth of tokens. `"both"` runs catalog then chat, giving the strongest signal at double the cost. Non-200 responses on either probe route through `classify_http_error`, so the canonical error categories (`ProviderAuthentication`, `ProviderUnavailable`, `ProviderInvalidModel`, etc.) surface consistently regardless of which probe ran. ### Be explicit with `tool_choice`; don't trust the provider's default -`Provider.complete(messages, tools, tool_choice=...)` accepts `"auto"`, `"required"`, `"none"`, or a `ForceTool(name=...)` record. When you omit `tool_choice`, the OpenAI provider's own default applies — usually `"auto"` when `tools` is non-empty, but documented per-provider. A pipeline that wants deterministic tool-calling (a routing node that MUST produce a tool call, a guarded LLM call that MUST NOT call tools) should pin `tool_choice` explicitly rather than relying on the provider default. +`Provider.complete(messages, tools, tool_choice=...)` accepts `"auto"`, `"required"`, `"none"`, or a `ForceTool(name=...)` record. When you omit `tool_choice`, the OpenAI provider's own default applies (usually `"auto"` when `tools` is non-empty, but documented per-provider). A pipeline that wants deterministic tool-calling (a routing node that MUST produce a tool call, a guarded LLM call that MUST NOT call tools) should pin `tool_choice` explicitly rather than relying on the provider default. -Pre-send validation catches the three §5 failure modes (`required` with empty tools, `ForceTool` with empty tools, `ForceTool.name` not in tools) and raises `ProviderInvalidRequest` before the HTTP call. Not all providers honor `tool_choice` — confirm with your provider's docs — but the OpenAI-compatible mapping is in `OpenAIProvider`. +Pre-send validation catches the three §5 failure modes (`required` with empty tools, `ForceTool` with empty tools, `ForceTool.name` not in tools) and raises `ProviderInvalidRequest` before the HTTP call. Not all providers honor `tool_choice` (confirm with your provider's docs), but the OpenAI-compatible mapping is in `OpenAIProvider`. ### Always `await graph.drain()` in short-lived processes; supply a `timeout` if observers might hang -`CompiledGraph.invoke()` returns when the graph reaches END or raises; observer events are dispatched onto a per-invocation queue and delivered by a background worker. The graph's execution loop never awaits observer processing. In a long-running service this is invisible — the worker drains naturally. In a CLI, script, or serverless function, the process exits before the worker finishes, and any late observer events (typically the last node's `completed` event plus any `checkpoint_saved` events) get dropped. +`CompiledGraph.invoke()` returns when the graph reaches END or raises; observer events are dispatched onto a per-invocation queue and delivered by a background worker. The graph's execution loop never awaits observer processing. In a long-running service this is invisible: the worker drains naturally. In a CLI, script, or serverless function, the process exits before the worker finishes, and any late observer events (typically the last node's `completed` event plus any `checkpoint_saved` events) get dropped. Always call `await graph.drain()` before the short-lived process exits. If your observer set includes anything that might hang (a metrics observer with a flaky network endpoint, an OTel exporter behind a slow OTLP collector), supply a `timeout`: @@ -102,32 +102,32 @@ if summary.timeout_reached: log.warning("drain truncated: %d events undelivered", summary.undelivered_count) ``` -The compiled graph stays usable for subsequent invocations after a timed-out drain — workers are cancelled cleanly, no partial state leaks. +The compiled graph stays usable for subsequent invocations after a timed-out drain: workers are cancelled cleanly, no partial state leaks. ### `install_log_bridge` skips its own handler when the application already attached one to the same `LoggerProvider` Two distinct classes both named `LoggingHandler` exist in the OTel Python ecosystem and both bridge stdlib log records to the OTel Logs SDK: -- `opentelemetry.sdk._logs.LoggingHandler` (the SDK class). Typically attached by an application's own logging setup — e.g., a FastAPI `setup_logging(...)` step that wires up an OTLP-backed `LoggerProvider` for log export. +- `opentelemetry.sdk._logs.LoggingHandler` (the SDK class). Typically attached by an application's own logging setup, e.g., a FastAPI `setup_logging(...)` step that wires up an OTLP-backed `LoggerProvider` for log export. - `opentelemetry.instrumentation.logging.handler.LoggingHandler` (the instrumentation class). What `openarmature.observability.otel.install_log_bridge` attaches when it runs. -Different classes, same OTel-Logs export path. If both are attached against the same `LoggerProvider`, every stdlib log record fires through both handlers, both call `provider.get_logger(...).emit(...)`, and `BatchLogRecordProcessor` ships the record TWICE to the OTLP endpoint. The duplication is OTLP-only — a console handler attached separately is unaffected, which makes "OTLP rows are doubled, console isn't" a head-scratcher to diagnose. +Different classes, same OTel-Logs export path. If both are attached against the same `LoggerProvider`, every stdlib log record fires through both handlers, both call `provider.get_logger(...).emit(...)`, and `BatchLogRecordProcessor` ships the record TWICE to the OTLP endpoint. The duplication is OTLP-only; a console handler attached separately is unaffected, which makes "OTLP rows are doubled, console isn't" a head-scratcher to diagnose. -`install_log_bridge` detects either handler class against the same provider and skips its own `addHandler` accordingly; the `openarmature.correlation_id` LogRecord factory still installs. The check is provider-scoped, so an application that intentionally attaches a handler against a DIFFERENT `LoggerProvider` (a separate logs pipeline) still gets the OA bridge against the OA provider — the helper only dedups when the SAME provider would receive duplicate emissions. +`install_log_bridge` detects either handler class against the same provider and skips its own `addHandler` accordingly; the `openarmature.correlation_id` LogRecord factory still installs. The check is provider-scoped, so an application that intentionally attaches a handler against a DIFFERENT `LoggerProvider` (a separate logs pipeline) still gets the OA bridge against the OA provider; the helper only dedups when the SAME provider would receive duplicate emissions. ### Three exception hierarchies; know which one your code catches `openarmature` exceptions split across three sibling hierarchies: -- `RuntimeGraphError` (in `openarmature.graph`) — node execution failures: `NodeException`, `RoutingError`, `EdgeException`, `ReducerError`, `StateValidationError`. Each has a `category` string matching the spec's canonical error categories. -- `CheckpointError` (in `openarmature.checkpoint`) — persistence failures: `CheckpointNotFound`, `CheckpointSaveFailed`, `CheckpointRecordInvalid`, `CheckpointStateMigrationMissing`, `CheckpointStateMigrationFailed`, `CheckpointStateMigrationChainAmbiguous`. -- `LlmProviderError` (in `openarmature.llm`) — provider call failures: `ProviderAuthentication`, `ProviderInvalidRequest`, `ProviderInvalidResponse`, `ProviderInvalidModel`, `ProviderModelNotLoaded`, `ProviderRateLimit`, `ProviderUnavailable`, `ProviderUnsupportedContentBlock`, `StructuredOutputInvalid`. +- `RuntimeGraphError` (in `openarmature.graph`): node execution failures: `NodeException`, `RoutingError`, `EdgeException`, `ReducerError`, `StateValidationError`. Each has a `category` string matching the spec's canonical error categories. +- `CheckpointError` (in `openarmature.checkpoint`): persistence failures: `CheckpointNotFound`, `CheckpointSaveFailed`, `CheckpointRecordInvalid`, `CheckpointStateMigrationMissing`, `CheckpointStateMigrationFailed`, `CheckpointStateMigrationChainAmbiguous`. +- `LlmProviderError` (in `openarmature.llm`): provider call failures: `ProviderAuthentication`, `ProviderInvalidRequest`, `ProviderInvalidResponse`, `ProviderInvalidModel`, `ProviderModelNotLoaded`, `ProviderRateLimit`, `ProviderUnavailable`, `ProviderUnsupportedContentBlock`, `StructuredOutputInvalid`. -Catching `Exception` works but is too broad; catching one hierarchy misses the other two. If you want to branch on category strings (e.g., for retry logic), catch the relevant base — `RuntimeGraphError` covers all five spec runtime categories, `LlmProviderError` covers all nine provider categories, `CheckpointError` covers all six checkpoint categories. The `TRANSIENT_CATEGORIES` frozenset in `openarmature.llm` enumerates which provider categories are retriable. +Catching `Exception` works but is too broad; catching one hierarchy misses the other two. If you want to branch on category strings (e.g., for retry logic), catch the relevant base: `RuntimeGraphError` covers all five spec runtime categories, `LlmProviderError` covers all nine provider categories, `CheckpointError` covers all six checkpoint categories. The `TRANSIENT_CATEGORIES` frozenset in `openarmature.llm` enumerates which provider categories are retriable. ### Filter `openarmature.*`-namespaced events when your observer only cares about user nodes -OA emits observer events under sentinel node-names for its own internal dispatch: `openarmature.llm.complete` for LLM provider calls (proposal 0024), `openarmature.checkpoint.migrate` for state-migration runs (proposal 0014), `openarmature.checkpoint.save` for checkpoint saves (proposal 0010). These events let the OTel / Langfuse observers emit LLM-provider spans, checkpoint-migrate spans, etc. — but a custom observer that only cares about user-defined node activity sees them as noise: +OA emits observer events under sentinel node-names for its own internal dispatch: `openarmature.llm.complete` for LLM provider calls (proposal 0024), `openarmature.checkpoint.migrate` for state-migration runs (proposal 0014), `openarmature.checkpoint.save` for checkpoint saves (proposal 0010). These events let the OTel / Langfuse observers emit LLM-provider spans, checkpoint-migrate spans, etc., but a custom observer that only cares about user-defined node activity sees them as noise: ```python async def __call__(self, event: NodeEvent) -> None: @@ -137,11 +137,11 @@ async def __call__(self, event: NodeEvent) -> None: # … user-node handling ``` -`event.namespace[0]` is the safest discriminator (the leaf `event.node_name` would also work for LLM events but won't match the checkpoint sentinels since those repurpose `node_name` differently). Don't try to filter on `current_invocation_id() is None` — OA-internal events are dispatched within the same invocation context as user-node events, so `invocation_id` is set for both; the namespace-prefix check is the stable contract. +`event.namespace[0]` is the safest discriminator (the leaf `event.node_name` would also work for LLM events but won't match the checkpoint sentinels since those repurpose `node_name` differently). Don't try to filter on `current_invocation_id() is None`: OA-internal events are dispatched within the same invocation context as user-node events, so `invocation_id` is set for both; the namespace-prefix check is the stable contract. ### Fan-out subgraphs that emit `list[X]` per instance produce `list[list[X]]` at `target_field` -When a fan-out's per-instance state collects a `list[X]` as its `collect_field` (e.g., each instance produces 0..N records), the engine's contribution step is `[s[cfg.collect_field] for s in successes]` — every instance's value becomes one element of the outer list. With `list[X]` per-instance, the parent receives `list[list[X]]`, and the default `append` reducer on the parent's `Annotated[list[X], append]` field preserves the nesting verbatim. Pydantic then fails to validate each `list[X]` element against `X`: +When a fan-out's per-instance state collects a `list[X]` as its `collect_field` (e.g., each instance produces 0..N records), the engine's contribution step is `[s[cfg.collect_field] for s in successes]`; every instance's value becomes one element of the outer list. With `list[X]` per-instance, the parent receives `list[list[X]]`, and the default `append` reducer on the parent's `Annotated[list[X], append]` field preserves the nesting verbatim. Pydantic then fails to validate each `list[X]` element against `X`: ``` attributed_candidates.0 Input should be a valid dictionary or @@ -149,7 +149,7 @@ attributed_candidates.0 Input should be a valid dictionary or input_type=list] ``` -The fix is the `concat_flatten` built-in reducer (proposal 0036) — the list-of-lists analog of `append`. Declare it on the parent's collection field: +The fix is the `concat_flatten` built-in reducer (proposal 0036), the list-of-lists analog of `append`. Declare it on the parent's collection field: ```python from typing import Annotated @@ -162,7 +162,7 @@ class PipelineState(State): attributed_candidates: Annotated[list[ClaimCandidate], concat_flatten] = Field(default_factory=list) ``` -`concat_flatten` folds the per-instance lists into one flat list (`[*prior, *(item for sublist in update for item in sublist)]`), strict like `append` — it raises `ReducerError` if any element of the update isn't itself a list. +`concat_flatten` folds the per-instance lists into one flat list (`[*prior, *(item for sublist in update for item in sublist)]`), strict like `append`: it raises `ReducerError` if any element of the update isn't itself a list. The dict-shaped analog is `merge_all` (also proposal 0036): when each fan-out instance contributes a `dict[str, X]`, the parent's `target_field` receives `list[dict]`, which plain `merge` can't consume. `merge_all` folds the sequence of mappings into the prior with shallow last-write-wins per key: @@ -177,6 +177,6 @@ class PipelineState(State): keyed_results: Annotated[dict[str, Result], merge_all] = Field(default_factory=dict) ``` -Single-record-per-instance fan-outs (`collect_field: str`, parent field `Annotated[list[X], append]`) don't hit this — the engine still wraps each instance's value as one element, but `append` flattens it correctly since each element is already an `X`. The two non-flat shapes emerge only when the per-instance value is itself a container: a `list[X]` per instance lands `list[list[X]]` (use `concat_flatten`), and a `dict[str, X]` per instance lands `list[dict]` (use `merge_all`). +Single-record-per-instance fan-outs (`collect_field: str`, parent field `Annotated[list[X], append]`) don't hit this; the engine still wraps each instance's value as one element, but `append` flattens it correctly since each element is already an `X`. The two non-flat shapes emerge only when the per-instance value is itself a container: a `list[X]` per instance lands `list[list[X]]` (use `concat_flatten`), and a `dict[str, X]` per instance lands `list[dict]` (use `merge_all`). -If a parent field is populated by BOTH direct node writes AND fan-out collection, that's an architectural ambiguity worth fixing upstream — split into two fields, or pick one path. +If a parent field is populated by BOTH direct node writes AND fan-out collection, that's an architectural ambiguity worth fixing upstream: split into two fields, or pick one path. diff --git a/docs/agent/tldr.md b/docs/agent/tldr.md index 592c931..f348524 100644 --- a/docs/agent/tldr.md +++ b/docs/agent/tldr.md @@ -1,3 +1,3 @@ -OpenArmature is a workflow framework for LLM pipelines and tool-calling agents — typed state, compile-time topology checks, observability, and crash-safe checkpoints baked into a graph engine. The graph layer has no concept of LLMs or tools; the same primitives drive deterministic ETL pipelines and tool-calling agents alike. Nodes return partial updates; the engine merges into a frozen state snapshot. Behavior is defined by [openarmature-spec](https://openarmature.org/capabilities/) and verified by conformance fixtures; this package is the reference Python implementation. +OpenArmature is a workflow framework for LLM pipelines and tool-calling agents: typed state, compile-time topology checks, observability, and crash-safe checkpoints baked into a graph engine. The graph layer has no concept of LLMs or tools; the same primitives drive deterministic ETL pipelines and tool-calling agents alike. Nodes return partial updates; the engine merges into a frozen state snapshot. Behavior is defined by [openarmature-spec](https://openarmature.org/capabilities/) and verified by conformance fixtures; this package is the reference Python implementation. **What OpenArmature is NOT:** not a chat framework (no built-in messages channel), not an LLM SDK (Provider is the abstraction layer; OpenAIProvider is the canonical impl), not a state-management library (state is per-invocation, not application-wide), not an evaluation framework (deferred to `openarmature-eval`). diff --git a/docs/concepts/checkpointing.md b/docs/concepts/checkpointing.md index f1318a1..9862008 100644 --- a/docs/concepts/checkpointing.md +++ b/docs/concepts/checkpointing.md @@ -117,7 +117,7 @@ Field framing worth getting right: per-instance entry carries an explicit `result_is_error` boolean that discriminates success contributions (roll forward into `target_field`) from `collect`-mode error contributions (roll - forward into `errors_field`) — the engine reads the explicit field + forward into `errors_field`). The engine reads the explicit field on resume rather than inferring routing from the shape of `result`. Empty tuple when no fan-outs are in flight. See [Resume semantics](fan-out.md#resume-semantics) on the fan-out @@ -222,7 +222,7 @@ deserializes the result into your current state class. **Canonical source for `schema_version`.** The framework reads `schema_version` from the state class declared at graph construction -time — the class passed to `GraphBuilder(...)`. If you pass a State +time: the class passed to `GraphBuilder(...)`. If you pass a State subclass instance at runtime whose `schema_version` shadows the declared class's value, the saved record still carries the declared class's value. This rule keeps every save site within an invocation diff --git a/docs/concepts/fan-out.md b/docs/concepts/fan-out.md index b445e30..ef2aade 100644 --- a/docs/concepts/fan-out.md +++ b/docs/concepts/fan-out.md @@ -111,7 +111,7 @@ into prior state: declare `append` on `Annotated[list[X], append]`. Each instance's value is already an `X`; `append` concatenates cleanly. - Each instance emits a `list[X]` (0..N records per instance) → the - engine lands `list[list[X]]`. Declare `concat_flatten` instead — + engine lands `list[list[X]]`. Declare `concat_flatten` instead; it flattens one level so the parent field stays `list[X]`. Plain `append` would leave the nesting and fail Pydantic validation. - Each instance emits a `dict[str, X]` → the engine lands @@ -119,7 +119,7 @@ into prior state: the parent dict with last-write-wins per key. Plain `merge` can't consume a `list[dict]`. -`concat_flatten` and `merge_all` are strict — they raise +`concat_flatten` and `merge_all` are strict: they raise `ReducerError` if an update element isn't the expected list/mapping shape. See [state and reducers](state-and-reducers.md#five-built-in-reducers). diff --git a/docs/concepts/llms.md b/docs/concepts/llms.md index cdcd4c7..4118df9 100644 --- a/docs/concepts/llms.md +++ b/docs/concepts/llms.md @@ -99,42 +99,42 @@ async def startup() -> None: try: await provider.ready() except ProviderAuthentication: - # Bad API key — fail fast at boot. + # Bad API key: fail fast at boot. raise except ProviderInvalidModel: - # Bound model isn't served by this endpoint — same. + # Bound model isn't served by this endpoint: same. raise except ProviderUnavailable: - # Endpoint is down or unreachable — fail fast too. + # Endpoint is down or unreachable: fail fast too. raise ``` `OpenAIProvider` ships three probe shapes selected via the `readiness_probe` constructor kwarg: -- **`"chat_completions"`** (default) — issues `POST /v1/chat/completions` +- **`"chat_completions"`** (default): issues `POST /v1/chat/completions` with a `max_tokens=1` body. Actually exercises the inference wire path. Strongest signal at the cost of one prompt's worth of tokens on cloud endpoints. -- **`"models"`** — issues `GET /v1/models` and verifies the bound +- **`"models"`**: issues `GET /v1/models` and verifies the bound model appears in the catalog. Cheaper (no completion billing) but blind to proxy wire-mismatch cases: some OpenAI-compatible proxies (Bifrost is the motivating example) serve `/v1/models` correctly while 405'ing the completions endpoint, so a green catalog probe doesn't prove `complete()` will work. -- **`"both"`** — runs the catalog probe first (cheap fail-fast on +- **`"both"`**: runs the catalog probe first (cheap fail-fast on model-not-in-catalog with the cleaner `seen_ids` diagnostic), then the chat probe. Strongest signal at double the round-trip cost. ```python -# Local server (LM Studio, vLLM, llama.cpp) — chat probe is free. +# Local server (LM Studio, vLLM, llama.cpp): chat probe is free. provider = OpenAIProvider( base_url="http://localhost:8000", model="qwen2.5-coder", readiness_probe="chat_completions", # default ) -# Cloud endpoint, cost-sensitive — opt back into the catalog-only probe. +# Cloud endpoint, cost-sensitive: opt back into the catalog-only probe. provider = OpenAIProvider( base_url="https://api.openai.com", model="gpt-4o-mini", @@ -342,14 +342,14 @@ shape. By default the model decides whether and which tools to call. `tool_choice` constrains that decision per call. Four modes: -- `"auto"` — the model decides. Equivalent to omitting the parameter +- `"auto"`: the model decides. Equivalent to omitting the parameter when `tools` is non-empty. -- `"required"` — the model MUST call at least one tool. Useful for +- `"required"`: the model MUST call at least one tool. Useful for routing nodes that branch on tool selection. -- `"none"` — the model MUST NOT call tools, even if `tools` is +- `"none"`: the model MUST NOT call tools, even if `tools` is supplied. Useful for guarded LLM calls or for explicitly disabling tool-calling without rebuilding a tools-less request. -- `ForceTool(name=...)` — the model MUST call the named tool exactly. +- `ForceTool(name=...)`: the model MUST call the named tool exactly. Pre-send validation catches the three failure modes (`required` with empty tools, `ForceTool` with empty tools, `ForceTool.name` not in @@ -371,7 +371,7 @@ response = await provider.complete( ) ``` -Not all providers honor `tool_choice` — confirm with your provider's +Not all providers honor `tool_choice`; confirm with your provider's documentation. The `OpenAIProvider` maps the spec shape onto OpenAI's wire shape per the §8.1.1 mapping table. Whether the model actually honored the constraint is observable from the returned diff --git a/docs/concepts/observability.md b/docs/concepts/observability.md index 408238b..bedd6a6 100644 --- a/docs/concepts/observability.md +++ b/docs/concepts/observability.md @@ -244,16 +244,16 @@ or missing log entries. ### Bounded drain (optional timeout) `drain()` accepts an optional `timeout` parameter (non-negative -seconds) — `await compiled.drain(timeout=5.0)` bounds the wait at five +seconds): `await compiled.drain(timeout=5.0)` bounds the wait at five seconds. When the deadline fires, in-flight workers are cancelled -cleanly so the compiled graph stays usable for subsequent invocations -— partial delivery state from one drain does NOT leak into the next. +cleanly so the compiled graph stays usable for subsequent invocations; +partial delivery state from one drain does NOT leak into the next. The returned `DrainSummary` carries: -- `timeout_reached: bool` — `True` only when the timeout actually +- `timeout_reached: bool`: `True` only when the timeout actually fired. A drain that finishes before the deadline reports `False`. -- `undelivered_count: int` — events dispatched but not fully delivered +- `undelivered_count: int`: events dispatched but not fully delivered to every subscribed observer before the deadline. Always `0` when `timeout_reached is False`. @@ -305,8 +305,8 @@ the IDs explicitly. ## Caller-supplied invocation metadata `correlation_id` is one string; if you also need to attach -business-domain identifiers — tenant IDs, request IDs, feature -flags, A/B cohort labels — pass them as a structured mapping at +business-domain identifiers (tenant IDs, request IDs, feature +flags, A/B cohort labels), pass them as a structured mapping at `invoke()` time: ```python @@ -324,7 +324,7 @@ await compiled.invoke( Every observability backend picks the entries up: - **OTel** emits each entry as an `openarmature.user.` - cross-cutting span attribute on every span — invocation, node, + cross-cutting span attribute on every span: invocation, node, subgraph wrapper, fan-out instance, LLM provider, retry attempt. Backends that consume OTel attributes (Phoenix / Arize, Honeycomb, Datadog APM, HyperDX, Grafana Tempo, custom collectors) see them @@ -345,7 +345,7 @@ Two rules: `int`, `float`, `bool`) or homogeneous arrays of those types. `None`, nested objects, and mixed-type arrays are rejected. -Violations raise `ValueError` synchronously — no spans emitted, no +Violations raise `ValueError` synchronously: no spans emitted, no work runs. ### Adding entries mid-invocation @@ -371,7 +371,7 @@ node's `started`, any LLM call inside) pick up the new entries. **Per-async-context scoping.** The metadata mapping lives in a `ContextVar`, which Python copies on async-task creation. Fan-out instances and parallel-branches each receive their own copy at -dispatch time — an instance that calls `set_invocation_metadata` +dispatch time; an instance that calls `set_invocation_metadata` does NOT leak its augmentation to sibling instances. This is the canonical pattern for per-instance identifiers: @@ -472,7 +472,7 @@ Cross-vendor attribute names every LLM-aware backend reads (Langfuse, Phoenix, Honeycomb's LLM lens, OpenInference-aware tools). Emitted alongside the OA namespace: -- `gen_ai.system` — `"openai"` by default; override per provider +- `gen_ai.system`: `"openai"` by default; override per provider instance to `"vllm"` / `"lm_studio"` / `"llama_cpp"` / etc. when the OpenAI Chat Completions wire format is hitting a non-OpenAI endpoint: @@ -485,16 +485,16 @@ tools). Emitted alongside the OA namespace: ) ``` -- `gen_ai.request.model` / `gen_ai.response.model` — the bound +- `gen_ai.request.model` / `gen_ai.response.model`: the bound model and (when the provider returns one) the more-specific identifier in the response body. - `gen_ai.request.temperature` / `max_tokens` / `top_p` / `seed` / - `frequency_penalty` / `presence_penalty` / `stop_sequences` — + `frequency_penalty` / `presence_penalty` / `stop_sequences`: only emitted for fields the caller actually set; absence on the span means "not supplied," distinct from a zero value. -- `gen_ai.usage.input_tokens` / `output_tokens` — token counts. -- `gen_ai.response.finish_reasons` — single-element string array. -- `gen_ai.response.id` — when the provider returns one. +- `gen_ai.usage.input_tokens` / `output_tokens`: token counts. +- `gen_ai.response.finish_reasons`: single-element string array. +- `gen_ai.response.id`: when the provider returns one. Disable the GenAI semconv set with `OTelObserver(disable_genai_semconv=True)` when an external auto-instrumentation library (OpenInference, @@ -515,12 +515,12 @@ observer = OTelObserver( This surfaces three attributes: -- `openarmature.llm.input.messages` — JSON-encoded message array +- `openarmature.llm.input.messages`: JSON-encoded message array (the spec §3 message shape: `{role, content, tool_calls?, …}`). -- `openarmature.llm.output.content` — the assistant's response +- `openarmature.llm.output.content`: the assistant's response content string verbatim. Omitted for tool-call-only responses with empty content. -- `openarmature.llm.request.extras` — JSON-encoded `RuntimeConfig` +- `openarmature.llm.request.extras`: JSON-encoded `RuntimeConfig` extras bag (provider-specific pass-through fields like `repetition_penalty` for vLLM, or `top_k` for HuggingFace endpoints). Omitted when empty. @@ -543,7 +543,7 @@ that fits within `cap - len(marker)` bytes followed by the marker: ``` where M is the pre-truncation byte length. The marker is appended -outside any JSON encoding — a truncated attribute is *not* parseable +outside any JSON encoding, so a truncated attribute is *not* parseable JSON, which is the clean signal backend code can use to detect truncation without a separate flag. @@ -563,7 +563,7 @@ provider, *before* the payload reaches the observer: The `media_type` and `detail` fields are preserved at the image-block level (per llm-provider §3.1.2); only `source` is replaced. URL-form -images pass through unchanged — the URL is a short string and is +images pass through unchanged: the URL is a short string and is informative for trace readers. Redaction is **not** gated by `disable_llm_payload` and is **not** @@ -626,7 +626,7 @@ observer = OTelObserver( ``` Each enricher receives the live `Span` plus the `NodeEvent` that -triggered the close (or `None` on synthetic close sites — subgraph +triggered the close (or `None` on synthetic close sites: subgraph dispatch, detached root, fan-out instance, invocation span, shutdown drain). Setting attributes inside this hook works correctly; doing it from a `SpanProcessor.on_end` callback does @@ -668,9 +668,9 @@ full pattern. `OTelObserver.shutdown()` calls `provider.shutdown()` on the private `TracerProvider`, which per OTel SDK contract flushes every -registered span processor. Under unusual teardown orderings — for +registered span processor. Under unusual teardown orderings (for example, FastAPI's `TestClient` teardown that closes the event loop -before a `BatchSpanProcessor`'s export thread finishes — spans can +before a `BatchSpanProcessor`'s export thread finishes), spans can appear dropped. Two workarounds: - Call `observer._provider.force_flush(timeout_millis=...)` @@ -682,7 +682,7 @@ appear dropped. Two workarounds: ## Langfuse mapping (opt-in) A second sibling observer maps the same `NodeEvent` stream onto -Langfuse's native Trace + Observation data model — Traces at the +Langfuse's native Trace + Observation data model: Traces at the top, Span observations for graph nodes, Generation observations for LLM calls. Use it instead of (or alongside) the OTel observer when your trace UI is Langfuse and you want first-class Generation @@ -699,7 +699,7 @@ observer = LangfuseObserver(client=client) graph.attach_observer(observer) ``` -The `client` is anything matching the `LangfuseClient` Protocol — +The `client` is anything matching the `LangfuseClient` Protocol: the bundled `InMemoryLangfuseClient` (used by the conformance harness, useful for unit tests), or a real `langfuse.Langfuse()` instance wrapped in `LangfuseSDKAdapter` for production. Install @@ -749,7 +749,7 @@ for a runnable demo. matching the `LangfuseClient` Protocol's four methods. A runtime `isinstance(adapter, LangfuseClient)` check ships in - the unit suite — if a future v4 patch breaks the Protocol's + the unit suite, so if a future v4 patch breaks the Protocol's surface, the test fails loudly. ### What Langfuse sees @@ -772,7 +772,7 @@ for a runnable demo. ### Payload + truncation -`disable_llm_payload` mirrors the OTel observer's flag — defaults +`disable_llm_payload` mirrors the OTel observer's flag and defaults to `True` for the same privacy reason. Flip to `False` to populate `generation.input` / `output` / `metadata.request_extras` from the LLM event payload. @@ -804,7 +804,7 @@ the Generation observation links to that entity natively (spec The two observers are independent §6 event consumers and can be attached together. They share the `correlation_id` as the -cross-backend join key — find a slow Generation in Langfuse, search +cross-backend join key: find a slow Generation in Langfuse, search for its `correlation_id` in OTel logs, see the surrounding infrastructure activity. diff --git a/docs/concepts/parallel-branches.md b/docs/concepts/parallel-branches.md index 6255e21..d55e426 100644 --- a/docs/concepts/parallel-branches.md +++ b/docs/concepts/parallel-branches.md @@ -105,7 +105,7 @@ wrap that branch's whole subgraph invocation as a unit. Retry middleware on a branch retries the **whole branch**: a fresh subgraph invocation each time, fresh inner-node execution. The wrapping retry's attempt counter propagates to events emitted from -inner nodes (per graph-engine §6 v0.16.1), so observer events +inner nodes (per graph-engine §6), so observer events inside the branch correctly show `attempt_index` ticking across retries. diff --git a/docs/concepts/prompts.md b/docs/concepts/prompts.md index 9a89867..bb99691 100644 --- a/docs/concepts/prompts.md +++ b/docs/concepts/prompts.md @@ -125,7 +125,7 @@ strict default is actively wrong for your workflow. is a single user instruction and you don't need role tagging. - A `ChatPrompt` carries `chat_template: list[ChatSegment]`. Each segment is either a `ContentSegment` (a role-tagged content - block — `system`, `user`, or `assistant`, carrying a text + block: `system`, `user`, or `assistant`, carrying a text template OR a list of content-block templates for multimodal user messages) or a `PlaceholderSegment` (a slot the caller fills at render time with a `list[Message]`, useful for chat history @@ -137,7 +137,7 @@ kwarg is ignored. For `ChatPrompt` each content segment renders with the strict-undefined rule applied independently; placeholder segments inject their caller-supplied message lists in order. -Backends can return either variant — the `LangfusePromptBackend` +Backends can return either variant: the `LangfusePromptBackend` maps Langfuse text prompts to `TextPrompt` and Langfuse chat prompts to `ChatPrompt` with one `ContentSegment` per Langfuse chat message. Discriminate at the call site with @@ -146,7 +146,7 @@ behavior; most callers just pass the prompt back into `render()`. ## Per-prompt sampling parameters -A `Prompt` carries an optional `sampling` field — a `SamplingConfig` +A `Prompt` carries an optional `sampling` field: a `SamplingConfig` sub-record mirroring `RuntimeConfig`'s seven declared fields (`temperature`, `max_tokens`, `top_p`, `seed`, `frequency_penalty`, `presence_penalty`, `stop_sequences`) plus the extras pass-through @@ -177,7 +177,7 @@ once at construction, keyed by prompt name). `PromptManager.fetch(name)` without an explicit `label` consults a configured `LabelResolver` and falls back to `"production"`. This -lets one prompt be A/B-tested or canaried without code changes — +lets one prompt be A/B-tested or canaried without code changes: edit the resolver's data, not the call sites. ```python @@ -190,9 +190,9 @@ resolver = MappingLabelResolver({ }) manager = PromptManager(backend, label_resolver=resolver) -# Resolver returns "staging" — staging template fetched. +# Resolver returns "staging", staging template fetched. classify = await manager.fetch("experimental_classifier") -# Resolver returns "production" (the default) — production fetched. +# Resolver returns "production" (the default), production fetched. greet = await manager.fetch("greet") # Explicit label bypasses the resolver entirely. audit = await manager.fetch("greet", "audit") diff --git a/docs/concepts/state-and-reducers.md b/docs/concepts/state-and-reducers.md index 20b6dac..51959db 100644 --- a/docs/concepts/state-and-reducers.md +++ b/docs/concepts/state-and-reducers.md @@ -126,7 +126,7 @@ shapes: when a fan-out subgraph emits `list[X]` per instance, the parent's `target_field` receives `list[list[X]]` (which `append` would leave nested); when it emits `dict[str, X]`, the parent receives `list[dict]` (which `merge` can't consume). Both are -strict like their single-level counterparts — they raise +strict like their single-level counterparts: they raise `ReducerError` when an update element isn't the expected list/mapping shape. See the [fan-out](fan-out.md) page for the full pattern. diff --git a/docs/examples/07-multimodal-prompt.md b/docs/examples/07-multimodal-prompt.md index d9c0a11..48ccdbd 100644 --- a/docs/examples/07-multimodal-prompt.md +++ b/docs/examples/07-multimodal-prompt.md @@ -123,7 +123,7 @@ Lunar-mission image analysis (surface + equipment) `PromptGroup`. Inside the `with_active_prompt_group` scope, the attached OTel observer stamps `openarmature.prompt.group_name` on every LLM-call span. The console exporter prints those spans - alongside the human-readable output above — search the JSON blobs + alongside the human-readable output above; search the JSON blobs for `openarmature.prompt.group_name` to confirm. - **Per-call prompt scope**. The inner `with_active_prompt(rendered)` block adds the per-call identifiers (name, version, label, @@ -134,7 +134,7 @@ Lunar-mission image analysis (surface + equipment) - **GenAI semantic conventions**. The same LLM spans also carry the cross-vendor `gen_ai.*` attributes (`gen_ai.system`, `gen_ai.request.model`, `gen_ai.response.model`, - `gen_ai.usage.{input,output}_tokens`, etc.) — Langfuse, Phoenix, + `gen_ai.usage.{input,output}_tokens`, etc.). Langfuse, Phoenix, or Honeycomb's LLM lens would render the generation correctly without any per-service attribute-mapping shim. - **Fallback path** isn't visible in a clean run because the diff --git a/docs/examples/10-langfuse-observability.md b/docs/examples/10-langfuse-observability.md index c47e5ed..5899a79 100644 --- a/docs/examples/10-langfuse-observability.md +++ b/docs/examples/10-langfuse-observability.md @@ -1,6 +1,6 @@ # 10 - Langfuse observability -Send LLM call observability to Langfuse natively — Trace at the top, +Send LLM call observability to Langfuse natively: Trace at the top, Span observations for graph nodes, Generation observations with input, output, token usage, model parameters, and a native link back to the prompt entity the call rendered from. @@ -16,7 +16,7 @@ shape as the graph runs. The demo's prompt backend stubs a Langfuse-source by attaching a sentinel `langfuse_prompt` reference to the rendered prompt. The Generation observation reads that reference and links back to the -prompt entity — exactly what you'd see in a production Langfuse +prompt entity, exactly what you'd see in a production Langfuse dashboard threading "this generation came from prompt v7" without any manual wiring at the call site. @@ -28,7 +28,7 @@ manual wiring at the call site. - The `LangfuseClient` Protocol decouples the observer from the SDK. The bundled `InMemoryLangfuseClient` recorder is the test/demo shape; production passes a real `langfuse.Langfuse()` instance (or - a thin adapter — see [Reading the output](#reading-the-output) + a thin adapter; see [Reading the output](#reading-the-output) below). - Prompt linkage through [`Prompt.observability_entities`](../concepts/prompts.md#backend-keyed-observability-entity-references): @@ -40,7 +40,7 @@ manual wiring at the call site. output content on Generation observations. Default-off is the privacy posture; the demo deliberately flips it. - `correlation_id` cross-cutting metadata on the Trace and every - Observation — the join key if you're also running an OTel observer + Observation: the join key if you're also running an OTel observer alongside. ## How to run @@ -67,7 +67,7 @@ flowchart TD A single-node graph: fetch the prompt, render with the question, call the LLM under `with_active_prompt(...)`, store the response. The -single node is deliberate — the value is in the captured Trace shape, +single node is deliberate; the value is in the captured Trace shape, not the graph topology. ## Reading the output @@ -83,7 +83,7 @@ prompt: mission-briefing v7 ─── captured Langfuse trace ───────────────────────────────── Trace id=01234567-89ab-... name='answer_briefing' - metadata={correlation_id='...', entry_node='answer_briefing', spec_version='0.26.0'} + metadata={correlation_id='...', entry_node='answer_briefing', spec_version='0.38.0'} [span] 'answer_briefing' level=DEFAULT metadata={attempt_index=0, correlation_id='...', namespace=['answer_briefing'], step=0} [generation] 'openarmature.llm.complete' level=DEFAULT @@ -106,7 +106,7 @@ Trace id=01234567-89ab-... `output`, and the prompt-identity metadata. In a production Langfuse dashboard this is what the "Generation" detail view renders. - **`prompt_entity_link`** is the value `Prompt.observability_entities['langfuse_prompt']` - carried — a sentinel string in this demo, a real Langfuse SDK Prompt + carried: a sentinel string in this demo, a real Langfuse SDK Prompt object in production. When the backend doesn't surface the reference (e.g., a filesystem backend), the link is absent but the `metadata.prompt` map (name, version, label, hashes) still appears @@ -151,7 +151,7 @@ display logic from clobbering the trace's display name when later observations land without the attribute set. Validated against `langfuse>=4.6,<5`. v2.x and v3.x are NOT -supported — supply your own adapter against the same four-method +supported; supply your own adapter against the same four-method Protocol if you need to stay on an older version. For prompt linkage: in production, the @@ -173,6 +173,6 @@ graph.attach_observer(LangfuseObserver(client=langfuse_client)) Their `disable_llm_spans` / `disable_llm_payload` flags are independent. The `correlation_id` cross-cutting attribute is the join -key — find a slow Generation in Langfuse, search for the +key: find a slow Generation in Langfuse, search for the `correlation_id` in OTel logs to see the surrounding infrastructure activity. diff --git a/docs/model-providers/authoring.md b/docs/model-providers/authoring.md index 495214e..4fc81a4 100644 --- a/docs/model-providers/authoring.md +++ b/docs/model-providers/authoring.md @@ -277,7 +277,7 @@ of: Inline image bytes MUST be redacted in the provider's serialization step before reaching the payload (see - [Observability — Inline image + [Observability: Inline image redaction](../concepts/observability.md#inline-image-redaction-always-on)) so custom observers consuming `LlmEventPayload` cannot leak raw bytes. diff --git a/docs/model-providers/vllm.md b/docs/model-providers/vllm.md index 3fd2634..81f800a 100644 --- a/docs/model-providers/vllm.md +++ b/docs/model-providers/vllm.md @@ -1,7 +1,7 @@ # Self-hosted vLLM `OpenAIProvider` talks to any server that implements OpenAI's Chat -Completions wire format (`POST /v1/chat/completions`) — including +Completions wire format (`POST /v1/chat/completions`), including self-hosted [vLLM](https://github.com/vllm-project/vllm). This page walks through the configuration nuances specific to vLLM deployments. @@ -16,7 +16,7 @@ from openarmature.llm import OpenAIProvider, RuntimeConfig, UserMessage async def main() -> None: provider = OpenAIProvider( - base_url="http://localhost:8000", # host root only — no /v1 + base_url="http://localhost:8000", # host root only, no /v1 model="meta-llama/Llama-3.1-8B-Instruct", api_key=None, # vLLM doesn't require auth by default genai_system="vllm", # surfaces on observability spans @@ -36,24 +36,24 @@ asyncio.run(main()) That's it for the happy path. The rest of the page covers the config nuances you'll hit in real deployments. -## `base_url` shape — host root only +## `base_url` shape: host root only vLLM serves on `http://:/v1/chat/completions` and `http://:/v1/models`. `OpenAIProvider` appends the `/v1/...` paths itself, so the `base_url` you pass MUST be the host -root — no `/v1` suffix: +root, with no `/v1` suffix: ```python from openarmature.llm import OpenAIProvider -# Correct — host root only, provider appends /v1/... +# Correct: host root only, provider appends /v1/... provider = OpenAIProvider( base_url="http://localhost:8000", model="meta-llama/Llama-3.1-8B-Instruct", api_key=None, ) -# Rejected at construction time — raises ValueError +# Rejected at construction time, raises ValueError try: OpenAIProvider( base_url="http://localhost:8000/v1", @@ -61,7 +61,7 @@ try: api_key=None, ) except ValueError as exc: - # "base_url must not end with '/v1' — the provider appends …" + # "base_url must not end with '/v1'; the provider appends …" _ = exc ``` @@ -75,7 +75,7 @@ Trailing slashes are stripped; other non-empty paths (proxy prefixes like `/api/openai-proxy`) are left intact for intentional reverse- proxy setups. -## Authentication — typically off, optionally on +## Authentication: typically off, optionally on vLLM ships with auth off by default. Pass `api_key=None` for that case. To enable auth on the vLLM side, launch with `--api-key @@ -99,7 +99,7 @@ provider = OpenAIProvider( ``` A wrong or missing key surfaces as `ProviderAuthentication` (mapped -from 401/403) — the same error category as OpenAI cloud auth +from 401/403), the same error category as OpenAI cloud auth failures, so retry / surface logic is portable across backends. ## `genai_system="vllm"` for the observability layer @@ -122,10 +122,10 @@ provider = OpenAIProvider( Standard values for other backends running the same wire format: `"vllm"`, `"lmstudio"`, `"llamacpp"`, `"sglang"`. No `base_url` -sniffing is done — the same host:port could be any of those servers, +sniffing is done; the same host:port could be any of those servers, and a wrong inference would be worse than the explicit opt-in. -## Older vLLM releases — `force_prompt_augmentation_fallback` +## Older vLLM releases: `force_prompt_augmentation_fallback` OpenAI's native structured-output path uses the `response_format` field on the request body. Older vLLM releases either reject this @@ -149,19 +149,19 @@ directive into the system message instead of using `response_format`. The wire body never carries `response_format`; the model sees the schema in the prompt and is asked to produce conforming JSON. Validation against the schema still runs on the -returned text — `StructuredOutputInvalid` surfaces when the model's +returned text: `StructuredOutputInvalid` surfaces when the model's output doesn't match. Recent vLLM releases (>=0.5.x) support `response_format` natively; leave the flag at its default `False` for those. -## Readiness probe — `GET /v1/models` +## Readiness probe: `GET /v1/models` `provider.ready()` hits `GET /v1/models` and: - Matches the bound model against the returned `data[].id` entries; raises `ProviderInvalidModel` if absent. -- Consults an optional per-entry `status` field — if it contains +- Consults an optional per-entry `status` field: if it contains `loading` or `not_loaded`, raises `ProviderModelNotLoaded`. Local servers that report load state (some LM Studio / vLLM builds) get a real not-loaded signal through this path. @@ -169,7 +169,7 @@ leave the flag at its default `False` for those. `ProviderUnavailable`. **Limitation for vLLM specifically.** vLLM's `/v1/models` doesn't -populate a `status` field — it returns the configured model with a +populate a `status` field; it returns the configured model with a 200 even during a slow first-load. So the `status`-based not-loaded detection above doesn't fire for vLLM; the probe confirms the model name matches but can't tell warmed from cold. For deployments where @@ -182,7 +182,7 @@ from openarmature.llm import OpenAIProvider, RuntimeConfig, UserMessage async def warm_up(provider: OpenAIProvider) -> None: await provider.ready() - # Synthetic warm-up — sends a 1-token request to force the model + # Synthetic warm-up: sends a 1-token request to force the model # to finish loading before lifespan startup completes. await provider.complete( [UserMessage(content="ok")], diff --git a/docs/patterns/bypass-if-output-exists.md b/docs/patterns/bypass-if-output-exists.md index a4ad701..1365a3c 100644 --- a/docs/patterns/bypass-if-output-exists.md +++ b/docs/patterns/bypass-if-output-exists.md @@ -81,7 +81,7 @@ per-fan-out-instance, depending on the scope of the bypass. addressable output, downloading a file). - The "does it exist" check is cheap (a filesystem `stat`, a Redis `EXISTS`, a database key lookup). -- You're OK with the node being skipped silently — the partial +- You're OK with the node being skipped silently; the partial update returned by the middleware is indistinguishable from a successful node run. @@ -91,7 +91,7 @@ per-fan-out-instance, depending on the scope of the bypass. the node. The cost model inverts; the pattern is wrong. - You need to *force* re-execution on demand (cache invalidation). Add a `force_rerun: bool` field on state that the middleware - consults — but if you're doing that often, the bypass logic + consults. But if you're doing that often, the bypass logic belongs in the node itself, gated on a state field, not in middleware. - The cached output's freshness depends on inputs the middleware @@ -101,10 +101,10 @@ per-fan-out-instance, depending on the scope of the bypass. ## Cross-references -- [Middleware](../concepts/middleware.md) — middleware shape, the +- [Middleware](../concepts/middleware.md): middleware shape, the four registration sites, composition. - Spec: [pipeline-utilities](https://openarmature.org/capabilities/pipeline-utilities/) This pattern is explicitly called out in proposal 0008's *Alternatives considered* section as a userland recipe rather than -spec'd behavior — this page is its canonical home. +spec'd behavior; this page is its canonical home. diff --git a/docs/patterns/index.md b/docs/patterns/index.md index e42be87..3ab3159 100644 --- a/docs/patterns/index.md +++ b/docs/patterns/index.md @@ -4,9 +4,9 @@ Recipes for things people keep asking the framework to do but that compose cleanly from existing primitives. The split between [Concepts](../concepts/index.md) and Patterns is -intentional: Concepts explain *what OpenArmature is* — typed state, -nodes, edges, middleware, checkpointing, observers. Patterns -explain *ways to use it* — opinionated shapes for common +intentional: Concepts explain *what OpenArmature is* (typed state, +nodes, edges, middleware, checkpointing, observers). Patterns +explain *ways to use it*: opinionated shapes for common downstream questions like "how do I run an agent loop?" or "how do I skip work that's already been done?". @@ -18,27 +18,27 @@ I skip work that's already been done?". them?" → look here. Patterns are user-level recipes, not framework contracts. New -patterns can be added without spec coordination — they're how-to +patterns can be added without spec coordination; they're how-to docs composing existing primitives. ## The catalog -- [Parameterized entry point](parameterized-entry-point.md) — +- [Parameterized entry point](parameterized-entry-point.md): start the graph at an arbitrary node via state-driven routing. -- [Tool dispatch as node](tool-dispatch-as-node.md) — model an +- [Tool dispatch as node](tool-dispatch-as-node.md): model an agent tool-call loop as a graph cycle. -- [Session as checkpoint resume](session-as-checkpoint-resume.md) — +- [Session as checkpoint resume](session-as-checkpoint-resume.md): carry multi-turn agent state across turns using the existing checkpointer. -- [Bypass if output exists](bypass-if-output-exists.md) — +- [Bypass if output exists](bypass-if-output-exists.md): short-circuit a node whose external output already exists, via middleware. -- [State migration on resume](state-migration-on-resume.md) — let +- [State migration on resume](state-migration-on-resume.md): let older in-flight checkpoints resume against an evolved state schema without each node body having to handle multiple shapes. -- [Caller-supplied trace identifiers](caller-supplied-trace-identifiers.md) - — propagate tenant ID / request ID / feature flags into every +- [Caller-supplied trace identifiers](caller-supplied-trace-identifiers.md): + propagate tenant ID / request ID / feature flags into every observability span via `invoke(metadata=...)`. -- [Custom observer: reconciling started → completed pairs](observer-state-reconciliation.md) - — thread per-call state between paired events using a per- +- [Custom observer: reconciling started → completed pairs](observer-state-reconciliation.md): + thread per-call state between paired events using a per- invocation dict keyed on the spec's uniqueness tuple. diff --git a/docs/patterns/parameterized-entry-point.md b/docs/patterns/parameterized-entry-point.md index b0b03d2..3c28a4b 100644 --- a/docs/patterns/parameterized-entry-point.md +++ b/docs/patterns/parameterized-entry-point.md @@ -11,7 +11,7 @@ execution should begin. The graph stays a single graph; what differs across runs is which branch the conditional edge takes. Combine with [checkpointing](../concepts/checkpointing.md) if you -want resume-style behavior — skip nodes whose work is already +want resume-style behavior: skip nodes whose work is already captured in state. ## Snippet @@ -78,7 +78,7 @@ fields the chosen branch needs) and the graph routes accordingly. - You have a few canonical entry points and the choice between them is data, not control flow. -- You want to skip work already done in a prior run — combine with +- You want to skip work already done in a prior run; combine with [checkpointing](../concepts/checkpointing.md) to pick up where you left off. - Your "different entry points" share state structure and most of @@ -90,7 +90,7 @@ fields the chosen branch needs) and the graph routes accordingly. it's a different compiled graph. Don't bend one graph into two; two graphs are easier to test and reason about. - The number of entry points grows unboundedly. Then you're - reimplementing routing — consider a higher-level dispatch layer + reimplementing routing; consider a higher-level dispatch layer that picks which graph to invoke. ## Cross-references diff --git a/docs/patterns/session-as-checkpoint-resume.md b/docs/patterns/session-as-checkpoint-resume.md index 5e5c634..1370534 100644 --- a/docs/patterns/session-as-checkpoint-resume.md +++ b/docs/patterns/session-as-checkpoint-resume.md @@ -87,7 +87,7 @@ state and the session table holds the join keys. - Your application has long-lived sessions with multiple LLM turns and you want the prior state to be the starting point of the next turn. -- You're already running a checkpointer for crash resume — this +- You're already running a checkpointer for crash resume; this pattern is "use it more." - Cross-turn state has clean reducer semantics: `merge` for accumulating dicts, `append` for growing lists. @@ -97,7 +97,7 @@ state and the session table holds the join keys. - A session's "state" is bigger than fits comfortably in a single graph state shape. Split into multiple graphs and share an external store keyed by session. -- Turns are completely independent — there's no value in carrying +- Turns are completely independent; there's no value in carrying state across them. Then just run each turn as a fresh invoke. - The application already has its own state-management layer that conflicts with OA's frozen-state model. Use OA per-turn without @@ -105,10 +105,10 @@ state and the session table holds the join keys. ## Cross-references -- [Checkpointing](../concepts/checkpointing.md) — backend wiring, +- [Checkpointing](../concepts/checkpointing.md): backend wiring, `resume_invocation`, schema migration. -- [State and reducers](../concepts/state-and-reducers.md) — `merge` +- [State and reducers](../concepts/state-and-reducers.md): `merge` and `append` reducer strategies. -- [`examples/08-checkpointing-and-migration`](../examples/08-checkpointing-and-migration.md) — +- [`examples/08-checkpointing-and-migration`](../examples/08-checkpointing-and-migration.md): single-resume baseline. - Spec: [pipeline-utilities](https://openarmature.org/capabilities/pipeline-utilities/) diff --git a/docs/patterns/tool-dispatch-as-node.md b/docs/patterns/tool-dispatch-as-node.md index 3a385ba..a852174 100644 --- a/docs/patterns/tool-dispatch-as-node.md +++ b/docs/patterns/tool-dispatch-as-node.md @@ -13,7 +13,7 @@ LLM node if the model wants more turns. The exit is the conditional edge routing to a `present` node (or `END`) when the assistant returns no `tool_calls`. -No "agent framework" abstraction — the loop is just a graph cycle +No "agent framework" abstraction; the loop is just a graph cycle on top of [`Tool`, `ToolCall`, `ToolMessage`](../concepts/llms.md). ## Snippet @@ -99,7 +99,7 @@ for malformed `ToolCall.arguments`, and trace output. - The model needs to call local Python functions and react to their results. -- The loop is bounded — either by `MAX_TURNS`, by the model +- The loop is bounded, either by `MAX_TURNS`, by the model signaling it's done, or by both. - Tool results are textual or JSON-serializable and fit cleanly into `ToolMessage.content`. @@ -120,11 +120,11 @@ for malformed `ToolCall.arguments`, and trace output. ## Cross-references -- [LLMs concept page](../concepts/llms.md) — `Tool`, `ToolCall`, +- [LLMs concept page](../concepts/llms.md): `Tool`, `ToolCall`, `ToolMessage` types and the `complete(messages, tools=...)` contract. -- [State and reducers](../concepts/state-and-reducers.md) — +- [State and reducers](../concepts/state-and-reducers.md): `append` reducer semantics. -- [`examples/09-tool-use`](../examples/09-tool-use.md) — runnable +- [`examples/09-tool-use`](../examples/09-tool-use.md): runnable reference implementation. - Spec: [llm-provider](https://openarmature.org/capabilities/llm-provider/) diff --git a/src/openarmature/AGENTS.md b/src/openarmature/AGENTS.md index c38e119..e1211ad 100644 --- a/src/openarmature/AGENTS.md +++ b/src/openarmature/AGENTS.md @@ -4,7 +4,7 @@ ## TL;DR -OpenArmature is a workflow framework for LLM pipelines and tool-calling agents — typed state, compile-time topology checks, observability, and crash-safe checkpoints baked into a graph engine. The graph layer has no concept of LLMs or tools; the same primitives drive deterministic ETL pipelines and tool-calling agents alike. Nodes return partial updates; the engine merges into a frozen state snapshot. Behavior is defined by [openarmature-spec](https://openarmature.org/capabilities/) and verified by conformance fixtures; this package is the reference Python implementation. +OpenArmature is a workflow framework for LLM pipelines and tool-calling agents: typed state, compile-time topology checks, observability, and crash-safe checkpoints baked into a graph engine. The graph layer has no concept of LLMs or tools; the same primitives drive deterministic ETL pipelines and tool-calling agents alike. Nodes return partial updates; the engine merges into a frozen state snapshot. Behavior is defined by [openarmature-spec](https://openarmature.org/capabilities/) and verified by conformance fixtures; this package is the reference Python implementation. **What OpenArmature is NOT:** not a chat framework (no built-in messages channel), not an LLM SDK (Provider is the abstraction layer; OpenAIProvider is the canonical impl), not a state-management library (state is per-invocation, not application-wide), not an evaluation framework (deferred to `openarmature-eval`). @@ -486,7 +486,7 @@ per-fan-out-instance, depending on the scope of the bypass. addressable output, downloading a file). - The "does it exist" check is cheap (a filesystem `stat`, a Redis `EXISTS`, a database key lookup). -- You're OK with the node being skipped silently — the partial +- You're OK with the node being skipped silently; the partial update returned by the middleware is indistinguishable from a successful node run. @@ -496,7 +496,7 @@ per-fan-out-instance, depending on the scope of the bypass. the node. The cost model inverts; the pattern is wrong. - You need to *force* re-execution on demand (cache invalidation). Add a `force_rerun: bool` field on state that the middleware - consults — but if you're doing that often, the bypass logic + consults. But if you're doing that often, the bypass logic belongs in the node itself, gated on a state field, not in middleware. - The cached output's freshness depends on inputs the middleware @@ -506,13 +506,13 @@ per-fan-out-instance, depending on the scope of the bypass. #### Cross-references -- [Middleware](https://openarmature.ai/concepts/middleware/) — middleware shape, the +- [Middleware](https://openarmature.ai/concepts/middleware/): middleware shape, the four registration sites, composition. - Spec: [pipeline-utilities](https://openarmature.org/capabilities/pipeline-utilities/) This pattern is explicitly called out in proposal 0008's *Alternatives considered* section as a userland recipe rather than -spec'd behavior — this page is its canonical home. +spec'd behavior; this page is its canonical home. ### Caller-supplied trace identifiers @@ -802,7 +802,7 @@ execution should begin. The graph stays a single graph; what differs across runs is which branch the conditional edge takes. Combine with [checkpointing](https://openarmature.ai/concepts/checkpointing/) if you -want resume-style behavior — skip nodes whose work is already +want resume-style behavior: skip nodes whose work is already captured in state. #### Snippet @@ -869,7 +869,7 @@ fields the chosen branch needs) and the graph routes accordingly. - You have a few canonical entry points and the choice between them is data, not control flow. -- You want to skip work already done in a prior run — combine with +- You want to skip work already done in a prior run; combine with [checkpointing](https://openarmature.ai/concepts/checkpointing/) to pick up where you left off. - Your "different entry points" share state structure and most of @@ -881,7 +881,7 @@ fields the chosen branch needs) and the graph routes accordingly. it's a different compiled graph. Don't bend one graph into two; two graphs are easier to test and reason about. - The number of entry points grows unboundedly. Then you're - reimplementing routing — consider a higher-level dispatch layer + reimplementing routing; consider a higher-level dispatch layer that picks which graph to invoke. #### Cross-references @@ -979,7 +979,7 @@ state and the session table holds the join keys. - Your application has long-lived sessions with multiple LLM turns and you want the prior state to be the starting point of the next turn. -- You're already running a checkpointer for crash resume — this +- You're already running a checkpointer for crash resume; this pattern is "use it more." - Cross-turn state has clean reducer semantics: `merge` for accumulating dicts, `append` for growing lists. @@ -989,7 +989,7 @@ state and the session table holds the join keys. - A session's "state" is bigger than fits comfortably in a single graph state shape. Split into multiple graphs and share an external store keyed by session. -- Turns are completely independent — there's no value in carrying +- Turns are completely independent; there's no value in carrying state across them. Then just run each turn as a fresh invoke. - The application already has its own state-management layer that conflicts with OA's frozen-state model. Use OA per-turn without @@ -997,11 +997,11 @@ state and the session table holds the join keys. #### Cross-references -- [Checkpointing](https://openarmature.ai/concepts/checkpointing/) — backend wiring, +- [Checkpointing](https://openarmature.ai/concepts/checkpointing/): backend wiring, `resume_invocation`, schema migration. -- [State and reducers](https://openarmature.ai/concepts/state-and-reducers/) — `merge` +- [State and reducers](https://openarmature.ai/concepts/state-and-reducers/): `merge` and `append` reducer strategies. -- [`examples/08-checkpointing-and-migration`](https://openarmature.ai/examples/08-checkpointing-and-migration/) — +- [`examples/08-checkpointing-and-migration`](https://openarmature.ai/examples/08-checkpointing-and-migration/): single-resume baseline. - Spec: [pipeline-utilities](https://openarmature.org/capabilities/pipeline-utilities/) @@ -1151,7 +1151,7 @@ LLM node if the model wants more turns. The exit is the conditional edge routing to a `present` node (or `END`) when the assistant returns no `tool_calls`. -No "agent framework" abstraction — the loop is just a graph cycle +No "agent framework" abstraction; the loop is just a graph cycle on top of [`Tool`, `ToolCall`, `ToolMessage`](https://openarmature.ai/concepts/llms/). #### Snippet @@ -1237,7 +1237,7 @@ for malformed `ToolCall.arguments`, and trace output. - The model needs to call local Python functions and react to their results. -- The loop is bounded — either by `MAX_TURNS`, by the model +- The loop is bounded, either by `MAX_TURNS`, by the model signaling it's done, or by both. - Tool results are textual or JSON-serializable and fit cleanly into `ToolMessage.content`. @@ -1258,12 +1258,12 @@ for malformed `ToolCall.arguments`, and trace output. #### Cross-references -- [LLMs concept page](https://openarmature.ai/concepts/llms/) — `Tool`, `ToolCall`, +- [LLMs concept page](https://openarmature.ai/concepts/llms/): `Tool`, `ToolCall`, `ToolMessage` types and the `complete(messages, tools=...)` contract. -- [State and reducers](https://openarmature.ai/concepts/state-and-reducers/) — +- [State and reducers](https://openarmature.ai/concepts/state-and-reducers/): `append` reducer semantics. -- [`examples/09-tool-use`](https://openarmature.ai/examples/09-tool-use/) — runnable +- [`examples/09-tool-use`](https://openarmature.ai/examples/09-tool-use/): runnable reference implementation. - Spec: [llm-provider](https://openarmature.org/capabilities/llm-provider/) @@ -1273,7 +1273,7 @@ Recipes that aren't deducible from the API surface alone. The primitives docs te ### Declare a non-clobbering reducer on accumulator list fields -State fields default to `last_write_wins` — each node's write replaces the prior value for that field. For scalar fields (`status: str`, `count: int`) that's usually what you want. For list fields that accumulate contributions across multiple nodes (`messages: list[Message]`, `events: list[Event]`, `results: list[Result]`), it's the wrong default — every node's contribution silently clobbers everything before it. +State fields default to `last_write_wins`: each node's write replaces the prior value for that field. For scalar fields (`status: str`, `count: int`) that's usually what you want. For list fields that accumulate contributions across multiple nodes (`messages: list[Message]`, `events: list[Event]`, `results: list[Result]`), it's the wrong default; every node's contribution silently clobbers everything before it. Declare `append` (or another non-clobbering reducer) at the state class: @@ -1288,15 +1288,15 @@ class WorkflowState(State): final_status: str = "pending" # last_write_wins is fine here ``` -The failure mode without `append` is silent and easy to misdiagnose — the final state shows only the last node's contribution to the list, with no error. Common "why is my accumulator empty?" question. `merge` is the equivalent for `dict[str, V]` fields that accumulate keys across nodes. +The failure mode without `append` is silent and easy to misdiagnose: the final state shows only the last node's contribution to the list, with no error. Common "why is my accumulator empty?" question. `merge` is the equivalent for `dict[str, V]` fields that accumulate keys across nodes. ### Branch on `Response.finish_reason` before reading `message.content` After `await provider.complete(messages, tools=[...])` returns, the shape of `Response` varies by `finish_reason`: -- `finish_reason == "stop"` — assistant produced a content response. `message.content` carries the text; `message.tool_calls` is empty. -- `finish_reason == "tool_calls"` — assistant emitted tool calls. `message.tool_calls` carries the list; `message.content` is typically empty (model didn't say anything beyond the tool calls). -- `finish_reason == "length"` / `"content_filter"` / `"error"` — completion was cut off or refused; `message.content` may be partial or empty. +- `finish_reason == "stop"`: assistant produced a content response. `message.content` carries the text; `message.tool_calls` is empty. +- `finish_reason == "tool_calls"`: assistant emitted tool calls. `message.tool_calls` carries the list; `message.content` is typically empty (model didn't say anything beyond the tool calls). +- `finish_reason == "length"` / `"content_filter"` / `"error"`: completion was cut off or refused; `message.content` may be partial or empty. Post-LLM logic that reads `message.content` without checking `finish_reason` misses the entire tool-calling path: @@ -1317,11 +1317,11 @@ else: The discriminator is one branch; missing it gives you empty data on tool-call responses and silently wrong behavior on truncations. -### `disable_llm_payload` defaults to `True` — flip it for LLM-aware observability backends +### `disable_llm_payload` defaults to `True`: flip it for LLM-aware observability backends The `OTelObserver` (and any spec-conformant observer reading LLM events) defaults `disable_llm_payload: bool = True` per spec §5.5's "default-off by privacy" framing. Without flipping the flag, LLM spans carry GenAI semconv attributes (token counts, model name, finish reason) but NOT the message payload (input messages, response content, request extras). -That's the right default for general OpenArmature use — payloads may contain PII the user hasn't audited, and storage cost grows with prompt size. But it's the WRONG default if you're wiring up an LLM-aware observability backend (Langfuse, Phoenix, Honeycomb's LLM lens) that renders the message stream as part of its generation view. Backends will show "empty" generations and you'll wonder why. +That's the right default for general OpenArmature use: payloads may contain PII the user hasn't audited, and storage cost grows with prompt size. But it's the WRONG default if you're wiring up an LLM-aware observability backend (Langfuse, Phoenix, Honeycomb's LLM lens) that renders the message stream as part of its generation view. Backends will show "empty" generations and you'll wonder why. Flip the flag once at observer construction: @@ -1332,36 +1332,36 @@ observer = OTelObserver( span_processor=your_exporter, disable_llm_payload=False, # opt in to message-payload attributes ) -compiled.attach_observer(observer) +graph.attach_observer(observer) ``` -The companion `disable_genai_semconv` flag defaults to `False` — GenAI semconv attributes emit by default since they're how LLM-aware backends render anything at all. Don't flip that one unless you're routing GenAI emission through a different layer. +The companion `disable_genai_semconv` flag defaults to `False`: GenAI semconv attributes emit by default since they're how LLM-aware backends render anything at all. Don't flip that one unless you're routing GenAI emission through a different layer. ### Use the bundled `FilesystemCheckpointer` or `SQLiteCheckpointer`, not a hand-rolled serializer -The temptation when persisting graph state is to `json.dumps(state.model_dump())` and write to a file. Don't. The shipped Checkpointer backends handle every contract `openarmature.checkpoint.Checkpointer` defines — round-trip integrity, `parent_states` for inner-save resume, fan-out progress tracking, schema-version migration, listing by `correlation_id`, `CheckpointRecordInvalid` on shape drift. A hand-rolled serializer that "works" on the happy path silently fails the moment a fan-out crash leaves an in-flight save record, and you'll be debugging it for hours before realizing the bundled backend exists. +The temptation when persisting graph state is to `json.dumps(state.model_dump())` and write to a file. Don't. The shipped Checkpointer backends handle every contract `openarmature.checkpoint.Checkpointer` defines: round-trip integrity, `parent_states` for inner-save resume, fan-out progress tracking, schema-version migration, listing by `correlation_id`, `CheckpointRecordInvalid` on shape drift. A hand-rolled serializer that "works" on the happy path silently fails the moment a fan-out crash leaves an in-flight save record, and you'll be debugging it for hours before realizing the bundled backend exists. -If your storage requirement isn't local disk (`FilesystemCheckpointer`) or local SQLite (`SQLiteCheckpointer` — also supports `:memory:` and arbitrary file paths), implement the `Checkpointer` Protocol against your backend rather than wrapping state serialization yourself. Custom backends inherit the spec's correctness contract for free. +If your storage requirement isn't local disk (`FilesystemCheckpointer`) or local SQLite (`SQLiteCheckpointer`, which also supports `:memory:` and arbitrary file paths), implement the `Checkpointer` Protocol against your backend rather than wrapping state serialization yourself. Custom backends inherit the spec's correctness contract for free. ### Subgraphs > conditional-edge spaghetti when branches don't share state A common shape is "after this LLM call, route to either a JSON-extraction node or a tool-dispatch node depending on `finish_reason`." The naive solution is two conditional edges from the LLM node, one to each downstream. That works for two branches; it scales poorly past three. -When the branches operate on different sub-shapes of state — e.g., one path is "extract JSON, then validate" while another is "dispatch tools, loop until done, then summarize" — encapsulate each as a `SubgraphNode` and route from the LLM node to the right subgraph. Each subgraph has its own state schema (projected from the parent), its own entry node, and its own internal topology. The parent graph becomes a switchboard with a few edges; the complexity lives one layer down where it composes cleanly. +When the branches operate on different sub-shapes of state (e.g., one path is "extract JSON, then validate" while another is "dispatch tools, loop until done, then summarize"), encapsulate each as a `SubgraphNode` and route from the LLM node to the right subgraph. Each subgraph has its own state schema (projected from the parent), its own entry node, and its own internal topology. The parent graph becomes a switchboard with a few edges; the complexity lives one layer down where it composes cleanly. ### `OpenAIProvider.ready()` exercises `chat/completions` by default; opt back into the catalog-only probe for cost-sensitive callers -`OpenAIProvider(..., readiness_probe=...)` accepts `"chat_completions"` (default), `"models"`, or `"both"`. The default issues `POST /v1/chat/completions` with a `max_tokens=1` body so a green `ready()` actually proves the inference wire path works, not just that the catalog endpoint answers. The motivating failure class: OpenAI-compatible proxies (Bifrost is the field-reported case) that return 200 on `GET /v1/models` while 405'ing the completions endpoint — the previous catalog-only default reported ready and every real call broke. The `"models"` opt-in is the old behavior, useful for cost-sensitive cloud callers where every `ready()` would otherwise bill one prompt's worth of tokens. `"both"` runs catalog then chat — strongest signal at double the cost. Non-200 responses on either probe route through `classify_http_error`, so the canonical error categories (`ProviderAuthentication`, `ProviderUnavailable`, `ProviderInvalidModel`, etc.) surface consistently regardless of which probe ran. +`OpenAIProvider(..., readiness_probe=...)` accepts `"chat_completions"` (default), `"models"`, or `"both"`. The default issues `POST /v1/chat/completions` with a `max_tokens=1` body so a green `ready()` actually proves the inference wire path works, not just that the catalog endpoint answers. The motivating failure class: OpenAI-compatible proxies (Bifrost is the field-reported case) that return 200 on `GET /v1/models` while 405'ing the completions endpoint, so the previous catalog-only default reported ready and every real call broke. The `"models"` opt-in is the old behavior, useful for cost-sensitive cloud callers where every `ready()` would otherwise bill one prompt's worth of tokens. `"both"` runs catalog then chat, giving the strongest signal at double the cost. Non-200 responses on either probe route through `classify_http_error`, so the canonical error categories (`ProviderAuthentication`, `ProviderUnavailable`, `ProviderInvalidModel`, etc.) surface consistently regardless of which probe ran. ### Be explicit with `tool_choice`; don't trust the provider's default -`Provider.complete(messages, tools, tool_choice=...)` accepts `"auto"`, `"required"`, `"none"`, or a `ForceTool(name=...)` record. When you omit `tool_choice`, the OpenAI provider's own default applies — usually `"auto"` when `tools` is non-empty, but documented per-provider. A pipeline that wants deterministic tool-calling (a routing node that MUST produce a tool call, a guarded LLM call that MUST NOT call tools) should pin `tool_choice` explicitly rather than relying on the provider default. +`Provider.complete(messages, tools, tool_choice=...)` accepts `"auto"`, `"required"`, `"none"`, or a `ForceTool(name=...)` record. When you omit `tool_choice`, the OpenAI provider's own default applies (usually `"auto"` when `tools` is non-empty, but documented per-provider). A pipeline that wants deterministic tool-calling (a routing node that MUST produce a tool call, a guarded LLM call that MUST NOT call tools) should pin `tool_choice` explicitly rather than relying on the provider default. -Pre-send validation catches the three §5 failure modes (`required` with empty tools, `ForceTool` with empty tools, `ForceTool.name` not in tools) and raises `ProviderInvalidRequest` before the HTTP call. Not all providers honor `tool_choice` — confirm with your provider's docs — but the OpenAI-compatible mapping is in `OpenAIProvider`. +Pre-send validation catches the three §5 failure modes (`required` with empty tools, `ForceTool` with empty tools, `ForceTool.name` not in tools) and raises `ProviderInvalidRequest` before the HTTP call. Not all providers honor `tool_choice` (confirm with your provider's docs), but the OpenAI-compatible mapping is in `OpenAIProvider`. ### Always `await graph.drain()` in short-lived processes; supply a `timeout` if observers might hang -`CompiledGraph.invoke()` returns when the graph reaches END or raises; observer events are dispatched onto a per-invocation queue and delivered by a background worker. The graph's execution loop never awaits observer processing. In a long-running service this is invisible — the worker drains naturally. In a CLI, script, or serverless function, the process exits before the worker finishes, and any late observer events (typically the last node's `completed` event plus any `checkpoint_saved` events) get dropped. +`CompiledGraph.invoke()` returns when the graph reaches END or raises; observer events are dispatched onto a per-invocation queue and delivered by a background worker. The graph's execution loop never awaits observer processing. In a long-running service this is invisible: the worker drains naturally. In a CLI, script, or serverless function, the process exits before the worker finishes, and any late observer events (typically the last node's `completed` event plus any `checkpoint_saved` events) get dropped. Always call `await graph.drain()` before the short-lived process exits. If your observer set includes anything that might hang (a metrics observer with a flaky network endpoint, an OTel exporter behind a slow OTLP collector), supply a `timeout`: @@ -1371,32 +1371,32 @@ if summary.timeout_reached: log.warning("drain truncated: %d events undelivered", summary.undelivered_count) ``` -The compiled graph stays usable for subsequent invocations after a timed-out drain — workers are cancelled cleanly, no partial state leaks. +The compiled graph stays usable for subsequent invocations after a timed-out drain: workers are cancelled cleanly, no partial state leaks. ### `install_log_bridge` skips its own handler when the application already attached one to the same `LoggerProvider` Two distinct classes both named `LoggingHandler` exist in the OTel Python ecosystem and both bridge stdlib log records to the OTel Logs SDK: -- `opentelemetry.sdk._logs.LoggingHandler` (the SDK class). Typically attached by an application's own logging setup — e.g., a FastAPI `setup_logging(...)` step that wires up an OTLP-backed `LoggerProvider` for log export. +- `opentelemetry.sdk._logs.LoggingHandler` (the SDK class). Typically attached by an application's own logging setup, e.g., a FastAPI `setup_logging(...)` step that wires up an OTLP-backed `LoggerProvider` for log export. - `opentelemetry.instrumentation.logging.handler.LoggingHandler` (the instrumentation class). What `openarmature.observability.otel.install_log_bridge` attaches when it runs. -Different classes, same OTel-Logs export path. If both are attached against the same `LoggerProvider`, every stdlib log record fires through both handlers, both call `provider.get_logger(...).emit(...)`, and `BatchLogRecordProcessor` ships the record TWICE to the OTLP endpoint. The duplication is OTLP-only — a console handler attached separately is unaffected, which makes "OTLP rows are doubled, console isn't" a head-scratcher to diagnose. +Different classes, same OTel-Logs export path. If both are attached against the same `LoggerProvider`, every stdlib log record fires through both handlers, both call `provider.get_logger(...).emit(...)`, and `BatchLogRecordProcessor` ships the record TWICE to the OTLP endpoint. The duplication is OTLP-only; a console handler attached separately is unaffected, which makes "OTLP rows are doubled, console isn't" a head-scratcher to diagnose. -`install_log_bridge` detects either handler class against the same provider and skips its own `addHandler` accordingly; the `openarmature.correlation_id` LogRecord factory still installs. The check is provider-scoped, so an application that intentionally attaches a handler against a DIFFERENT `LoggerProvider` (a separate logs pipeline) still gets the OA bridge against the OA provider — the helper only dedups when the SAME provider would receive duplicate emissions. +`install_log_bridge` detects either handler class against the same provider and skips its own `addHandler` accordingly; the `openarmature.correlation_id` LogRecord factory still installs. The check is provider-scoped, so an application that intentionally attaches a handler against a DIFFERENT `LoggerProvider` (a separate logs pipeline) still gets the OA bridge against the OA provider; the helper only dedups when the SAME provider would receive duplicate emissions. ### Three exception hierarchies; know which one your code catches `openarmature` exceptions split across three sibling hierarchies: -- `RuntimeGraphError` (in `openarmature.graph`) — node execution failures: `NodeException`, `RoutingError`, `EdgeException`, `ReducerError`, `StateValidationError`. Each has a `category` string matching the spec's canonical error categories. -- `CheckpointError` (in `openarmature.checkpoint`) — persistence failures: `CheckpointNotFound`, `CheckpointSaveFailed`, `CheckpointRecordInvalid`, `CheckpointStateMigrationMissing`, `CheckpointStateMigrationFailed`, `CheckpointStateMigrationChainAmbiguous`. -- `LlmProviderError` (in `openarmature.llm`) — provider call failures: `ProviderAuthentication`, `ProviderInvalidRequest`, `ProviderInvalidResponse`, `ProviderInvalidModel`, `ProviderModelNotLoaded`, `ProviderRateLimit`, `ProviderUnavailable`, `ProviderUnsupportedContentBlock`, `StructuredOutputInvalid`. +- `RuntimeGraphError` (in `openarmature.graph`): node execution failures: `NodeException`, `RoutingError`, `EdgeException`, `ReducerError`, `StateValidationError`. Each has a `category` string matching the spec's canonical error categories. +- `CheckpointError` (in `openarmature.checkpoint`): persistence failures: `CheckpointNotFound`, `CheckpointSaveFailed`, `CheckpointRecordInvalid`, `CheckpointStateMigrationMissing`, `CheckpointStateMigrationFailed`, `CheckpointStateMigrationChainAmbiguous`. +- `LlmProviderError` (in `openarmature.llm`): provider call failures: `ProviderAuthentication`, `ProviderInvalidRequest`, `ProviderInvalidResponse`, `ProviderInvalidModel`, `ProviderModelNotLoaded`, `ProviderRateLimit`, `ProviderUnavailable`, `ProviderUnsupportedContentBlock`, `StructuredOutputInvalid`. -Catching `Exception` works but is too broad; catching one hierarchy misses the other two. If you want to branch on category strings (e.g., for retry logic), catch the relevant base — `RuntimeGraphError` covers all five spec runtime categories, `LlmProviderError` covers all nine provider categories, `CheckpointError` covers all six checkpoint categories. The `TRANSIENT_CATEGORIES` frozenset in `openarmature.llm` enumerates which provider categories are retriable. +Catching `Exception` works but is too broad; catching one hierarchy misses the other two. If you want to branch on category strings (e.g., for retry logic), catch the relevant base: `RuntimeGraphError` covers all five spec runtime categories, `LlmProviderError` covers all nine provider categories, `CheckpointError` covers all six checkpoint categories. The `TRANSIENT_CATEGORIES` frozenset in `openarmature.llm` enumerates which provider categories are retriable. ### Filter `openarmature.*`-namespaced events when your observer only cares about user nodes -OA emits observer events under sentinel node-names for its own internal dispatch: `openarmature.llm.complete` for LLM provider calls (proposal 0024), `openarmature.checkpoint.migrate` for state-migration runs (proposal 0014), `openarmature.checkpoint.save` for checkpoint saves (proposal 0010). These events let the OTel / Langfuse observers emit LLM-provider spans, checkpoint-migrate spans, etc. — but a custom observer that only cares about user-defined node activity sees them as noise: +OA emits observer events under sentinel node-names for its own internal dispatch: `openarmature.llm.complete` for LLM provider calls (proposal 0024), `openarmature.checkpoint.migrate` for state-migration runs (proposal 0014), `openarmature.checkpoint.save` for checkpoint saves (proposal 0010). These events let the OTel / Langfuse observers emit LLM-provider spans, checkpoint-migrate spans, etc., but a custom observer that only cares about user-defined node activity sees them as noise: ```python async def __call__(self, event: NodeEvent) -> None: @@ -1406,11 +1406,11 @@ async def __call__(self, event: NodeEvent) -> None: # … user-node handling ``` -`event.namespace[0]` is the safest discriminator (the leaf `event.node_name` would also work for LLM events but won't match the checkpoint sentinels since those repurpose `node_name` differently). Don't try to filter on `current_invocation_id() is None` — OA-internal events are dispatched within the same invocation context as user-node events, so `invocation_id` is set for both; the namespace-prefix check is the stable contract. +`event.namespace[0]` is the safest discriminator (the leaf `event.node_name` would also work for LLM events but won't match the checkpoint sentinels since those repurpose `node_name` differently). Don't try to filter on `current_invocation_id() is None`: OA-internal events are dispatched within the same invocation context as user-node events, so `invocation_id` is set for both; the namespace-prefix check is the stable contract. ### Fan-out subgraphs that emit `list[X]` per instance produce `list[list[X]]` at `target_field` -When a fan-out's per-instance state collects a `list[X]` as its `collect_field` (e.g., each instance produces 0..N records), the engine's contribution step is `[s[cfg.collect_field] for s in successes]` — every instance's value becomes one element of the outer list. With `list[X]` per-instance, the parent receives `list[list[X]]`, and the default `append` reducer on the parent's `Annotated[list[X], append]` field preserves the nesting verbatim. Pydantic then fails to validate each `list[X]` element against `X`: +When a fan-out's per-instance state collects a `list[X]` as its `collect_field` (e.g., each instance produces 0..N records), the engine's contribution step is `[s[cfg.collect_field] for s in successes]`; every instance's value becomes one element of the outer list. With `list[X]` per-instance, the parent receives `list[list[X]]`, and the default `append` reducer on the parent's `Annotated[list[X], append]` field preserves the nesting verbatim. Pydantic then fails to validate each `list[X]` element against `X`: ``` attributed_candidates.0 Input should be a valid dictionary or @@ -1418,7 +1418,7 @@ attributed_candidates.0 Input should be a valid dictionary or input_type=list] ``` -The fix is the `concat_flatten` built-in reducer (proposal 0036) — the list-of-lists analog of `append`. Declare it on the parent's collection field: +The fix is the `concat_flatten` built-in reducer (proposal 0036), the list-of-lists analog of `append`. Declare it on the parent's collection field: ```python from typing import Annotated @@ -1431,7 +1431,7 @@ class PipelineState(State): attributed_candidates: Annotated[list[ClaimCandidate], concat_flatten] = Field(default_factory=list) ``` -`concat_flatten` folds the per-instance lists into one flat list (`[*prior, *(item for sublist in update for item in sublist)]`), strict like `append` — it raises `ReducerError` if any element of the update isn't itself a list. +`concat_flatten` folds the per-instance lists into one flat list (`[*prior, *(item for sublist in update for item in sublist)]`), strict like `append`: it raises `ReducerError` if any element of the update isn't itself a list. The dict-shaped analog is `merge_all` (also proposal 0036): when each fan-out instance contributes a `dict[str, X]`, the parent's `target_field` receives `list[dict]`, which plain `merge` can't consume. `merge_all` folds the sequence of mappings into the prior with shallow last-write-wins per key: @@ -1446,9 +1446,9 @@ class PipelineState(State): keyed_results: Annotated[dict[str, Result], merge_all] = Field(default_factory=dict) ``` -Single-record-per-instance fan-outs (`collect_field: str`, parent field `Annotated[list[X], append]`) don't hit this — the engine still wraps each instance's value as one element, but `append` flattens it correctly since each element is already an `X`. The two non-flat shapes emerge only when the per-instance value is itself a container: a `list[X]` per instance lands `list[list[X]]` (use `concat_flatten`), and a `dict[str, X]` per instance lands `list[dict]` (use `merge_all`). +Single-record-per-instance fan-outs (`collect_field: str`, parent field `Annotated[list[X], append]`) don't hit this; the engine still wraps each instance's value as one element, but `append` flattens it correctly since each element is already an `X`. The two non-flat shapes emerge only when the per-instance value is itself a container: a `list[X]` per instance lands `list[list[X]]` (use `concat_flatten`), and a `dict[str, X]` per instance lands `list[dict]` (use `merge_all`). -If a parent field is populated by BOTH direct node writes AND fan-out collection, that's an architectural ambiguity worth fixing upstream — split into two fields, or pick one path. +If a parent field is populated by BOTH direct node writes AND fan-out collection, that's an architectural ambiguity worth fixing upstream: split into two fields, or pick one path. ## Example index diff --git a/src/openarmature/_patterns/bypass-if-output-exists.md b/src/openarmature/_patterns/bypass-if-output-exists.md index 6b0f215..f371016 100644 --- a/src/openarmature/_patterns/bypass-if-output-exists.md +++ b/src/openarmature/_patterns/bypass-if-output-exists.md @@ -81,7 +81,7 @@ per-fan-out-instance, depending on the scope of the bypass. addressable output, downloading a file). - The "does it exist" check is cheap (a filesystem `stat`, a Redis `EXISTS`, a database key lookup). -- You're OK with the node being skipped silently — the partial +- You're OK with the node being skipped silently; the partial update returned by the middleware is indistinguishable from a successful node run. @@ -91,7 +91,7 @@ per-fan-out-instance, depending on the scope of the bypass. the node. The cost model inverts; the pattern is wrong. - You need to *force* re-execution on demand (cache invalidation). Add a `force_rerun: bool` field on state that the middleware - consults — but if you're doing that often, the bypass logic + consults. But if you're doing that often, the bypass logic belongs in the node itself, gated on a state field, not in middleware. - The cached output's freshness depends on inputs the middleware @@ -101,10 +101,10 @@ per-fan-out-instance, depending on the scope of the bypass. ## Cross-references -- [Middleware](https://openarmature.ai/concepts/middleware/) — middleware shape, the +- [Middleware](https://openarmature.ai/concepts/middleware/): middleware shape, the four registration sites, composition. - Spec: [pipeline-utilities](https://openarmature.org/capabilities/pipeline-utilities/) This pattern is explicitly called out in proposal 0008's *Alternatives considered* section as a userland recipe rather than -spec'd behavior — this page is its canonical home. +spec'd behavior; this page is its canonical home. diff --git a/src/openarmature/_patterns/parameterized-entry-point.md b/src/openarmature/_patterns/parameterized-entry-point.md index 7823479..81f44dc 100644 --- a/src/openarmature/_patterns/parameterized-entry-point.md +++ b/src/openarmature/_patterns/parameterized-entry-point.md @@ -11,7 +11,7 @@ execution should begin. The graph stays a single graph; what differs across runs is which branch the conditional edge takes. Combine with [checkpointing](https://openarmature.ai/concepts/checkpointing/) if you -want resume-style behavior — skip nodes whose work is already +want resume-style behavior: skip nodes whose work is already captured in state. ## Snippet @@ -78,7 +78,7 @@ fields the chosen branch needs) and the graph routes accordingly. - You have a few canonical entry points and the choice between them is data, not control flow. -- You want to skip work already done in a prior run — combine with +- You want to skip work already done in a prior run; combine with [checkpointing](https://openarmature.ai/concepts/checkpointing/) to pick up where you left off. - Your "different entry points" share state structure and most of @@ -90,7 +90,7 @@ fields the chosen branch needs) and the graph routes accordingly. it's a different compiled graph. Don't bend one graph into two; two graphs are easier to test and reason about. - The number of entry points grows unboundedly. Then you're - reimplementing routing — consider a higher-level dispatch layer + reimplementing routing; consider a higher-level dispatch layer that picks which graph to invoke. ## Cross-references diff --git a/src/openarmature/_patterns/session-as-checkpoint-resume.md b/src/openarmature/_patterns/session-as-checkpoint-resume.md index 84390bd..77c98ea 100644 --- a/src/openarmature/_patterns/session-as-checkpoint-resume.md +++ b/src/openarmature/_patterns/session-as-checkpoint-resume.md @@ -87,7 +87,7 @@ state and the session table holds the join keys. - Your application has long-lived sessions with multiple LLM turns and you want the prior state to be the starting point of the next turn. -- You're already running a checkpointer for crash resume — this +- You're already running a checkpointer for crash resume; this pattern is "use it more." - Cross-turn state has clean reducer semantics: `merge` for accumulating dicts, `append` for growing lists. @@ -97,7 +97,7 @@ state and the session table holds the join keys. - A session's "state" is bigger than fits comfortably in a single graph state shape. Split into multiple graphs and share an external store keyed by session. -- Turns are completely independent — there's no value in carrying +- Turns are completely independent; there's no value in carrying state across them. Then just run each turn as a fresh invoke. - The application already has its own state-management layer that conflicts with OA's frozen-state model. Use OA per-turn without @@ -105,10 +105,10 @@ state and the session table holds the join keys. ## Cross-references -- [Checkpointing](https://openarmature.ai/concepts/checkpointing/) — backend wiring, +- [Checkpointing](https://openarmature.ai/concepts/checkpointing/): backend wiring, `resume_invocation`, schema migration. -- [State and reducers](https://openarmature.ai/concepts/state-and-reducers/) — `merge` +- [State and reducers](https://openarmature.ai/concepts/state-and-reducers/): `merge` and `append` reducer strategies. -- [`examples/08-checkpointing-and-migration`](https://openarmature.ai/examples/08-checkpointing-and-migration/) — +- [`examples/08-checkpointing-and-migration`](https://openarmature.ai/examples/08-checkpointing-and-migration/): single-resume baseline. - Spec: [pipeline-utilities](https://openarmature.org/capabilities/pipeline-utilities/) diff --git a/src/openarmature/_patterns/tool-dispatch-as-node.md b/src/openarmature/_patterns/tool-dispatch-as-node.md index 6ed6bc5..7145a70 100644 --- a/src/openarmature/_patterns/tool-dispatch-as-node.md +++ b/src/openarmature/_patterns/tool-dispatch-as-node.md @@ -13,7 +13,7 @@ LLM node if the model wants more turns. The exit is the conditional edge routing to a `present` node (or `END`) when the assistant returns no `tool_calls`. -No "agent framework" abstraction — the loop is just a graph cycle +No "agent framework" abstraction; the loop is just a graph cycle on top of [`Tool`, `ToolCall`, `ToolMessage`](https://openarmature.ai/concepts/llms/). ## Snippet @@ -99,7 +99,7 @@ for malformed `ToolCall.arguments`, and trace output. - The model needs to call local Python functions and react to their results. -- The loop is bounded — either by `MAX_TURNS`, by the model +- The loop is bounded, either by `MAX_TURNS`, by the model signaling it's done, or by both. - Tool results are textual or JSON-serializable and fit cleanly into `ToolMessage.content`. @@ -120,11 +120,11 @@ for malformed `ToolCall.arguments`, and trace output. ## Cross-references -- [LLMs concept page](https://openarmature.ai/concepts/llms/) — `Tool`, `ToolCall`, +- [LLMs concept page](https://openarmature.ai/concepts/llms/): `Tool`, `ToolCall`, `ToolMessage` types and the `complete(messages, tools=...)` contract. -- [State and reducers](https://openarmature.ai/concepts/state-and-reducers/) — +- [State and reducers](https://openarmature.ai/concepts/state-and-reducers/): `append` reducer semantics. -- [`examples/09-tool-use`](https://openarmature.ai/examples/09-tool-use/) — runnable +- [`examples/09-tool-use`](https://openarmature.ai/examples/09-tool-use/): runnable reference implementation. - Spec: [llm-provider](https://openarmature.org/capabilities/llm-provider/)