LunarCommand · chris-colinsky · Jun 1, 2026 · Jun 1, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,10 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
 
 ## [Unreleased]
 
+### Changed
+
+- **Docs sweep: stale references and em-dash normalization.** Fixed three definite stale references (`spec_version='0.26.0'` in the Langfuse example output now reads `'0.38.0'`; the dangling `v0.16.1` qualifier dropped from the parallel-branches concept page; `compiled.attach_observer` corrected to `graph.attach_observer` in `non-obvious-shapes.md` for variable-name consistency with the rest of the docs). Swept em dashes out of the user-facing docs (130 instances across 17 files) per the convention set during the patterns expansion. mkdocs strict build clean; no broken intra-docs links.
+
 ### Added
 
 - **vLLM production deployment notes.** `docs/model-providers/vllm.md` grows a "Production deployment" section covering the `VLLM_HTTP_TIMEOUT_KEEP_ALIVE` gotcha (vLLM's stock 5s uvicorn keep-alive lapses pooled OA-side httpx connections and surfaces as `ProviderUnavailable`; widen to roughly 300s), a systemd unit skeleton, and the three throughput knobs that interact with OA's shared connection pool (`--max-model-len`, `--max-num-seqs`, `--gpu-memory-utilization`). The existing "Tool calling" section grows a `--tool-call-parser` family table verified against vLLM's docs (Llama 3.x / Llama 4 / Mistral / Hermes / Qwen3 / DeepSeek V3 / GPT-OSS), plus explicit "not supported here" callouts for Anthropic / Gemini (proprietary cloud) and mainstream Gemma (no vLLM parser).

diff --git a/docs/agent/non-obvious-shapes.md b/docs/agent/non-obvious-shapes.md
diff --git a/docs/agent/tldr.md b/docs/agent/tldr.md
@@ -1,3 +1,3 @@
-OpenArmature is a workflow framework for LLM pipelines and tool-calling agents — typed state, compile-time topology checks, observability, and crash-safe checkpoints baked into a graph engine. The graph layer has no concept of LLMs or tools; the same primitives drive deterministic ETL pipelines and tool-calling agents alike. Nodes return partial updates; the engine merges into a frozen state snapshot. Behavior is defined by [openarmature-spec](https://openarmature.org/capabilities/) and verified by conformance fixtures; this package is the reference Python implementation.
+OpenArmature is a workflow framework for LLM pipelines and tool-calling agents: typed state, compile-time topology checks, observability, and crash-safe checkpoints baked into a graph engine. The graph layer has no concept of LLMs or tools; the same primitives drive deterministic ETL pipelines and tool-calling agents alike. Nodes return partial updates; the engine merges into a frozen state snapshot. Behavior is defined by [openarmature-spec](https://openarmature.org/capabilities/) and verified by conformance fixtures; this package is the reference Python implementation.
 
 **What OpenArmature is NOT:** not a chat framework (no built-in messages channel), not an LLM SDK (Provider is the abstraction layer; OpenAIProvider is the canonical impl), not a state-management library (state is per-invocation, not application-wide), not an evaluation framework (deferred to `openarmature-eval`).
diff --git a/docs/concepts/checkpointing.md b/docs/concepts/checkpointing.md
@@ -117,7 +117,7 @@ Field framing worth getting right:
   per-instance entry carries an explicit `result_is_error` boolean
   that discriminates success contributions (roll forward into
   `target_field`) from `collect`-mode error contributions (roll
-  forward into `errors_field`) — the engine reads the explicit field
+  forward into `errors_field`). The engine reads the explicit field
   on resume rather than inferring routing from the shape of `result`.
   Empty tuple when no fan-outs are in flight. See
   [Resume semantics](fan-out.md#resume-semantics) on the fan-out
@@ -222,7 +222,7 @@ deserializes the result into your current state class.
 
 **Canonical source for `schema_version`.** The framework reads
 `schema_version` from the state class declared at graph construction
-time — the class passed to `GraphBuilder(...)`. If you pass a State
+time: the class passed to `GraphBuilder(...)`. If you pass a State
 subclass instance at runtime whose `schema_version` shadows the
 declared class's value, the saved record still carries the declared
 class's value. This rule keeps every save site within an invocation

diff --git a/docs/concepts/fan-out.md b/docs/concepts/fan-out.md
@@ -111,15 +111,15 @@ into prior state:
   declare `append` on `Annotated[list[X], append]`. Each instance's
   value is already an `X`; `append` concatenates cleanly.
 - Each instance emits a `list[X]` (0..N records per instance) → the
-  engine lands `list[list[X]]`. Declare `concat_flatten` instead —
+  engine lands `list[list[X]]`. Declare `concat_flatten` instead;
   it flattens one level so the parent field stays `list[X]`. Plain
   `append` would leave the nesting and fail Pydantic validation.
 - Each instance emits a `dict[str, X]` → the engine lands
   `list[dict]`. Declare `merge_all`, which folds the mappings into
   the parent dict with last-write-wins per key. Plain `merge` can't
   consume a `list[dict]`.
 
-`concat_flatten` and `merge_all` are strict — they raise
+`concat_flatten` and `merge_all` are strict: they raise
 `ReducerError` if an update element isn't the expected list/mapping
 shape. See [state and reducers](state-and-reducers.md#five-built-in-reducers).
 

diff --git a/docs/concepts/llms.md b/docs/concepts/llms.md
@@ -99,42 +99,42 @@ async def startup() -> None:
     try:
         await provider.ready()
     except ProviderAuthentication:
-        # Bad API key — fail fast at boot.
+        # Bad API key: fail fast at boot.
         raise
     except ProviderInvalidModel:
-        # Bound model isn't served by this endpoint — same.
+        # Bound model isn't served by this endpoint: same.
         raise
     except ProviderUnavailable:
-        # Endpoint is down or unreachable — fail fast too.
+        # Endpoint is down or unreachable: fail fast too.
         raise
 ```
 
 `OpenAIProvider` ships three probe shapes selected via the
 `readiness_probe` constructor kwarg:
 
-- **`"chat_completions"`** (default) — issues `POST /v1/chat/completions`
+- **`"chat_completions"`** (default): issues `POST /v1/chat/completions`
   with a `max_tokens=1` body. Actually exercises the inference wire
   path. Strongest signal at the cost of one prompt's worth of tokens
   on cloud endpoints.
-- **`"models"`** — issues `GET /v1/models` and verifies the bound
+- **`"models"`**: issues `GET /v1/models` and verifies the bound
   model appears in the catalog. Cheaper (no completion billing) but
   blind to proxy wire-mismatch cases: some OpenAI-compatible proxies
   (Bifrost is the motivating example) serve `/v1/models` correctly
   while 405'ing the completions endpoint, so a green catalog probe
   doesn't prove `complete()` will work.
-- **`"both"`** — runs the catalog probe first (cheap fail-fast on
+- **`"both"`**: runs the catalog probe first (cheap fail-fast on
   model-not-in-catalog with the cleaner `seen_ids` diagnostic), then
   the chat probe. Strongest signal at double the round-trip cost.
 
 ```python
-# Local server (LM Studio, vLLM, llama.cpp) — chat probe is free.
+# Local server (LM Studio, vLLM, llama.cpp): chat probe is free.
 provider = OpenAIProvider(
     base_url="http://localhost:8000",
     model="qwen2.5-coder",
     readiness_probe="chat_completions",  # default
 )
 
-# Cloud endpoint, cost-sensitive — opt back into the catalog-only probe.
+# Cloud endpoint, cost-sensitive: opt back into the catalog-only probe.
 provider = OpenAIProvider(
     base_url="https://api.openai.com",
     model="gpt-4o-mini",
@@ -342,14 +342,14 @@ shape.
 By default the model decides whether and which tools to call.
 `tool_choice` constrains that decision per call. Four modes:
 
-- `"auto"` — the model decides. Equivalent to omitting the parameter
+- `"auto"`: the model decides. Equivalent to omitting the parameter
   when `tools` is non-empty.
-- `"required"` — the model MUST call at least one tool. Useful for
+- `"required"`: the model MUST call at least one tool. Useful for
   routing nodes that branch on tool selection.
-- `"none"` — the model MUST NOT call tools, even if `tools` is
+- `"none"`: the model MUST NOT call tools, even if `tools` is
   supplied. Useful for guarded LLM calls or for explicitly disabling
   tool-calling without rebuilding a tools-less request.
-- `ForceTool(name=...)` — the model MUST call the named tool exactly.
+- `ForceTool(name=...)`: the model MUST call the named tool exactly.
 
 Pre-send validation catches the three failure modes (`required` with
 empty tools, `ForceTool` with empty tools, `ForceTool.name` not in
@@ -371,7 +371,7 @@ response = await provider.complete(
 )
 ```
 
-Not all providers honor `tool_choice` — confirm with your provider's
+Not all providers honor `tool_choice`; confirm with your provider's
 documentation. The `OpenAIProvider` maps the spec shape onto OpenAI's
 wire shape per the §8.1.1 mapping table. Whether the model actually
 honored the constraint is observable from the returned

diff --git a/docs/concepts/observability.md b/docs/concepts/observability.md
@@ -244,16 +244,16 @@ or missing log entries.
 ### Bounded drain (optional timeout)
 
 `drain()` accepts an optional `timeout` parameter (non-negative
-seconds) — `await compiled.drain(timeout=5.0)` bounds the wait at five
+seconds): `await compiled.drain(timeout=5.0)` bounds the wait at five
 seconds. When the deadline fires, in-flight workers are cancelled
-cleanly so the compiled graph stays usable for subsequent invocations
-— partial delivery state from one drain does NOT leak into the next.
+cleanly so the compiled graph stays usable for subsequent invocations;
+partial delivery state from one drain does NOT leak into the next.
 
 The returned `DrainSummary` carries:
 
-- `timeout_reached: bool` — `True` only when the timeout actually
+- `timeout_reached: bool`: `True` only when the timeout actually
   fired. A drain that finishes before the deadline reports `False`.
-- `undelivered_count: int` — events dispatched but not fully delivered
+- `undelivered_count: int`: events dispatched but not fully delivered
   to every subscribed observer before the deadline. Always `0` when
   `timeout_reached is False`.
 
@@ -305,8 +305,8 @@ the IDs explicitly.
 ## Caller-supplied invocation metadata
 
 `correlation_id` is one string; if you also need to attach
-business-domain identifiers — tenant IDs, request IDs, feature
-flags, A/B cohort labels — pass them as a structured mapping at
+business-domain identifiers (tenant IDs, request IDs, feature
+flags, A/B cohort labels), pass them as a structured mapping at
 `invoke()` time:
 
 ```python
@@ -324,7 +324,7 @@ await compiled.invoke(
 Every observability backend picks the entries up:
 
 - **OTel** emits each entry as an `openarmature.user.<key>`
-  cross-cutting span attribute on every span — invocation, node,
+  cross-cutting span attribute on every span: invocation, node,
   subgraph wrapper, fan-out instance, LLM provider, retry attempt.
   Backends that consume OTel attributes (Phoenix / Arize, Honeycomb,
   Datadog APM, HyperDX, Grafana Tempo, custom collectors) see them
@@ -345,7 +345,7 @@ Two rules:
   `int`, `float`, `bool`) or homogeneous arrays of those types.
   `None`, nested objects, and mixed-type arrays are rejected.
 
-Violations raise `ValueError` synchronously — no spans emitted, no
+Violations raise `ValueError` synchronously: no spans emitted, no
 work runs.
 
 ### Adding entries mid-invocation
@@ -371,7 +371,7 @@ node's `started`, any LLM call inside) pick up the new entries.
 **Per-async-context scoping.** The metadata mapping lives in a
 `ContextVar`, which Python copies on async-task creation. Fan-out
 instances and parallel-branches each receive their own copy at
-dispatch time — an instance that calls `set_invocation_metadata`
+dispatch time; an instance that calls `set_invocation_metadata`
 does NOT leak its augmentation to sibling instances. This is the
 canonical pattern for per-instance identifiers:
 
@@ -472,7 +472,7 @@ Cross-vendor attribute names every LLM-aware backend reads
 (Langfuse, Phoenix, Honeycomb's LLM lens, OpenInference-aware
 tools). Emitted alongside the OA namespace:
 
-- `gen_ai.system` — `"openai"` by default; override per provider
+- `gen_ai.system`: `"openai"` by default; override per provider
   instance to `"vllm"` / `"lm_studio"` / `"llama_cpp"` / etc. when
   the OpenAI Chat Completions wire format is hitting a non-OpenAI
   endpoint:
@@ -485,16 +485,16 @@ tools). Emitted alongside the OA namespace:
   )
   ```
 
-- `gen_ai.request.model` / `gen_ai.response.model` — the bound
+- `gen_ai.request.model` / `gen_ai.response.model`: the bound
   model and (when the provider returns one) the more-specific
   identifier in the response body.
 - `gen_ai.request.temperature` / `max_tokens` / `top_p` / `seed` /
-  `frequency_penalty` / `presence_penalty` / `stop_sequences` —
+  `frequency_penalty` / `presence_penalty` / `stop_sequences`:
   only emitted for fields the caller actually set; absence on
   the span means "not supplied," distinct from a zero value.
-- `gen_ai.usage.input_tokens` / `output_tokens` — token counts.
-- `gen_ai.response.finish_reasons` — single-element string array.
-- `gen_ai.response.id` — when the provider returns one.
+- `gen_ai.usage.input_tokens` / `output_tokens`: token counts.
+- `gen_ai.response.finish_reasons`: single-element string array.
+- `gen_ai.response.id`: when the provider returns one.
 
 Disable the GenAI semconv set with `OTelObserver(disable_genai_semconv=True)`
 when an external auto-instrumentation library (OpenInference,
@@ -515,12 +515,12 @@ observer = OTelObserver(
 
 This surfaces three attributes:
 
-- `openarmature.llm.input.messages` — JSON-encoded message array
+- `openarmature.llm.input.messages`: JSON-encoded message array
   (the spec §3 message shape: `{role, content, tool_calls?, …}`).
-- `openarmature.llm.output.content` — the assistant's response
+- `openarmature.llm.output.content`: the assistant's response
   content string verbatim. Omitted for tool-call-only responses
   with empty content.
-- `openarmature.llm.request.extras` — JSON-encoded `RuntimeConfig`
+- `openarmature.llm.request.extras`: JSON-encoded `RuntimeConfig`
   extras bag (provider-specific pass-through fields like
   `repetition_penalty` for vLLM, or `top_k` for HuggingFace
   endpoints). Omitted when empty.
@@ -543,7 +543,7 @@ that fits within `cap - len(marker)` bytes followed by the marker:
 ```
 
 where M is the pre-truncation byte length. The marker is appended
-outside any JSON encoding — a truncated attribute is *not* parseable
+outside any JSON encoding, so a truncated attribute is *not* parseable
 JSON, which is the clean signal backend code can use to detect
 truncation without a separate flag.
 
@@ -563,7 +563,7 @@ provider, *before* the payload reaches the observer:
 
 The `media_type` and `detail` fields are preserved at the image-block
 level (per llm-provider §3.1.2); only `source` is replaced. URL-form
-images pass through unchanged — the URL is a short string and is
+images pass through unchanged: the URL is a short string and is
 informative for trace readers.
 
 Redaction is **not** gated by `disable_llm_payload` and is **not**
@@ -626,7 +626,7 @@ observer = OTelObserver(
 ```
 
 Each enricher receives the live `Span` plus the `NodeEvent` that
-triggered the close (or `None` on synthetic close sites — subgraph
+triggered the close (or `None` on synthetic close sites: subgraph
 dispatch, detached root, fan-out instance, invocation span,
 shutdown drain). Setting attributes inside this hook works
 correctly; doing it from a `SpanProcessor.on_end` callback does
@@ -668,9 +668,9 @@ full pattern.
 
 `OTelObserver.shutdown()` calls `provider.shutdown()` on the private
 `TracerProvider`, which per OTel SDK contract flushes every
-registered span processor. Under unusual teardown orderings — for
+registered span processor. Under unusual teardown orderings (for
 example, FastAPI's `TestClient` teardown that closes the event loop
-before a `BatchSpanProcessor`'s export thread finishes — spans can
+before a `BatchSpanProcessor`'s export thread finishes), spans can
 appear dropped. Two workarounds:
 
 - Call `observer._provider.force_flush(timeout_millis=...)`
@@ -682,7 +682,7 @@ appear dropped. Two workarounds:
 ## Langfuse mapping (opt-in)
 
 A second sibling observer maps the same `NodeEvent` stream onto
-Langfuse's native Trace + Observation data model — Traces at the
+Langfuse's native Trace + Observation data model: Traces at the
 top, Span observations for graph nodes, Generation observations for
 LLM calls. Use it instead of (or alongside) the OTel observer when
 your trace UI is Langfuse and you want first-class Generation
@@ -699,7 +699,7 @@ observer = LangfuseObserver(client=client)
 graph.attach_observer(observer)
 ```
 
-The `client` is anything matching the `LangfuseClient` Protocol —
+The `client` is anything matching the `LangfuseClient` Protocol:
 the bundled `InMemoryLangfuseClient` (used by the conformance
 harness, useful for unit tests), or a real `langfuse.Langfuse()`
 instance wrapped in `LangfuseSDKAdapter` for production. Install
@@ -749,7 +749,7 @@ for a runnable demo.
     matching the `LangfuseClient` Protocol's four methods.
 
     A runtime `isinstance(adapter, LangfuseClient)` check ships in
-    the unit suite — if a future v4 patch breaks the Protocol's
+    the unit suite, so if a future v4 patch breaks the Protocol's
     surface, the test fails loudly.
 
 ### What Langfuse sees
@@ -772,7 +772,7 @@ for a runnable demo.
 
 ### Payload + truncation
 
-`disable_llm_payload` mirrors the OTel observer's flag — defaults
+`disable_llm_payload` mirrors the OTel observer's flag and defaults
 to `True` for the same privacy reason. Flip to `False` to populate
 `generation.input` / `output` / `metadata.request_extras` from the
 LLM event payload.
@@ -804,7 +804,7 @@ the Generation observation links to that entity natively (spec
 
 The two observers are independent §6 event consumers and can be
 attached together. They share the `correlation_id` as the
-cross-backend join key — find a slow Generation in Langfuse, search
+cross-backend join key: find a slow Generation in Langfuse, search
 for its `correlation_id` in OTel logs, see the surrounding
 infrastructure activity.
 

diff --git a/docs/concepts/parallel-branches.md b/docs/concepts/parallel-branches.md
@@ -105,7 +105,7 @@ wrap that branch's whole subgraph invocation as a unit. Retry
 middleware on a branch retries the **whole branch**: a fresh
 subgraph invocation each time, fresh inner-node execution. The
 wrapping retry's attempt counter propagates to events emitted from
-inner nodes (per graph-engine §6 v0.16.1), so observer events
+inner nodes (per graph-engine §6), so observer events
 inside the branch correctly show `attempt_index` ticking across
 retries.