LunarCommand · chris-colinsky · May 23, 2026 · May 23, 2026 · May 23, 2026 · May 23, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,39 @@ All notable changes to `openarmature-python` are documented in this file.
 
 The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The package follows [Semantic Versioning](https://semver.org/); pre-1.0 minor bumps may carry behavioral changes per [spec governance](https://github.com/LunarCommand/openarmature-spec/blob/main/GOVERNANCE.md).
 
+## [Unreleased]
+
+LLM-provider span payload and GenAI semconv release. Pinned spec
+jumps from v0.16.1 to v0.17.0 (proposal 0024 / observability §5.5
+expansion). The trigger was a friction report from a downstream
+agent integrating OA with Langfuse over OTLP: LLM spans rendered
+"naked" (model + tokens only), prompt linkage silently dropped at
+the dispatch-worker task boundary, and every backend needed a
+per-service attribute-mapping shim. This release clears all eight
+items in that report.
+
+### Added
+
+- **`openarmature.llm.input.messages` / `openarmature.llm.output.content` / `openarmature.llm.request.extras` span attributes (spec §5.5.1).** When the OTel observer is constructed with `disable_llm_payload=False`, LLM spans carry the messages sent, the assistant response content, and the `RuntimeConfig` extras bag — JSON-encoded with sorted keys, no insignificant whitespace, UTF-8. Default-off (the flag is `disable_llm_payload: bool = True`) because the payload may contain PII the user hasn't audited; opt in deliberately. Subject to the §5.5.5 truncation contract.
+- **GenAI semantic-conventions attributes (spec §5.5.2 + §5.5.3).** LLM spans now carry `gen_ai.system`, `gen_ai.request.model`, `gen_ai.response.model`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `gen_ai.response.finish_reasons` (single-element string array), `gen_ai.response.id`, and per-set `gen_ai.request.{temperature,max_tokens,top_p,seed}` (only set fields — absence is meaningful per §5.5.2). The existing `openarmature.llm.*` attribute set is preserved alongside; both namespaces emit. Default-on (`disable_genai_semconv: bool = False`); opt out when an external auto-instrumentation library (OpenInference, opentelemetry-instrumentation-openai, etc.) is the canonical source of GenAI attributes for your stack.
+- **`OTelObserver(resource=...)` constructor argument.** Optional `opentelemetry.sdk.resources.Resource` passed to the private `TracerProvider`. Lets callers set `service.name` / `service.version` directly rather than via `OTEL_SERVICE_NAME` / `OTEL_RESOURCE_ATTRIBUTES` environment variables (which had to be set BEFORE constructing the observer to take effect — a footgun the explicit kwarg avoids).
+- **Multi-processor support on `OTelObserver`.** The `span_processor` constructor argument now accepts a `SpanProcessor | Sequence[SpanProcessor]`. Multi-destination export (e.g., HyperDX + Langfuse on one observer) becomes a one-line constructor call instead of a per-service `CompoundSpanProcessor` workaround.
+- **`OTelObserver(attribute_enrichers=...)` hook.** Sequence of `Callable[[Span, NodeEvent | None], None]` invoked just before the observer ends each span. Lets users add backend-specific attributes (custom `langfuse.*` keys, vendor span kinds, etc.) without subclassing or mutating `span._attributes` post-`on_end`. The event is `None` on synthetic close sites (subgraph dispatch, detached root, fan-out instance, invocation span, shutdown drain); enrichers that need per-event context short-circuit on `None`. Exceptions are caught and warned, never propagated to the dispatch worker.
+- **`OTelObserver(payload_max_bytes=...)` truncation cap.** Per-attribute byte cap for the §5.5.1 payload attributes. Default 65,536 (64 KiB) per attribute; minimum 256 bytes (rejected at construction). The truncation algorithm (spec §5.5.5) emits the largest UTF-8 code-point-aligned prefix that fits within `cap - len(marker)` bytes followed by the marker `…[truncated, M bytes total]`. Inline image bytes are unconditionally redacted at the provider before any cap applies (see Image redaction below).
+- **`OpenAIProvider(genai_system="openai")` constructor argument.** Default `"openai"`; override for non-OpenAI endpoints that speak the OpenAI Chat Completions wire format (vLLM, LM Studio, llama.cpp, sglang). Surfaces as the `gen_ai.system` span attribute. No base-URL sniffing happens — the same host:port could be any of several servers, and a wrong inference is worse than the explicit opt-in.
+- **`openarmature.observability.LLM_NAMESPACE` and `openarmature.observability.LlmEventPayload` public exports.** The `("openarmature.llm.complete",)` sentinel namespace used by the LLM-provider hook and the payload shape backend observers consume. Third-party `Provider` implementations can dispatch their own LLM events via `current_dispatch()(NodeEvent(..., namespace=LLM_NAMESPACE, pre_state=LlmEventPayload(...)))`; custom observers can recognize the same sentinel and read attributes off the payload. Previously private (`_LLM_NAMESPACE`, `_LlmEventState`); the old underscore-prefixed names are no longer exported.
+- **`Response.response_id` and `Response.response_model` typed fields.** Mirror the wire response's `id` and `model` fields when the provider returns them. Surface as `gen_ai.response.id` and `gen_ai.response.model` per spec §5.5.3; also useful for downstream cross-referencing with provider-side billing or audit logs without reaching into `Response.raw`.
+
+### Changed
+
+- **Prompt-context attribute propagation now survives the dispatch-worker task boundary.** Previously the OTel observer read `current_prompt_result()` / `current_prompt_group()` from inside `_handle_llm_event`, which runs in the engine's delivery-worker task. `asyncio.create_task(deliver_loop(queue))` snapshots the current Context at task creation, before any node body runs — so the ContextVars set by `with_active_prompt(...)` were never visible to the worker. `openarmature.prompt.*` attributes silently went missing on the LLM span. Fixed by capturing both ContextVars at dispatch time inside the `OpenAIProvider.complete()` call (which runs in the node task, where `with_active_prompt` IS active) and threading the snapshots through the `LlmEventPayload`. The observer reads from the payload, not the ContextVar.
+- **Inline image bytes are redacted at the provider, not the observer.** Image content blocks with `ImageSourceInline` are serialized with `source` replaced by `{type: "inline_redacted", byte_count: N}` per §5.5.5 *before* the payload reaches the observability dispatch queue. Defense-in-depth: bytes never leave the provider in event form, so custom observers subscribing to the LLM event (enabled by `LlmEventPayload` being public) cannot accidentally leak raw image bytes regardless of their implementation. `media_type` and `detail` are preserved at the image-block level per llm-provider §3.1.2. URL-form images pass through unchanged.
+- **`OTelObserver.shutdown()` docstring documents the `BatchSpanProcessor` flush gotcha.** Under fast or unusual teardown orderings (e.g., FastAPI TestClient teardown that closes the event loop before the batch processor's export thread finishes), spans can appear dropped. Documented workarounds: call `provider.force_flush(timeout_millis=…)` explicitly before `shutdown()`, or use `SimpleSpanProcessor` in tests.
+
+### Notes
+
+- **Pinned spec version bumped to v0.17.0.** Per the additive-only governance rule (proposal 0024 adds; never renames), implementations passing v0.16.1 conformance fixtures continue to pass under v0.17.0; the new fixtures (012-021) add cases without modifying existing ones.
+
 ## [0.7.0] — 2026-05-23
 
 Docs-and-examples release. Pinned spec stays at v0.16.1; no

diff --git a/README.md b/README.md
@@ -53,6 +53,9 @@ The engine awaits each save before advancing. A crash immediately after a `compl
 **Observability that doesn't double-export.**<br>
 The OpenTelemetry mapping mandates a private `TracerProvider`. That prevents the trap where global-provider auto-instrumentation libraries (OpenInference, Langfuse v3, etc.) emit duplicate spans alongside the framework's. Your spans flow exactly where you point them; no surprise fan-out to vendor backends you didn't configure.
 
+**LLM spans LLM-aware backends can actually read.**<br>
+Each `provider.complete()` call emits a dedicated `openarmature.llm.complete` span carrying both the framework's `openarmature.llm.*` attributes and the cross-vendor OpenTelemetry GenAI semantic conventions (`gen_ai.system`, `gen_ai.request.*`, `gen_ai.response.*`, `gen_ai.usage.*`). Langfuse, Phoenix, Honeycomb's LLM lens — they render generations correctly out of the box, no per-service attribute-mapping shim required. Input/output payload emission is opt-in (`disable_llm_payload=False`), default-off because the payload may contain PII; image bytes are unconditionally redacted at the provider so they never enter the observability stream.
+
 ## Hello World
 
 About a hundred lines that show the engine in action. Three reducer policies declared on one state class. Three LLM calls each returning typed structured output (Pydantic class on two, raw JSON Schema dict on the third). Conditional routing as a pure function of state, not a hidden state machine. An observer attached at compile time that sees every node boundary the engine emits. Requires Python 3.12 or later and an OpenAI-compatible endpoint (defaults to OpenAI public API; works against any local server too).

diff --git a/docs/concepts/observability.md b/docs/concepts/observability.md
@@ -333,3 +333,226 @@ join semantics survive even when trace boundaries don't.
 The non-detached default is what you want most of the time: one
 trace per outermost invocation, with subgraphs and fan-out instances
 as nested spans.
+
+### LLM provider spans
+
+When an `OpenAIProvider` (or any [custom Provider](../model-providers/authoring.md)
+that wires the dispatch hook) is used inside a graph with `OTelObserver`
+attached, each `provider.complete()` call emits a dedicated span named
+`openarmature.llm.complete`, parented under the calling node's span.
+The span carries two attribute families.
+
+**`openarmature.llm.*` (always on).** The framework's canonical
+namespace: model identifier, finish reason, token counts, prompt
+identity from `with_active_prompt(...)`, error category on failure.
+Set unconditionally whenever the LLM span itself emits.
+
+**`gen_ai.*` (OpenTelemetry GenAI semantic conventions, default on).**
+Cross-vendor attribute names every LLM-aware backend reads
+(Langfuse, Phoenix, Honeycomb's LLM lens, OpenInference-aware
+tools). Emitted alongside the OA namespace:
+
+- `gen_ai.system` — `"openai"` by default; override per provider
+  instance to `"vllm"` / `"lm_studio"` / `"llama_cpp"` / etc. when
+  the OpenAI Chat Completions wire format is hitting a non-OpenAI
+  endpoint:
+
+  ```python
+  provider = OpenAIProvider(
+      base_url="http://vllm.internal:8000",
+      model="meta-llama/Llama-3-8B-Instruct",
+      genai_system="vllm",
+  )
+  ```
+
+- `gen_ai.request.model` / `gen_ai.response.model` — the bound
+  model and (when the provider returns one) the more-specific
+  identifier in the response body.
+- `gen_ai.request.temperature` / `max_tokens` / `top_p` / `seed`
+  — only emitted for fields the caller actually set; absence on
+  the span means "not supplied," distinct from a zero value.
+- `gen_ai.usage.input_tokens` / `output_tokens` — token counts.
+- `gen_ai.response.finish_reasons` — single-element string array.
+- `gen_ai.response.id` — when the provider returns one.
+
+Disable the GenAI semconv set with `OTelObserver(disable_genai_semconv=True)`
+when an external auto-instrumentation library (OpenInference,
+`opentelemetry-instrumentation-openai`) is already the canonical
+source on your stack.
+
+### LLM payload attributes
+
+By default, LLM spans do **not** carry the messages sent or the
+response content. Opt in with `disable_llm_payload=False`:
+
+```python
+observer = OTelObserver(
+    span_processor=SimpleSpanProcessor(exporter),
+    disable_llm_payload=False,
+)
+```
+
+This surfaces three attributes:
+
+- `openarmature.llm.input.messages` — JSON-encoded message array
+  (the spec §3 message shape: `{role, content, tool_calls?, …}`).
+- `openarmature.llm.output.content` — the assistant's response
+  content string verbatim. Omitted for tool-call-only responses
+  with empty content.
+- `openarmature.llm.request.extras` — JSON-encoded `RuntimeConfig`
+  extras bag (provider-specific pass-through fields like
+  `frequency_penalty`). Omitted when empty.
+
+**Default-off is deliberate.** The payload may contain PII the user
+hasn't audited; opting in is a separate decision from opting into
+observability. The flag name keeps symmetry with `disable_llm_spans`:
+the default value (`True`) reads as "the observer disables payload
+emission by default."
+
+#### Truncation
+
+Each payload attribute is capped at `payload_max_bytes` UTF-8 bytes
+(default 64 KiB, minimum 256). When the serialized value exceeds the
+cap, the observer emits the largest UTF-8-code-point-aligned prefix
+that fits within `cap - len(marker)` bytes followed by the marker:
+
+```
+…[truncated, M bytes total]
+```
+
+where M is the pre-truncation byte length. The marker is appended
+outside any JSON encoding — a truncated attribute is *not* parseable
+JSON, which is the clean signal backend code can use to detect
+truncation without a separate flag.
+
+#### Inline image redaction (always on)
+
+Image content blocks with `ImageSourceInline` are redacted at the
+provider, *before* the payload reaches the observer:
+
+```json
+{
+  "type": "image",
+  "source": {"type": "inline_redacted", "byte_count": 4096},
+  "media_type": "image/png",
+  "detail": "auto"
+}
+```
+
+The `media_type` and `detail` fields are preserved at the image-block
+level (per llm-provider §3.1.2); only `source` is replaced. URL-form
+images pass through unchanged — the URL is a short string and is
+informative for trace readers.
+
+Redaction is **not** gated by `disable_llm_payload` and is **not**
+configurable. Inline image bytes never leave the provider in event
+form, so custom observers consuming
+[`LlmEventPayload`](#publishing-llm-events-for-custom-observers)
+cannot accidentally leak raw bytes regardless of how they're
+written.
+
+### Identifying the service: `Resource`
+
+Pass an `opentelemetry.sdk.resources.Resource` to set
+`service.name` / `service.version` / etc. without relying on the
+`OTEL_SERVICE_NAME` / `OTEL_RESOURCE_ATTRIBUTES` environment
+variables (which had to be set *before* `OTelObserver()`
+construction to take effect):
+
+```python
+from opentelemetry.sdk.resources import Resource
+
+observer = OTelObserver(
+    span_processor=SimpleSpanProcessor(exporter),
+    resource=Resource.create({"service.name": "claims-pipeline"}),
+)
+```
+
+### Fanning out to multiple backends
+
+The `span_processor` argument accepts either a single processor or
+a sequence. Multi-destination export (HyperDX + Langfuse from one
+observer) is a one-line construct:
+
+```python
+observer = OTelObserver(
+    span_processor=[
+        BatchSpanProcessor(OTLPSpanExporter(endpoint=HYPERDX_URL)),
+        BatchSpanProcessor(OTLPSpanExporter(endpoint=LANGFUSE_URL)),
+    ],
+)
+```
+
+Every registered processor receives every span.
+
+### Adding backend-specific attributes: `attribute_enrichers`
+
+When a backend needs attributes the framework doesn't emit
+(custom `langfuse.observation.*` keys, Honeycomb derived fields,
+etc.), the `attribute_enrichers` hook fires just before every
+`span.end()` call:
+
+```python
+def langfuse_observation_kind(span, event):
+    if span.name == "openarmature.llm.complete":
+        span.set_attribute("langfuse.observation.type", "generation")
+
+observer = OTelObserver(
+    span_processor=processor,
+    attribute_enrichers=[langfuse_observation_kind],
+)
+```
+
+Each enricher receives the live `Span` plus the `NodeEvent` that
+triggered the close (or `None` on synthetic close sites — subgraph
+dispatch, detached root, fan-out instance, invocation span,
+shutdown drain). Setting attributes inside this hook works
+correctly; doing it from a `SpanProcessor.on_end` callback does
+not, because the framework has already called `span.end()` and the
+OTel SDK silently drops `set_attribute` on ended spans.
+
+Exceptions raised by an enricher are caught and warned, never
+propagated.
+
+### Publishing LLM events for custom observers
+
+`openarmature.observability.LLM_NAMESPACE` and
+`openarmature.observability.LlmEventPayload` are part of the public
+API. A custom observer subscribing to the dispatch stream can
+recognize the LLM-event sentinel namespace and read the typed
+payload directly:
+
+```python
+from openarmature.observability import LLM_NAMESPACE, LlmEventPayload
+
+async def my_llm_observer(event):
+    if event.namespace != LLM_NAMESPACE:
+        return
+    payload = event.pre_state
+    if not isinstance(payload, LlmEventPayload):
+        return
+    # payload.model, payload.input_messages (already image-redacted),
+    # payload.output_content, payload.request_params,
+    # payload.response_id, payload.active_prompt, ...
+```
+
+A custom `Provider` that wants to participate in the same span
+emission protocol dispatches `NodeEvent(namespace=LLM_NAMESPACE,
+pre_state=LlmEventPayload(...))` via `current_dispatch()`. See
+[Authoring providers](../model-providers/authoring.md) for the
+full pattern.
+
+### Flushing under fast teardown
+
+`OTelObserver.shutdown()` calls `provider.shutdown()` on the private
+`TracerProvider`, which per OTel SDK contract flushes every
+registered span processor. Under unusual teardown orderings — for
+example, FastAPI's `TestClient` teardown that closes the event loop
+before a `BatchSpanProcessor`'s export thread finishes — spans can
+appear dropped. Two workarounds:
+
+- Call `observer._provider.force_flush(timeout_millis=...)`
+  explicitly before `shutdown()`.
+- Use `SimpleSpanProcessor` instead of `BatchSpanProcessor` in
+  tests; it exports synchronously and is unaffected by teardown
+  timing.