Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,39 @@ All notable changes to `openarmature-python` are documented in this file.

The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The package follows [Semantic Versioning](https://semver.org/); pre-1.0 minor bumps may carry behavioral changes per [spec governance](https://github.com/LunarCommand/openarmature-spec/blob/main/GOVERNANCE.md).

## [Unreleased]

LLM-provider span payload and GenAI semconv release. Pinned spec
jumps from v0.16.1 to v0.17.0 (proposal 0024 / observability §5.5
expansion). The trigger was a friction report from a downstream
agent integrating OA with Langfuse over OTLP: LLM spans rendered
"naked" (model + tokens only), prompt linkage silently dropped at
the dispatch-worker task boundary, and every backend needed a
per-service attribute-mapping shim. This release clears all eight
items in that report.

### Added

- **`openarmature.llm.input.messages` / `openarmature.llm.output.content` / `openarmature.llm.request.extras` span attributes (spec §5.5.1).** When the OTel observer is constructed with `disable_llm_payload=False`, LLM spans carry the messages sent, the assistant response content, and the `RuntimeConfig` extras bag — JSON-encoded with sorted keys, no insignificant whitespace, UTF-8. Default-off (the flag is `disable_llm_payload: bool = True`) because the payload may contain PII the user hasn't audited; opt in deliberately. Subject to the §5.5.5 truncation contract.
- **GenAI semantic-conventions attributes (spec §5.5.2 + §5.5.3).** LLM spans now carry `gen_ai.system`, `gen_ai.request.model`, `gen_ai.response.model`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `gen_ai.response.finish_reasons` (single-element string array), `gen_ai.response.id`, and per-set `gen_ai.request.{temperature,max_tokens,top_p,seed}` (only set fields — absence is meaningful per §5.5.2). The existing `openarmature.llm.*` attribute set is preserved alongside; both namespaces emit. Default-on (`disable_genai_semconv: bool = False`); opt out when an external auto-instrumentation library (OpenInference, opentelemetry-instrumentation-openai, etc.) is the canonical source of GenAI attributes for your stack.
- **`OTelObserver(resource=...)` constructor argument.** Optional `opentelemetry.sdk.resources.Resource` passed to the private `TracerProvider`. Lets callers set `service.name` / `service.version` directly rather than via `OTEL_SERVICE_NAME` / `OTEL_RESOURCE_ATTRIBUTES` environment variables (which had to be set BEFORE constructing the observer to take effect — a footgun the explicit kwarg avoids).
- **Multi-processor support on `OTelObserver`.** The `span_processor` constructor argument now accepts a `SpanProcessor | Sequence[SpanProcessor]`. Multi-destination export (e.g., HyperDX + Langfuse on one observer) becomes a one-line constructor call instead of a per-service `CompoundSpanProcessor` workaround.
- **`OTelObserver(attribute_enrichers=...)` hook.** Sequence of `Callable[[Span, NodeEvent | None], None]` invoked just before the observer ends each span. Lets users add backend-specific attributes (custom `langfuse.*` keys, vendor span kinds, etc.) without subclassing or mutating `span._attributes` post-`on_end`. The event is `None` on synthetic close sites (subgraph dispatch, detached root, fan-out instance, invocation span, shutdown drain); enrichers that need per-event context short-circuit on `None`. Exceptions are caught and warned, never propagated to the dispatch worker.
- **`OTelObserver(payload_max_bytes=...)` truncation cap.** Per-attribute byte cap for the §5.5.1 payload attributes. Default 65,536 (64 KiB) per attribute; minimum 256 bytes (rejected at construction). The truncation algorithm (spec §5.5.5) emits the largest UTF-8 code-point-aligned prefix that fits within `cap - len(marker)` bytes followed by the marker `…[truncated, M bytes total]`. Inline image bytes are unconditionally redacted at the provider before any cap applies (see Image redaction below).
- **`OpenAIProvider(genai_system="openai")` constructor argument.** Default `"openai"`; override for non-OpenAI endpoints that speak the OpenAI Chat Completions wire format (vLLM, LM Studio, llama.cpp, sglang). Surfaces as the `gen_ai.system` span attribute. No base-URL sniffing happens — the same host:port could be any of several servers, and a wrong inference is worse than the explicit opt-in.
- **`openarmature.observability.LLM_NAMESPACE` and `openarmature.observability.LlmEventPayload` public exports.** The `("openarmature.llm.complete",)` sentinel namespace used by the LLM-provider hook and the payload shape backend observers consume. Third-party `Provider` implementations can dispatch their own LLM events via `current_dispatch()(NodeEvent(..., namespace=LLM_NAMESPACE, pre_state=LlmEventPayload(...)))`; custom observers can recognize the same sentinel and read attributes off the payload. Previously private (`_LLM_NAMESPACE`, `_LlmEventState`); the old underscore-prefixed names are no longer exported.
- **`Response.response_id` and `Response.response_model` typed fields.** Mirror the wire response's `id` and `model` fields when the provider returns them. Surface as `gen_ai.response.id` and `gen_ai.response.model` per spec §5.5.3; also useful for downstream cross-referencing with provider-side billing or audit logs without reaching into `Response.raw`.

### Changed

- **Prompt-context attribute propagation now survives the dispatch-worker task boundary.** Previously the OTel observer read `current_prompt_result()` / `current_prompt_group()` from inside `_handle_llm_event`, which runs in the engine's delivery-worker task. `asyncio.create_task(deliver_loop(queue))` snapshots the current Context at task creation, before any node body runs — so the ContextVars set by `with_active_prompt(...)` were never visible to the worker. `openarmature.prompt.*` attributes silently went missing on the LLM span. Fixed by capturing both ContextVars at dispatch time inside the `OpenAIProvider.complete()` call (which runs in the node task, where `with_active_prompt` IS active) and threading the snapshots through the `LlmEventPayload`. The observer reads from the payload, not the ContextVar.
- **Inline image bytes are redacted at the provider, not the observer.** Image content blocks with `ImageSourceInline` are serialized with `source` replaced by `{type: "inline_redacted", byte_count: N}` per §5.5.5 *before* the payload reaches the observability dispatch queue. Defense-in-depth: bytes never leave the provider in event form, so custom observers subscribing to the LLM event (enabled by `LlmEventPayload` being public) cannot accidentally leak raw image bytes regardless of their implementation. `media_type` and `detail` are preserved at the image-block level per llm-provider §3.1.2. URL-form images pass through unchanged.
- **`OTelObserver.shutdown()` docstring documents the `BatchSpanProcessor` flush gotcha.** Under fast or unusual teardown orderings (e.g., FastAPI TestClient teardown that closes the event loop before the batch processor's export thread finishes), spans can appear dropped. Documented workarounds: call `provider.force_flush(timeout_millis=…)` explicitly before `shutdown()`, or use `SimpleSpanProcessor` in tests.

### Notes

- **Pinned spec version bumped to v0.17.0.** Per the additive-only governance rule (proposal 0024 adds; never renames), implementations passing v0.16.1 conformance fixtures continue to pass under v0.17.0; the new fixtures (012-021) add cases without modifying existing ones.

## [0.7.0] — 2026-05-23

Docs-and-examples release. Pinned spec stays at v0.16.1; no
Expand Down
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,9 @@ The engine awaits each save before advancing. A crash immediately after a `compl
**Observability that doesn't double-export.**<br>
The OpenTelemetry mapping mandates a private `TracerProvider`. That prevents the trap where global-provider auto-instrumentation libraries (OpenInference, Langfuse v3, etc.) emit duplicate spans alongside the framework's. Your spans flow exactly where you point them; no surprise fan-out to vendor backends you didn't configure.

**LLM spans LLM-aware backends can actually read.**<br>
Each `provider.complete()` call emits a dedicated `openarmature.llm.complete` span carrying both the framework's `openarmature.llm.*` attributes and the cross-vendor OpenTelemetry GenAI semantic conventions (`gen_ai.system`, `gen_ai.request.*`, `gen_ai.response.*`, `gen_ai.usage.*`). Langfuse, Phoenix, Honeycomb's LLM lens — they render generations correctly out of the box, no per-service attribute-mapping shim required. Input/output payload emission is opt-in (`disable_llm_payload=False`), default-off because the payload may contain PII; image bytes are unconditionally redacted at the provider so they never enter the observability stream.

## Hello World

About a hundred lines that show the engine in action. Three reducer policies declared on one state class. Three LLM calls each returning typed structured output (Pydantic class on two, raw JSON Schema dict on the third). Conditional routing as a pure function of state, not a hidden state machine. An observer attached at compile time that sees every node boundary the engine emits. Requires Python 3.12 or later and an OpenAI-compatible endpoint (defaults to OpenAI public API; works against any local server too).
Expand Down
223 changes: 223 additions & 0 deletions docs/concepts/observability.md
Original file line number Diff line number Diff line change
Expand Up @@ -333,3 +333,226 @@ join semantics survive even when trace boundaries don't.
The non-detached default is what you want most of the time: one
trace per outermost invocation, with subgraphs and fan-out instances
as nested spans.

### LLM provider spans

When an `OpenAIProvider` (or any [custom Provider](../model-providers/authoring.md)
that wires the dispatch hook) is used inside a graph with `OTelObserver`
attached, each `provider.complete()` call emits a dedicated span named
`openarmature.llm.complete`, parented under the calling node's span.
The span carries two attribute families.

**`openarmature.llm.*` (always on).** The framework's canonical
namespace: model identifier, finish reason, token counts, prompt
identity from `with_active_prompt(...)`, error category on failure.
Set unconditionally whenever the LLM span itself emits.

**`gen_ai.*` (OpenTelemetry GenAI semantic conventions, default on).**
Cross-vendor attribute names every LLM-aware backend reads
(Langfuse, Phoenix, Honeycomb's LLM lens, OpenInference-aware
tools). Emitted alongside the OA namespace:

- `gen_ai.system` — `"openai"` by default; override per provider
instance to `"vllm"` / `"lm_studio"` / `"llama_cpp"` / etc. when
the OpenAI Chat Completions wire format is hitting a non-OpenAI
endpoint:

```python
provider = OpenAIProvider(
base_url="http://vllm.internal:8000",
model="meta-llama/Llama-3-8B-Instruct",
genai_system="vllm",
)
```

- `gen_ai.request.model` / `gen_ai.response.model` — the bound
model and (when the provider returns one) the more-specific
identifier in the response body.
- `gen_ai.request.temperature` / `max_tokens` / `top_p` / `seed`
— only emitted for fields the caller actually set; absence on
the span means "not supplied," distinct from a zero value.
- `gen_ai.usage.input_tokens` / `output_tokens` — token counts.
- `gen_ai.response.finish_reasons` — single-element string array.
- `gen_ai.response.id` — when the provider returns one.

Disable the GenAI semconv set with `OTelObserver(disable_genai_semconv=True)`
when an external auto-instrumentation library (OpenInference,
`opentelemetry-instrumentation-openai`) is already the canonical
source on your stack.

### LLM payload attributes

By default, LLM spans do **not** carry the messages sent or the
response content. Opt in with `disable_llm_payload=False`:

```python
observer = OTelObserver(
span_processor=SimpleSpanProcessor(exporter),
disable_llm_payload=False,
)
```

This surfaces three attributes:

- `openarmature.llm.input.messages` — JSON-encoded message array
(the spec §3 message shape: `{role, content, tool_calls?, …}`).
- `openarmature.llm.output.content` — the assistant's response
content string verbatim. Omitted for tool-call-only responses
with empty content.
- `openarmature.llm.request.extras` — JSON-encoded `RuntimeConfig`
extras bag (provider-specific pass-through fields like
`frequency_penalty`). Omitted when empty.

**Default-off is deliberate.** The payload may contain PII the user
hasn't audited; opting in is a separate decision from opting into
observability. The flag name keeps symmetry with `disable_llm_spans`:
the default value (`True`) reads as "the observer disables payload
emission by default."

#### Truncation

Each payload attribute is capped at `payload_max_bytes` UTF-8 bytes
(default 64 KiB, minimum 256). When the serialized value exceeds the
cap, the observer emits the largest UTF-8-code-point-aligned prefix
that fits within `cap - len(marker)` bytes followed by the marker:

```
…[truncated, M bytes total]
```

where M is the pre-truncation byte length. The marker is appended
outside any JSON encoding — a truncated attribute is *not* parseable
JSON, which is the clean signal backend code can use to detect
truncation without a separate flag.

#### Inline image redaction (always on)

Image content blocks with `ImageSourceInline` are redacted at the
provider, *before* the payload reaches the observer:

```json
{
"type": "image",
"source": {"type": "inline_redacted", "byte_count": 4096},
"media_type": "image/png",
"detail": "auto"
}
```

The `media_type` and `detail` fields are preserved at the image-block
level (per llm-provider §3.1.2); only `source` is replaced. URL-form
images pass through unchanged — the URL is a short string and is
informative for trace readers.

Redaction is **not** gated by `disable_llm_payload` and is **not**
configurable. Inline image bytes never leave the provider in event
form, so custom observers consuming
[`LlmEventPayload`](#publishing-llm-events-for-custom-observers)
cannot accidentally leak raw bytes regardless of how they're
written.

### Identifying the service: `Resource`

Pass an `opentelemetry.sdk.resources.Resource` to set
`service.name` / `service.version` / etc. without relying on the
`OTEL_SERVICE_NAME` / `OTEL_RESOURCE_ATTRIBUTES` environment
variables (which had to be set *before* `OTelObserver()`
construction to take effect):

```python
from opentelemetry.sdk.resources import Resource

observer = OTelObserver(
span_processor=SimpleSpanProcessor(exporter),
resource=Resource.create({"service.name": "claims-pipeline"}),
)
```

### Fanning out to multiple backends

The `span_processor` argument accepts either a single processor or
a sequence. Multi-destination export (HyperDX + Langfuse from one
observer) is a one-line construct:

```python
observer = OTelObserver(
span_processor=[
BatchSpanProcessor(OTLPSpanExporter(endpoint=HYPERDX_URL)),
BatchSpanProcessor(OTLPSpanExporter(endpoint=LANGFUSE_URL)),
],
)
```

Every registered processor receives every span.

### Adding backend-specific attributes: `attribute_enrichers`

When a backend needs attributes the framework doesn't emit
(custom `langfuse.observation.*` keys, Honeycomb derived fields,
etc.), the `attribute_enrichers` hook fires just before every
`span.end()` call:

```python
def langfuse_observation_kind(span, event):
if span.name == "openarmature.llm.complete":
span.set_attribute("langfuse.observation.type", "generation")

observer = OTelObserver(
span_processor=processor,
attribute_enrichers=[langfuse_observation_kind],
)
```

Each enricher receives the live `Span` plus the `NodeEvent` that
triggered the close (or `None` on synthetic close sites — subgraph
dispatch, detached root, fan-out instance, invocation span,
shutdown drain). Setting attributes inside this hook works
correctly; doing it from a `SpanProcessor.on_end` callback does
not, because the framework has already called `span.end()` and the
OTel SDK silently drops `set_attribute` on ended spans.

Exceptions raised by an enricher are caught and warned, never
propagated.

### Publishing LLM events for custom observers

`openarmature.observability.LLM_NAMESPACE` and
`openarmature.observability.LlmEventPayload` are part of the public
API. A custom observer subscribing to the dispatch stream can
recognize the LLM-event sentinel namespace and read the typed
payload directly:

```python
from openarmature.observability import LLM_NAMESPACE, LlmEventPayload

async def my_llm_observer(event):
if event.namespace != LLM_NAMESPACE:
return
payload = event.pre_state
if not isinstance(payload, LlmEventPayload):
return
# payload.model, payload.input_messages (already image-redacted),
# payload.output_content, payload.request_params,
# payload.response_id, payload.active_prompt, ...
```

A custom `Provider` that wants to participate in the same span
emission protocol dispatches `NodeEvent(namespace=LLM_NAMESPACE,
pre_state=LlmEventPayload(...))` via `current_dispatch()`. See
[Authoring providers](../model-providers/authoring.md) for the
full pattern.

### Flushing under fast teardown

`OTelObserver.shutdown()` calls `provider.shutdown()` on the private
`TracerProvider`, which per OTel SDK contract flushes every
registered span processor. Under unusual teardown orderings — for
example, FastAPI's `TestClient` teardown that closes the event loop
before a `BatchSpanProcessor`'s export thread finishes — spans can
appear dropped. Two workarounds:

- Call `observer._provider.force_flush(timeout_millis=...)`
explicitly before `shutdown()`.
- Use `SimpleSpanProcessor` instead of `BatchSpanProcessor` in
tests; it exports synchronously and is unaffected by teardown
timing.
Loading