Observability LLM payload + GenAI semconv (v0.17.0) by chris-colinsky · Pull Request #61 · LunarCommand/openarmature-python

chris-colinsky · 2026-05-23T17:29:57Z

Summary

Realizes spec v0.17.0 §5.5 expansion (proposal 0024) and the five python-only ergonomics items from the observability friction roundup. Triggered by a downstream agent integrating OA with Langfuse over OTLP: LLM spans rendered "naked" (model + tokens only), prompt linkage silently dropped at the dispatch-worker task boundary, and every backend needed a per-service attribute-mapping shim. All eight friction items resolved here.

What's in

LLM span attribute expansion (spec v0.17.0):

§5.5.1 input/output payload — openarmature.llm.input.messages, openarmature.llm.output.content, openarmature.llm.request.extras. JSON-encoded with sorted keys, default-off via disable_llm_payload: bool = True, truncated at payload_max_bytes (default 64 KiB, minimum 256). Inline image bytes redacted at the provider before reaching the event payload — not gated by any observer flag, defence-in-depth that applies to every custom observer too.
§5.5.2 / §5.5.3 GenAI semconv — gen_ai.system, gen_ai.request.model, gen_ai.response.{model,id,finish_reasons}, gen_ai.usage.{input,output}_tokens, plus per-set gen_ai.request.{temperature,max_tokens,top_p,seed}. Opt out via disable_genai_semconv. gen_ai.system is caller-set per OpenAIProvider instance (default "openai"; override to "vllm" / "lm_studio" / etc. when hitting non-OpenAI endpoints that speak the OpenAI wire format).

OTelObserver ergonomics:

resource=Resource(...) constructor field.
span_processor now accepts SpanProcessor | Sequence[SpanProcessor].
attribute_enrichers: Sequence[Callable[[Span, NodeEvent | None], None]] hook fires before every span.end() the observer issues. Exceptions are caught and warned, never propagated.
shutdown() docstring documents the BatchSpanProcessor flush gotcha under fast teardown.

Public surface:

openarmature.observability.LLM_NAMESPACE and openarmature.observability.LlmEventPayload. The payload type now lives at openarmature.observability.llm_event (moved from the provider module to break a circular import that the lazy __getattr__ was masking; subclasses plain Pydantic BaseModel since NodeEvent.pre_state is Any). _LlmEventState retained as a deprecated alias on the provider module for one release.
Response.response_id and Response.response_model typed fields sourcing the new gen_ai.response.* attributes.

Cross-task ContextVar bug fix:

The dispatch worker (asyncio.create_task(deliver_loop(queue))) snapshots its Context at invoke() entry — before any node body opens a with_active_prompt(...) block. The observer's current_prompt_result() / current_prompt_group() reads from the worker task saw None even when a node body had set them. Fixed by capturing both ContextVars at dispatch time inside OpenAIProvider.complete() (in the node task, where the writes are live) and putting the snapshot on LlmEventPayload. Observer reads from the payload.

Tests:

10 new conformance fixture drivers (012-021) consuming the new harness primitives (content_repeat, base64_data_synthetic, attribute_parses_as_messages, attribute_does_not_contain, attribute_truncation, etc.).
tests/conformance/harness/llm_attribute_assertions.py assertion-helper module.
End-to-end graph: parameterize GraphBuilder/CompiledGraph on StateT (PEP 695 generics) #3 regression test exercising the real cross-task boundary via provider.complete() inside a node body inside invoke().
Fixed a pre-existing test-isolation issue: OTel SDK 1.x makes set_tracer_provider one-shot, so the unit-test finally blocks silently failed to restore the global, leaking it to fixture 005 sub-case 3. Reset via the SDK's private _TRACER_PROVIDER_SET_ONCE primitive in both call sites.

Docs + examples:

CHANGELOG [Unreleased] section.
docs/concepts/observability.md — seven new subsections under OTel coverage.
docs/model-providers/authoring.md — expanded "Observability spans" bullet to a runnable dispatch sketch.
README — sibling pitch bullet under the existing "doesn't double-export" framing.
Example 03 (observer-hooks) — added Resource for service.name.
Example 07 (multimodal-prompt) — wired OTelObserver so the example's headline claim ("OTel observers stamp openarmature.prompt.group_name onto every LLM-call span") is now actually demonstrable end-to-end.

What's out

Per the coord-thread scope (discuss-observability-friction-roundup 02-03):

test: handle model instances in fixture discriminators #8 Langfuse-native backend, ci: tag-driven release pipeline with TestPyPI RC track #11 langfuse.trace.name on invocation root — separate proposal coming in discuss-observability-langfuse-mapping.
chore: support plural subgraphs: form, pull spec v0.8.1 #10 maybe_install_log_bridge wrapper — kept the explicit two-line caller-side conditional.
Tool-call observability — out of scope for v0.17.0 per spec scope answers Q3.

Test plan

uv run pytest tests/ -q — full suite (729 passed, 70 skipped on this branch).
uv run pytest tests/conformance/test_observability.py -v — confirm all 21 observability fixtures green (11 v0.7.0 + 10 new).
uv run pytest tests/unit/test_observability_otel.py::test_prompt_context_propagates_cross_task_via_provider_complete -v — the graph: parameterize GraphBuilder/CompiledGraph on StateT (PEP 695 generics) #3 regression test (would have failed on pre-fix code).
uv run pyright and uv run ruff check clean.
Spot-check LLM_NAMESPACE and LlmEventPayload import cleanly from openarmature.observability in both load orders (observability-first, llm-first).
Run example 07 against any OpenAI-compatible endpoint and confirm the console-exporter JSON for the two openarmature.llm.complete spans carries openarmature.prompt.group_name = "lunar-image-analysis" plus the per-call openarmature.prompt.* attributes and the new gen_ai.* set.

Pin to v0.17.0 (proposal 0024 accepted — LLM span payload + GenAI semconv). Submodule, pyproject.toml [tool.openarmature].spec_version, and the runtime __spec_version__ move in lockstep per the three-pin drift guard in tests/test_smoke.py.

Realize spec v0.17.0 §5.5 expansion and the five python-only items from the observability friction roundup. §5.5.1 input/output payload: openarmature.llm.input.messages, openarmature.llm.output.content, openarmature.llm.request.extras — JSON-encoded with sorted keys, default-off via disable_llm_payload, truncated per §5.5.5 at payload_max_bytes (default 64 KiB, minimum 256). Inline image bytes are redacted at the provider before reaching the event payload, not gated by any observer flag. §5.5.2 / §5.5.3 GenAI semconv: gen_ai.system (caller-overridable per OpenAIProvider for non-OpenAI endpoints), gen_ai.request.model, gen_ai.response.{model,id,finish_reasons}, gen_ai.usage.{input, output}_tokens, plus gen_ai.request.{temperature,max_tokens,top_p, seed} when set. Opt-out via disable_genai_semconv. OTelObserver knobs: resource= for service.name, span_processor accepts SpanProcessor | Sequence[SpanProcessor], attribute_enrichers hook fires before every span.end() the observer issues. LlmEventPayload (renamed from _LlmEventState, moved to openarmature.observability.llm_event) and LLM_NAMESPACE are now public — third-party Provider impls and custom observers can interoperate against a stable shape. _LlmEventState retained as a deprecated alias for one release. Response gains response_id and response_model typed fields sourcing the new gen_ai.response.* attributes from the wire payload. Fix the prompt-context cross-task propagation bug: the worker task running deliver_loop snapshots its Context at invoke()-entry, so with_active_prompt(...) blocks opened later inside node bodies were invisible to the observer. Capture current_prompt_result() / current_prompt_group() at dispatch time in _make_llm_event (running in the node task where the ContextVars are set) and put the snapshot on LlmEventPayload; observer reads from the payload. Tests: 10 new conformance fixture drivers (012-021), assertion-helper module, RuntimeConfigSpec directive shape, end-to-end #3 regression test exercising the real cross-task boundary. Reset OTel global tracer provider state in test finally blocks via the SDK's private Once primitive so cross-suite runs no longer leak the global provider into subsequent tests.

CHANGELOG [Unreleased] entry describing the eight friction-roundup items shipped against spec v0.17.0. README adds a sibling pitch bullet under the existing "doesn't double-export" framing covering the dual openarmature.llm.* + gen_ai.* attribute story and the default-off / privacy posture on payload emission. Concepts/observability page gains seven new subsections under the existing OpenTelemetry coverage: the LLM provider span, the default-off payload attributes with truncation + image-redaction subsections, the Resource constructor knob, multi-processor fan-out, attribute_enrichers, the public LlmEventPayload / LLM_NAMESPACE contract, and BatchSpanProcessor flush behaviour under fast teardown. Model-providers/authoring page expands the "Observability spans" bullet from one-line claim to a runnable dispatch sketch that third-party Provider authors can copy directly.

Example 03 (observer-hooks) gains a Resource carrying service.name so its OTel spans match the shape every production backend expects. Comment expanded to mention the auto-emitted gen_ai.* attributes now surfacing on openarmature.llm.complete spans. Example 07 (multimodal-prompt) gets the OTelObserver wiring its docstring has been claiming all along — the example's headline teach was that with_active_prompt_group plus with_active_prompt stamps openarmature.prompt.* attributes on LLM-call spans, but without an attached observer the propagation was unobservable. The console exporter now prints those spans so the prompt-context story is end-to-end visible. The cross-task ContextVar fix makes this work for real (previously the worker task's stale Context snapshot silently dropped both attribute families). Per-example docs page and the examples index updated for the new --all-extras requirement on demo 07.

Copilot

Pull request overview

Implements OpenArmature observability spec v0.17.0 §5.5 expansion by enriching openarmature.llm.complete spans with optional input/output payload attributes and default-on OpenTelemetry GenAI semantic conventions, plus several OTelObserver ergonomics improvements (resource, multi span processors, enrichers) and a cross-task prompt-context propagation fix. The PR also updates conformance + unit tests and expands docs/examples to reflect the new observability behavior.

Changes:

Add LLM payload attributes (default-off, truncation, provider-side inline image redaction) and GenAI semconv (gen_ai.*) on LLM spans.
Fix prompt-context propagation across the dispatch-worker task boundary by snapshotting prompt ContextVars into the LLM event payload.
Improve OTelObserver ergonomics (Resource support, multiple SpanProcessors, pre-end attribute enrichers) and extend conformance/unit coverage + documentation.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`src/openarmature/observability/otel/observer.py`	Adds payload/semconv emission, truncation, Resource support, multi-processors, and attribute enrichers.
`src/openarmature/observability/llm_event.py`	Introduces the typed LLM event payload model used between providers and observers.
`src/openarmature/llm/providers/openai.py`	Captures prompt context at dispatch, emits expanded LLM payload fields, adds `genai_system`, and records response id/model.
`src/openarmature/llm/response.py`	Adds typed `response_id` and `response_model` fields.
`src/openarmature/observability/__init__.py`	Exports LLM event contract symbols (but currently imports from OTel backend).
`tests/conformance/test_observability.py`	Adds v0.17.0 fixture drivers and helper logic for payload/semconv fixtures; fixes tracer provider global reset in fixture 005.
`tests/conformance/harness/llm_attribute_assertions.py`	New assertion helpers for LLM payload/semconv fixtures (shape, truncation, redaction).
`tests/conformance/harness/directives.py`	Adds `calls_llm.config` schema support for RuntimeConfig parameters/extras.
`tests/conformance/harness/fixtures.py`	Extends fixture schema with observer/provider flags for new v0.17.0 attributes.
`tests/unit/test_observability_otel.py`	Adds tracer-provider reset helper and a cross-task prompt propagation regression test; updates prompt attribute unit test to use payload snapshot.
`tests/test_smoke.py`	Updates pinned spec version assertion to `0.17.0`.
`src/openarmature/__init__.py`	Bumps `__spec_version__` to `0.17.0`.
`pyproject.toml`	Bumps `[tool.openarmature].spec_version` to `0.17.0`.
`README.md`	Documents improved LLM span readability via GenAI semconv + optional payloads.
`docs/concepts/observability.md`	Adds detailed sections on LLM spans, payloads (truncation/redaction), Resource, multi-export, enrichers, and teardown flushing.
`docs/model-providers/authoring.md`	Documents public LLM event dispatch contract (`LLM_NAMESPACE`, `LlmEventPayload`).
`docs/examples/index.md`	Updates examples install instructions for OTel SDK usage.
`docs/examples/07-multimodal-prompt.md`	Updates example doc to describe OTelObserver output and GenAI attributes.
`examples/03-observer-hooks/main.py`	Adds Resource usage and updates commentary for GenAI semconv.
`examples/07-multimodal-prompt/main.py`	Wires OTelObserver + Resource so prompt/group attributes are visible in exported spans.
`CHANGELOG.md`	Adds an Unreleased entry describing the v0.17.0 observability changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Drop the speculative _LlmEventState backwards-compat alias and the two test-side imports + constructor sites that referenced it (no real downstream consumer to protect). CodeQL's unused-global warnings clear as a side-effect. - Move LLM_NAMESPACE from otel/observer.py to llm_event.py so the core openarmature.observability package no longer pulls the OTel backend (and opentelemetry-sdk) into its import chain. Users without the [otel] extra can now import LLM_NAMESPACE / LlmEventPayload cleanly. - Reconcile the stale "subclasses State" header comment on llm_event.py with the actual BaseModel-based implementation. - Widen attribute_enrichers' type to Sequence[Callable[[Span, NodeEvent | None], None]] to match how _run_enrichers calls it; drop the cast("Any", event) workaround. - Add try/finally + await provider.aclose() to the new #3 regression test so its httpx.AsyncClient doesn't leak. - Collapse the vestigial if/isinstance/else branch in the payload- fixture driver to one append. 729 passed, pyright + ruff clean.

chris-colinsky added 4 commits May 23, 2026 10:24

Copilot AI review requested due to automatic review settings May 23, 2026 17:29

Copilot started reviewing on behalf of chris-colinsky May 23, 2026 17:30 View session

github-code-quality Bot found potential problems May 23, 2026

View reviewed changes

Comment thread src/openarmature/llm/providers/openai.py Fixed

github-advanced-security AI found potential problems May 23, 2026

View reviewed changes

Comment thread src/openarmature/llm/providers/openai.py Fixed

Copilot AI reviewed May 23, 2026

View reviewed changes

chris-colinsky merged commit 052b839 into main May 23, 2026
6 checks passed

chris-colinsky deleted the feature/observability-payload-and-semconv branch May 23, 2026 17:50

This was referenced May 23, 2026

chore(release): v0.8.0-rc1 #62

Merged

chore(release): v0.8.0 #63

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Observability LLM payload + GenAI semconv (v0.17.0)#61

Observability LLM payload + GenAI semconv (v0.17.0)#61
chris-colinsky merged 5 commits into
mainfrom
feature/observability-payload-and-semconv

chris-colinsky commented May 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chris-colinsky commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in

What's out

Test plan

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chris-colinsky commented May 23, 2026 •

edited

Loading