Observability LLM payload + GenAI semconv (v0.17.0)#61
Merged
Conversation
Pin to v0.17.0 (proposal 0024 accepted — LLM span payload + GenAI semconv). Submodule, pyproject.toml [tool.openarmature].spec_version, and the runtime __spec_version__ move in lockstep per the three-pin drift guard in tests/test_smoke.py.
Realize spec v0.17.0 §5.5 expansion and the five python-only items
from the observability friction roundup.
§5.5.1 input/output payload: openarmature.llm.input.messages,
openarmature.llm.output.content, openarmature.llm.request.extras —
JSON-encoded with sorted keys, default-off via disable_llm_payload,
truncated per §5.5.5 at payload_max_bytes (default 64 KiB, minimum
256). Inline image bytes are redacted at the provider before
reaching the event payload, not gated by any observer flag.
§5.5.2 / §5.5.3 GenAI semconv: gen_ai.system (caller-overridable
per OpenAIProvider for non-OpenAI endpoints), gen_ai.request.model,
gen_ai.response.{model,id,finish_reasons}, gen_ai.usage.{input,
output}_tokens, plus gen_ai.request.{temperature,max_tokens,top_p,
seed} when set. Opt-out via disable_genai_semconv.
OTelObserver knobs: resource= for service.name, span_processor
accepts SpanProcessor | Sequence[SpanProcessor], attribute_enrichers
hook fires before every span.end() the observer issues.
LlmEventPayload (renamed from _LlmEventState, moved to
openarmature.observability.llm_event) and LLM_NAMESPACE are now
public — third-party Provider impls and custom observers can
interoperate against a stable shape. _LlmEventState retained as a
deprecated alias for one release.
Response gains response_id and response_model typed fields sourcing
the new gen_ai.response.* attributes from the wire payload.
Fix the prompt-context cross-task propagation bug: the worker task
running deliver_loop snapshots its Context at invoke()-entry, so
with_active_prompt(...) blocks opened later inside node bodies were
invisible to the observer. Capture current_prompt_result() /
current_prompt_group() at dispatch time in _make_llm_event (running
in the node task where the ContextVars are set) and put the snapshot
on LlmEventPayload; observer reads from the payload.
Tests: 10 new conformance fixture drivers (012-021), assertion-helper
module, RuntimeConfigSpec directive shape, end-to-end #3 regression
test exercising the real cross-task boundary. Reset OTel global
tracer provider state in test finally blocks via the SDK's private
Once primitive so cross-suite runs no longer leak the global
provider into subsequent tests.
CHANGELOG [Unreleased] entry describing the eight friction-roundup items shipped against spec v0.17.0. README adds a sibling pitch bullet under the existing "doesn't double-export" framing covering the dual openarmature.llm.* + gen_ai.* attribute story and the default-off / privacy posture on payload emission. Concepts/observability page gains seven new subsections under the existing OpenTelemetry coverage: the LLM provider span, the default-off payload attributes with truncation + image-redaction subsections, the Resource constructor knob, multi-processor fan-out, attribute_enrichers, the public LlmEventPayload / LLM_NAMESPACE contract, and BatchSpanProcessor flush behaviour under fast teardown. Model-providers/authoring page expands the "Observability spans" bullet from one-line claim to a runnable dispatch sketch that third-party Provider authors can copy directly.
Example 03 (observer-hooks) gains a Resource carrying service.name so its OTel spans match the shape every production backend expects. Comment expanded to mention the auto-emitted gen_ai.* attributes now surfacing on openarmature.llm.complete spans. Example 07 (multimodal-prompt) gets the OTelObserver wiring its docstring has been claiming all along — the example's headline teach was that with_active_prompt_group plus with_active_prompt stamps openarmature.prompt.* attributes on LLM-call spans, but without an attached observer the propagation was unobservable. The console exporter now prints those spans so the prompt-context story is end-to-end visible. The cross-task ContextVar fix makes this work for real (previously the worker task's stale Context snapshot silently dropped both attribute families). Per-example docs page and the examples index updated for the new --all-extras requirement on demo 07.
There was a problem hiding this comment.
Pull request overview
Implements OpenArmature observability spec v0.17.0 §5.5 expansion by enriching openarmature.llm.complete spans with optional input/output payload attributes and default-on OpenTelemetry GenAI semantic conventions, plus several OTelObserver ergonomics improvements (resource, multi span processors, enrichers) and a cross-task prompt-context propagation fix. The PR also updates conformance + unit tests and expands docs/examples to reflect the new observability behavior.
Changes:
- Add LLM payload attributes (default-off, truncation, provider-side inline image redaction) and GenAI semconv (
gen_ai.*) on LLM spans. - Fix prompt-context propagation across the dispatch-worker task boundary by snapshotting prompt ContextVars into the LLM event payload.
- Improve OTelObserver ergonomics (Resource support, multiple SpanProcessors, pre-end attribute enrichers) and extend conformance/unit coverage + documentation.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/openarmature/observability/otel/observer.py |
Adds payload/semconv emission, truncation, Resource support, multi-processors, and attribute enrichers. |
src/openarmature/observability/llm_event.py |
Introduces the typed LLM event payload model used between providers and observers. |
src/openarmature/llm/providers/openai.py |
Captures prompt context at dispatch, emits expanded LLM payload fields, adds genai_system, and records response id/model. |
src/openarmature/llm/response.py |
Adds typed response_id and response_model fields. |
src/openarmature/observability/__init__.py |
Exports LLM event contract symbols (but currently imports from OTel backend). |
tests/conformance/test_observability.py |
Adds v0.17.0 fixture drivers and helper logic for payload/semconv fixtures; fixes tracer provider global reset in fixture 005. |
tests/conformance/harness/llm_attribute_assertions.py |
New assertion helpers for LLM payload/semconv fixtures (shape, truncation, redaction). |
tests/conformance/harness/directives.py |
Adds calls_llm.config schema support for RuntimeConfig parameters/extras. |
tests/conformance/harness/fixtures.py |
Extends fixture schema with observer/provider flags for new v0.17.0 attributes. |
tests/unit/test_observability_otel.py |
Adds tracer-provider reset helper and a cross-task prompt propagation regression test; updates prompt attribute unit test to use payload snapshot. |
tests/test_smoke.py |
Updates pinned spec version assertion to 0.17.0. |
src/openarmature/__init__.py |
Bumps __spec_version__ to 0.17.0. |
pyproject.toml |
Bumps [tool.openarmature].spec_version to 0.17.0. |
README.md |
Documents improved LLM span readability via GenAI semconv + optional payloads. |
docs/concepts/observability.md |
Adds detailed sections on LLM spans, payloads (truncation/redaction), Resource, multi-export, enrichers, and teardown flushing. |
docs/model-providers/authoring.md |
Documents public LLM event dispatch contract (LLM_NAMESPACE, LlmEventPayload). |
docs/examples/index.md |
Updates examples install instructions for OTel SDK usage. |
docs/examples/07-multimodal-prompt.md |
Updates example doc to describe OTelObserver output and GenAI attributes. |
examples/03-observer-hooks/main.py |
Adds Resource usage and updates commentary for GenAI semconv. |
examples/07-multimodal-prompt/main.py |
Wires OTelObserver + Resource so prompt/group attributes are visible in exported spans. |
CHANGELOG.md |
Adds an Unreleased entry describing the v0.17.0 observability changes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Drop the speculative _LlmEventState backwards-compat alias and
the two test-side imports + constructor sites that referenced it
(no real downstream consumer to protect). CodeQL's unused-global
warnings clear as a side-effect.
- Move LLM_NAMESPACE from otel/observer.py to llm_event.py so the
core openarmature.observability package no longer pulls the OTel
backend (and opentelemetry-sdk) into its import chain. Users
without the [otel] extra can now import LLM_NAMESPACE /
LlmEventPayload cleanly.
- Reconcile the stale "subclasses State" header comment on
llm_event.py with the actual BaseModel-based implementation.
- Widen attribute_enrichers' type to
Sequence[Callable[[Span, NodeEvent | None], None]] to match how
_run_enrichers calls it; drop the cast("Any", event) workaround.
- Add try/finally + await provider.aclose() to the new #3
regression test so its httpx.AsyncClient doesn't leak.
- Collapse the vestigial if/isinstance/else branch in the payload-
fixture driver to one append.
729 passed, pyright + ruff clean.
This was referenced May 23, 2026
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Realizes spec v0.17.0 §5.5 expansion (proposal 0024) and the five python-only ergonomics items from the observability friction roundup. Triggered by a downstream agent integrating OA with Langfuse over OTLP: LLM spans rendered "naked" (model + tokens only), prompt linkage silently dropped at the dispatch-worker task boundary, and every backend needed a per-service attribute-mapping shim. All eight friction items resolved here.
What's in
LLM span attribute expansion (spec v0.17.0):
openarmature.llm.input.messages,openarmature.llm.output.content,openarmature.llm.request.extras. JSON-encoded with sorted keys, default-off viadisable_llm_payload: bool = True, truncated atpayload_max_bytes(default 64 KiB, minimum 256). Inline image bytes redacted at the provider before reaching the event payload — not gated by any observer flag, defence-in-depth that applies to every custom observer too.gen_ai.system,gen_ai.request.model,gen_ai.response.{model,id,finish_reasons},gen_ai.usage.{input,output}_tokens, plus per-setgen_ai.request.{temperature,max_tokens,top_p,seed}. Opt out viadisable_genai_semconv.gen_ai.systemis caller-set perOpenAIProviderinstance (default"openai"; override to"vllm"/"lm_studio"/ etc. when hitting non-OpenAI endpoints that speak the OpenAI wire format).OTelObserver ergonomics:
resource=Resource(...)constructor field.span_processornow acceptsSpanProcessor | Sequence[SpanProcessor].attribute_enrichers: Sequence[Callable[[Span, NodeEvent | None], None]]hook fires before everyspan.end()the observer issues. Exceptions are caught and warned, never propagated.shutdown()docstring documents theBatchSpanProcessorflush gotcha under fast teardown.Public surface:
openarmature.observability.LLM_NAMESPACEandopenarmature.observability.LlmEventPayload. The payload type now lives atopenarmature.observability.llm_event(moved from the provider module to break a circular import that the lazy__getattr__was masking; subclasses plain PydanticBaseModelsinceNodeEvent.pre_stateisAny)._LlmEventStateretained as a deprecated alias on the provider module for one release.Response.response_idandResponse.response_modeltyped fields sourcing the newgen_ai.response.*attributes.Cross-task ContextVar bug fix:
asyncio.create_task(deliver_loop(queue))) snapshots itsContextatinvoke()entry — before any node body opens awith_active_prompt(...)block. The observer'scurrent_prompt_result()/current_prompt_group()reads from the worker task sawNoneeven when a node body had set them. Fixed by capturing both ContextVars at dispatch time insideOpenAIProvider.complete()(in the node task, where the writes are live) and putting the snapshot onLlmEventPayload. Observer reads from the payload.Tests:
content_repeat,base64_data_synthetic,attribute_parses_as_messages,attribute_does_not_contain,attribute_truncation, etc.).tests/conformance/harness/llm_attribute_assertions.pyassertion-helper module.provider.complete()inside a node body insideinvoke().set_tracer_providerone-shot, so the unit-testfinallyblocks silently failed to restore the global, leaking it to fixture 005 sub-case 3. Reset via the SDK's private_TRACER_PROVIDER_SET_ONCEprimitive in both call sites.Docs + examples:
[Unreleased]section.docs/concepts/observability.md— seven new subsections under OTel coverage.docs/model-providers/authoring.md— expanded "Observability spans" bullet to a runnable dispatch sketch.Resourceforservice.name.OTelObserverso the example's headline claim ("OTel observers stampopenarmature.prompt.group_nameonto every LLM-call span") is now actually demonstrable end-to-end.What's out
Per the coord-thread scope (
discuss-observability-friction-roundup02-03):langfuse.trace.nameon invocation root — separate proposal coming indiscuss-observability-langfuse-mapping.maybe_install_log_bridgewrapper — kept the explicit two-line caller-side conditional.Test plan
uv run pytest tests/ -q— full suite (729 passed, 70 skipped on this branch).uv run pytest tests/conformance/test_observability.py -v— confirm all 21 observability fixtures green (11 v0.7.0 + 10 new).uv run pytest tests/unit/test_observability_otel.py::test_prompt_context_propagates_cross_task_via_provider_complete -v— the graph: parameterize GraphBuilder/CompiledGraph on StateT (PEP 695 generics) #3 regression test (would have failed on pre-fix code).uv run pyrightanduv run ruff checkclean.LLM_NAMESPACEandLlmEventPayloadimport cleanly fromopenarmature.observabilityin both load orders (observability-first, llm-first).openarmature.llm.completespans carriesopenarmature.prompt.group_name = "lunar-image-analysis"plus the per-callopenarmature.prompt.*attributes and the newgen_ai.*set.