Add production-observability example#117
Merged
Merged
Conversation
New examples/12-production-observability/ demonstrates the
production-grade observability stack end-to-end. Pairs the dual-
observer pattern (the README pitch's "no SaaS lock-in" claim) with
the caller-hook surface from proposal 0043 and the canonical
TimingMiddleware so a reader sees what each piece does in one
place.
What's wired:
- Both OTelObserver and LangfuseObserver attached to the same
graph (proposal 0031). Each consumes the same NodeEvent stream
independently; nothing in node code knows there are two.
- trace_input_from_state / trace_output_from_state caller hooks on
the LangfuseObserver (proposal 0043 §8.4.1). Hooks return domain
dicts shaped for the Langfuse UI viewer; raw State stays out of
trace payloads.
- Built-in TimingMiddleware wrapping the respond node. The
on_complete callback receives a TimingRecord and prints a
one-line summary; production callbacks would queue to a metrics
backend (StatsD / Prometheus / OTLP metrics) instead.
- invoke(metadata={...}) carrying multi-tenant identifiers (tenantId
/ requestId / featureFlag). Both observers pick them up in one
call: OTel as openarmature.user.* span attributes, Langfuse as
top-level trace.metadata keys.
- InMemoryLangfuseClient + InMemorySpanExporter capture in-process
so the demo prints what both backends would have ingested without
needing real cloud credentials. Walk-through doc shows the
production swap (LangfuseSDKAdapter, BatchSpanProcessor +
OTLPSpanExporter).
- disable_llm_payload=False on BOTH observers so the captured LLM
input messages and output content appear in both backends (the
whole point of the example would be undercut by leaving the
payload capture asymmetric).
- try/except NodeException at the invoke() boundary surfaces the
underlying LlmProviderError category so a reader sees the
production-shape error path. Both observer captures still print
on failure so the dual-observer story extends to failure modes.
Complementary to example 03 (observer hooks at finer granularity)
and example 10 (Langfuse + LangfusePromptBackend prompt linkage).
This example's headline is the production-shape wiring, not the
hook surface or prompt management.
There was a problem hiding this comment.
Pull request overview
Adds a new example #12 that wires both the OTel and Langfuse observers on a single graph, with caller hooks shaping trace.input/trace.output, the canonical TimingMiddleware, and invoke(metadata=...) propagation to demonstrate a production-shape observability stack end-to-end. Includes companion documentation, catalog/nav entries, and a CHANGELOG note.
Changes:
- New runnable example at
examples/12-production-observability/main.pyusing in-memory captures for both backends. - New docs page
docs/examples/12-production-observability.mdplus catalog/nav entries. - AGENTS.md and CHANGELOG entries updated for the new example.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
examples/12-production-observability/main.py |
New single-node graph wired with dual observers, timing middleware, and metadata propagation. |
docs/examples/12-production-observability.md |
New walk-through doc with sample output and production-swap recipe. |
docs/examples/index.md |
Adds catalog entry for example 12. |
mkdocs.yml |
Adds example 12 to the docs nav. |
src/openarmature/AGENTS.md |
Mentions example 12 in the bundled examples index. |
CHANGELOG.md |
Adds an "Added" bullet for the new example. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Second of three examples picked from the audit (after PR #116's chat-with-multimodal; crash-and-resume on example 08 still to come).
New
examples/12-production-observability/demonstrates the production-grade observability stack end-to-end. Pairs the dual-observer pattern (the README pitch's "no SaaS lock-in" claim) with the caller-hook surface from proposal 0043 and the canonicalTimingMiddlewareso a reader sees what each piece does in one place.What's wired:
OTelObserver+LangfuseObserverattached simultaneously (proposal 0031). Each consumes the sameNodeEventstream independently; nothing in node code knows there are two.trace.input/trace.output(proposal 0043 §8.4.1). Hooks return domain dicts like{"question": ...}/{"answer": ..., "model": ...}shaped for the Langfuse UI viewer; raw State stays out of trace payloads.TimingMiddlewarewrapping the respond node.on_completecallback receives aTimingRecord(node_name, duration_ms, outcome, exception_category)and prints a one-line summary; production callbacks would queue to a metrics backend (StatsD / Prometheus / OTLP metrics) instead.invoke(metadata={...})carrying multi-tenant identifiers (tenantId / requestId / featureFlag). Both observers pick them up in one call: OTel asopenarmature.user.*span attributes, Langfuse as top-leveltrace.metadatakeys plus per-observation metadata.InMemoryLangfuseClient+InMemorySpanExportercapture in-process so the demo prints what both backends would have ingested without needing real cloud credentials. Walk-through doc shows the production swap (LangfuseSDKAdapter,BatchSpanProcessor+OTLPSpanExporter).disable_llm_payload=Falseon BOTH observers so the captured LLM input messages and output content appear in both backends. The example's whole point would be undercut by leaving the payload capture asymmetric.try / except NodeExceptionat theinvoke()boundary surfaces the underlyingLlmProviderErrorcategory so a reader sees the production-shape error path. Both observer captures still print on failure so the dual-observer story extends to failure modes too.Complementary to example 03 (observer hooks at finer granularity) and example 10 (Langfuse + LangfusePromptBackend prompt linkage). This example's headline is the production-shape wiring, not the hook surface or prompt management.
Verified end-to-end
Real run against
gpt-4o-mini: TimingMiddleware fired with[timing] respond: 4217.6ms (success), both captures show the three caller-supplied metadata entries, Langfuse trace showsinput/outputfrom the caller hooks, LLM payload captured on the Generation (input messages + output content), three distinct identifiers (requestIdfrom the caller,correlation_idfrom OA, Traceid=invocation_id).Test plan
Out of scope
Reviewer notes
One observation from the manual run: the OTel side captured two spans (
respond+openarmature.llm.complete) but didn't include theopenarmature.invocationroot span. Worth investigating later but not in this PR's scope — the spans that DO appear carry the metadata correctly and the headline demonstrates. May file a follow-on if it turns out the InMemorySpanExporter captures aren't reaching the invocation span specifically.