feat(observability): align traces and metrics with OTel GenAI semantic conventions (#125) by cchinchilla-dev · Pull Request #142 · cchinchilla-dev/agentloom

cchinchilla-dev · 2026-05-02T18:07:42Z

What

Aligns the AgentLoom telemetry surface with the OpenTelemetry GenAI semantic conventions (May 2026 registry). Spans, attributes, and metrics now use canonical gen_ai.* names, so any OTel-aware backend (Grafana GenAI dashboards, Jaeger plugins, third-party collectors) auto-correlates AgentLoom traces without per-site relabeling.

New observability/schema.py as single source of truth for every span name, attribute, and metric. No raw telemetry literals anywhere else in the codebase — guarded by a drift-detection regression test.
Inference spans renamed to {operation_name} {model} (e.g. chat gpt-4o-mini); tool-call spans to execute_tool {tool_name}. Provider attempts emit child spans under each step span, including failed fallback attempts.
Provider-name translation helper maps internal short names to registry values (google → gcp.gemini, etc.); custom values ollama and mock documented as local extensions.
Token metric is now a histogram (gen_ai.client.token.usage) with a gen_ai.token.type dimension; latency and TTFT migrated to gen_ai.client.operation.duration and gen_ai.client.operation.time_to_first_chunk. Grafana dashboard queries rewritten to match.
Prompt metadata captured by default (hash, length, template id, template vars). Full-prompt capture stays opt-in via WorkflowConfig.capture_prompts, emitted as a span event to avoid attribute-size limits.
workflow.run_id propagated through the span tree; error.type set alongside step.error on failed inference spans.

Why

The previous schema used ad-hoc names (gen_ai.system, step.tokens, gen_ai.server.time_to_first_token) that drift from the OTel registry, blocking downstream consumers from using off-the-shelf dashboards. Centralizing the names also unblocks the planned agentloom-contracts package extraction — that work becomes a git mv instead of a grep-and-replace.

Closes #125

Testing

uv run pytest — 1095 passed
uv run ruff check src/ tests/ clean
uv run mypy src/ clean
Validated end-to-end against OpenAI gpt-4o-mini, Anthropic claude-haiku-4-5, and Google gemini-2.5-flash from CLI, Docker compose, and Kubernetes
Grafana stack brought up locally; 22 of 23 panels populated after running 5 diverse workflows

Notes

Zero backwards compatibility retained. The legacy on_provider_call observer hook and the tokens positional argument on on_step_end are removed; the previous agentloom_tokens_total / agentloom_provider_latency_seconds / agentloom_time_to_first_token_seconds metrics no longer exist. AgentLoom-specific metrics (workflow lifecycle, cost, circuit breaker, webhooks, approvals, recordings) keep the agentloom_* prefix as application namespace per OTel naming guidance.

…ntions

…s; drop legacy hook

…un_id

…span events

… gen_ai.client.* names

codecov · 2026-05-02T18:09:47Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

This PR modernizes AgentLoom's observability layer to use OpenTelemetry GenAI semantic conventions, centralizing telemetry names and expanding trace/metric coverage so external OTel backends can consume AgentLoom data with less custom relabeling.

Changes:

Adds a centralized observability schema module for span names, span attributes, metric names, and provider-name translation.
Refactors observer, gateway, engine, and LLM step code to emit GenAI-aligned spans/attributes/metrics, including provider-attempt spans and prompt metadata.
Updates tests, docs, changelog, and Grafana queries to reflect the new telemetry surface and breaking hook/signature changes.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
`tests/providers/test_gateway.py`	Adds fallback-span and `step_id` handling tests for gateway completions.
`tests/observability/test_schema.py`	Adds regression tests for centralized schema constants and observer behavior.
`tests/observability/test_observer.py`	Updates observer tests for renamed semantic-convention attributes/hooks.
`tests/observability/test_noop.py`	Updates noop observer coverage for new hook surface.
`tests/observability/test_metrics.py`	Updates metric tests for histogram-based GenAI telemetry.
`tests/core/test_engine_integration.py`	Adjusts engine/observer lifecycle assertions for `run_id` and hook changes.
`src/agentloom/steps/llm_call.py`	Adds prompt metadata generation, prompt-capture events, and forwards `step_id` to gateway.
`src/agentloom/steps/base.py`	Extends `StepContext` with `capture_prompts`.
`src/agentloom/providers/gateway.py`	Adds per-attempt provider observer hooks for complete/stream paths and consumes `step_id`.
`src/agentloom/observability/schema.py`	Introduces centralized span/attribute/metric constants and provider-name mapping.
`src/agentloom/observability/observer.py`	Refactors tracing/metrics emission to the new schema and adds provider start/end spans.
`src/agentloom/observability/noop.py`	Mirrors the expanded observer API with no-op implementations.
`src/agentloom/observability/metrics.py`	Renames GenAI metrics, switches token counting to histograms, and rewrites emitted attributes.
`src/agentloom/core/results.py`	Adds `PromptMetadata` and threads it through `StepResult`.
`src/agentloom/core/protocols.py`	Removes old observer hook surface from the protocol.
`src/agentloom/core/models.py`	Adds workflow-level `capture_prompts` config.
`src/agentloom/core/engine.py`	Passes `run_id`, prompt metadata, and new token fields into observer hooks.
`docs/workflow-yaml.md`	Documents the new `capture_prompts` workflow config.
`docs/observability.md`	Documents the new span hierarchy, attributes, and metric names.
`deploy/grafana/dashboards/agentloom.json`	Rewrites dashboard queries to use renamed metrics and labels.
`CHANGELOG.md`	Documents the breaking observability schema changes and new telemetry behavior.

Comments suppressed due to low confidence (1)

src/agentloom/core/protocols.py:118

ObserverProtocol no longer describes the hooks that the gateway/steps actually use after this refactor (on_provider_call_start, on_provider_call_end, attach_step_event). That makes the protocol's own guarantee here inaccurate: an observer can satisfy ObserverProtocol and still silently miss the new observability callbacks.

    # Provider-level hooks called by engine + gateway. Listed here so an
    # ``isinstance(obs, ObserverProtocol)`` check fails for observers that
    # would crash mid-run on a missing method.

    def on_tokens(
        self,
        provider: str,
        model: str,
        prompt_tokens: int,
        completion_tokens: int,
        **kwargs: Any,
    ) -> None: ...

    def on_stream_response(

… paths

cchinchilla-dev added 7 commits May 2, 2026 11:08

feat(observability): add OTel schema module with GenAI semantic conve…

dc55104

…ntions

feat(observability): align observer + noop to canonical gen_ai.* name…

915910f

…s; drop legacy hook

feat(providers): emit inference child spans per gateway fallback attempt

14bce4a

feat(observability): capture prompt metadata and propagate workflow r…

797d416

…un_id

feat(observability): add capture_prompts opt-in flag for full-prompt …

483797d

…span events

docs: surface OTel GenAI alignment, span schema, and capture flags

dd9de64

feat(observability): migrate token/duration/TTFT metrics to canonical…

f46f3c0

… gen_ai.client.* names

Copilot AI review requested due to automatic review settings May 2, 2026 18:07

cchinchilla-dev added enhancement New feature or request observability Tracing, metrics, logging labels May 2, 2026

github-actions Bot added documentation Documentation improvements providers Provider gateway and adapters infrastructure Deployment, Docker, K8s, Helm, Terraform core Core engine, DAG, state labels May 2, 2026

Copilot started reviewing on behalf of cchinchilla-dev May 2, 2026 18:08 View session

Copilot AI reviewed May 2, 2026

View reviewed changes

cchinchilla-dev added 2 commits May 2, 2026 20:54

test(observability): cover provider span lifecycle and prompt-capture…

3869a53

… paths

fix(observability): address Copilot review

08fff8e

github-actions Bot added the cli CLI commands label May 2, 2026

cchinchilla-dev merged commit 74ec02c into main May 2, 2026
10 checks passed

cchinchilla-dev deleted the feat/otel-schema-125 branch May 2, 2026 19:24

cchinchilla-dev mentioned this pull request May 3, 2026

feat(observability): add per-run history records and quality annotations (#77, #59) #143

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(observability): align traces and metrics with OTel GenAI semantic conventions (#125)#142

feat(observability): align traces and metrics with OTel GenAI semantic conventions (#125)#142
cchinchilla-dev merged 9 commits intomainfrom
feat/otel-schema-125

cchinchilla-dev commented May 2, 2026

Uh oh!

codecov Bot commented May 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cchinchilla-dev commented May 2, 2026

What

Why

Testing

Notes

Uh oh!

codecov Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented May 2, 2026 •

edited

Loading