Skip to content

feat(observability): align traces and metrics with OTel GenAI semantic conventions (#125)#142

Merged
cchinchilla-dev merged 9 commits intomainfrom
feat/otel-schema-125
May 2, 2026
Merged

feat(observability): align traces and metrics with OTel GenAI semantic conventions (#125)#142
cchinchilla-dev merged 9 commits intomainfrom
feat/otel-schema-125

Conversation

@cchinchilla-dev
Copy link
Copy Markdown
Owner

What

Aligns the AgentLoom telemetry surface with the OpenTelemetry GenAI semantic conventions (May 2026 registry). Spans, attributes, and metrics now use canonical gen_ai.* names, so any OTel-aware backend (Grafana GenAI dashboards, Jaeger plugins, third-party collectors) auto-correlates AgentLoom traces without per-site relabeling.

  • New observability/schema.py as single source of truth for every span name, attribute, and metric. No raw telemetry literals anywhere else in the codebase — guarded by a drift-detection regression test.
  • Inference spans renamed to {operation_name} {model} (e.g. chat gpt-4o-mini); tool-call spans to execute_tool {tool_name}. Provider attempts emit child spans under each step span, including failed fallback attempts.
  • Provider-name translation helper maps internal short names to registry values (googlegcp.gemini, etc.); custom values ollama and mock documented as local extensions.
  • Token metric is now a histogram (gen_ai.client.token.usage) with a gen_ai.token.type dimension; latency and TTFT migrated to gen_ai.client.operation.duration and gen_ai.client.operation.time_to_first_chunk. Grafana dashboard queries rewritten to match.
  • Prompt metadata captured by default (hash, length, template id, template vars). Full-prompt capture stays opt-in via WorkflowConfig.capture_prompts, emitted as a span event to avoid attribute-size limits.
  • workflow.run_id propagated through the span tree; error.type set alongside step.error on failed inference spans.

Why

The previous schema used ad-hoc names (gen_ai.system, step.tokens, gen_ai.server.time_to_first_token) that drift from the OTel registry, blocking downstream consumers from using off-the-shelf dashboards. Centralizing the names also unblocks the planned agentloom-contracts package extraction — that work becomes a git mv instead of a grep-and-replace.

Closes #125

Testing

  • uv run pytest — 1095 passed
  • uv run ruff check src/ tests/ clean
  • uv run mypy src/ clean
  • Validated end-to-end against OpenAI gpt-4o-mini, Anthropic claude-haiku-4-5, and Google gemini-2.5-flash from CLI, Docker compose, and Kubernetes
  • Grafana stack brought up locally; 22 of 23 panels populated after running 5 diverse workflows

Notes

Zero backwards compatibility retained. The legacy on_provider_call observer hook and the tokens positional argument on on_step_end are removed; the previous agentloom_tokens_total / agentloom_provider_latency_seconds / agentloom_time_to_first_token_seconds metrics no longer exist. AgentLoom-specific metrics (workflow lifecycle, cost, circuit breaker, webhooks, approvals, recordings) keep the agentloom_* prefix as application namespace per OTel naming guidance.

Copilot AI review requested due to automatic review settings May 2, 2026 18:07
@cchinchilla-dev cchinchilla-dev added enhancement New feature or request observability Tracing, metrics, logging labels May 2, 2026
@github-actions github-actions Bot added documentation Documentation improvements providers Provider gateway and adapters infrastructure Deployment, Docker, K8s, Helm, Terraform core Core engine, DAG, state labels May 2, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modernizes AgentLoom's observability layer to use OpenTelemetry GenAI semantic conventions, centralizing telemetry names and expanding trace/metric coverage so external OTel backends can consume AgentLoom data with less custom relabeling.

Changes:

  • Adds a centralized observability schema module for span names, span attributes, metric names, and provider-name translation.
  • Refactors observer, gateway, engine, and LLM step code to emit GenAI-aligned spans/attributes/metrics, including provider-attempt spans and prompt metadata.
  • Updates tests, docs, changelog, and Grafana queries to reflect the new telemetry surface and breaking hook/signature changes.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
tests/providers/test_gateway.py Adds fallback-span and step_id handling tests for gateway completions.
tests/observability/test_schema.py Adds regression tests for centralized schema constants and observer behavior.
tests/observability/test_observer.py Updates observer tests for renamed semantic-convention attributes/hooks.
tests/observability/test_noop.py Updates noop observer coverage for new hook surface.
tests/observability/test_metrics.py Updates metric tests for histogram-based GenAI telemetry.
tests/core/test_engine_integration.py Adjusts engine/observer lifecycle assertions for run_id and hook changes.
src/agentloom/steps/llm_call.py Adds prompt metadata generation, prompt-capture events, and forwards step_id to gateway.
src/agentloom/steps/base.py Extends StepContext with capture_prompts.
src/agentloom/providers/gateway.py Adds per-attempt provider observer hooks for complete/stream paths and consumes step_id.
src/agentloom/observability/schema.py Introduces centralized span/attribute/metric constants and provider-name mapping.
src/agentloom/observability/observer.py Refactors tracing/metrics emission to the new schema and adds provider start/end spans.
src/agentloom/observability/noop.py Mirrors the expanded observer API with no-op implementations.
src/agentloom/observability/metrics.py Renames GenAI metrics, switches token counting to histograms, and rewrites emitted attributes.
src/agentloom/core/results.py Adds PromptMetadata and threads it through StepResult.
src/agentloom/core/protocols.py Removes old observer hook surface from the protocol.
src/agentloom/core/models.py Adds workflow-level capture_prompts config.
src/agentloom/core/engine.py Passes run_id, prompt metadata, and new token fields into observer hooks.
docs/workflow-yaml.md Documents the new capture_prompts workflow config.
docs/observability.md Documents the new span hierarchy, attributes, and metric names.
deploy/grafana/dashboards/agentloom.json Rewrites dashboard queries to use renamed metrics and labels.
CHANGELOG.md Documents the breaking observability schema changes and new telemetry behavior.
Comments suppressed due to low confidence (1)

src/agentloom/core/protocols.py:118

  • ObserverProtocol no longer describes the hooks that the gateway/steps actually use after this refactor (on_provider_call_start, on_provider_call_end, attach_step_event). That makes the protocol's own guarantee here inaccurate: an observer can satisfy ObserverProtocol and still silently miss the new observability callbacks.
    # Provider-level hooks called by engine + gateway. Listed here so an
    # ``isinstance(obs, ObserverProtocol)`` check fails for observers that
    # would crash mid-run on a missing method.

    def on_tokens(
        self,
        provider: str,
        model: str,
        prompt_tokens: int,
        completion_tokens: int,
        **kwargs: Any,
    ) -> None: ...

    def on_stream_response(

Comment thread src/agentloom/providers/gateway.py
Comment thread src/agentloom/core/engine.py
Comment thread src/agentloom/observability/metrics.py
Comment thread src/agentloom/observability/metrics.py
Comment thread src/agentloom/steps/llm_call.py Outdated
Comment thread deploy/grafana/dashboards/agentloom.json Outdated
Comment thread deploy/grafana/dashboards/agentloom.json Outdated
Comment thread deploy/grafana/dashboards/agentloom.json Outdated
Comment thread deploy/grafana/dashboards/agentloom.json Outdated
Comment thread src/agentloom/observability/metrics.py
@github-actions github-actions Bot added the cli CLI commands label May 2, 2026
@cchinchilla-dev cchinchilla-dev merged commit 74ec02c into main May 2, 2026
10 checks passed
@cchinchilla-dev cchinchilla-dev deleted the feat/otel-schema-125 branch May 2, 2026 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cli CLI commands core Core engine, DAG, state documentation Documentation improvements enhancement New feature or request infrastructure Deployment, Docker, K8s, Helm, Terraform observability Tracing, metrics, logging providers Provider gateway and adapters

Projects

None yet

Development

Successfully merging this pull request may close these issues.

expand OTel span schema, align with GenAI semantic conventions, centralize names

2 participants