Skip to content

feat(otel): adopt OTel GenAI semantic conventions for trace attributes #298

@christso

Description

@christso

Context

AgentV's OTel exporter (#277) currently uses custom agentv.* attribute namespaces (e.g. agentv.trace.duration_ms, agentv.trace.cost_usd, agentv.test_id). The OpenTelemetry community has converging GenAI semantic conventions that observability platforms are increasingly adopting.

Using non-standard attribute names means AgentV traces require custom dashboards/queries on every backend rather than working with built-in GenAI views (Langfuse, Braintrust, Datadog, etc.).

Note: GenAI semantic conventions are currently experimental (status: Development). Attribute names may change before stabilization. This is acceptable — early adoption positions AgentV correctly and attribute names can be updated in lockstep.

Exact attribute mapping

Based on the GenAI Spans spec (verified 2026-02-22):

Root eval span (agentv.eval)

Current (AgentV) Spec attribute Type Required Action
(missing) gen_ai.operation.name string Required Add — value: "evaluate"
(missing) gen_ai.provider.name string Required Add — value from target config (e.g. "anthropic", "azure")
(missing) gen_ai.request.model string Conditionally required Add — value from target config
agentv.trace.cost_usd (no spec equivalent) double Keep as agentv.* — GenAI spec has no cost attribute
agentv.trace.event_count (no spec equivalent) int Keep as agentv.*
agentv.trace.llm_call_count (no spec equivalent) int Keep as agentv.* (or derive from child span count)
agentv.test_id (no spec equivalent) string Keep as agentv.* — eval-specific
agentv.target (no spec equivalent) string Keep as agentv.* — eval-specific
agentv.dataset (no spec equivalent) string Keep as agentv.* — eval-specific
agentv.score (no spec equivalent) double Keep as agentv.* — eval-specific

Child LLM spans (gen_ai.generation)

Current (AgentV) Spec attribute Type Required Action
gen_ai.request.model gen_ai.request.model string Conditionally required Already correct
gen_ai.duration_ms (use standard span timing) Remove — OTel spans have native start/end time
gen_ai.content gen_ai.output.messages any Opt-in Rename; gate behind --otel-capture-content
(missing) gen_ai.operation.name string Required Add — value: "chat"
(missing) gen_ai.provider.name string Required Add — value from provider
(missing) gen_ai.response.model string Recommended Add — actual model that responded
(missing) gen_ai.response.finish_reasons string[] Recommended Add if available from provider
(missing) gen_ai.usage.input_tokens int Recommended Add (see #299)
(missing) gen_ai.usage.output_tokens int Recommended Add (see #299)

Child tool spans (gen_ai.tool)

Current (AgentV) Spec attribute Action
gen_ai.tool.name gen_ai.tool.name Already correct
gen_ai.tool.call.id gen_ai.tool.call.id Already correct
gen_ai.tool.input gen_ai.tool.call.arguments Rename
gen_ai.tool.output gen_ai.tool.call.result Rename

Span naming

Current: gen_ai.generation, gen_ai.message.{role}, gen_ai.tool
Spec: {gen_ai.operation.name} {gen_ai.request.model} for LLM spans, execute_tool {gen_ai.tool.name} for tools

Current span name Spec span name Action
gen_ai.generation chat {model} (e.g. chat claude-sonnet-4-5-20250929) Update
gen_ai.message.user No standard — keep or drop Keep as-is
gen_ai.tool execute_tool {tool_name} (e.g. execute_tool Read) Update

Files to modify

  1. packages/core/src/observability/otel-exporter.ts — All attribute and span name changes
  2. packages/core/src/observability/types.ts — Add providerName to OtelExportOptions if needed
  3. packages/core/test/observability/otel-exporter.test.ts — Update assertions

Acceptance criteria

  • Root span has required gen_ai.operation.name and gen_ai.provider.name attributes
  • LLM child spans use chat {model} naming convention
  • Tool child spans use execute_tool {tool_name} naming convention
  • Tool spans use gen_ai.tool.call.arguments / gen_ai.tool.call.result (not .input / .output)
  • Content attributes gated behind --otel-capture-content use gen_ai.output.messages (not gen_ai.content)
  • gen_ai.duration_ms removed (redundant with native span timing)
  • All agentv.* eval-specific attributes retained unchanged
  • Existing tests updated, no regressions
  • Traces verified manually against Langfuse GenAI dashboard (confirm they populate built-in views)

References

Testing Approach

Unit Tests (InMemorySpanExporter)

Use InMemorySpanExporter from @opentelemetry/sdk-trace-base to capture spans in-process and assert attributes:

import { InMemorySpanExporter, SimpleSpanProcessor } from '@opentelemetry/sdk-trace-base';

const exporter = new InMemorySpanExporter();
// Wire into OtelTraceExporter with this exporter
// Run a mock eval case through exportResult()

const spans = exporter.getFinishedSpans();
const root = spans.find(s => s.name === 'gen_ai.eval');
expect(root.attributes['gen_ai.operation.name']).toBe('eval');
expect(root.attributes['gen_ai.system']).toBe('agentv');

const genSpan = spans.find(s => s.name === 'gen_ai.chat');
expect(genSpan.attributes['gen_ai.operation.name']).toBe('chat');
expect(genSpan.attributes['gen_ai.request.model']).toBeDefined();

Manual Validation (Jaeger)

docker run -d -p 16686:16686 -p 4318:4318 jaegertracing/all-in-one:latest
# Run: agentv eval examples/features/tool-trajectory-simple/evals/dataset.eval.yaml --target mock_agent --export-otel
# Open http://localhost:16686 — verify span names, attribute keys match GenAI conventions

What to Assert

  • Root span name follows gen_ai.{operation} pattern
  • All gen_ai.* attributes present per mapping table above
  • No legacy agentv.* attributes remain
  • Tool spans have gen_ai.operation.name: tool
  • gen_ai.system set to agentv on all spans

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions