Context
AgentV's OTel exporter (#277) currently uses custom agentv.* attribute namespaces (e.g. agentv.trace.duration_ms, agentv.trace.cost_usd, agentv.test_id). The OpenTelemetry community has converging GenAI semantic conventions that observability platforms are increasingly adopting.
Using non-standard attribute names means AgentV traces require custom dashboards/queries on every backend rather than working with built-in GenAI views (Langfuse, Braintrust, Datadog, etc.).
Note: GenAI semantic conventions are currently experimental (status: Development). Attribute names may change before stabilization. This is acceptable — early adoption positions AgentV correctly and attribute names can be updated in lockstep.
Exact attribute mapping
Based on the GenAI Spans spec (verified 2026-02-22):
Root eval span (agentv.eval)
| Current (AgentV) |
Spec attribute |
Type |
Required |
Action |
| (missing) |
gen_ai.operation.name |
string |
Required |
Add — value: "evaluate" |
| (missing) |
gen_ai.provider.name |
string |
Required |
Add — value from target config (e.g. "anthropic", "azure") |
| (missing) |
gen_ai.request.model |
string |
Conditionally required |
Add — value from target config |
agentv.trace.cost_usd |
(no spec equivalent) |
double |
— |
Keep as agentv.* — GenAI spec has no cost attribute |
agentv.trace.event_count |
(no spec equivalent) |
int |
— |
Keep as agentv.* |
agentv.trace.llm_call_count |
(no spec equivalent) |
int |
— |
Keep as agentv.* (or derive from child span count) |
agentv.test_id |
(no spec equivalent) |
string |
— |
Keep as agentv.* — eval-specific |
agentv.target |
(no spec equivalent) |
string |
— |
Keep as agentv.* — eval-specific |
agentv.dataset |
(no spec equivalent) |
string |
— |
Keep as agentv.* — eval-specific |
agentv.score |
(no spec equivalent) |
double |
— |
Keep as agentv.* — eval-specific |
Child LLM spans (gen_ai.generation)
| Current (AgentV) |
Spec attribute |
Type |
Required |
Action |
gen_ai.request.model |
gen_ai.request.model |
string |
Conditionally required |
Already correct |
gen_ai.duration_ms |
(use standard span timing) |
— |
— |
Remove — OTel spans have native start/end time |
gen_ai.content |
gen_ai.output.messages |
any |
Opt-in |
Rename; gate behind --otel-capture-content |
| (missing) |
gen_ai.operation.name |
string |
Required |
Add — value: "chat" |
| (missing) |
gen_ai.provider.name |
string |
Required |
Add — value from provider |
| (missing) |
gen_ai.response.model |
string |
Recommended |
Add — actual model that responded |
| (missing) |
gen_ai.response.finish_reasons |
string[] |
Recommended |
Add if available from provider |
| (missing) |
gen_ai.usage.input_tokens |
int |
Recommended |
Add (see #299) |
| (missing) |
gen_ai.usage.output_tokens |
int |
Recommended |
Add (see #299) |
Child tool spans (gen_ai.tool)
| Current (AgentV) |
Spec attribute |
Action |
gen_ai.tool.name |
gen_ai.tool.name |
Already correct |
gen_ai.tool.call.id |
gen_ai.tool.call.id |
Already correct |
gen_ai.tool.input |
gen_ai.tool.call.arguments |
Rename |
gen_ai.tool.output |
gen_ai.tool.call.result |
Rename |
Span naming
Current: gen_ai.generation, gen_ai.message.{role}, gen_ai.tool
Spec: {gen_ai.operation.name} {gen_ai.request.model} for LLM spans, execute_tool {gen_ai.tool.name} for tools
| Current span name |
Spec span name |
Action |
gen_ai.generation |
chat {model} (e.g. chat claude-sonnet-4-5-20250929) |
Update |
gen_ai.message.user |
No standard — keep or drop |
Keep as-is |
gen_ai.tool |
execute_tool {tool_name} (e.g. execute_tool Read) |
Update |
Files to modify
packages/core/src/observability/otel-exporter.ts — All attribute and span name changes
packages/core/src/observability/types.ts — Add providerName to OtelExportOptions if needed
packages/core/test/observability/otel-exporter.test.ts — Update assertions
Acceptance criteria
References
Testing Approach
Unit Tests (InMemorySpanExporter)
Use InMemorySpanExporter from @opentelemetry/sdk-trace-base to capture spans in-process and assert attributes:
import { InMemorySpanExporter, SimpleSpanProcessor } from '@opentelemetry/sdk-trace-base';
const exporter = new InMemorySpanExporter();
// Wire into OtelTraceExporter with this exporter
// Run a mock eval case through exportResult()
const spans = exporter.getFinishedSpans();
const root = spans.find(s => s.name === 'gen_ai.eval');
expect(root.attributes['gen_ai.operation.name']).toBe('eval');
expect(root.attributes['gen_ai.system']).toBe('agentv');
const genSpan = spans.find(s => s.name === 'gen_ai.chat');
expect(genSpan.attributes['gen_ai.operation.name']).toBe('chat');
expect(genSpan.attributes['gen_ai.request.model']).toBeDefined();
Manual Validation (Jaeger)
docker run -d -p 16686:16686 -p 4318:4318 jaegertracing/all-in-one:latest
# Run: agentv eval examples/features/tool-trajectory-simple/evals/dataset.eval.yaml --target mock_agent --export-otel
# Open http://localhost:16686 — verify span names, attribute keys match GenAI conventions
What to Assert
Context
AgentV's OTel exporter (#277) currently uses custom
agentv.*attribute namespaces (e.g.agentv.trace.duration_ms,agentv.trace.cost_usd,agentv.test_id). The OpenTelemetry community has converging GenAI semantic conventions that observability platforms are increasingly adopting.Using non-standard attribute names means AgentV traces require custom dashboards/queries on every backend rather than working with built-in GenAI views (Langfuse, Braintrust, Datadog, etc.).
Exact attribute mapping
Based on the GenAI Spans spec (verified 2026-02-22):
Root eval span (
agentv.eval)gen_ai.operation.name"evaluate"gen_ai.provider.name"anthropic","azure")gen_ai.request.modelagentv.trace.cost_usdagentv.*— GenAI spec has no cost attributeagentv.trace.event_countagentv.*agentv.trace.llm_call_countagentv.*(or derive from child span count)agentv.test_idagentv.*— eval-specificagentv.targetagentv.*— eval-specificagentv.datasetagentv.*— eval-specificagentv.scoreagentv.*— eval-specificChild LLM spans (
gen_ai.generation)gen_ai.request.modelgen_ai.request.modelgen_ai.duration_msgen_ai.contentgen_ai.output.messages--otel-capture-contentgen_ai.operation.name"chat"gen_ai.provider.namegen_ai.response.modelgen_ai.response.finish_reasonsgen_ai.usage.input_tokensgen_ai.usage.output_tokensChild tool spans (
gen_ai.tool)gen_ai.tool.namegen_ai.tool.namegen_ai.tool.call.idgen_ai.tool.call.idgen_ai.tool.inputgen_ai.tool.call.argumentsgen_ai.tool.outputgen_ai.tool.call.resultSpan naming
Current:
gen_ai.generation,gen_ai.message.{role},gen_ai.toolSpec:
{gen_ai.operation.name} {gen_ai.request.model}for LLM spans,execute_tool {gen_ai.tool.name}for toolsgen_ai.generationchat {model}(e.g.chat claude-sonnet-4-5-20250929)gen_ai.message.usergen_ai.toolexecute_tool {tool_name}(e.g.execute_tool Read)Files to modify
packages/core/src/observability/otel-exporter.ts— All attribute and span name changespackages/core/src/observability/types.ts— AddproviderNametoOtelExportOptionsif neededpackages/core/test/observability/otel-exporter.test.ts— Update assertionsAcceptance criteria
gen_ai.operation.nameandgen_ai.provider.nameattributeschat {model}naming conventionexecute_tool {tool_name}naming conventiongen_ai.tool.call.arguments/gen_ai.tool.call.result(not.input/.output)--otel-capture-contentusegen_ai.output.messages(notgen_ai.content)gen_ai.duration_msremoved (redundant with native span timing)agentv.*eval-specific attributes retained unchangedReferences
packages/core/src/observability/otel-exporter.tsTesting Approach
Unit Tests (InMemorySpanExporter)
Use
InMemorySpanExporterfrom@opentelemetry/sdk-trace-baseto capture spans in-process and assert attributes:Manual Validation (Jaeger)
What to Assert
gen_ai.{operation}patterngen_ai.*attributes present per mapping table aboveagentv.*attributes remaingen_ai.operation.name: toolgen_ai.systemset toagentvon all spans