Skip to content

feat(otel): richer span hierarchy — Turn-level grouping for multi-turn evals #302

@christso

Description

@christso

Context

AgentV's OTel exporter creates a flat span hierarchy:

agentv.eval (root)
├── chat claude-sonnet-4-5 (LLM message 1)
│   ├── execute_tool Read
│   └── execute_tool Edit
├── gen_ai.message.user (user message)
└── chat claude-sonnet-4-5 (LLM message 2)

For multi-turn agentic evaluations (Copilot CLI, Claude SDK), an intermediate Turn-level grouping improves debuggability:

agentv.eval (root)
├── agentv.turn.1
│   ├── chat claude-sonnet-4-5
│   ├── execute_tool Read
│   └── chat claude-sonnet-4-5
├── agentv.turn.2
│   └── chat claude-sonnet-4-5

Proposal

Add --otel-group-turns CLI flag that groups messages into Turn spans.

Turn detection

A turn boundary occurs at each user message:

  1. First user message → start Turn 1
  2. All assistant messages + tool calls until next user → children of Turn 1
  3. Next user message → end Turn 1, start Turn 2

For single-turn evaluations (only one user message), the turn span is omitted — messages attach directly to root (current behavior).

Implementation

// In exportResult(), when --otel-group-turns is enabled:
if (this.options.groupTurns && result.output) {
  const turns = groupMessagesIntoTurns(result.output);
  if (turns.length > 1) {
    for (const [i, turn] of turns.entries()) {
      const turnCtx = api.trace.setSpan(api.context.active(), rootSpan);
      api.context.with(turnCtx, () => {
        tracer.startActiveSpan(`agentv.turn.${i + 1}`, {}, (turnSpan) => {
          for (const msg of turn.messages) {
            this.exportMessage(tracer, api, turnCtx, msg, captureContent);
          }
          turnSpan.end();
        });
      });
    }
  } else {
    // Single turn — flat hierarchy (current behavior)
    for (const msg of result.output) {
      this.exportMessage(tracer, api, parentCtx, msg, captureContent);
    }
  }
}

groupMessagesIntoTurns helper

interface Turn {
  messages: Message[];
}

function groupMessagesIntoTurns(messages: Message[]): Turn[] {
  const turns: Turn[] = [];
  let current: Message[] = [];

  for (const msg of messages) {
    if (msg.role === 'user' && current.length > 0) {
      turns.push({ messages: current });
      current = [];
    }
    current.push(msg);
  }
  if (current.length > 0) turns.push({ messages: current });
  return turns;
}

Files to modify

  1. packages/core/src/observability/otel-exporter.ts — Add groupTurns option, turn grouping logic
  2. packages/core/src/observability/types.ts — Add groupTurns?: boolean to OtelExportOptions
  3. apps/cli/src/commands/eval/commands/run.ts — Add --otel-group-turns flag
  4. Tests — Verify turn grouping with multi-turn and single-turn inputs

Acceptance criteria

  • --otel-group-turns creates intermediate agentv.turn.N spans for multi-turn outputs
  • Single-turn outputs (one user message) skip the turn layer (flat, same as today)
  • Without --otel-group-turns, behavior is unchanged
  • Turn spans have correct parent (root eval span) and children (message/tool spans)
  • Skills and docs updated to mention --otel-group-turns

References

Testing Approach

Unit Tests (InMemorySpanExporter)

const exporter = new InMemorySpanExporter();

// Test 1: Default (flat) — no groupTurns option
// Run multi-turn eval
const flatSpans = exporter.getFinishedSpans();
// All gen_ai.chat and gen_ai.tool spans should be direct children of root
const root = flatSpans.find(s => s.name === 'gen_ai.eval');
const children = flatSpans.filter(s => s.parentSpanId === root.spanContext().spanId);
expect(children.length).toBeGreaterThan(0); // flat: tools are direct children

// Test 2: With groupTurns enabled
exporter.reset();
// Run same eval with groupTurns: true
const groupedSpans = exporter.getFinishedSpans();
const turnSpans = groupedSpans.filter(s => s.name.startsWith('gen_ai.turn'));
expect(turnSpans.length).toBeGreaterThan(0);
// Tool spans should be children of turn spans, not root
const toolSpan = groupedSpans.find(s => s.attributes['gen_ai.operation.name'] === 'tool');
expect(toolSpan.parentSpanId).not.toBe(root.spanContext().spanId); // nested under turn

What to Assert

  • Default behavior unchanged (flat hierarchy, no turn spans)
  • --otel-group-turns creates intermediate turn spans
  • Tool/LLM spans nested under their respective turn span
  • Turn spans have sequential numbering in attributes
  • Root span unaffected by grouping option

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions