Skip to content

feat: Claude Code session tracing plugin (hook-based observability) #300

@christso

Description

@christso

Context

Braintrust's trace-claude-code plugin demonstrates hook-based instrumentation that traces all Claude Code sessions to an observability backend. AgentV currently only traces during agentv eval runs.

Per AgentV's Principle 1 (Lightweight Core, Plugin Extensibility), this belongs as a plugin, not core code. The plugin uses the core OtelTraceExporter for actual span export.

What Braintrust does

5 Claude Code lifecycle hooks create a Session > Turn > [Tool + LLM] span hierarchy:

Hook Creates Key behavior
SessionStart Root session span Async, creates per-session state file
UserPromptSubmit Turn span (child of session) Increments turn counter
PostToolUse Tool span (child of turn) Smart naming: "Read: file.py", "Terminal: git status"
Stop LLM span(s) (child of turn) Parses JSONL transcript to reconstruct individual model calls
SessionEnd Cleanup Removes stale state files

Key constraint: Braintrust's hooks are bash scripts using curl to hit a REST API. AgentV needs OTLP export, which requires the OTel SDK (TypeScript). This means AgentV's hooks should be TypeScript scripts (run via bun), not bash.

Proposal

Create a Claude Code plugin at plugins/agentv-trace/ that exports session traces via OTLP.

Plugin structure

plugins/agentv-trace/
├── .claude-plugin/
│   └── plugin.json
├── hooks/
│   ├── hooks.json            # Hook registration (all async)
│   ├── session-start.ts      # Root span via OtelTraceExporter
│   ├── user-prompt-submit.ts # Turn span
│   ├── post-tool-use.ts      # Tool span
│   ├── stop.ts               # LLM span(s) from transcript
│   └── session-end.ts        # Flush + cleanup
├── lib/
│   ├── state.ts              # Per-session state management
│   └── transcript-parser.ts  # JSONL transcript → LLM call extraction
├── setup.ts                  # Interactive config (backend, API keys)
└── package.json              # Depends on @agentv/core (for OtelTraceExporter)

hooks.json

{
  "hooks": [
    {"event": "SessionStart", "command": "bun run hooks/session-start.ts", "async": true},
    {"event": "UserPromptSubmit", "command": "bun run hooks/user-prompt-submit.ts", "async": true},
    {"event": "PostToolUse", "command": "bun run hooks/post-tool-use.ts", "matcher": "*", "async": true},
    {"event": "Stop", "command": "bun run hooks/stop.ts", "async": true},
    {"event": "SessionEnd", "command": "bun run hooks/session-end.ts", "async": true}
  ]
}

Configuration

Environment variables (set during setup.ts or manually in .env):

AGENTV_TRACE_BACKEND=langfuse    # Uses OtelTraceExporter backend presets
LANGFUSE_PUBLIC_KEY=pk-...
LANGFUSE_SECRET_KEY=sk-...
# Or:
AGENTV_TRACE_BACKEND=braintrust
BRAINTRUST_API_KEY=bt-...
# Or custom:
AGENTV_TRACE_ENDPOINT=http://localhost:4318/v1/traces

State management

Per-session state stored in ~/.claude/state/agentv-trace/{session_id}.json:

{
  "rootSpanId": "abc123",
  "currentTurnSpanId": "def456",
  "turnCount": 0,
  "toolCount": 0,
  "turnLastLine": 0
}

Use atomic file operations (write to temp file, rename) to avoid corruption from concurrent hooks.

Span hierarchy (using GenAI conventions from #298)

Session: "agentv session {workspace}" (kind=INTERNAL)
├── Turn 1 (gen_ai.operation.name="chat")
│   ├── chat claude-sonnet-4-5-20250929 (LLM span, with token metrics)
│   ├── execute_tool Read (tool span)
│   ├── execute_tool Edit (tool span)
│   └── chat claude-sonnet-4-5-20250929 (second LLM call)
├── Turn 2
│   └── ...

Phased implementation

Phase Scope Complexity
Phase 1 Session + Turn spans only Low — just stdin JSON parsing + span creation
Phase 2 Tool spans from PostToolUse Low — tool name/input/output from stdin
Phase 3 LLM spans from transcript parsing High — parse ~/.claude/projects/.../session.jsonl, extract assistant messages, token usage, model info

Start with Phase 1+2, defer Phase 3 to a follow-up.

Acceptance criteria (Phase 1+2)

  • Plugin installs via Claude Code plugin marketplace
  • setup.ts configures backend and validates API connection
  • Session root span created on SessionStart
  • Turn spans created on each UserPromptSubmit
  • Tool spans created on each PostToolUse
  • All hooks are async (non-blocking)
  • Spans appear in configured backend (verified with Langfuse)
  • State files cleaned up for sessions older than 24h
  • Skills and docs updated

References

Testing Approach

Unit Tests

// Test 1: Hook script generates valid OTel spans
// Simulate Claude Code hook lifecycle: PreToolUse → PostToolUse → Stop
// Assert spans are created with correct GenAI attributes

// Test 2: Session span hierarchy
// Simulate: session start → turn 1 (tool calls) → turn 2 (tool calls) → session stop
// Assert: session root span → turn spans → tool/LLM child spans

// Test 3: Plugin handles missing env vars gracefully
// Run hook without OTEL_EXPORTER_OTLP_ENDPOINT set
// Assert: no crash, warning logged, spans discarded

Integration Test (Jaeger)

docker run -d -p 16686:16686 -p 4318:4318 jaegertracing/all-in-one:latest

# Configure Claude Code hooks to use the plugin
# Run a Claude Code session
# Open http://localhost:16686 — verify session trace appears with tool spans

Manual Validation

# ConsoleSpanExporter mode for quick debugging
OTEL_TRACES_EXPORTER=console claude 'list files in current directory'
# Verify spans printed to stderr with correct session/turn/tool hierarchy

What to Assert

  • Plugin installs as Claude Code hooks (PreToolUse, PostToolUse, Stop)
  • Session root span created on first hook invocation
  • Turn spans group tool calls correctly
  • Spans use GenAI semantic conventions from feat(otel): adopt OTel GenAI semantic conventions for trace attributes #298
  • Works with any OTel backend (Jaeger, Langfuse, Grafana Tempo)
  • Graceful degradation when no OTel endpoint configured

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions