Skip to content

feat(instrument): OpenTelemetry GenAI semantic conventions for all LLM-call adapters (spec 07)#125

Closed
mmercuri wants to merge 1 commit into
feat/instrument-multitenancy-org-id-propagationfrom
feat/instrument-otel-genai-semconv
Closed

feat(instrument): OpenTelemetry GenAI semantic conventions for all LLM-call adapters (spec 07)#125
mmercuri wants to merge 1 commit into
feat/instrument-multitenancy-org-id-propagationfrom
feat/instrument-otel-genai-semconv

Conversation

@mmercuri

Copy link
Copy Markdown
Contributor

Summary

Implements spec 07-otel-genai-semantic-conventions.md — every LLM-call event emitted by every adapter (9 providers + 16 framework + embedding) is additively stamped with the canonical OpenTelemetry GenAI gen_ai.* attribute set, alongside the existing custom LayerLens attributes.

  • New shared helper: src/layerlens/instrument/adapters/_base/genai_semconv.py (constants + stamp_genai_attributes + detect_gen_ai_system)
  • Centralised hook in BaseAdapter.emit_dict_event covers every framework adapter automatically
  • Per-adapter explicit stamping in LLMProviderAdapter._emit_model_invoke and _emit_tool_calls for accurate gen_ai.system / gen_ai.tool.* resolution
  • Stamping is idempotent, additive (no legacy field removed), and safe (never raises on partial payloads)
  • Documentation: docs/adapters/otel-genai-conventions.md — contract + per-adapter coverage matrix

gen_ai.* Attribute Set Stamped

Required (always present on model.invoke):
gen_ai.system, gen_ai.provider.name, gen_ai.operation.name

Request: gen_ai.request.{model, max_tokens, temperature, top_p, top_k, frequency_penalty, presence_penalty, stop_sequences, seed, choice.count, encoding_formats}

Response: gen_ai.response.{id, model, finish_reasons}

Usage: gen_ai.usage.{input_tokens, output_tokens}

Tool: gen_ai.tool.{name, call.id, description, type}

Agent: gen_ai.agent.{id, name, description}

Provider-specific:

  • OpenAI: gen_ai.openai.{request.service_tier, request.response_format, response.service_tier, response.system_fingerprint}
  • Anthropic: gen_ai.anthropic.{cache_creation_input_tokens, cache_read_input_tokens}
  • AWS Bedrock (per spec §4.3 namespace): aws.bedrock.{guardrail.id, knowledge_base.id, agent.id}
  • Google Vertex: gen_ai.google.safety_ratings

Per-Adapter Wiring Coverage

Layer Adapters Wiring path
Providers (9) OpenAI, AzureOpenAI, Anthropic, AWSBedrock, GoogleVertex, Cohere, Mistral, Ollama, LiteLLM explicit stamp in _emit_model_invoke + centralised hook
Frameworks (16) agno, autogen, bedrock_agents, crewai, google_adk, langchain, langfuse, langgraph, llama_index, ms_agent_framework, openai_agents, pydantic_ai, semantic_kernel, smolagents, strands, salesforce_agentforce centralised hook (every emission goes through BaseAdapter.emit_dict_event)
Embedding EmbeddingAdapter (embedding.creategen_ai.operation.name=embeddings) centralised hook

Test plan

  • uv run pytest tests/instrument/adapters/_base/test_genai_semconv.py -x — 49 passed
  • uv run pytest tests/instrument/adapters/test_genai_semconv_per_adapter.py — 35 passed
  • uv run pytest tests/instrument/adapters/frameworks/ — 258 passed, 1 skipped (no regression)
  • uv run mypy --strict src/layerlens/instrument/adapters/_base/genai_semconv.py — clean
  • uv run ruff check — clean

CLAUDE.md Compliance

  • All LLM-call adapters wired (no "planned backlog" deferrals)
  • gen_ai.* attribute names match official OTel spec verbatim (constant table is the source of truth; TestGenAiAttributeNamesMatchSpec pins each one)
  • Additive contract preserved — no legacy attribute removed; tests assert side-by-side coexistence
  • No co-author trailers

…M-call adapters (spec 07)

Implements spec ``07-otel-genai-semantic-conventions.md``: every
LLM-call event emitted by every adapter (9 providers + 16 frameworks +
embedding) is additively stamped with the canonical OpenTelemetry
GenAI ``gen_ai.*`` attribute set, alongside the existing custom
LayerLens attribute set (CLAUDE.md "complete means complete" — no
attribute removed, no dashboard broken).

The wiring is two-layered:

1. Centralised hook in ``BaseAdapter.emit_dict_event`` — recognises
   ``model.invoke`` / ``embedding.create`` / ``model.{request,response}``
   event types and stamps via the shared
   ``stamp_genai_attributes(...)`` helper. Covers every framework
   adapter automatically (no per-site plumbing needed).
2. Per-adapter explicit stamping in
   ``LLMProviderAdapter._emit_model_invoke`` — gives the most
   accurate ``gen_ai.system`` resolution by passing the concrete
   ``provider`` argument through ``detect_gen_ai_system``. Tool calls
   additionally carry ``gen_ai.tool.name`` / ``gen_ai.tool.call.id``.

Stamping is **idempotent**, **additive**, and **safe** (never raises
on partial / malformed payloads; the centralised hook swallows
stamping errors at DEBUG so the circuit-breaker path keeps running).

New module
``src/layerlens/instrument/adapters/_base/genai_semconv.py`` exports:

* Module-level constants for every spec attribute key
  (``GEN_AI_SYSTEM``, ``GEN_AI_REQUEST_MODEL``,
  ``GEN_AI_USAGE_INPUT_TOKENS``, ``GEN_AI_RESPONSE_FINISH_REASONS``,
  ``GEN_AI_TOOL_NAME``, ``AWS_BEDROCK_GUARDRAIL_ID``, …).
* ``detect_gen_ai_system(name)`` — maps adapter / framework /
  provider strings to the canonical OTel ``gen_ai.system`` value
  (``openai``, ``anthropic``, ``aws.bedrock``, ``gcp.vertex_ai``,
  ``gcp.gemini``, ``cohere``, ``mistral_ai``, ``ollama``,
  ``litellm``, ``azure.openai``, with documented ``_OTHER``
  fallback).
* ``stamp_genai_attributes(payload, request_kwargs, response_obj,
  *, system, operation)`` — additive in-place stamping, supports
  flat and nested ``model`` layouts (langchain), reads usage from
  payload OR ``token_usage`` dict OR response object, normalises
  finish-reasons to ``list[str]`` per spec.

Provider-specific extensions:
* OpenAI / Azure OpenAI — ``system_fingerprint``, ``service_tier``,
  ``response_format``.
* Anthropic — cache creation / read input tokens.
* AWS Bedrock — guardrail / knowledge-base / agent IDs (per spec §4.3
  ``aws.bedrock.*`` namespace, not ``gen_ai.*``).

Tests:
* ``tests/instrument/adapters/_base/test_genai_semconv.py`` — 49
  tests pinning every constant to the spec spelling, exercising
  every helper code path, asserting the additive contract, and
  verifying the centralised hook stamps automatically.
* ``tests/instrument/adapters/test_genai_semconv_per_adapter.py`` —
  35 tests parameterised across all 9 providers and all 16
  framework adapters, asserting ``gen_ai.*`` attributes appear on
  emitted ``model.invoke`` events alongside legacy fields.

Documentation:
* ``docs/adapters/otel-genai-conventions.md`` — contract,
  implementation, per-adapter coverage matrix.

No regression in existing 258 framework tests; full helper +
per-adapter suite (84 tests) green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants