Skip to content

feat(observability): trace span for embeddings api#84

Merged
bzp2010 merged 3 commits intomainfrom
bzp/feat-o11y-embed-span
May 4, 2026
Merged

feat(observability): trace span for embeddings api#84
bzp2010 merged 3 commits intomainfrom
bzp/feat-o11y-embed-span

Conversation

@bzp2010
Copy link
Copy Markdown
Collaborator

@bzp2010 bzp2010 commented May 4, 2026

Summary by CodeRabbit

  • New Features

    • Distributed tracing for embedding operations with richer telemetry and error tagging
  • Improvements

    • Unified telemetry attributes across chat, embedding, and message flows
    • Better error tracking for gateway failures and timeouts
    • Consistent naming and telemetry standards across handlers
  • Tests

    • Added tests validating telemetry property keys and values for embedding telemetry

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1794c894-40b3-4907-8d03-447cc2e85f88

📥 Commits

Reviewing files that changed from the base of the PR and between cb4b8ae and 2df03ac.

📒 Files selected for processing (1)
  • src/proxy/handlers/embeddings/span_attributes.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/proxy/handlers/embeddings/span_attributes.rs

📝 Walkthrough

Walkthrough

Standardizes telemetry attribute keys by replacing hardcoded strings with OpenTelemetry semantic convention constants, enables the semconv_experimental feature, and adds telemetry span construction and wiring for embeddings including error/timeouts and tests.

Changes

OpenTelemetry Semantic Convention Standardization & Embeddings Telemetry

Layer / File(s) Summary
Dependency
Cargo.toml
Enable semconv_experimental feature for opentelemetry-semantic-conventions = "0.31.0".
Semantic constants import
src/proxy/.../span_attributes/telemetry.rs, src/proxy/utils/trace/span_attributes.rs
Import opentelemetry_semantic_conventions::attribute::{GEN_AI_*, SERVER_ADDRESS, SERVER_PORT, USER_ID, ...} and replace literal "gen_ai.*", "server.*", "user.id" keys with these constants.
Core telemetry helpers
src/proxy/utils/trace/span_attributes.rs
Switch finish-reasons and usage token keys to GEN_AI_RESPONSE_FINISH_REASONS, GEN_AI_USAGE_INPUT_TOKENS, GEN_AI_USAGE_OUTPUT_TOKENS.
Chat completions telemetry
src/proxy/handlers/chat_completions/span_attributes/telemetry.rs, src/proxy/handlers/chat_completions/mod.rs
Replace request/response/chunk/usage keys with semantic constants; update span name from "aisix.llm.chat_completion" to "aisix.llm.chat_completions" and use GEN_AI_RESPONSE_FINISH_REASONS for finish reasons matching.
Messages telemetry
src/proxy/handlers/messages/span_attributes/telemetry.rs
Replace request/response/chunk/usage keys with semantic-convention constants (GEN_AI_*, SERVER_ADDRESS, SERVER_PORT, USER_ID).
Embeddings telemetry: data & wiring
src/proxy/handlers/embeddings/span_attributes.rs, src/proxy/handlers/embeddings/mod.rs
Add span_attributes module with request_span_properties(request, provider, base_url) and response_span_properties(response, usage) that emit semantic-convention keys; create span in embeddings handler, apply properties, wrap gateway embed call with WithSpan + maybe_timeout, record response properties on success and distinct error.type properties for gateway errors vs timeouts.
Tests
src/proxy/handlers/embeddings/span_attributes.rs (tests inside)
Add unit tests asserting emitted embedding request/response span property keys/values (encoding formats, invocation parameters, server fields, vectors, dimension and usage values).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
E2e Test Quality Review ⚠️ Warning PR's primary observability feature (tracing spans for embeddings API) lacks E2E test validation of span creation, recording, and telemetry properties. Add E2E test cases that verify span creation, recorded traces, span names, and span attributes (gen_ai.operation.name, gen_ai.request.model) in telemetry output.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main objective of the pull request, which is to add trace span instrumentation for the embeddings API.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Check ✅ Passed Security analysis reveals no critical, high, or medium-severity vulnerabilities. OpenTelemetry tracing additions properly handle legitimate observability data without exposing sensitive credentials.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bzp/feat-o11y-embed-span

Review rate limit: 3/5 reviews remaining, refill in 16 minutes and 23 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/proxy/handlers/chat_completions/mod.rs (1)

76-76: Span name rename needs downstream observability coordination.

aisix.llm.chat_completionaisix.llm.chat_completions can break existing dashboards/alerts until queries are updated.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proxy/handlers/chat_completions/mod.rs` at line 76, The span name was
changed to "aisix.llm.chat_completions" in the Span::enter_with_local_parent
call which will break downstream dashboards/alerts expecting
"aisix.llm.chat_completion"; either revert the literal back to the previous name
or, if the rename is required, emit the legacy name as well (e.g., create the
new span under the existing name or attach the old name as an attribute) and add
a const for both names so callers use a single source of truth; update any
observability mapping/docs and coordinate with downstream teams before landing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/proxy/handlers/embeddings/span_attributes.rs`:
- Around line 89-99: The code currently serializes and pushes full embeddings
via the blocks that call serde_json::to_string(response) and
serde_json::to_string(&data.embedding) inside the if-let and the for (index,
data) in response.data.iter().enumerate() loop; replace those pushes with
compact metadata only: emit output-level metadata such as total item count
(e.g., "output.count") and a source/model id if available, and per-item metadata
such as embedding length/dimension (e.g.,
"embedding.embeddings.{index}.embedding.length" or "embedding.{index}.dim")
rather than the full vector; remove or guard against pushing "output.value" and
the long "embedding.embeddings.{index}.embedding.vector" entries to avoid huge
span attributes.

---

Nitpick comments:
In `@src/proxy/handlers/chat_completions/mod.rs`:
- Line 76: The span name was changed to "aisix.llm.chat_completions" in the
Span::enter_with_local_parent call which will break downstream dashboards/alerts
expecting "aisix.llm.chat_completion"; either revert the literal back to the
previous name or, if the rename is required, emit the legacy name as well (e.g.,
create the new span under the existing name or attach the old name as an
attribute) and add a const for both names so callers use a single source of
truth; update any observability mapping/docs and coordinate with downstream
teams before landing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e4dbf074-b2ed-4b5f-8cf1-1ac3e099ef3a

📥 Commits

Reviewing files that changed from the base of the PR and between 0429787 and cb4b8ae.

📒 Files selected for processing (7)
  • Cargo.toml
  • src/proxy/handlers/chat_completions/mod.rs
  • src/proxy/handlers/chat_completions/span_attributes/telemetry.rs
  • src/proxy/handlers/embeddings/mod.rs
  • src/proxy/handlers/embeddings/span_attributes.rs
  • src/proxy/handlers/messages/span_attributes/telemetry.rs
  • src/proxy/utils/trace/span_attributes.rs

Comment on lines +89 to +99
if let Some(value) = serde_json::to_string(response).ok() {
properties.push(("output.value".into(), value));
}

for (index, data) in response.data.iter().enumerate() {
if let Ok(value) = serde_json::to_string(&data.embedding) {
properties.push((
format!("embedding.embeddings.{index}.embedding.vector"),
value,
));
}
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid emitting full embedding vectors into span attributes.

output.value plus per-vector properties can make spans extremely large for high-dimension/batched embeddings, which risks dropped traces and high telemetry overhead. Prefer compact metadata (counts/dimensions) instead.

💡 Suggested change (compact span payload)
-    if let Some(value) = serde_json::to_string(response).ok() {
-        properties.push(("output.value".into(), value));
-    }
-
-    for (index, data) in response.data.iter().enumerate() {
-        if let Ok(value) = serde_json::to_string(&data.embedding) {
-            properties.push((
-                format!("embedding.embeddings.{index}.embedding.vector"),
-                value,
-            ));
-        }
-    }
+    properties.push((
+        "gen_ai.embeddings.count".into(),
+        response.data.len().to_string(),
+    ));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if let Some(value) = serde_json::to_string(response).ok() {
properties.push(("output.value".into(), value));
}
for (index, data) in response.data.iter().enumerate() {
if let Ok(value) = serde_json::to_string(&data.embedding) {
properties.push((
format!("embedding.embeddings.{index}.embedding.vector"),
value,
));
}
properties.push((
"gen_ai.embeddings.count".into(),
response.data.len().to_string(),
));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proxy/handlers/embeddings/span_attributes.rs` around lines 89 - 99, The
code currently serializes and pushes full embeddings via the blocks that call
serde_json::to_string(response) and serde_json::to_string(&data.embedding)
inside the if-let and the for (index, data) in response.data.iter().enumerate()
loop; replace those pushes with compact metadata only: emit output-level
metadata such as total item count (e.g., "output.count") and a source/model id
if available, and per-item metadata such as embedding length/dimension (e.g.,
"embedding.embeddings.{index}.embedding.length" or "embedding.{index}.dim")
rather than the full vector; remove or guard against pushing "output.value" and
the long "embedding.embeddings.{index}.embedding.vector" entries to avoid huge
span attributes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is known; it currently conforms to the specification but lacks redaction support.
I will add redaction support in the future; please create an issue for this upcoming task.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
gh issue create \
  --repo api7/aisix \
  --title "feat(observability): add redaction support for embedding span attributes" \
  --body "## Summary

Currently, the embeddings tracing span emits full embedding vectors and serialized response payloads as span attributes (e.g., \`output.value\`, \`embedding.embeddings.{index}.embedding.vector\`). While this conforms to the OpenInference/OpenTelemetry specification, it can result in very large span payloads for high-dimension or batched embeddings, risking dropped traces and high telemetry overhead.

Redaction support should be added to allow users to optionally suppress or truncate these large attributes (e.g., replace full vectors with compact metadata such as count and dimension).

## Context

- Introduced in: `#84`
- Comment: https://github.com/api7/aisix/pull/84#discussion_r3180010194

## Acceptance Criteria

- [ ] Provide a redaction/filtering mechanism for span attributes that contain large payloads (e.g., \`output.value\`, \`embedding.embeddings.{index}.embedding.vector\`)
- [ ] Ensure the default behavior still conforms to the specification
- [ ] Allow users/operators to opt in to compact metadata output (e.g., counts/dimensions only)

/cc `@bzp2010`" \
  --assignee bzp2010

Length of output: 40


@bzp2010, understood! I've created a follow-up issue to track the redaction support work: #85


✏️ Learnings added
Learnt from: bzp2010
Repo: api7/aisix PR: 84
File: src/proxy/handlers/embeddings/span_attributes.rs:89-99
Timestamp: 2026-05-04T07:56:26.382Z
Learning: In api7/aisix, the embeddings tracing span intentionally emits full embedding vectors (`embedding.embeddings.{index}.embedding.vector`) and serialized response payloads (`output.value`) as span attributes to conform to the OpenInference/OpenTelemetry specification. Redaction support is planned as a future enhancement (tracked in https://github.com/api7/aisix/issues/85). Do not flag these large span attributes as a bug during code review.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: bzp2010
Repo: api7/aisix PR: 34
File: src/proxy/hooks2/authorization/mod.rs:1-1
Timestamp: 2026-04-10T09:35:05.745Z
Learning: In Rust code using the `anyhow` crate, treat `use anyhow::Result;` as potentially used even if the code uses non-default error types like `Result<T, SomeOtherError>` (i.e., `Result<T, E>` where `E` is not `anyhow::Error`). Because `anyhow::Result` is an alias for `std::result::Result<T, E=anyhow::Error>`, the import is referenced by the `Result<..., ...>` syntax. Do not flag `use anyhow::Result;` as unused solely due to the presence of `Result<T, SomeOtherError>` usages.

Learnt from: AlinsRan
Repo: api7/aisix PR: 46
File: src/lib.rs:122-126
Timestamp: 2026-04-17T09:32:54.079Z
Learning: Avoid adding `#[fastrace::trace]` to top-level process-lifetime entrypoints in `src/lib.rs`—specifically `run()`, `run_with_config()`, and `run_with_provider()`. These functions cover the entire server lifetime (startup → shutdown), while Fastrace spans should be used for request-scoped boundaries (e.g., HTTP/RPC handler functions) rather than global process entrypoints.

Learnt from: bzp2010
Repo: api7/aisix PR: 80
File: src/main.rs:9-9
Timestamp: 2026-05-03T12:45:26.883Z
Learning: In this repository (api7/aisix), `aisix::utils::instance::init()` is intentionally **not** guaranteed to be called by the library entrypoints (`src/lib.rs` via `run()`, `run_with_config()`, `run_with_provider()`). Treat it as the embedder/binary’s responsibility to call `instance::init()` before handling any requests. Also note that `instance_id()` and `run_id()` in `src/utils/instance/mod.rs` intentionally panic (via `.expect()`) when the `OnceLock`s are uninitialized—this is by design. During code review, do **not** flag missing `instance::init()` wiring inside the library startup path as a bug and do **not** suggest moving `instance::init()` into the library automatically.

@bzp2010 bzp2010 merged commit ee53cd6 into main May 4, 2026
3 checks passed
@bzp2010 bzp2010 deleted the bzp/feat-o11y-embed-span branch May 4, 2026 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant