feat(observability): trace span for embeddings api#84
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughStandardizes telemetry attribute keys by replacing hardcoded strings with OpenTelemetry semantic convention constants, enables the semconv_experimental feature, and adds telemetry span construction and wiring for embeddings including error/timeouts and tests. ChangesOpenTelemetry Semantic Convention Standardization & Embeddings Telemetry
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 3/5 reviews remaining, refill in 16 minutes and 23 seconds. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/proxy/handlers/chat_completions/mod.rs (1)
76-76: Span name rename needs downstream observability coordination.
aisix.llm.chat_completion→aisix.llm.chat_completionscan break existing dashboards/alerts until queries are updated.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/proxy/handlers/chat_completions/mod.rs` at line 76, The span name was changed to "aisix.llm.chat_completions" in the Span::enter_with_local_parent call which will break downstream dashboards/alerts expecting "aisix.llm.chat_completion"; either revert the literal back to the previous name or, if the rename is required, emit the legacy name as well (e.g., create the new span under the existing name or attach the old name as an attribute) and add a const for both names so callers use a single source of truth; update any observability mapping/docs and coordinate with downstream teams before landing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/proxy/handlers/embeddings/span_attributes.rs`:
- Around line 89-99: The code currently serializes and pushes full embeddings
via the blocks that call serde_json::to_string(response) and
serde_json::to_string(&data.embedding) inside the if-let and the for (index,
data) in response.data.iter().enumerate() loop; replace those pushes with
compact metadata only: emit output-level metadata such as total item count
(e.g., "output.count") and a source/model id if available, and per-item metadata
such as embedding length/dimension (e.g.,
"embedding.embeddings.{index}.embedding.length" or "embedding.{index}.dim")
rather than the full vector; remove or guard against pushing "output.value" and
the long "embedding.embeddings.{index}.embedding.vector" entries to avoid huge
span attributes.
---
Nitpick comments:
In `@src/proxy/handlers/chat_completions/mod.rs`:
- Line 76: The span name was changed to "aisix.llm.chat_completions" in the
Span::enter_with_local_parent call which will break downstream dashboards/alerts
expecting "aisix.llm.chat_completion"; either revert the literal back to the
previous name or, if the rename is required, emit the legacy name as well (e.g.,
create the new span under the existing name or attach the old name as an
attribute) and add a const for both names so callers use a single source of
truth; update any observability mapping/docs and coordinate with downstream
teams before landing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e4dbf074-b2ed-4b5f-8cf1-1ac3e099ef3a
📒 Files selected for processing (7)
Cargo.tomlsrc/proxy/handlers/chat_completions/mod.rssrc/proxy/handlers/chat_completions/span_attributes/telemetry.rssrc/proxy/handlers/embeddings/mod.rssrc/proxy/handlers/embeddings/span_attributes.rssrc/proxy/handlers/messages/span_attributes/telemetry.rssrc/proxy/utils/trace/span_attributes.rs
| if let Some(value) = serde_json::to_string(response).ok() { | ||
| properties.push(("output.value".into(), value)); | ||
| } | ||
|
|
||
| for (index, data) in response.data.iter().enumerate() { | ||
| if let Ok(value) = serde_json::to_string(&data.embedding) { | ||
| properties.push(( | ||
| format!("embedding.embeddings.{index}.embedding.vector"), | ||
| value, | ||
| )); | ||
| } |
There was a problem hiding this comment.
Avoid emitting full embedding vectors into span attributes.
output.value plus per-vector properties can make spans extremely large for high-dimension/batched embeddings, which risks dropped traces and high telemetry overhead. Prefer compact metadata (counts/dimensions) instead.
💡 Suggested change (compact span payload)
- if let Some(value) = serde_json::to_string(response).ok() {
- properties.push(("output.value".into(), value));
- }
-
- for (index, data) in response.data.iter().enumerate() {
- if let Ok(value) = serde_json::to_string(&data.embedding) {
- properties.push((
- format!("embedding.embeddings.{index}.embedding.vector"),
- value,
- ));
- }
- }
+ properties.push((
+ "gen_ai.embeddings.count".into(),
+ response.data.len().to_string(),
+ ));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if let Some(value) = serde_json::to_string(response).ok() { | |
| properties.push(("output.value".into(), value)); | |
| } | |
| for (index, data) in response.data.iter().enumerate() { | |
| if let Ok(value) = serde_json::to_string(&data.embedding) { | |
| properties.push(( | |
| format!("embedding.embeddings.{index}.embedding.vector"), | |
| value, | |
| )); | |
| } | |
| properties.push(( | |
| "gen_ai.embeddings.count".into(), | |
| response.data.len().to_string(), | |
| )); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/proxy/handlers/embeddings/span_attributes.rs` around lines 89 - 99, The
code currently serializes and pushes full embeddings via the blocks that call
serde_json::to_string(response) and serde_json::to_string(&data.embedding)
inside the if-let and the for (index, data) in response.data.iter().enumerate()
loop; replace those pushes with compact metadata only: emit output-level
metadata such as total item count (e.g., "output.count") and a source/model id
if available, and per-item metadata such as embedding length/dimension (e.g.,
"embedding.embeddings.{index}.embedding.length" or "embedding.{index}.dim")
rather than the full vector; remove or guard against pushing "output.value" and
the long "embedding.embeddings.{index}.embedding.vector" entries to avoid huge
span attributes.
There was a problem hiding this comment.
This is known; it currently conforms to the specification but lacks redaction support.
I will add redaction support in the future; please create an issue for this upcoming task.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
gh issue create \
--repo api7/aisix \
--title "feat(observability): add redaction support for embedding span attributes" \
--body "## Summary
Currently, the embeddings tracing span emits full embedding vectors and serialized response payloads as span attributes (e.g., \`output.value\`, \`embedding.embeddings.{index}.embedding.vector\`). While this conforms to the OpenInference/OpenTelemetry specification, it can result in very large span payloads for high-dimension or batched embeddings, risking dropped traces and high telemetry overhead.
Redaction support should be added to allow users to optionally suppress or truncate these large attributes (e.g., replace full vectors with compact metadata such as count and dimension).
## Context
- Introduced in: `#84`
- Comment: https://github.com/api7/aisix/pull/84#discussion_r3180010194
## Acceptance Criteria
- [ ] Provide a redaction/filtering mechanism for span attributes that contain large payloads (e.g., \`output.value\`, \`embedding.embeddings.{index}.embedding.vector\`)
- [ ] Ensure the default behavior still conforms to the specification
- [ ] Allow users/operators to opt in to compact metadata output (e.g., counts/dimensions only)
/cc `@bzp2010`" \
--assignee bzp2010
Length of output: 40
@bzp2010, understood! I've created a follow-up issue to track the redaction support work: #85
✏️ Learnings added
Learnt from: bzp2010
Repo: api7/aisix PR: 84
File: src/proxy/handlers/embeddings/span_attributes.rs:89-99
Timestamp: 2026-05-04T07:56:26.382Z
Learning: In api7/aisix, the embeddings tracing span intentionally emits full embedding vectors (`embedding.embeddings.{index}.embedding.vector`) and serialized response payloads (`output.value`) as span attributes to conform to the OpenInference/OpenTelemetry specification. Redaction support is planned as a future enhancement (tracked in https://github.com/api7/aisix/issues/85). Do not flag these large span attributes as a bug during code review.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: bzp2010
Repo: api7/aisix PR: 34
File: src/proxy/hooks2/authorization/mod.rs:1-1
Timestamp: 2026-04-10T09:35:05.745Z
Learning: In Rust code using the `anyhow` crate, treat `use anyhow::Result;` as potentially used even if the code uses non-default error types like `Result<T, SomeOtherError>` (i.e., `Result<T, E>` where `E` is not `anyhow::Error`). Because `anyhow::Result` is an alias for `std::result::Result<T, E=anyhow::Error>`, the import is referenced by the `Result<..., ...>` syntax. Do not flag `use anyhow::Result;` as unused solely due to the presence of `Result<T, SomeOtherError>` usages.
Learnt from: AlinsRan
Repo: api7/aisix PR: 46
File: src/lib.rs:122-126
Timestamp: 2026-04-17T09:32:54.079Z
Learning: Avoid adding `#[fastrace::trace]` to top-level process-lifetime entrypoints in `src/lib.rs`—specifically `run()`, `run_with_config()`, and `run_with_provider()`. These functions cover the entire server lifetime (startup → shutdown), while Fastrace spans should be used for request-scoped boundaries (e.g., HTTP/RPC handler functions) rather than global process entrypoints.
Learnt from: bzp2010
Repo: api7/aisix PR: 80
File: src/main.rs:9-9
Timestamp: 2026-05-03T12:45:26.883Z
Learning: In this repository (api7/aisix), `aisix::utils::instance::init()` is intentionally **not** guaranteed to be called by the library entrypoints (`src/lib.rs` via `run()`, `run_with_config()`, `run_with_provider()`). Treat it as the embedder/binary’s responsibility to call `instance::init()` before handling any requests. Also note that `instance_id()` and `run_id()` in `src/utils/instance/mod.rs` intentionally panic (via `.expect()`) when the `OnceLock`s are uninitialized—this is by design. During code review, do **not** flag missing `instance::init()` wiring inside the library startup path as a bug and do **not** suggest moving `instance::init()` into the library automatically.
Summary by CodeRabbit
New Features
Improvements
Tests