feat(otel): propagate session.id to spans and log records#7490
feat(otel): propagate session.id to spans and log records#7490codefromthecrypt merged 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds end-to-end propagation of session.id into OpenTelemetry traces and logs so all telemetry produced during a session can be queried by that attribute.
Changes:
- Introduces
SessionIdBridgeto copysession.idfrom tracing span ancestry onto OTel log record attributes. - Moves/standardizes span creation to higher-level call sites (agent/provider entrypoints) and fixes
reply_streamspan handling by switching to.instrument()for async streams. - Adds a new integration-style test that validates
session.idappears on exported OTLP log records.
Reviewed changes
Copilot reviewed 19 out of 20 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| crates/goose/src/otel/otlp.rs | Adds SessionIdBridge layer + log processor and wires it into OTLP log export. |
| crates/goose/src/agents/agent.rs | Ensures reply_stream span is re-entered safely across awaits; adds session.id to key spans. |
| crates/goose/src/agents/reply_parts.rs | Instruments provider streaming entrypoint with session.id and model name. |
| crates/goose/src/providers/base.rs | Instruments Provider::complete with session.id and model; documents span ownership expectations. |
| crates/goose/src/tracing/rate_limiter.rs | Simplifies span/metric processing by removing unnecessary async/.await. |
| crates/goose/src/providers/{bedrock,codex,cursor_agent,gemini_cli,litellm,sagemaker_tgi,snowflake,venice}.rs | Removes redundant per-provider #[instrument] on stream(). |
| crates/goose/tests/session_id_propagation_test.rs | Adds OTLP log export test asserting session.id is present on log records. |
| crates/goose/Cargo.toml | Adds tracing-futures and test-only deps for OTLP log decoding. |
| crates/goose-test-support/src/otel.rs | Adds shared helper to lock/clear OTEL env and restore global providers for tests. |
| crates/goose-test-support/src/lib.rs | Exposes new otel test-support module. |
| crates/goose-test-support/Cargo.toml | Adds deps needed by the new OTEL test-support utilities. |
| Cargo.toml | Adds workspace deps for opentelemetry-proto, prost, and tracing-futures. |
| Cargo.lock | Locks new transitive dependencies. |
581bef4 to
1066a86
Compare
|
@michaelneale mind having a look? This is in the line of fire for MCP correlation in logs (should do this first). Meanwhile, sometime today I will take a look at and try to progress the FS capabilities PR from @rabi and/or do follow up on my flight back to Asia today. |
b2da812 to
caca818
Compare
|
@alexhancock in case you have time to look. I have an ElasticON presentation I'd like to have otel in top shape for in <2 weeks |
|
I'll refactor this as otel seem to be able to release a new dep which contains code needed for this (reducing the manual session sync thing) open-telemetry/opentelemetry-rust#3408 so draft until that's out maybe early next week. |
Pull request was converted to draft
caca818 to
3dce87c
Compare
|
per otel guidance I used a pinned commit until next release as they have no near term on it. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3dce87ca16
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| /// Handles toolshim transformations if needed | ||
| #[tracing::instrument( | ||
| skip(provider, session_id, system_prompt, messages, tools, toolshim_tools), | ||
| fields(session.id = %session_id, gen_ai.request.model = %provider.get_model_config().model_name) |
There was a problem hiding this comment.
Tag span with active model instead of provider default
gen_ai.request.model is now derived from provider.get_model_config().model_name, but wrapper providers can change models per turn. In LeadWorkerProvider, get_model_config() always returns the lead model while stream() may route requests to the worker/fallback model, so this span field will be wrong for those turns and model-level telemetry/experiments become misleading. This regression was introduced when per-provider stream() instrumentation was removed and replaced by this call-site field.
Useful? React with 👍 / 👎.
Signed-off-by: Adrian Cole <adrian@tetrate.io>
3dce87c to
93ea855
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 93ea855e85
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
* main: (65 commits) feat(otel): propagate session.id to spans and log records (#7490) fix(test): add env_lock to is_openai_reasoning_model tests (#7917) fix(acp): pass session_id when loading extensions so skills are discovered (#7868) updated canonical models (#7920) feat(autovisualiser): Migrate the autovisualiser extension to MCP Apps (#7852) fix: add tool_choice and parallel_tool_calls to chatgpt_codex provider (#7867) fix: tool confirmation handling for multiple requests (#7856) Remove dead OllamaSetup onboarding flow (#7861) fix: resolve tokio::sync::Mutex deadlock in recipe retry path (#7832) Upgrade Electron 40.6.0 → 41.0.0 (#7851) Only show up to 50 lines of source code (#7578) fix: stop writing without error when hitting broken pipe for goose session list (#7858) feat(acp): add session/set_mode handler (#7801) Keep messages in sync (#7850) More acp tools (#7843) fix: skip upgrade-insecure-requests CSP for external HTTP backends (#7714) fix(shell): prevent hang when command backgrounds a child process (#7689) Remove include from Cargo.toml in goose-mcp (#7838) Exit agent loop when tool call JSON fails to parse (#7840) chore: remove redundant husky prepare script (#7829) ...
…oken-retry * origin/main: (21 commits) Remove java/.ai-usage-marker directory (#7925) test(acp): add terminal delegation fixtures and fix shell singleton (#7923) fix: bump pctx_code_mode to 0.3.0 for iterator type checking fix (#7892) feat: persist GooseMode per-session via session DB (#7854) feat(otel): propagate session.id to spans and log records (#7490) fix(test): add env_lock to is_openai_reasoning_model tests (#7917) fix(acp): pass session_id when loading extensions so skills are discovered (#7868) updated canonical models (#7920) feat(autovisualiser): Migrate the autovisualiser extension to MCP Apps (#7852) fix: add tool_choice and parallel_tool_calls to chatgpt_codex provider (#7867) fix: tool confirmation handling for multiple requests (#7856) Remove dead OllamaSetup onboarding flow (#7861) fix: resolve tokio::sync::Mutex deadlock in recipe retry path (#7832) Upgrade Electron 40.6.0 → 41.0.0 (#7851) Only show up to 50 lines of source code (#7578) fix: stop writing without error when hitting broken pipe for goose session list (#7858) feat(acp): add session/set_mode handler (#7801) Keep messages in sync (#7850) More acp tools (#7843) fix: skip upgrade-insecure-requests CSP for external HTTP backends (#7714) ...
* origin/main: (72 commits) No Check do Check (#7942) Log 500 errors and also show error for direct download (#7936) fix: retry on authentication failure with credential refresh (#7812) Remove java/.ai-usage-marker directory (#7925) test(acp): add terminal delegation fixtures and fix shell singleton (#7923) fix: bump pctx_code_mode to 0.3.0 for iterator type checking fix (#7892) feat: persist GooseMode per-session via session DB (#7854) feat(otel): propagate session.id to spans and log records (#7490) fix(test): add env_lock to is_openai_reasoning_model tests (#7917) fix(acp): pass session_id when loading extensions so skills are discovered (#7868) updated canonical models (#7920) feat(autovisualiser): Migrate the autovisualiser extension to MCP Apps (#7852) fix: add tool_choice and parallel_tool_calls to chatgpt_codex provider (#7867) fix: tool confirmation handling for multiple requests (#7856) Remove dead OllamaSetup onboarding flow (#7861) fix: resolve tokio::sync::Mutex deadlock in recipe retry path (#7832) Upgrade Electron 40.6.0 → 41.0.0 (#7851) Only show up to 50 lines of source code (#7578) fix: stop writing without error when hitting broken pipe for goose session list (#7858) feat(acp): add session/set_mode handler (#7801) ...
* main: Add DCO git commit command to AGENTS.md (#7945) fix(claude-code): remove incorrect agent_visible filter on user message (#7931) No Check do Check (#7942) Log 500 errors and also show error for direct download (#7936) fix: retry on authentication failure with credential refresh (#7812) Remove java/.ai-usage-marker directory (#7925) test(acp): add terminal delegation fixtures and fix shell singleton (#7923) fix: bump pctx_code_mode to 0.3.0 for iterator type checking fix (#7892) feat: persist GooseMode per-session via session DB (#7854) feat(otel): propagate session.id to spans and log records (#7490) fix(test): add env_lock to is_openai_reasoning_model tests (#7917) fix(acp): pass session_id when loading extensions so skills are discovered (#7868)
Summary
OTel spans and log records have no
session.id, making it impossible to query all telemetry for a single session. This addssession.idtoreply,reply_stream,dispatch_tool_call,complete, andstream_response_from_providerspans, and propagates it to log records via upstreamwith_span_attribute_allowlist.reply_streamusedSpan::enter()insidetry_stream!which is invalid across.await(tokio can resume on a different thread). Replaced withtracing-futures.instrument().#[instrument]fromstream()impls, consolidated tocomplete()andstream_response_from_providercall sites.process_span/process_metricin rate_limiter: removed unnecessaryasync(no.await).clear_otel_envtest helper togoose-test-support.All otel crates pinned to git rev
345cd74auntil 0.32.0 release (tracking issue).Type of Change
AI Assistance
Testing
Before — no spans or logs carry
session.id:After — 7 occurrences across spans and logs:
Related Issues
Continues OTel work from #7271, #7144