Skip to content

feat(otel): propagate session.id to spans and log records#7490

Merged
codefromthecrypt merged 1 commit intomainfrom
adrian/session-id-otel
Mar 16, 2026
Merged

feat(otel): propagate session.id to spans and log records#7490
codefromthecrypt merged 1 commit intomainfrom
adrian/session-id-otel

Conversation

@codefromthecrypt
Copy link
Collaborator

@codefromthecrypt codefromthecrypt commented Feb 24, 2026

Summary

OTel spans and log records have no session.id, making it impossible to query all telemetry for a single session. This adds session.id to reply, reply_stream, dispatch_tool_call, complete, and stream_response_from_provider spans, and propagates it to log records via upstream with_span_attribute_allowlist.

  • reply_stream used Span::enter() inside try_stream! which is invalid across .await (tokio can resume on a different thread). Replaced with tracing-futures .instrument().
  • Removed per-provider #[instrument] from stream() impls, consolidated to complete() and stream_response_from_provider call sites.
  • process_span/process_metric in rate_limiter: removed unnecessary async (no .await).
  • Moved clear_otel_env test helper to goose-test-support.

All otel crates pinned to git rev 345cd74a until 0.32.0 release (tracking issue).

Type of Change

  • Feature
  • Refactor / Code quality

AI Assistance

  • This PR was created or reviewed with AI assistance

Testing

Before — no spans or logs carry session.id:

$ OTEL_TRACES_EXPORTER=console OTEL_LOGS_EXPORTER=console   goose run --with-streamable-http-extension 'https://mcp.kiwi.com'   -t "Use the kiwi search-flight tool to find fastest itinerary from BKI to SYD tomorrow."   > /tmp/out.txt 2>&1
$ grep "session\.id" /tmp/out.txt
$

After — 7 occurrences across spans and logs:

$ OTEL_TRACES_EXPORTER=console OTEL_LOGS_EXPORTER=console   ./target/release/goose run --with-streamable-http-extension 'https://mcp.kiwi.com'   -t "Use the kiwi search-flight tool to find fastest itinerary from BKI to SYD tomorrow."   > /tmp/out.txt 2>&1
$ grep "session\.id" /tmp/out.txt
		 ->  session.id: String(Owned("20260314_5"))
		 ->  session.id: String(Owned("20260314_5"))
		 ->  session.id: String(Owned("20260314_5"))
		 ->  session.id: String(Owned("20260314_5"))
		 ->  session.id: String(Owned("20260314_5"))
		 ->  session.id: String(Owned("20260314_5"))
		 ->  session.id: String(Owned("20260314_5"))

Related Issues

Continues OTel work from #7271, #7144

Copilot AI review requested due to automatic review settings February 24, 2026 20:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-to-end propagation of session.id into OpenTelemetry traces and logs so all telemetry produced during a session can be queried by that attribute.

Changes:

  • Introduces SessionIdBridge to copy session.id from tracing span ancestry onto OTel log record attributes.
  • Moves/standardizes span creation to higher-level call sites (agent/provider entrypoints) and fixes reply_stream span handling by switching to .instrument() for async streams.
  • Adds a new integration-style test that validates session.id appears on exported OTLP log records.

Reviewed changes

Copilot reviewed 19 out of 20 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
crates/goose/src/otel/otlp.rs Adds SessionIdBridge layer + log processor and wires it into OTLP log export.
crates/goose/src/agents/agent.rs Ensures reply_stream span is re-entered safely across awaits; adds session.id to key spans.
crates/goose/src/agents/reply_parts.rs Instruments provider streaming entrypoint with session.id and model name.
crates/goose/src/providers/base.rs Instruments Provider::complete with session.id and model; documents span ownership expectations.
crates/goose/src/tracing/rate_limiter.rs Simplifies span/metric processing by removing unnecessary async/.await.
crates/goose/src/providers/{bedrock,codex,cursor_agent,gemini_cli,litellm,sagemaker_tgi,snowflake,venice}.rs Removes redundant per-provider #[instrument] on stream().
crates/goose/tests/session_id_propagation_test.rs Adds OTLP log export test asserting session.id is present on log records.
crates/goose/Cargo.toml Adds tracing-futures and test-only deps for OTLP log decoding.
crates/goose-test-support/src/otel.rs Adds shared helper to lock/clear OTEL env and restore global providers for tests.
crates/goose-test-support/src/lib.rs Exposes new otel test-support module.
crates/goose-test-support/Cargo.toml Adds deps needed by the new OTEL test-support utilities.
Cargo.toml Adds workspace deps for opentelemetry-proto, prost, and tracing-futures.
Cargo.lock Locks new transitive dependencies.

@codefromthecrypt
Copy link
Collaborator Author

@michaelneale mind having a look? This is in the line of fire for MCP correlation in logs (should do this first).

Meanwhile, sometime today I will take a look at and try to progress the FS capabilities PR from @rabi and/or do follow up on my flight back to Asia today.

@codefromthecrypt codefromthecrypt force-pushed the adrian/session-id-otel branch 2 times, most recently from b2da812 to caca818 Compare March 5, 2026 05:59
@codefromthecrypt
Copy link
Collaborator Author

@alexhancock in case you have time to look. I have an ElasticON presentation I'd like to have otel in top shape for in <2 weeks

@codefromthecrypt
Copy link
Collaborator Author

I'll refactor this as otel seem to be able to release a new dep which contains code needed for this (reducing the manual session sync thing) open-telemetry/opentelemetry-rust#3408 so draft until that's out maybe early next week.

@codefromthecrypt codefromthecrypt marked this pull request as draft March 6, 2026 21:27
auto-merge was automatically disabled March 6, 2026 21:27

Pull request was converted to draft

@codefromthecrypt codefromthecrypt marked this pull request as ready for review March 14, 2026 02:15
@codefromthecrypt
Copy link
Collaborator Author

per otel guidance I used a pinned commit until next release as they have no near term on it.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3dce87ca16

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

/// Handles toolshim transformations if needed
#[tracing::instrument(
skip(provider, session_id, system_prompt, messages, tools, toolshim_tools),
fields(session.id = %session_id, gen_ai.request.model = %provider.get_model_config().model_name)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Tag span with active model instead of provider default

gen_ai.request.model is now derived from provider.get_model_config().model_name, but wrapper providers can change models per turn. In LeadWorkerProvider, get_model_config() always returns the lead model while stream() may route requests to the worker/fallback model, so this span field will be wrong for those turns and model-level telemetry/experiments become misleading. This regression was introduced when per-provider stream() instrumentation was removed and replaced by this call-site field.

Useful? React with 👍 / 👎.

Signed-off-by: Adrian Cole <adrian@tetrate.io>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 93ea855e85

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@codefromthecrypt codefromthecrypt added this pull request to the merge queue Mar 16, 2026
Merged via the queue into main with commit 2631095 Mar 16, 2026
24 checks passed
@codefromthecrypt codefromthecrypt deleted the adrian/session-id-otel branch March 16, 2026 05:42
jh-block added a commit that referenced this pull request Mar 16, 2026
* main: (65 commits)
  feat(otel): propagate session.id to spans and log records (#7490)
  fix(test): add env_lock to is_openai_reasoning_model tests (#7917)
  fix(acp): pass session_id when loading extensions so skills are discovered (#7868)
  updated canonical models (#7920)
  feat(autovisualiser): Migrate the autovisualiser extension to MCP Apps  (#7852)
  fix: add tool_choice and parallel_tool_calls to chatgpt_codex provider (#7867)
  fix: tool confirmation handling for multiple requests (#7856)
  Remove dead OllamaSetup onboarding flow (#7861)
  fix: resolve tokio::sync::Mutex deadlock in recipe retry path (#7832)
  Upgrade Electron 40.6.0 → 41.0.0 (#7851)
  Only show up to 50 lines of source code (#7578)
  fix: stop writing without error when hitting broken pipe for goose session list (#7858)
  feat(acp): add session/set_mode handler (#7801)
  Keep messages in sync (#7850)
  More acp tools (#7843)
  fix: skip upgrade-insecure-requests CSP for external HTTP backends (#7714)
  fix(shell): prevent hang when command backgrounds a child process (#7689)
  Remove include from Cargo.toml in goose-mcp (#7838)
  Exit agent loop when tool call JSON fails to parse (#7840)
  chore: remove redundant husky prepare script (#7829)
  ...
wpfleger96 added a commit that referenced this pull request Mar 16, 2026
…oken-retry

* origin/main: (21 commits)
  Remove java/.ai-usage-marker directory (#7925)
  test(acp): add terminal delegation fixtures and fix shell singleton (#7923)
  fix: bump pctx_code_mode to 0.3.0 for iterator type checking fix (#7892)
  feat: persist GooseMode per-session via session DB (#7854)
  feat(otel): propagate session.id to spans and log records (#7490)
  fix(test): add env_lock to is_openai_reasoning_model tests (#7917)
  fix(acp): pass session_id when loading extensions so skills are discovered (#7868)
  updated canonical models (#7920)
  feat(autovisualiser): Migrate the autovisualiser extension to MCP Apps  (#7852)
  fix: add tool_choice and parallel_tool_calls to chatgpt_codex provider (#7867)
  fix: tool confirmation handling for multiple requests (#7856)
  Remove dead OllamaSetup onboarding flow (#7861)
  fix: resolve tokio::sync::Mutex deadlock in recipe retry path (#7832)
  Upgrade Electron 40.6.0 → 41.0.0 (#7851)
  Only show up to 50 lines of source code (#7578)
  fix: stop writing without error when hitting broken pipe for goose session list (#7858)
  feat(acp): add session/set_mode handler (#7801)
  Keep messages in sync (#7850)
  More acp tools (#7843)
  fix: skip upgrade-insecure-requests CSP for external HTTP backends (#7714)
  ...
wpfleger96 added a commit that referenced this pull request Mar 16, 2026
* origin/main: (72 commits)
  No Check do Check (#7942)
  Log 500 errors and also show error for direct download (#7936)
  fix: retry on authentication failure with credential refresh (#7812)
  Remove java/.ai-usage-marker directory (#7925)
  test(acp): add terminal delegation fixtures and fix shell singleton (#7923)
  fix: bump pctx_code_mode to 0.3.0 for iterator type checking fix (#7892)
  feat: persist GooseMode per-session via session DB (#7854)
  feat(otel): propagate session.id to spans and log records (#7490)
  fix(test): add env_lock to is_openai_reasoning_model tests (#7917)
  fix(acp): pass session_id when loading extensions so skills are discovered (#7868)
  updated canonical models (#7920)
  feat(autovisualiser): Migrate the autovisualiser extension to MCP Apps  (#7852)
  fix: add tool_choice and parallel_tool_calls to chatgpt_codex provider (#7867)
  fix: tool confirmation handling for multiple requests (#7856)
  Remove dead OllamaSetup onboarding flow (#7861)
  fix: resolve tokio::sync::Mutex deadlock in recipe retry path (#7832)
  Upgrade Electron 40.6.0 → 41.0.0 (#7851)
  Only show up to 50 lines of source code (#7578)
  fix: stop writing without error when hitting broken pipe for goose session list (#7858)
  feat(acp): add session/set_mode handler (#7801)
  ...
lifeizhou-ap added a commit that referenced this pull request Mar 17, 2026
* main:
  Add DCO git commit command to AGENTS.md (#7945)
  fix(claude-code): remove incorrect agent_visible filter on user message (#7931)
  No Check do Check (#7942)
  Log 500 errors and also show error for direct download (#7936)
  fix: retry on authentication failure with credential refresh (#7812)
  Remove java/.ai-usage-marker directory (#7925)
  test(acp): add terminal delegation fixtures and fix shell singleton (#7923)
  fix: bump pctx_code_mode to 0.3.0 for iterator type checking fix (#7892)
  feat: persist GooseMode per-session via session DB (#7854)
  feat(otel): propagate session.id to spans and log records (#7490)
  fix(test): add env_lock to is_openai_reasoning_model tests (#7917)
  fix(acp): pass session_id when loading extensions so skills are discovered (#7868)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants