diff --git a/docs/index.md b/docs/index.md
index a153fef8..4d7cc2a3 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -68,6 +68,7 @@ Use the reading path that matches your task:
 | Observe a local coding-agent CLI | [NeMo Relay CLI](nemo-relay-cli/about.md) |
 | Package reusable behavior | [Build Plugins](build-plugins/about.md) |
 | Export traces or trajectories | [Observability](plugins/observability/about.md) |
+| Debug trace incidents | [Trace Incident Runbook](troubleshooting/trace-incident-runbook.md) |
 | Tune performance with adaptive behavior | [Adaptive](plugins/adaptive/about.md) |
 | Look up symbols | [APIs](reference/api/index.md) |
 
@@ -270,6 +271,7 @@ reference/performance
 :maxdepth: 2
 
 Troubleshooting Guide <troubleshooting/troubleshooting-guide>
+Trace Incident Runbook <troubleshooting/trace-incident-runbook>
 ```
 
 ```{toctree}
diff --git a/docs/plugins/observability/about.md b/docs/plugins/observability/about.md
index 81b8741d..2b49f832 100644
--- a/docs/plugins/observability/about.md
+++ b/docs/plugins/observability/about.md
@@ -57,9 +57,13 @@ Choose the exporter based on the downstream system:
 | Generic OTLP traces | [OpenTelemetry](opentelemetry.md) |
 | OpenInference-oriented agent and LLM spans | [OpenInference](openinference.md) |
 
-Start with local event inspection before production export. Add sanitize
+Start with in-process event inspection before exporting externally. Add sanitize
 guardrails before exporters receive sensitive payloads.
 
+For trace incidents involving missing traces, wrong scope attachment, export
+failures, duplicate events, or sensitive telemetry, use the
+[Trace Incident Runbook](../../troubleshooting/trace-incident-runbook.md).
+
 ## Correlating Trajectories And Traces
 
 When ATIF and trace exporters observe the same NeMo Relay events, they share
diff --git a/docs/troubleshooting/trace-incident-runbook.md b/docs/troubleshooting/trace-incident-runbook.md
new file mode 100644
index 00000000..d8d31682
--- /dev/null
+++ b/docs/troubleshooting/trace-incident-runbook.md
@@ -0,0 +1,210 @@
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+-->
+
+# Trace Incident Runbook
+
+Use this runbook when a NeMo Relay application has missing traces, partial
+traces, incorrect scope parentage, exporter failures, duplicate events, or
+sensitive data in telemetry. It assumes that the application already has a
+baseline scope and call instrumentation path.
+
+For first-time setup problems, start with the
+[Troubleshooting Guide](troubleshooting-guide.md). For conceptual grounding,
+refer to [Agent Runtime Primer](../getting-started/agent-runtime-primer.md),
+[Scopes](../about/concepts/scopes.md), [Events](../about/concepts/events.md),
+and [Subscribers](../about/concepts/subscribers.md).
+
+## Protect Sensitive Data First
+
+Do not collect raw prompts, model responses, authorization headers, tokens,
+customer records, tool arguments, or provider payloads while triaging an
+incident. Capture the smallest sanitized event sample that proves the failure.
+
+Before exporting incident artifacts outside the current trust boundary, verify
+that sanitize guardrails or exporter filters remove sensitive fields. Sanitize
+guardrails change emitted telemetry payloads only; they do not change the live
+request or response passed to the tool, model provider, or application. Refer to
+[Middleware](../about/concepts/middleware.md) and
+[Add Middleware](../instrument-applications/advanced-guide.md) for the
+guardrail boundary.
+
+## Triage By Symptom
+
+Use this table to choose the first check for the symptom you see.
+
+| Symptom | Likely Area | Start With |
+|---|---|---|
+| No traces | Missing instrumentation boundary or inactive exporter | [Confirm Instrumentation Boundary](#confirm-instrumentation-boundary) |
+| Partial traces | Unwrapped calls, dropped streams, or late subscriber registration | [Confirm Managed Calls](#confirm-managed-calls) |
+| Wrong parent or child scope | Scope propagation or shared scope stack issue | [Confirm Active Scope](#confirm-active-scope) |
+| Events appear in process but export fails elsewhere | Exporter config, endpoint, environment, or flush path | [Confirm Exporter Setup](#confirm-exporter-setup) |
+| Duplicate events | Duplicate subscribers, duplicate wrappers, or mixed manual and managed lifecycle calls | [Check For Duplicate Event Sources](#check-for-duplicate-event-sources) |
+| Sensitive data appears in telemetry | Missing sanitize guardrails before subscribers or exporters | [Confirm Sanitization Before Export](#confirm-sanitization-before-export) |
+
+## Run The Ordered Checks
+
+Run these checks in order before changing exporter or application code.
+
+1. Confirm the instrumentation boundary.
+2. Confirm the active scope and root scope ownership.
+3. Confirm managed tool and LLM calls.
+4. Confirm subscriber or exporter registration timing.
+5. Confirm exporter endpoint, environment, and flush behavior.
+6. Confirm sanitization before export.
+
+## Confirm Instrumentation Boundary
+
+Start with the code path that owns the real work.
+
+- If application code calls the tool or model provider directly, verify that the
+  call path uses [Instrument Applications](../instrument-applications/about.md)
+  guidance.
+- If a framework owns scheduling, retries, callbacks, or provider payloads,
+  verify that the integration uses
+  [Integrate into Frameworks](../integrate-frameworks/about.md) guidance.
+- If a plugin installs runtime behavior, verify that the plugin is activated
+  before the request path starts.
+
+Do not debug an exporter first if no in-process subscriber sees events. Add or
+enable a sanitized in-process subscriber at the same boundary and confirm that
+scope, tool, or LLM events exist before investigating external export.
+
+## Confirm Active Scope
+
+Trace gaps and wrong parent-child relationships usually start with scope
+ownership. Verify these conditions:
+
+- Each request, agent run, or workflow starts under the intended top-level scope.
+- Detached tasks, worker threads, callbacks, and async jobs receive the intended
+  scope stack when they should remain part of the same logical run.
+- Independent requests receive fresh isolated scope stacks.
+- Scope-local middleware and subscribers are registered on the owning scope or
+  an ancestor scope.
+
+Use [Adding Scopes and Marks](../instrument-applications/adding-scopes-and-marks.md)
+and [Scopes](../about/concepts/scopes.md) to compare the intended root scope
+with the emitted event `uuid` and `parent_uuid` values.
+
+## Confirm Managed Calls
+
+Partial traces often mean some work bypasses the runtime helpers. Check these
+areas:
+
+- Tool calls that should emit tool start and end events use the managed tool
+  call path.
+- Model calls that should emit LLM start and end events use the managed LLM call
+  path or an integration wrapper that emits equivalent lifecycle events.
+- Manual lifecycle calls emit matched start and end events with the same
+  lifecycle UUID.
+- Streaming LLM responses are drained until completion so final events,
+  collectors, and subscribers can observe the completed output.
+
+Refer to [Instrument a Tool Call](../instrument-applications/instrument-tool-call.md),
+[Instrument an LLM Call](../instrument-applications/instrument-llm-call.md),
+[Wrap Tool Calls](../integrate-frameworks/wrap-tool-calls.md), and
+[Wrap LLM Calls](../integrate-frameworks/wrap-llm-calls.md).
+
+## Confirm Subscriber And Exporter Registration
+
+Events are not buffered for subscribers that register after the event has
+already been emitted. Verify these conditions:
+
+- Plugin-managed observability components are loaded before the request path.
+- Manual subscribers are registered before the scope, tool, or LLM events they
+  need to observe.
+- Scope-local subscribers are registered on a scope that is active for the work
+  they should observe.
+- Exporter filters match the intended root scope or event category.
+- Shutdown, teardown, or request completion calls flush owned exporters before
+  the process exits or the container stops.
+
+Use [Observability](../plugins/observability/about.md),
+[Observability Configuration](../plugins/observability/configuration.md), and
+[Subscribers](../about/concepts/subscribers.md) to verify the registration
+lifecycle.
+
+## Confirm Exporter Setup
+
+If in-process event inspection works but export fails elsewhere, isolate
+exporter transport and configuration from runtime instrumentation.
+
+For file or trajectory export, confirm these settings:
+
+- Output paths are writable by the running process.
+- The application shuts down or clears the exporter in a path that flushes
+  partial output.
+- ATIF export is scoped to the intended agent root and does not mix concurrent
+  root scopes.
+
+For OpenTelemetry or OpenInference export, confirm these settings:
+
+- The OpenTelemetry Protocol (OTLP) endpoint, headers, credentials, and network
+  egress are available in the target environment.
+- The exporter is enabled in the active configuration file or plugin document.
+- The backend receives spans with `nemo_relay.uuid` and
+  `nemo_relay.parent_uuid` attributes.
+- The application flushes and shuts down the subscriber during graceful
+  termination.
+
+Refer to [Agent Trajectory Observability Format (ATOF)](../plugins/observability/atof.md),
+[Agent Trajectory Interchange Format (ATIF)](../plugins/observability/atif.md),
+[OpenTelemetry](../plugins/observability/opentelemetry.md), and
+[OpenInference](../plugins/observability/openinference.md).
+
+## Check For Duplicate Event Sources
+
+Duplicate events usually mean the same boundary is instrumented more than once.
+Check these areas:
+
+- The application does not wrap a call that a framework integration already
+  wraps.
+- Manual lifecycle calls are not emitted around the same call that already uses
+  managed tool or LLM helpers.
+- Plugin-managed exporters and manually registered exporters are not both
+  active for the same output path or backend.
+- Retry logic belongs to the framework or application and is not being counted
+  as duplicate telemetry for the same real call.
+
+If duplicate events are expected because a retry or fallback actually executed
+more than once, preserve the events and add stable names or metadata that let
+the downstream backend distinguish attempts.
+
+## Confirm Sanitization Before Export
+
+Sensitive data in telemetry is an incident. Use this order:
+
+1. Stop or disable the affected exporter if sensitive data is leaving the
+   intended trust boundary.
+2. Keep the application path stable unless the live request itself is unsafe.
+3. Add or fix sanitize-request and sanitize-response guardrails before
+   subscribers and exporters receive events.
+4. Validate the sanitized event with ATOF JSONL or an in-process subscriber
+   before re-enabling external export.
+5. Re-enable one exporter at a time and confirm the downstream backend no
+   longer receives sensitive fields.
+
+Use a request intercept only when the real request to the tool or provider must
+change. Use a sanitize guardrail when only the recorded telemetry should change.
+
+## Escalation Capture Checklist
+
+Collect this information before escalating an incident:
+
+- NeMo Relay version and binding package version.
+- Language binding and runtime version.
+- Whether instrumentation is direct application code, a framework integration,
+  or plugin-managed behavior.
+- Exporter type, configuration source, and activation path.
+- Sanitized event sample that shows `uuid`, `parent_uuid`, `category`,
+  `scope_category`, name, and redacted metadata.
+- Runtime shape, such as single process, worker pool, async tasks, sidecar, job
+  queue, or container orchestration.
+- Reproduction scope, including whether the failure occurs for one request, one
+  tenant, one service, or all requests.
+- Recent changes to instrumentation, plugin configuration, exporter endpoints,
+  runtime environment, or tracing backend configuration.
+
+Do not attach raw prompts, model responses, credentials, customer records,
+authorization headers, or unredacted tool arguments to escalation artifacts.
diff --git a/docs/troubleshooting/troubleshooting-guide.md b/docs/troubleshooting/troubleshooting-guide.md
index c54b21bc..5ddd4f3b 100644
--- a/docs/troubleshooting/troubleshooting-guide.md
+++ b/docs/troubleshooting/troubleshooting-guide.md
@@ -7,6 +7,10 @@ SPDX-License-Identifier: Apache-2.0
 
 Use this page when a NeMo Relay setup, build, or runtime workflow does not behave as expected.
 
+For trace incidents involving missing traces, wrong scope attachment, export
+failures, duplicate events, or sensitive telemetry, start with the
+[Trace Incident Runbook](trace-incident-runbook.md).
+
 ## Package Or Build Setup Fails
 
 Confirm that your environment matches [Prerequisites](../getting-started/prerequisites.md), then rerun the binding-specific setup command from [Installation](../getting-started/installation.md).