Skip to content

docs: document LLM request intercept outcomes#341

Open
bbednarski9 wants to merge 1 commit into
NVIDIA:mainfrom
bbednarski9:docs/llm-intercept-pending-marks
Open

docs: document LLM request intercept outcomes#341
bbednarski9 wants to merge 1 commit into
NVIDIA:mainfrom
bbednarski9:docs/llm-intercept-pending-marks

Conversation

@bbednarski9

@bbednarski9 bbednarski9 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Summary

  • update Python and Rust plugin examples for the canonical LLM request-intercept outcome
  • update standalone helper examples to consume the returned outcome object
  • document codec authority and raw-content rejection rules
  • add a consolidated reference for pending marks, lifecycle ordering, binding contracts, and migration

Context

This is the documentation companion to #327. It intentionally contains only the six docs/** changes so implementation review and Docs code-owner review can proceed independently.

Depends on #327 and should merge immediately after it. The implementation PR retains the package and native-plugin README updates alongside the code they describe.

Developer impact

These updates replace obsolete tuple/request-only examples and document the breaking request-authority contract introduced by #327. There are no runtime changes in this PR.

Validation

Summary by CodeRabbit

  • Documentation
    • Added a new reference page describing the standard format for request-intercept outcomes.
    • Updated code examples across Python, Rust, Node.js, and related guides to use the newer outcome-based return style.
    • Clarified how request changes, headers, and annotation updates should be handled in codec-aware workflows.
    • Expanded guidance on validation, error cases, and lifecycle behavior for intercept processing.

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: d52c36c2-e4cb-4cba-961d-37a3d9aeab16

📥 Commits

Reviewing files that changed from the base of the PR and between 2d66a41 and 279e53e.

📒 Files selected for processing (6)
  • docs/build-plugins/code-examples.mdx
  • docs/build-plugins/register-behavior.mdx
  • docs/instrument-applications/code-examples.mdx
  • docs/integrate-into-frameworks/code-examples.mdx
  • docs/integrate-into-frameworks/provider-codecs.mdx
  • docs/reference/llm-request-intercept-outcomes.mdx
📜 Recent review details
⚠️ CI failures not shown inline (3)

GitHub Actions: Fern Docs / 0_Preview docs.txt: docs: document LLM request intercept outcomes

Conclusion: failure

View job details

##[group]Run bail() {
 �[36;1mbail() {�[0m
 �[36;1m  printf '::error::install-action: %s\n' "$*"�[0m

GitHub Actions: Build pull request / 0_pr-builder _ run.txt: docs: document LLM request intercept outcomes

Conclusion: failure

View job details

##[group]Run if grep -n -R -E 'alternative\-gh\-token\-secret\-name\:' ./.github; then
 �[36;1mif grep -n -R -E 'alternative\-gh\-token\-secret\-name\:' ./.github; then�[0m
 �[36;1m  echo "::error::$ERROR_MSG"�[0m

GitHub Actions: Build pull request / 12_Check _ Run.txt: docs: document LLM request intercept outcomes

Conclusion: failure

View job details

##[group]Run bail() {
 �[36;1mbail() {�[0m
 �[36;1m  printf '::error::install-action: %s\n' "$*"�[0m
🧰 Additional context used
📓 Path-based instructions (13)
{docs/**,README.md,CONTRIBUTING.md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

{docs/**,README.md,CONTRIBUTING.md}: For docs-only changes, run targeted checks only if commands, package names, or examples changed. Use just docs for docs-site builds and just docs-linkcheck when links changed
Run docs site build with just docs

Files:

  • docs/build-plugins/code-examples.mdx
  • docs/reference/llm-request-intercept-outcomes.mdx
  • docs/integrate-into-frameworks/provider-codecs.mdx
  • docs/instrument-applications/code-examples.mdx
  • docs/build-plugins/register-behavior.mdx
  • docs/integrate-into-frameworks/code-examples.mdx
{docs/**,README.md,CONTRIBUTING.md,**/*.md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

Run docs link validation with just docs-linkcheck when links change

Files:

  • docs/build-plugins/code-examples.mdx
  • docs/reference/llm-request-intercept-outcomes.mdx
  • docs/integrate-into-frameworks/provider-codecs.mdx
  • docs/instrument-applications/code-examples.mdx
  • docs/build-plugins/register-behavior.mdx
  • docs/integrate-into-frameworks/code-examples.mdx
{docs/**,README.md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

Verify README and docs entry points still match current package names and paths for large or public-facing changes

Files:

  • docs/build-plugins/code-examples.mdx
  • docs/reference/llm-request-intercept-outcomes.mdx
  • docs/integrate-into-frameworks/provider-codecs.mdx
  • docs/instrument-applications/code-examples.mdx
  • docs/build-plugins/register-behavior.mdx
  • docs/integrate-into-frameworks/code-examples.mdx
{docs/**,examples/**,README.md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

Verify examples still run with documented commands for large or public-facing changes

Files:

  • docs/build-plugins/code-examples.mdx
  • docs/reference/llm-request-intercept-outcomes.mdx
  • docs/integrate-into-frameworks/provider-codecs.mdx
  • docs/instrument-applications/code-examples.mdx
  • docs/build-plugins/register-behavior.mdx
  • docs/integrate-into-frameworks/code-examples.mdx
{docs/**,README.md,**/Cargo.toml,**/package.json,**/*.md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

Ensure renamed public surfaces are reflected consistently in manifests and docs for large or public-facing changes

Files:

  • docs/build-plugins/code-examples.mdx
  • docs/reference/llm-request-intercept-outcomes.mdx
  • docs/integrate-into-frameworks/provider-codecs.mdx
  • docs/instrument-applications/code-examples.mdx
  • docs/build-plugins/register-behavior.mdx
  • docs/integrate-into-frameworks/code-examples.mdx
**/*.{md,mdx,py,sh,yaml,yml,toml,json}

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

Keep package names, repo references, and build commands current

Files:

  • docs/build-plugins/code-examples.mdx
  • docs/reference/llm-request-intercept-outcomes.mdx
  • docs/integrate-into-frameworks/provider-codecs.mdx
  • docs/instrument-applications/code-examples.mdx
  • docs/build-plugins/register-behavior.mdx
  • docs/integrate-into-frameworks/code-examples.mdx
**/*.mdx

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

In MDX files, top-of-file comments must use JSX comment delimiters: {/* to open and */} to close. Do not use HTML comments for MDX SPDX headers.

MDX top-of-file SPDX comments must use {/* ... */} delimiters instead of HTML comment delimiters (Must-Fix)

Files:

  • docs/build-plugins/code-examples.mdx
  • docs/reference/llm-request-intercept-outcomes.mdx
  • docs/integrate-into-frameworks/provider-codecs.mdx
  • docs/instrument-applications/code-examples.mdx
  • docs/build-plugins/register-behavior.mdx
  • docs/integrate-into-frameworks/code-examples.mdx
**/*.{html,md,mdx}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Include SPDX license header in HTML and Markdown files using HTML comment syntax

Files:

  • docs/build-plugins/code-examples.mdx
  • docs/reference/llm-request-intercept-outcomes.mdx
  • docs/integrate-into-frameworks/provider-codecs.mdx
  • docs/instrument-applications/code-examples.mdx
  • docs/build-plugins/register-behavior.mdx
  • docs/integrate-into-frameworks/code-examples.mdx
docs/**/*.{md,mdx}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update embedded documentation snippets, patch docs, and binding-support notes if examples or supported bindings changed

Files:

  • docs/build-plugins/code-examples.mdx
  • docs/reference/llm-request-intercept-outcomes.mdx
  • docs/integrate-into-frameworks/provider-codecs.mdx
  • docs/instrument-applications/code-examples.mdx
  • docs/build-plugins/register-behavior.mdx
  • docs/integrate-into-frameworks/code-examples.mdx
docs/**

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Run just docs or ./scripts/build-docs.sh html to regenerate ignored Fern API reference pages before validation for documentation site changes

Files:

  • docs/build-plugins/code-examples.mdx
  • docs/reference/llm-request-intercept-outcomes.mdx
  • docs/integrate-into-frameworks/provider-codecs.mdx
  • docs/instrument-applications/code-examples.mdx
  • docs/build-plugins/register-behavior.mdx
  • docs/integrate-into-frameworks/code-examples.mdx
**

⚙️ CodeRabbit configuration file

**:

AGENTS.md

This file provides guidance to agents, including Claude Code and OpenAI Codex, when working in this repository.

Project Overview

NeMo Relay is a multi-language agent runtime framework for execution scopes, lifecycle events, middleware, plugins, and observability around tool and LLM calls. The core runtime is Rust. Primary supported bindings are Rust, Python, and Node.js. Go, WebAssembly, and the raw C FFI are experimental and source-first.

The shared runtime model is:

  1. Scope stacks decide where work belongs and which scope-local behavior is visible.
  2. Middleware registries decide what guardrails and intercepts run around managed calls.
  3. Plugins install reusable runtime behavior from configuration.
  4. Events record runtime behavior in ATOF form.
  5. Subscribers and exporters consume events in-process or export them to ATIF, OpenTelemetry, OpenInference, or other backends.

Repository Structure

The repository layout separates the Rust runtime, language bindings,
documentation, integrations, and agent-facing skills.

crates/
  core/       # Rust core runtime crate, published as nemo-relay
  adaptive/   # Adaptive runtime primitives and plugin components
  python/     # PyO3 native extension for the Python package
  ffi/        # Raw C ABI layer used by downstream bindings such as Go
  node/       # NAPI Node.js binding and JavaScript/TypeScript entry points
  wasm/       # wasm-bindgen WebAssembly binding and JS wrappers
python/
  nemo_relay/  # Python wrapper package: scopes, tools, LLM, middleware, typed helpers, plugins, adaptive helpers
  tests/      # Python tests
go/
  nemo_relay/  # Experimental Go CGo binding and tests
fern/         # Fern documentation site
scripts/      # Stable wrappers and helper scripts; build/test/docs entry points live in justfile
skills/       # Publishe...

Files:

  • docs/build-plugins/code-examples.mdx
  • docs/reference/llm-request-intercept-outcomes.mdx
  • docs/integrate-into-frameworks/provider-codecs.mdx
  • docs/instrument-applications/code-examples.mdx
  • docs/build-plugins/register-behavior.mdx
  • docs/integrate-into-frameworks/code-examples.mdx
{docs/**,README.md,CONTRIBUTING.md,RELEASING.md,SECURITY.md}

⚙️ CodeRabbit configuration file

{docs/**,README.md,CONTRIBUTING.md,RELEASING.md,SECURITY.md}: Review documentation for technical accuracy against the current API, command correctness, and consistency across language bindings.
Flag stale examples, missing SPDX headers where required, and instructions that no longer match CI or pre-commit behavior.

Files:

  • docs/build-plugins/code-examples.mdx
  • docs/reference/llm-request-intercept-outcomes.mdx
  • docs/integrate-into-frameworks/provider-codecs.mdx
  • docs/instrument-applications/code-examples.mdx
  • docs/build-plugins/register-behavior.mdx
  • docs/integrate-into-frameworks/code-examples.mdx
docs/reference/**

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update relevant reference documentation for any public API changes

Files:

  • docs/reference/llm-request-intercept-outcomes.mdx
🔇 Additional comments (11)
docs/reference/llm-request-intercept-outcomes.mdx (4)

1-6: LGTM!


8-72: LGTM!


83-97: LGTM!


76-82: 🎯 Functional Correctness

break_chain is not a failure path break_chain only stops later request intercepts after the current one returns; accumulated marks are discarded only when the intercept fails or returns a malformed boundary result.

			> Likely an incorrect or invalid review comment.
docs/build-plugins/code-examples.mdx (1)

40-47: LGTM!

docs/build-plugins/register-behavior.mdx (2)

54-60: LGTM!


105-105: 🎯 Functional Correctness

Confirm the Rust API here matches #327. nemo_relay::api::llm::LlmRequestInterceptOutcome is not defined in this tree, so the import and LlmRequestInterceptOutcome::new(request, annotated) need the exact path and constructor signature from the merged implementation.

docs/instrument-applications/code-examples.mdx (1)

263-264: LGTM!

Also applies to: 282-283, 300-301

docs/integrate-into-frameworks/code-examples.mdx (1)

187-191: LGTM!

Also applies to: 200-201, 213-214

docs/integrate-into-frameworks/provider-codecs.mdx (2)

44-54: LGTM!


97-97: LGTM!

Also applies to: 106-106


Walkthrough

Documentation across build-plugins, instrument-applications, integrate-into-frameworks, and provider-codecs pages is updated to reflect a new LLMRequestInterceptOutcome return contract for LLM request intercepts, replacing raw tuple returns. A new reference page documents the canonical outcome contract, request authority rules, and migration guidance.

Changes

Intercept Outcome Documentation

Layer / File(s) Summary
Canonical outcome reference page
docs/reference/llm-request-intercept-outcomes.mdx
New page defines the required/optional outcome fields, request authority semantics based on active codec, a Mermaid flowchart of intercept chain execution, cross-language callback mappings, lifecycle/event timing rules, and native ABI/grpc-v1 migration notes.
Build-plugins intercept examples
docs/build-plugins/code-examples.mdx, docs/build-plugins/register-behavior.mdx
Python and Rust add_header intercept examples now construct and return LLMRequestInterceptOutcome/LlmRequestInterceptOutcome objects instead of raw (request, annotated) tuples.
Consumer-side outcome usage
docs/instrument-applications/code-examples.mdx, docs/integrate-into-frameworks/code-examples.mdx
Python, Node.js, and Rust examples now capture intercept results as an outcome object and derive the request via outcome.request before passing it to conditional execution or downstream logic.
Provider-codecs workflow and example
docs/integrate-into-frameworks/provider-codecs.mdx
Documents read-only raw request content during active codecs, annotation-based provider-body edits, and rejection rules; updates the Python system-message example to return LLMRequestInterceptOutcome.

Estimated code review effort: 2 (Simple) | ~12 minutes

Suggested labels: DO NOT MERGE

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title follows Conventional Commits and accurately summarizes the docs-only change.
Description check ✅ Passed The description covers summary, context, impact, and validation, but omits template sections like Overview checkboxes and reviewer start.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

@willkill07 willkill07 added this to the 0.5 milestone Jul 1, 2026
rapids-bot Bot pushed a commit that referenced this pull request Jul 1, 2026
#### Overview

Finalize one canonical LLM request-intercept outcome across the Rust runtime, built-in and adaptive plugins, native ABI v1, `grpc-v1` workers, public C FFI, Go, Python, Node.js, and WebAssembly.

Request intercepts can rewrite the provider request, carry an optional normalized annotation, and schedule ordered marks for the managed LLM lifecycle:

```json
{
  "request": {"headers": {}, "content": {}},
  "annotated_request": null,
  "pending_marks": []
}
```

`request` is required. `annotated_request` defaults to `null`, and `pending_marks` defaults to an empty list. Each pending mark contains only its name, optional category and category profile, data, and metadata; Relay continues to own event UUIDs, parent UUIDs, and timestamps.

The finalized contract also defines one provider-body source of truth. Without a request codec, `outcome.request.content` is authoritative. With a codec, `outcome.annotated_request` is required and authoritative, `outcome.request.content` is read-only context, and `outcome.request.headers` remains writable.

- [x] I confirm this contribution is my own work, or I have the right to submit it under this project's license.
- [x] I searched existing issues and open pull requests, and this does not duplicate existing work.

#### Why

Request intercepts run before Relay creates the managed LLM handle. A mark emitted directly from an intercept therefore cannot reliably attach to that future LLM scope. Returning pending mark specifications lets the lifecycle owner emit them at the correct boundary without leaking control data into provider requests, annotations, codecs, sanitizers, or execution intercepts.

Codec-aware interception also previously allowed two conflicting provider-body representations: an intercept could change both the raw request content and its normalized annotation, while Relay later encoded only the annotation. Making authority explicit prevents raw content edits from being silently discarded.

#### Details

- Make `LlmRequestInterceptOutcome` the only Rust callback result and keep one `register_llm_request_intercept` registration family for global, scope-local, plugin-context, and adaptive paths.
- Propagate each accepted request and annotation to the next intercept while appending pending marks in effective middleware order.
- Without a request codec, use `outcome.request.content` as the provider body.
- With a request codec, require `outcome.annotated_request`, encode the provider body from it, and allow header changes only through `outcome.request.headers`.
- Reject raw `request.content` mutations or missing annotations at the offending codec-path intercept, before later middleware, LLM lifecycle creation, mark emission, or provider invocation.
- Preserve marks from an intercept that breaks the chain; discard all accumulated marks if any intercept fails.
- Return the complete outcome from standalone request-intercept helpers. These helpers expose pending marks but do not emit them because they do not own an LLM lifecycle.
- After successful interception, create the LLM handle and capture one subscriber snapshot before emitting lifecycle events.
- Emit LLM start at `T`, every pending mark at `T + 1µs` in returned order with the LLM UUID as parent, and LLM end at or after `T + 1µs`.
- Apply the same behavior to streaming and non-streaming managed execution, including provider errors and stream finalization.
- Keep pending marks separate from provider-visible requests and annotations.

#### Boundary contracts

- **Native ABI v1:** return one host-owned outcome JSON string. Remove the private annotation-envelope transport and append required outcome-contract version fields to both host and plugin descriptor tables so stale binaries fail before callback invocation.
- **`grpc-v1`:** return one `JsonEnvelope` using schema `nemo.relay.LlmRequestInterceptOutcome@1`.
- **Public C FFI:** return one owned `char **out_outcome_json` and add `nemo_relay_llm_request_intercept_outcome_json_new`.
- **Go:** return `(LLMRequestInterceptOutcome, error)` and expose request, outcome, and pending-mark DTOs.
- **Python:** return `LLMRequestInterceptOutcome` and export `PendingMarkSpec`.
- **Node.js and WebAssembly:** return `{ request, annotated?, pendingMarks? }`. Binding-owned pending-mark DTOs use `categoryProfile`; canonical event and outcome JSON retains `category_profile`.
- **Rust native and worker SDKs:** expose only the canonical callback and registration method.

#### Breaking changes

This intentionally finalizes unpublished contracts in place:

- Rust and Python tuple results are removed.
- C and Go split outputs are removed.
- Mark-specific parallel registration variants are removed.
- The native annotation metadata envelope and fallback parser are removed.
- Native ABI host and plugin tables require the finalized outcome-contract field.
- The `grpc-v1` request-intercept result is replaced by the canonical outcome envelope.
- Codec-path intercepts must return an annotation and may no longer mutate raw `request.content`; malformed outcomes fail before lifecycle creation.
- Node.js and WebAssembly pending-mark objects use `categoryProfile` instead of the Rust/wire name `category_profile`.

All development native plugins and workers must rebuild against this version.

#### Where should the reviewer start?

1. `crates/types/src/api/event.rs` and `crates/types/src/api/llm.rs` for the canonical data contract.
2. `crates/core/src/api/runtime/state.rs`, `crates/core/src/api/shared.rs`, `crates/core/src/api/llm.rs`, and `crates/core/src/stream.rs` for chaining, codec authority, validation, and lifecycle behavior.
3. `crates/plugin/src/lib.rs`, `crates/core/src/plugin/dynamic/native.rs`, and `crates/core/src/plugin/dynamic/worker.rs` for native and worker boundaries.
4. `crates/ffi`, `go/nemo_relay`, `crates/python`, `crates/node`, and `crates/wasm` for binding contracts and DTO conversion.
5. `crates/core/tests/integration/middleware_tests.rs`, `crates/core/tests/integration/pipeline_tests.rs`, `crates/plugin/tests/typed_callbacks.rs`, and the binding tests for lifecycle, codec-authority, and boundary coverage.

The full contract, request-authority diagram, and migration notes are tracked in [companion documentation PR #341](#341), which should merge immediately after this PR.

#### Testing

- `cargo test --workspace --all-targets`
- `cargo clippy --workspace --all-targets -- -D warnings`
- `cargo fmt --all -- --check`
- Python codec and worker SDK coverage passes, including malformed codec-path outcomes and canonical worker envelopes.
- Node.js LLM suite: **38 passed**, including `categoryProfile` input/output conversion and codec-authority rejection.
- Go: all `go/nemo_relay/...` packages passed, including codec-authority coverage; `go vet ./...` passes.
- Native SDK: **52 passed**.
- Worker SDK: **9 passed**; worker protocol tests: **6 passed**.
- C FFI: unit and integration suites passed, including owned outcome allocation and malformed/null input coverage.
- WebAssembly native Rust tests: **13 passed**, including camelCase pending-mark DTO round trips and rejection of the wire-only `category_profile` spelling.
- Repository formatting, strict Clippy, Ruff, Prettier, type, lockfile, FFI-header, and applicable pre-commit checks pass.

`wasm-pack` and the `wasm32-unknown-unknown` Rust target were not available for the package-level Wasm suite. Environment-dependent socket and external-network tests were not used to validate these binding changes.

#### Related Issues

- Relates to #296



## Summary by CodeRabbit

* **New Features**
  * LLM request intercepts can now return a unified outcome that includes the rewritten request, optional annotated request, and pending marks.
  * Pending marks are now emitted alongside LLM lifecycle events and supported across SDKs and plugins.

* **Bug Fixes**
  * Improved consistency of LLM event timing and parent/child relationships.
  * Added stricter validation so intercepts that modify raw request content or omit required annotations are rejected when needed.

Authors:
  - Bryan Bednarski (https://github.com/bbednarski9)

Approvers:
  - Will Killian (https://github.com/willkill07)

URL: #327
@bbednarski9 bbednarski9 marked this pull request as ready for review July 1, 2026 17:59
@bbednarski9 bbednarski9 requested a review from lvojtku as a code owner July 1, 2026 17:59
@coderabbitai coderabbitai Bot added the DO NOT MERGE PR should not be merged; see PR for details label Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DO NOT MERGE PR should not be merged; see PR for details Documentation documentation-related size:M PR is medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants