docs: document LLM request intercept outcomes#341
Conversation
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (6)
📜 Recent review details
|
| Layer / File(s) | Summary |
|---|---|
Canonical outcome reference page docs/reference/llm-request-intercept-outcomes.mdx |
New page defines the required/optional outcome fields, request authority semantics based on active codec, a Mermaid flowchart of intercept chain execution, cross-language callback mappings, lifecycle/event timing rules, and native ABI/grpc-v1 migration notes. |
Build-plugins intercept examples docs/build-plugins/code-examples.mdx, docs/build-plugins/register-behavior.mdx |
Python and Rust add_header intercept examples now construct and return LLMRequestInterceptOutcome/LlmRequestInterceptOutcome objects instead of raw (request, annotated) tuples. |
Consumer-side outcome usage docs/instrument-applications/code-examples.mdx, docs/integrate-into-frameworks/code-examples.mdx |
Python, Node.js, and Rust examples now capture intercept results as an outcome object and derive the request via outcome.request before passing it to conditional execution or downstream logic. |
Provider-codecs workflow and example docs/integrate-into-frameworks/provider-codecs.mdx |
Documents read-only raw request content during active codecs, annotation-based provider-body edits, and rejection rules; updates the Python system-message example to return LLMRequestInterceptOutcome. |
Estimated code review effort: 2 (Simple) | ~12 minutes
Suggested labels: DO NOT MERGE
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | ✅ Passed | The title follows Conventional Commits and accurately summarizes the docs-only change. |
| Description check | ✅ Passed | The description covers summary, context, impact, and validation, but omits template sections like Overview checkboxes and reviewer start. |
| Docstring Coverage | ✅ Passed | No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check. |
| Linked Issues check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
| Out of Scope Changes check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
✨ Finishing Touches
🧪 Generate unit tests (beta)
- Create PR with unit tests
Comment @coderabbitai help to get the list of available commands.
#### Overview
Finalize one canonical LLM request-intercept outcome across the Rust runtime, built-in and adaptive plugins, native ABI v1, `grpc-v1` workers, public C FFI, Go, Python, Node.js, and WebAssembly.
Request intercepts can rewrite the provider request, carry an optional normalized annotation, and schedule ordered marks for the managed LLM lifecycle:
```json
{
"request": {"headers": {}, "content": {}},
"annotated_request": null,
"pending_marks": []
}
```
`request` is required. `annotated_request` defaults to `null`, and `pending_marks` defaults to an empty list. Each pending mark contains only its name, optional category and category profile, data, and metadata; Relay continues to own event UUIDs, parent UUIDs, and timestamps.
The finalized contract also defines one provider-body source of truth. Without a request codec, `outcome.request.content` is authoritative. With a codec, `outcome.annotated_request` is required and authoritative, `outcome.request.content` is read-only context, and `outcome.request.headers` remains writable.
- [x] I confirm this contribution is my own work, or I have the right to submit it under this project's license.
- [x] I searched existing issues and open pull requests, and this does not duplicate existing work.
#### Why
Request intercepts run before Relay creates the managed LLM handle. A mark emitted directly from an intercept therefore cannot reliably attach to that future LLM scope. Returning pending mark specifications lets the lifecycle owner emit them at the correct boundary without leaking control data into provider requests, annotations, codecs, sanitizers, or execution intercepts.
Codec-aware interception also previously allowed two conflicting provider-body representations: an intercept could change both the raw request content and its normalized annotation, while Relay later encoded only the annotation. Making authority explicit prevents raw content edits from being silently discarded.
#### Details
- Make `LlmRequestInterceptOutcome` the only Rust callback result and keep one `register_llm_request_intercept` registration family for global, scope-local, plugin-context, and adaptive paths.
- Propagate each accepted request and annotation to the next intercept while appending pending marks in effective middleware order.
- Without a request codec, use `outcome.request.content` as the provider body.
- With a request codec, require `outcome.annotated_request`, encode the provider body from it, and allow header changes only through `outcome.request.headers`.
- Reject raw `request.content` mutations or missing annotations at the offending codec-path intercept, before later middleware, LLM lifecycle creation, mark emission, or provider invocation.
- Preserve marks from an intercept that breaks the chain; discard all accumulated marks if any intercept fails.
- Return the complete outcome from standalone request-intercept helpers. These helpers expose pending marks but do not emit them because they do not own an LLM lifecycle.
- After successful interception, create the LLM handle and capture one subscriber snapshot before emitting lifecycle events.
- Emit LLM start at `T`, every pending mark at `T + 1µs` in returned order with the LLM UUID as parent, and LLM end at or after `T + 1µs`.
- Apply the same behavior to streaming and non-streaming managed execution, including provider errors and stream finalization.
- Keep pending marks separate from provider-visible requests and annotations.
#### Boundary contracts
- **Native ABI v1:** return one host-owned outcome JSON string. Remove the private annotation-envelope transport and append required outcome-contract version fields to both host and plugin descriptor tables so stale binaries fail before callback invocation.
- **`grpc-v1`:** return one `JsonEnvelope` using schema `nemo.relay.LlmRequestInterceptOutcome@1`.
- **Public C FFI:** return one owned `char **out_outcome_json` and add `nemo_relay_llm_request_intercept_outcome_json_new`.
- **Go:** return `(LLMRequestInterceptOutcome, error)` and expose request, outcome, and pending-mark DTOs.
- **Python:** return `LLMRequestInterceptOutcome` and export `PendingMarkSpec`.
- **Node.js and WebAssembly:** return `{ request, annotated?, pendingMarks? }`. Binding-owned pending-mark DTOs use `categoryProfile`; canonical event and outcome JSON retains `category_profile`.
- **Rust native and worker SDKs:** expose only the canonical callback and registration method.
#### Breaking changes
This intentionally finalizes unpublished contracts in place:
- Rust and Python tuple results are removed.
- C and Go split outputs are removed.
- Mark-specific parallel registration variants are removed.
- The native annotation metadata envelope and fallback parser are removed.
- Native ABI host and plugin tables require the finalized outcome-contract field.
- The `grpc-v1` request-intercept result is replaced by the canonical outcome envelope.
- Codec-path intercepts must return an annotation and may no longer mutate raw `request.content`; malformed outcomes fail before lifecycle creation.
- Node.js and WebAssembly pending-mark objects use `categoryProfile` instead of the Rust/wire name `category_profile`.
All development native plugins and workers must rebuild against this version.
#### Where should the reviewer start?
1. `crates/types/src/api/event.rs` and `crates/types/src/api/llm.rs` for the canonical data contract.
2. `crates/core/src/api/runtime/state.rs`, `crates/core/src/api/shared.rs`, `crates/core/src/api/llm.rs`, and `crates/core/src/stream.rs` for chaining, codec authority, validation, and lifecycle behavior.
3. `crates/plugin/src/lib.rs`, `crates/core/src/plugin/dynamic/native.rs`, and `crates/core/src/plugin/dynamic/worker.rs` for native and worker boundaries.
4. `crates/ffi`, `go/nemo_relay`, `crates/python`, `crates/node`, and `crates/wasm` for binding contracts and DTO conversion.
5. `crates/core/tests/integration/middleware_tests.rs`, `crates/core/tests/integration/pipeline_tests.rs`, `crates/plugin/tests/typed_callbacks.rs`, and the binding tests for lifecycle, codec-authority, and boundary coverage.
The full contract, request-authority diagram, and migration notes are tracked in [companion documentation PR #341](#341), which should merge immediately after this PR.
#### Testing
- `cargo test --workspace --all-targets`
- `cargo clippy --workspace --all-targets -- -D warnings`
- `cargo fmt --all -- --check`
- Python codec and worker SDK coverage passes, including malformed codec-path outcomes and canonical worker envelopes.
- Node.js LLM suite: **38 passed**, including `categoryProfile` input/output conversion and codec-authority rejection.
- Go: all `go/nemo_relay/...` packages passed, including codec-authority coverage; `go vet ./...` passes.
- Native SDK: **52 passed**.
- Worker SDK: **9 passed**; worker protocol tests: **6 passed**.
- C FFI: unit and integration suites passed, including owned outcome allocation and malformed/null input coverage.
- WebAssembly native Rust tests: **13 passed**, including camelCase pending-mark DTO round trips and rejection of the wire-only `category_profile` spelling.
- Repository formatting, strict Clippy, Ruff, Prettier, type, lockfile, FFI-header, and applicable pre-commit checks pass.
`wasm-pack` and the `wasm32-unknown-unknown` Rust target were not available for the package-level Wasm suite. Environment-dependent socket and external-network tests were not used to validate these binding changes.
#### Related Issues
- Relates to #296
## Summary by CodeRabbit
* **New Features**
* LLM request intercepts can now return a unified outcome that includes the rewritten request, optional annotated request, and pending marks.
* Pending marks are now emitted alongside LLM lifecycle events and supported across SDKs and plugins.
* **Bug Fixes**
* Improved consistency of LLM event timing and parent/child relationships.
* Added stricter validation so intercepts that modify raw request content or omit required annotations are rejected when needed.
Authors:
- Bryan Bednarski (https://github.com/bbednarski9)
Approvers:
- Will Killian (https://github.com/willkill07)
URL: #327
Summary
Context
This is the documentation companion to #327. It intentionally contains only the six
docs/**changes so implementation review and Docs code-owner review can proceed independently.Depends on #327 and should merge immediately after it. The implementation PR retains the package and native-plugin README updates alongside the code they describe.
Developer impact
These updates replace obsolete tuple/request-only examples and document the breaking request-authority contract introduced by #327. There are no runtime changes in this PR.
Validation
fern check --warningscompleted with zero errors (redirect validation was skipped without a Fern token)Preview docscheck on feat(plugin)!: support pending marks from LLM intercepts #327git diff --checkSummary by CodeRabbit