Skip to content

fix: preserve Hermes error-path export consistency#229

Merged
rapids-bot[bot] merged 4 commits into
NVIDIA:mainfrom
mnajafian-nv:fix/hermes-error-export-consistency-refresh
Jun 5, 2026
Merged

fix: preserve Hermes error-path export consistency#229
rapids-bot[bot] merged 4 commits into
NVIDIA:mainfrom
mnajafian-nv:fix/hermes-error-export-consistency-refresh

Conversation

@mnajafian-nv
Copy link
Copy Markdown
Contributor

@mnajafian-nv mnajafian-nv commented Jun 5, 2026

Overview

This PR tightens Hermes error-path observability consistency by adding exporter-visible api_request_error coverage for ATOF, ATIF, and OpenInference, and by fixing OpenInference metadata so mixed-fidelity Hermes spans reflect the completed event rather than the request start.

  • I confirm this contribution is my own work, or I have the right to submit it under this project's license.
  • I searched existing issues and open pull requests, and this does not duplicate existing work.

Details

  • Adds Hermes ATOF validation that api_request_error exports a lossy error-path LLM end event with the expected api_call_id, status, retry fields, error payload, and fidelity metadata.
  • Adds Hermes ATIF validation that api_request_error produces an exportable agent step with the expected error response fields and observed-event fidelity markers.
  • Adds Hermes OpenInference validation that api_request_error produces JSON output, JSON mime type, and final-event fidelity metadata on the finished LLM span.
  • Fixes OpenInference LLM span metadata handling so finished spans reflect end-event metadata instead of retaining start-event metadata for mixed-fidelity Hermes flows.
  • Keeps the change focused on Hermes error-path consistency without expanding into unrelated tool-result or routed-provider follow-up work.

Validated with:

  • cargo test -p nemo-relay-cli serve_listener_hermes_api_request_error_writes_lossy_atof_error_event -- --nocapture
  • cargo test -p nemo-relay-cli hermes_api_request_error_writes_atif_error_step_and_fidelity -- --nocapture
  • cargo test -p nemo-relay hermes_api_request_error_emits_openinference_json_output_and_metadata -- --nocapture
  • cargo fmt --all --check
  • uv run pre-commit run --all-files

Where should the reviewer start?

Start in crates/core/src/observability/openinference.rs for the metadata handling fix, then review crates/cli/tests/coverage/server_tests.rs for the Hermes ATOF error-path regression, crates/cli/tests/coverage/session_tests.rs for the ATIF error-path regression, and crates/core/tests/unit/observability/openinference_tests.rs for the OpenInference mixed-fidelity error-path coverage.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • Relates to Hermes observability consistency work.

Signed-off-by: mnajafian-nv <mnajafian@nvidia.com>
Signed-off-by: mnajafian-nv <mnajafian@nvidia.com>
@mnajafian-nv mnajafian-nv requested review from a team as code owners June 5, 2026 01:50
@github-actions github-actions Bot added size:L PR is large Bug issue describes bug; PR fixes bug lang:rust PR changes/introduces Rust code labels Jun 5, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 5, 2026

Review Change Stack

Walkthrough

This PR updates Rust crate license attributions to reflect Apache-2.0-only licensing and adds comprehensive observability coverage for Hermes API request error handling via OpenInference span metadata repositioning and integration tests that validate error event propagation, fidelity tracking, and metadata serialization across ATOF and ATIF observability pipelines.

Changes

License Attribution Updates

Layer / File(s) Summary
Crate license type and content consolidation
ATTRIBUTIONS-Rust.md
License type fields change from dual MIT/Apache-2.0 to Apache-2.0-only for block-buffer, crypto-common, digest, md-5, and version_check across versions; associated MIT license blocks are removed or consolidated, leaving only Apache 2.0 content.

Hermes API Error Observability

Layer / File(s) Summary
OpenInference span metadata deferred to end attributes
crates/core/src/observability/openinference.rs, crates/core/tests/unit/observability/openinference_tests.rs
LLM span start attributes no longer include oi::METADATA; end attributes now include metadata serialized from event.metadata(). Unit test verifies JSON output for error events and confirms metadata reflects fidelity source and provider payload exactness.
Integration test coverage for Hermes error observability
crates/cli/tests/coverage/server_tests.rs, crates/cli/tests/coverage/session_tests.rs
ATOF integration test sends pre_api_request and api_request_error hooks with 502 error/retry metadata and validates exactly two LLM event exports with correct fidelity and error fields. ATIF test verifies error step trajectory generation and distinguishes sanitized request payloads from unsanitized error payloads via fidelity source markers.

Possibly related PRs

  • NVIDIA/NeMo-Relay#207: Both PRs modify crates/core/src/observability/openinference.rs LLM span attribute handling in start_attributes/end_attributes, directly intertwining span metadata and OpenInference attribute emission logic.
  • NVIDIA/NeMo-Relay#219: Both PRs add Hermes ATIF trajectory tests asserting provider_payload_exact and fidelity_source behavior for API request scenarios.
  • NVIDIA/NeMo-Relay#215: Both PRs extend Hermes observability regression coverage with OpenInference/ATOF fidelity assertions and metadata validation across crates/cli/tests/coverage/server_tests.rs and crates/core/tests/unit/observability/openinference_tests.rs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed Title follows Conventional Commits format with 'fix' type and clear imperative summary under 72 characters.
Docstring Coverage ✅ Passed Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description provides a comprehensive overview, detailed changes, reviewer guidance, and validation steps that align well with the template structure.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

Signed-off-by: mnajafian-nv <mnajafian@nvidia.com>
@github-actions github-actions Bot added size:M PR is medium and removed size:L PR is large labels Jun 5, 2026
@willkill07 willkill07 removed the request for review from a team June 5, 2026 12:31
@willkill07
Copy link
Copy Markdown
Member

/merge

@rapids-bot rapids-bot Bot merged commit 559cce7 into NVIDIA:main Jun 5, 2026
72 checks passed
@mnajafian-nv mnajafian-nv deleted the fix/hermes-error-export-consistency-refresh branch June 5, 2026 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug issue describes bug; PR fixes bug lang:rust PR changes/introduces Rust code size:M PR is medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants