Skip to content

[AgentServer] Platform headers, persistence resilience, x-request-id, logging fix#46429

Merged
RaviPidaparthi merged 4 commits intomainfrom
feat/responses-platform-headers-request-id
Apr 22, 2026
Merged

[AgentServer] Platform headers, persistence resilience, x-request-id, logging fix#46429
RaviPidaparthi merged 4 commits intomainfrom
feat/responses-platform-headers-request-id

Conversation

@RaviPidaparthi
Copy link
Copy Markdown
Member

@RaviPidaparthi RaviPidaparthi commented Apr 20, 2026

Summary

This PR delivers improvements to the Azure AI AgentServer Python libraries focused on observability, resilience, and hardening.

🔧 Platform headers & request tracing

  • Added _platform_headers module centralizing all HTTP header name constants (x-request-id, x-platform-server, x-agent-session-id, isolation keys, traceparent, x-ms-client-request-id) — replaces scattered string literals
  • Added RequestIdMiddleware — pure-ASGI middleware that sets x-request-id response header for end-to-end request correlation (priority: OTEL trace ID → incoming header → UUID fallback)
  • Resolved request IDs stored in ASGI scope state and enriched into error.additionalInfo.request_id on error responses
  • Middleware deduplicates headers and guards against non-dict scope state

🔒 Persistence failure resilience

  • Graceful degradation when storage operations fail — responses complete with a storage_error status instead of crashing
  • Enhanced error messaging: "An internal error occurred while storing the response. Subsequent retrieval is not guaranteed."
  • Covers all response modes: streaming, background+streaming, background+non-streaming, and synchronous
  • Buffer-then-persist-then-yield pattern: terminal SSE events are buffered, persistence attempted, and on failure the terminal is replaced with response.failed carrying error_code="storage_error"
  • Sync persistence failure surfaces as HTTP 500 with runtime record eviction to prevent unbounded memory growth
  • Phase 1 (create) and Phase 2 (update) failures handled independently per spec §3.1–§3.4

🐛 Logging policy fix

  • Fixed FoundryStorageLoggingPolicy to safely handle transport failures before an HTTP response is available
  • Transport-failure ERROR log is intentionally minimal and omits isolation flags

🛡️ Hardening (from review feedback)

  • Terminal events replaced in-place (preserving sequence numbers) to avoid gaps for replay subscribers
  • subject.publish() for terminal events deferred until after persistence resolves — replay subscribers see the correct final terminal
  • _enrich_error_payload guards against non-dict additionalInfo values
  • _get_scope_request_id wired into handle_create error responses for consistent enrichment

Test results

All 942 tests pass (Core: 89, Responses: 853). pyright, pylint, ruff all clean.

Copilot AI review requested due to automatic review settings April 20, 2026 22:55
@github-actions github-actions Bot added the Hosted Agents sdk/agentserver/* label Apr 20, 2026
@RaviPidaparthi RaviPidaparthi force-pushed the feat/responses-platform-headers-request-id branch 3 times, most recently from 32f398d to 4e75827 Compare April 20, 2026 23:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ports AgentServer platform header and request-correlation behavior from the referenced .NET change into the Python AgentServer packages, adding centralized header constants, an ASGI x-request-id middleware, and updated logging/error-enrichment behavior to improve traceability.

Changes:

  • Added RequestIdMiddleware (core) to set x-request-id on all HTTP responses and enrich JSON error bodies with error.additionalInfo.request_id.
  • Introduced _platform_headers (responses) and migrated multiple modules/tests from string literals to shared constants.
  • Updated Foundry storage logging to include traceparent, remove x-ms-request-id, and handle transport failures without masking the original exception.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_request_id.py New ASGI middleware for x-request-id + error JSON enrichment.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_base.py Wires RequestIdMiddleware into the default Starlette middleware stack.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/init.py Re-exports RequestIdMiddleware and REQUEST_ID_STATE_KEY.
sdk/agentserver/azure-ai-agentserver-responses/azure/ai/agentserver/responses/_platform_headers.py New centralized header-name constants (request/session/isolation/tracing).
sdk/agentserver/azure-ai-agentserver-responses/azure/ai/agentserver/responses/store/_foundry_provider.py Uses platform header constants for isolation headers.
sdk/agentserver/azure-ai-agentserver-responses/azure/ai/agentserver/responses/store/_foundry_logging_policy.py Improves correlation logging (traceparent) and transport-failure behavior.
sdk/agentserver/azure-ai-agentserver-responses/azure/ai/agentserver/responses/hosting/_validation.py Adds request-id-aware error builders + payload enrichment helper.
sdk/agentserver/azure-ai-agentserver-responses/azure/ai/agentserver/responses/hosting/_observability.py Switches request-id extraction to shared REQUEST_ID constant.
sdk/agentserver/azure-ai-agentserver-responses/azure/ai/agentserver/responses/hosting/_endpoint_handler.py Migrates header literals to constants; introduces request-id helper.
sdk/agentserver/azure-ai-agentserver-responses/tests/unit/test_request_id_middleware.py New unit tests for request-id header + middleware enrichment behavior.
sdk/agentserver/azure-ai-agentserver-responses/tests/unit/test_error_enrichment.py New tests validating request-id enrichment in validation error responses.
sdk/agentserver/azure-ai-agentserver-responses/tests/unit/test_platform_headers.py New tests for _platform_headers constant wire values.
sdk/agentserver/azure-ai-agentserver-responses/tests/unit/test_foundry_logging_policy.py Updated tests for revised log levels/correlation fields and transport failures.
sdk/agentserver/azure-ai-agentserver-responses/tests/unit/test_foundry_storage_provider.py Updates imports to use _platform_headers constants.
sdk/agentserver/azure-ai-agentserver-responses/CHANGELOG.md Documents new headers/middleware/logging and behavior changes for next release.
Comments suppressed due to low confidence (3)

sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_request_id.py:102

  • Error-body enrichment runs on the first http.response.body message after deferring http.response.start, but it doesn’t check more_body. For streaming/chunked error responses, this would attempt to JSON-parse a partial body and can also send the deferred start too early, potentially producing corrupted output. Consider only attempting enrichment when more_body is false (full body), or if more_body is true then flush the deferred start immediately and skip enrichment for that response.
    if trace_id and trace_id != "0" * 32:
        return trace_id

    # Priority 2: Incoming x-request-id header
    incoming = _extract_header(raw_headers, b"x-request-id")
    if incoming:
        return incoming

    # Priority 3: New UUID
    return uuid.uuid4().hex

sdk/agentserver/azure-ai-agentserver-responses/tests/unit/test_request_id_middleware.py:130

  • test_different_requests_get_different_ids doesn’t actually assert the two request IDs are different; it only asserts they’re non-empty. Either assert id1 != id2 (and adjust for any known determinism), or rename/reword the test to reflect what it validates.
    sdk/agentserver/azure-ai-agentserver-responses/tests/unit/test_request_id_middleware.py:101
  • The docstring for test_incoming_x_request_id_is_used_as_fallback says the client-supplied x-request-id “should be used”, but the assertion only checks the header is non-empty. If OTEL trace IDs can override the incoming header in this test environment, consider updating the docstring to match the behavior under test, or make the test deterministic (e.g., disable trace-id resolution) so it can assert the fallback is actually used.
        client = TestClient(_build_app())
        response = client.get(path)
        assert "x-request-id" in response.headers

@RaviPidaparthi RaviPidaparthi force-pushed the feat/responses-platform-headers-request-id branch from 4e75827 to c3eedba Compare April 20, 2026 23:18
- Added _platform_headers module centralizing all platform HTTP header
  name constants (x-request-id, x-platform-server, x-agent-session-id,
  isolation keys, traceparent, x-ms-client-request-id).

- Added RequestIdMiddleware (core) that sets x-request-id response header
  on every HTTP response. Value resolved: OTEL trace ID > incoming header > UUID.

- Error responses (4xx/5xx) with JSON error body are automatically enriched
  with error.additionalInfo.request_id matching the x-request-id header.

- Foundry storage logging now includes traceparent in all log messages.

- Fixed FoundryStorageLoggingPolicy crash on transport-level failures
  (DNS, connection refused) when no response is available. Transport
  failures logged at ERROR level; original exception propagates cleanly.

- Removed x-ms-request-id from Foundry storage response logging.

- Migrated all header string literals to _platform_headers constants.
@RaviPidaparthi RaviPidaparthi force-pushed the feat/responses-platform-headers-request-id branch from c3eedba to a3e96be Compare April 20, 2026 23:22
…-then-yield

Implement persistence failure resilience per spec. Terminal SSE events
are now buffered, persistence attempted, and on failure the terminal
is replaced with response.failed carrying error_code='storage_error'.

Key changes:
- _process_handler_events: buffer terminal events to state.pending_terminal
- _persist_and_resolve_terminal: new method — attempts persistence, replaces
  terminal on failure with storage_error response.failed
- _apply_storage_error_replacement: helper for storage error substitution
- _finalize_stream: stripped of persistence (moved to resolve_terminal),
  eviction guarded by not persistence_failed
- run_sync: persistence failure → _HandlerError → HTTP 500
- _run_background_non_stream: Phase 2 failure sets persistence_failed
- _register_bg_execution: Phase 1 failure → standalone error event (§3.3)
- ResponseExecution: added persistence_failed/persistence_exception fields
- build_failed_response: added error_code parameter

Tests: 8 new contract tests in test_persistence_failure.py covering all
modes (streaming, sync, bg+stream, bg+non-stream) and failure scenarios.
All 942 tests pass.
@RaviPidaparthi RaviPidaparthi changed the title [AgentServer] Platform headers, x-request-id middleware, logging fixes [AgentServer] Platform headers, x-request-id, and persistence failure resilience Apr 21, 2026
@RaviPidaparthi RaviPidaparthi requested a review from Copilot April 21, 2026 04:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 13 comments.

- _request_id.py: guard scope['state'] with isinstance(MutableMapping),
  filter existing x-request-id header to prevent duplicates
- _validation.py: fix docstrings to use camelCase 'additionalInfo',
  guard _enrich_error_payload against non-dict additionalInfo values
- _endpoint_handler.py: wire _get_scope_request_id into handle_create
  error responses for consistent error body enrichment
- _orchestrator.py: replace terminal event in-place (preserve seq num),
  evict runtime record on sync persistence failure to avoid memory leak,
  evict on bg+stream Phase 1 failure for §3.3 compliance,
  defer subject.publish for terminal events until persistence resolves
- test_request_id_middleware.py: remove unused PlainTextResponse import,
  fix _error_plain docstring, add id1 != id2 assertion,
  monkeypatch _get_trace_id in fallback test for deterministic assertion
- test_foundry_logging_policy.py: rename test to match actual behavior
- test_persistence_failure.py: add error payload assertions (code check)
@RaviPidaparthi RaviPidaparthi changed the title [AgentServer] Platform headers, x-request-id, and persistence failure resilience [AgentServer] Platform headers, persistence resilience, x-request-id, logging fix Apr 22, 2026
@RaviPidaparthi RaviPidaparthi enabled auto-merge (squash) April 22, 2026 18:50
@RaviPidaparthi
Copy link
Copy Markdown
Member Author

/check-enforcer override

@RaviPidaparthi RaviPidaparthi merged commit dccabe3 into main Apr 22, 2026
30 of 38 checks passed
@RaviPidaparthi RaviPidaparthi deleted the feat/responses-platform-headers-request-id branch April 22, 2026 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Hosted Agents sdk/agentserver/*

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants