Skip to content

feat(server): default MCP streamable-http to stateful with idle eviction#636

Merged
bokelley merged 1 commit intomainfrom
bokelley/mcp-sse-idle-eviction
May 10, 2026
Merged

feat(server): default MCP streamable-http to stateful with idle eviction#636
bokelley merged 1 commit intomainfrom
bokelley/mcp-sse-idle-eviction

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

Summary

  • Flips the MCP streamable-http default from stateless to stateful with session_idle_timeout=1800.0. Stateless was the production-broken default — upstream MCP holds GET-SSE streams open without idle eviction, causing connection accumulation under load (the issue this branch was opened against). Stateful + the session_idle_timeout knob added in mcp 1.27.0 is the production-safe shape.
  • Plumbs the originating Starlette Request into RequestMetadata.request_context so context_factory can read auth state through the stateful session-task boundary that breaks ContextVar propagation. The bundled BearerTokenAuthMiddleware + auth_context_factory are updated to use request.state (working in both modes) with a ContextVar fallback for backward compat.
  • Bumps mcp pin to >=1.27.0,<2.0.

Why

Two architectural failures landed at the same time when this branch was opened:

  1. The connection leak: stateless mode in upstream MCP routes GET requests to _handle_get_request which establishes an EventSourceResponse with no idle timer, no max-lifetime, no heartbeat-driven reap. In stateless mode this is also semantically useless — every request gets a fresh transport, so the standalone SSE writer can never deliver server-initiated messages. Pure resource sink.
  2. The latency gap vs A2A: stateless pays per-request transport-construction + app.run() setup + session task spawn for every tools/call. Stateful keeps a session task alive and just dispatches into it.

The reason stateless became the default in PR #296 was bundled with json_response=True to dodge an upstream FastMCP SSE-streaming bug — but those flags are orthogonal. This PR decouples them: streaming_responses (the public knob) only controls json_response now; stateless_http is a separate kwarg with False as the new default.

What unblocks the default flip

The first attempt at this flip uncovered a real architectural issue: the stateful session task is a separate async task from the HTTP request task, so middleware-set ContextVars don't propagate to dispatch. That broke the documented auth pattern in examples/mcp_with_auth_middleware.py.

This PR solves it by:

  • Adding RequestMetadata.request_context: Any (the originating Starlette Request).
  • Populating it from mcp.server.lowlevel.server.request_ctx.get().request — the upstream MCP contextvar that's reliably set in both dispatch paths inside the dispatch sub-task.
  • Updating BearerTokenAuthMiddleware and A2ABearerAuthMiddleware to mirror principal/tenant onto request.state alongside the ContextVars.
  • Updating auth_context_factory to prefer request.state (works in both stateless and stateful) and fall back to ContextVars (works in stateless and A2A only).

End-to-end test (tests/test_mcp_stateful_session.py::test_stateful_auth_propagates_via_request_state) wires the real middleware, real session manager, real factory, real handler and asserts caller_identity / tenant_id arrive correctly through stateful streamable-http.

Migration

Adopters using the bundled BearerTokenAuthMiddleware + auth_context_factory: no action — the fix is wired in.

Adopters with custom context_factory using ContextVars: works on stateless, breaks on stateful. Migrate to read meta.request_context.state.<your_key>. The SDK still threads ContextVars on stateless mode and A2A for backward compat.

Adopters who genuinely need stateless (multi-replica without sticky LB on Mcp-Session-Id): opt in with stateless_http=True. The SDK suppresses session_idle_timeout in that combo to honor the upstream RuntimeError contract.

Test plan

  • pytest tests/ --ignore=tests/integration --ignore=tests/conformance — 3807 passed, 17 skipped, 1 xfailed (one wall-clock-flake test deselected — pre-existing, unrelated to this branch)
  • ruff check src/ tests/ — clean
  • mypy src/adcp/server/serve.py src/adcp/server/auth.py — clean
  • New: 10 stateful-mode tests including end-to-end auth propagation through real BearerTokenAuthMiddleware + auth_context_factory + real StreamableHTTPSessionManager
  • Migrated tests/test_mcp_middleware_composition.py from ContextVar-only to request.state pattern (proves new path works under default stateful)
  • CI green across all supported Python versions

Expert review

Two passes by code-reviewer, one each by ad-tech-protocol-expert and python-expert:

  • Protocol expert confirmed Mcp-Session-Id and AdCP context_id are orthogonal; idempotency / ctx_metadata echo unaffected because RequestContext is built per-request inside fn().
  • Python expert validated that there is no public hook for session_idle_timeout in mcp 1.27.x — pre-creating _session_manager is the only viable extension point. Required the <2.0 pin upper bound (which is included).
  • Code-reviewer's final pass: no blockers. Validated request_ctx is reliably set across both dispatch paths, request.state survives Starlette's BaseHTTPMiddleware (it's a live view onto scope["state"]), and the all-three-None fallback heuristic is safe (each request has a fresh scope["state"]).

🤖 Generated with Claude Code

Stateless was the production-broken default — upstream MCP holds GET-SSE
streams open with no idle eviction, causing connection accumulation
under load. Stateful mode + session_idle_timeout=1800s (the knob added
in mcp 1.27.0) is the production-safe shape.

To make stateful safe by default, plumb the originating Starlette
Request into RequestMetadata.request_context (sourced from upstream's
mcp.server.lowlevel.server.request_ctx, set reliably in both dispatch
paths). BearerTokenAuthMiddleware mirrors principal/tenant onto
request.state in addition to ContextVars; auth_context_factory reads
request.state first and falls back to ContextVars. Adopters using the
bundled middleware + factory get the fix for free. Adopters with
custom factories using ContextVars need to migrate to
meta.request_context.state for stateful — documented in the docstrings.

Adopters who genuinely need stateless (multi-replica without sticky LB
on Mcp-Session-Id) opt back in via stateless_http=True; the SDK
suppresses session_idle_timeout in that combination to honor the
upstream constructor's stateless+timeout RuntimeError contract.

Bumps mcp pin to >=1.27.0,<2.0 — the upper bound is required because
create_mcp_server pre-creates StreamableHTTPSessionManager (FastMCP
doesn't expose session_idle_timeout in its settings) which reads four
FastMCP private attrs whose contract is not preserved across majors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bokelley bokelley force-pushed the bokelley/mcp-sse-idle-eviction branch from 2128b6b to 9583ff1 Compare May 10, 2026 14:24
@bokelley bokelley merged commit 3173a54 into main May 10, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant