[None][fix] Consolidate aiohttp session management in disagg router#13408
[None][fix] Consolidate aiohttp session management in disagg router#13408reasonsolo merged 5 commits intoNVIDIA:mainfrom
Conversation
📝 WalkthroughWalkthroughRefactors Router to use a shared, lazily-initialized Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tensorrt_llm/serve/router.py (1)
74-79:⚠️ Potential issue | 🟡 MinorPotential
AttributeErrorwhen_sessionisNone.If
_sessionisNone, callingself._session.get(...)raisesAttributeError, which is silently caught and returnsFalse. This masks a configuration error as "unhealthy". Consider adding an explicit check.🛡️ Proposed fix
async def is_healthy(self) -> bool: + if self._session is None: + return False try: async with self._session.get(self._server + "/health") as response: return response.status == 200 except Exception: return False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/serve/router.py` around lines 74 - 79, The is_healthy method currently swallows an AttributeError when self._session is None; add an explicit pre-check at the top of is_healthy to detect missing configuration (e.g., if self._session is None) and raise a clear exception (RuntimeError or AttributeError) with a descriptive message indicating the session/server is not configured instead of silently returning False; keep the existing try/except around the network call (self._session.get(self._server + "/health")) to handle real network errors, and optionally also validate self._server is set before making the request.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@tensorrt_llm/serve/router.py`:
- Around line 74-79: The is_healthy method currently swallows an AttributeError
when self._session is None; add an explicit pre-check at the top of is_healthy
to detect missing configuration (e.g., if self._session is None) and raise a
clear exception (RuntimeError or AttributeError) with a descriptive message
indicating the session/server is not configured instead of silently returning
False; keep the existing try/except around the network call
(self._session.get(self._server + "/health")) to handle real network errors, and
optionally also validate self._server is set before making the request.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 3ed70456-0fde-4f67-bc7e-8241e78c9779
📒 Files selected for processing (2)
tensorrt_llm/serve/router.pytests/unittest/disaggregated/test_router.py
|
/bot run --disable-fail-fast |
|
PR_Github #45347 [ run ] triggered by Bot. Commit: |
f8b5b00 to
e742cfc
Compare
- Fix TypeError crash: KvCacheAwareRouter.finish_request was passing session= kwarg to decrement_load which no longer accepts it - Replace fragile __del__ with explicit async close() method, called from stop_server_monitoring for clean session teardown - Add assert in KvCacheAwareServerState.decrement_load to enforce session is set before polling kv_cache_events - Remove redundant _create_server_state override in LoadBalancingRouter (identical to LoadBalancingMixin base) - Remove vestigial **kwargs from _unregister_request and decrement_load call (no callers pass extra arguments) - Deduplicate mock_aiohttp_session test fixture (single autouse fixture replaces autouse + non-autouse pair that double-patched) - Add test verifying /kv_cache_events is queried on finish_request Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
e742cfc to
086ebc3
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #45359 [ run ] triggered by Bot. Commit: |
JunyiXu-nv
left a comment
There was a problem hiding this comment.
Changes LGTM.
BTW, is this shared session targeting to improve the performance? If so, do we have any benchmarking result?
No, this is to fix a bug, http session is not passed to finish_request, so the kvcacheaware_router doesn't really fetch kvcache_events |
|
I cherry-picked this fix onto a downstream branch and hit a server-startup crash that I believe also affects this PR against the current SymptomRoot causeStarting
All of that runs before The existing unit tests don't catch it because they mock Suggested fix — pass a session provider, not the sessionDefer session materialization to first async-context access. Minimal diff: class ServerState:
def __init__(
self,
server: str,
use_tokens: bool = False,
session_provider: Optional[Callable[[], aiohttp.ClientSession]] = None):
...
# Store the callable, not the session. The router's `session` property
# creates an aiohttp.ClientSession on first access; deferring until
# an async method (decrement_load / is_healthy) runs guarantees the
# event loop is live.
self._session_provider = session_provider
@property
def _session(self) -> Optional[aiohttp.ClientSession]:
return self._session_provider() if self._session_provider else None# LoadBalancingMixin._create_server_state
def _create_server_state(self, server: str) -> ServerState:
return self._server_state_class(
server, self._use_tokens,
lambda: self.session) # ← callable, materialized later
# KvCacheAwareRouter._create_server_state
def _create_server_state(self, server: str) -> KvCacheAwareServerState:
return KvCacheAwareServerState(
server, self._use_tokens, self._tokens_per_block,
lambda: self.session)With this change, I verified this fix locally end-to-end: |
Move poll_events HTTP call out of the finish_request critical path by firing it as a background asyncio task. This eliminates serialized 2-16s blocking per request caused by poll_events being called under the router lock. Also adds integration tests for load_balancing, kv_cache_aware, and conversation routers, and fixes missing http:// prefix in poll_events and is_healthy URLs. Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
PR_Github #45359 [ run ] completed with state
|
Remove extra session argument from finish_request calls since the session is now managed internally via session_provider. Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
8364dc3 to
8027f83
Compare
…ish_request perf - Add eager_poll config to KvCacheAwareRouter for test determinism - Make finish_request non-blocking by firing poll_and_update as background task - Add _base_url property to avoid double http:// prefix - Add ConversationRouterTester with explicit conversation_id and implicit prefix matching tests in test_workers.py - Add conversation router test to l0_dgx_h100 and QA test lists Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
8027f83 to
b0f0b5d
Compare
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
/bot run --disable-fail-fast |
|
PR_Github #45483 [ run ] triggered by Bot. Commit: |
|
PR_Github #45483 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
1 similar comment
|
/bot run --disable-fail-fast |
|
PR_Github #45545 [ run ] triggered by Bot. Commit: |
|
PR_Github #45545 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #45567 [ run ] triggered by Bot. Commit: |
|
PR_Github #45567 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #45584 [ run ] triggered by Bot. Commit: |
|
PR_Github #45584 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #45636 [ run ] triggered by Bot. Commit: |
|
PR_Github #45636 [ run ] completed with state |
Refreshed all 16 files under docs/overview/ to reflect TRT-LLM v1.3.0rc14 (upstream/main 3b7af1c, ~3 weeks of changes since the 2026-04-06 baseline 2b80f8d), competing frameworks (vLLM v0.20.0, SGLang v0.5.10.post1, LMCache v0.4.4, NVIDIA Dynamo v1.0.0), and the current hardware (Vera Rubin in production, AMD MI355X MLPerf 6.0, Google TPU v7 GA) and academic landscape (GOOSE, StreamServe, PrfaaS, FlowKV). Highlights: - Spec-dec algorithm count corrected 7 to 8 (DFlash added; EAGLE3 dynamic tree re-enabled; LoRA + spec-dec generic and EAGLE3-specific now work). - Block reuse + overlap scheduler combined (NVIDIA#12816), removing a long-standing internal gap. - First-class lmcache and kvbm KV connectors (NVIDIA#12626). - Production observability stack (Prometheus NVIDIA#12545, modular logger NVIDIA#13202, NvTelemetry NVIDIA#12384, per-iteration aggregate counters NVIDIA#13199). - Disagg reliability fail-fast wave (NVIDIA#13119, NVIDIA#13408, NVIDIA#12718, NVIDIA#12888). - AutoDeploy onboarded DeepSeek-R1, Gemma-4 + 4-31B NVFP4, MiniMax-M2.7; standalone-package-ready; legacy EdgeLLM ONNX export removed. - KV cache V2 still default OFF (multiple V2 fixes; tracked via new V2-default-on milestone in §06). - New 4 framework comparison column for Dynamo v1.0; updated feature matrix and gap analysis to reflect vLLM v0.20 (FA4 MLA prefill default, TurboQuant 2-bit KV, vLLM IR foundation, Model Runner V2). - Strategic prioritization quadrant chart fully re-ranked; new Tier 1 items: TTFT re-benchmark vs vLLM v0.20, low-bit KV, MLA prefill kernel default, disagg chaos-test harness; new Tier 2: Dynamo Snapshot integration, TRT-LLM IR strategy, adaptive spec-dec depth. Snapshot of pre-refresh content saved at docs/overview/.snapshots/2026-04-06/. Per-file diff highlights, priority-shift table, sources, and blocked/skipped notes are in docs/overview/CHANGELOG.md. Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
@coderabbitai summary
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.