[None][fix] Consolidate aiohttp session management in disagg router by reasonsolo · Pull Request #13408 · NVIDIA/TensorRT-LLM

reasonsolo · 2026-04-24T04:42:56Z

@coderabbitai summary

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-04-24T04:47:49Z

📝 Walkthrough

Walkthrough

Refactors Router to use a shared, lazily-initialized aiohttp.ClientSession instead of creating per-call sessions. The shared session is passed to ServerState instances during construction, eliminating per-request session creation and teardown overhead while centralizing session lifecycle management.

Changes

Cohort / File(s)	Summary
Router Session Management `tensorrt_llm/serve/router.py`	Adds `session` property and `close()` method to Router for lazy session initialization and cleanup. Updates ServerState and KvCacheAwareServerState constructors to accept optional session parameter. Removes session parameter from `KvCacheAwareServerState.decrement_load()` and `KvCacheAwareRouter.finish_request()` to use shared session. Updates server state creation calls to pass `self.session`.
Router Tests `tests/unittest/disaggregated/test_router.py`	Adds new async unit test validating KV-cache event polling in `KvCacheAwareRouter.finish_request()`. Mocks aiohttp.ClientSession response and verifies block hash is correctly applied to router's per-server KV cache state.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main change: consolidating aiohttp session management in the disaggregated router, which is the primary objective of this PR.
Description check	✅ Passed	The PR description covers the core changes and test plan, but omits the required 'Description' and 'Test Coverage' sections from the template structure.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tensorrt_llm/serve/router.py (1)

74-79: ⚠️ Potential issue | 🟡 Minor

Potential AttributeError when _session is None.

If _session is None, calling self._session.get(...) raises AttributeError, which is silently caught and returns False. This masks a configuration error as "unhealthy". Consider adding an explicit check.

🛡️ Proposed fix

     async def is_healthy(self) -> bool:
+        if self._session is None:
+            return False
         try:
             async with self._session.get(self._server + "/health") as response:
                 return response.status == 200
         except Exception:
             return False

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/serve/router.py` around lines 74 - 79, The is_healthy method
currently swallows an AttributeError when self._session is None; add an explicit
pre-check at the top of is_healthy to detect missing configuration (e.g., if
self._session is None) and raise a clear exception (RuntimeError or
AttributeError) with a descriptive message indicating the session/server is not
configured instead of silently returning False; keep the existing try/except
around the network call (self._session.get(self._server + "/health")) to handle
real network errors, and optionally also validate self._server is set before
making the request.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tensorrt_llm/serve/router.py`:
- Around line 74-79: The is_healthy method currently swallows an AttributeError
when self._session is None; add an explicit pre-check at the top of is_healthy
to detect missing configuration (e.g., if self._session is None) and raise a
clear exception (RuntimeError or AttributeError) with a descriptive message
indicating the session/server is not configured instead of silently returning
False; keep the existing try/except around the network call
(self._session.get(self._server + "/health")) to handle real network errors, and
optionally also validate self._server is set before making the request.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3ed70456-0fde-4f67-bc7e-8241e78c9779

📥 Commits

Reviewing files that changed from the base of the PR and between 371e38d and f8b5b00.

📒 Files selected for processing (2)

tensorrt_llm/serve/router.py
tests/unittest/disaggregated/test_router.py

reasonsolo · 2026-04-24T06:04:46Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-24T06:11:30Z

PR_Github #45347 [ run ] triggered by Bot. Commit: f8b5b00 Link to invocation

- Fix TypeError crash: KvCacheAwareRouter.finish_request was passing session= kwarg to decrement_load which no longer accepts it - Replace fragile __del__ with explicit async close() method, called from stop_server_monitoring for clean session teardown - Add assert in KvCacheAwareServerState.decrement_load to enforce session is set before polling kv_cache_events - Remove redundant _create_server_state override in LoadBalancingRouter (identical to LoadBalancingMixin base) - Remove vestigial **kwargs from _unregister_request and decrement_load call (no callers pass extra arguments) - Deduplicate mock_aiohttp_session test fixture (single autouse fixture replaces autouse + non-autouse pair that double-patched) - Add test verifying /kv_cache_events is queried on finish_request Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>

reasonsolo · 2026-04-24T06:53:31Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-24T06:59:16Z

PR_Github #45359 [ run ] triggered by Bot. Commit: 086ebc3 Link to invocation

JunyiXu-nv

Changes LGTM.

BTW, is this shared session targeting to improve the performance? If so, do we have any benchmarking result?

reasonsolo · 2026-04-24T07:21:55Z

Changes LGTM.

BTW, is this shared session targeting to improve the performance? If so, do we have any benchmarking result?

No, this is to fix a bug, http session is not passed to finish_request, so the kvcacheaware_router doesn't really fetch kvcache_events

lishicheng1996-nv · 2026-04-24T08:32:32Z

I cherry-picked this fix onto a downstream branch and hit a server-startup crash that I believe also affects this PR against the current aiohttp>=3.13.3 constraint. Flagging it here so you can reproduce / fix before merge.

Symptom

File "/project/TensorRT-LLM/tensorrt_llm/serve/router.py", line 655, in __init__
    self.session)
File "/project/TensorRT-LLM/tensorrt_llm/serve/router.py", line 192, in session
    self._session = aiohttp.ClientSession()
File "/usr/local/lib/python3.12/dist-packages/aiohttp/client.py", line 321, in __init__
    loop = loop or asyncio.get_running_loop()
RuntimeError: no running event loop

Root cause

Starting aiohttp 3.10, ClientSession.__init__ calls asyncio.get_running_loop() and raises RuntimeError when no loop is running. The router construction path is synchronous:

tensorrt_llm/commands/serve.py:disaggregated() — sync click handler, no loop yet.
OpenAIDisaggServer.__init__ → create_router(...) → KvCacheAwareRouter.__init__ → _init_load_balancing() → _create_server_state() → self.session property → aiohttp.ClientSession() ❌

All of that runs before asyncio.run(server(...)) in the same file. With aiohttp>=3.13.3 pinned in requirements.txt this is a hard crash at server startup.

The existing unit tests don't catch it because they mock aiohttp.ClientSession — ClientSession.__init__ never actually runs, so the loop precondition is unchecked.

Suggested fix — pass a session provider, not the session

Defer session materialization to first async-context access. Minimal diff:

class ServerState:
    def __init__(
            self,
            server: str,
            use_tokens: bool = False,
            session_provider: Optional[Callable[[], aiohttp.ClientSession]] = None):
        ...
        # Store the callable, not the session. The router's `session` property
        # creates an aiohttp.ClientSession on first access; deferring until
        # an async method (decrement_load / is_healthy) runs guarantees the
        # event loop is live.
        self._session_provider = session_provider

    @property
    def _session(self) -> Optional[aiohttp.ClientSession]:
        return self._session_provider() if self._session_provider else None

# LoadBalancingMixin._create_server_state
def _create_server_state(self, server: str) -> ServerState:
    return self._server_state_class(
        server, self._use_tokens,
        lambda: self.session)   # ← callable, materialized later

# KvCacheAwareRouter._create_server_state
def _create_server_state(self, server: str) -> KvCacheAwareServerState:
    return KvCacheAwareServerState(
        server, self._use_tokens, self._tokens_per_block,
        lambda: self.session)

With this change, self.session is invoked for the first time from inside an async def (e.g. the first decrement_load call), by which point the loop is running and aiohttp.ClientSession() succeeds.

I verified this fix locally end-to-end: /kv_cache_events is now actually POSTed, and _kv_cache_block_table populates across iterations (the original intent of the PR). Happy to open a follow-up PR or you can pull the one-liner here.

Move poll_events HTTP call out of the finish_request critical path by firing it as a background asyncio task. This eliminates serialized 2-16s blocking per request caused by poll_events being called under the router lock. Also adds integration tests for load_balancing, kv_cache_aware, and conversation routers, and fixes missing http:// prefix in poll_events and is_healthy URLs. Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>

tensorrt-cicd · 2026-04-25T01:44:24Z

PR_Github #45359 [ run ] completed with state SUCCESS. Commit: 086ebc3
/LLM/main/L0_MergeRequest_PR pipeline #35603 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Remove extra session argument from finish_request calls since the session is now managed internally via session_provider. Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>

…ish_request perf - Add eager_poll config to KvCacheAwareRouter for test determinism - Make finish_request non-blocking by firing poll_and_update as background task - Add _base_url property to avoid double http:// prefix - Add ConversationRouterTester with explicit conversation_id and implicit prefix matching tests in test_workers.py - Add conversation router test to l0_dgx_h100 and QA test lists Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>

Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>

reasonsolo · 2026-04-25T05:16:46Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-25T05:24:03Z

PR_Github #45483 [ run ] triggered by Bot. Commit: f523158 Link to invocation

tensorrt-cicd · 2026-04-25T15:12:51Z

PR_Github #45483 [ run ] completed with state SUCCESS. Commit: f523158
/LLM/main/L0_MergeRequest_PR pipeline #35713 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

reasonsolo · 2026-04-25T15:30:35Z

/bot run --disable-fail-fast

reasonsolo · 2026-04-26T03:18:05Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-26T03:25:31Z

PR_Github #45545 [ run ] triggered by Bot. Commit: f523158 Link to invocation

tensorrt-cicd · 2026-04-26T10:46:46Z

PR_Github #45545 [ run ] completed with state SUCCESS. Commit: f523158
/LLM/main/L0_MergeRequest_PR pipeline #35763 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

reasonsolo · 2026-04-26T11:07:57Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-26T11:14:17Z

PR_Github #45567 [ run ] triggered by Bot. Commit: f523158 Link to invocation

tensorrt-cicd · 2026-04-26T13:15:02Z

PR_Github #45567 [ run ] completed with state SUCCESS. Commit: f523158
/LLM/main/L0_MergeRequest_PR pipeline #35785 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

reasonsolo · 2026-04-26T14:50:55Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-26T14:57:40Z

PR_Github #45584 [ run ] triggered by Bot. Commit: f523158 Link to invocation

tensorrt-cicd · 2026-04-26T16:59:34Z

PR_Github #45584 [ run ] completed with state SUCCESS. Commit: f523158
/LLM/main/L0_MergeRequest_PR pipeline #35800 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

reasonsolo · 2026-04-27T03:19:45Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-27T03:26:26Z

PR_Github #45636 [ run ] triggered by Bot. Commit: f523158 Link to invocation

tensorrt-cicd · 2026-04-27T05:17:15Z

PR_Github #45636 [ run ] completed with state SUCCESS. Commit: f523158
/LLM/main/L0_MergeRequest_PR pipeline #35851 completed with status: 'SUCCESS'

CI Report

Link to invocation

Refreshed all 16 files under docs/overview/ to reflect TRT-LLM v1.3.0rc14 (upstream/main 3b7af1c, ~3 weeks of changes since the 2026-04-06 baseline 2b80f8d), competing frameworks (vLLM v0.20.0, SGLang v0.5.10.post1, LMCache v0.4.4, NVIDIA Dynamo v1.0.0), and the current hardware (Vera Rubin in production, AMD MI355X MLPerf 6.0, Google TPU v7 GA) and academic landscape (GOOSE, StreamServe, PrfaaS, FlowKV). Highlights: - Spec-dec algorithm count corrected 7 to 8 (DFlash added; EAGLE3 dynamic tree re-enabled; LoRA + spec-dec generic and EAGLE3-specific now work). - Block reuse + overlap scheduler combined (NVIDIA#12816), removing a long-standing internal gap. - First-class lmcache and kvbm KV connectors (NVIDIA#12626). - Production observability stack (Prometheus NVIDIA#12545, modular logger NVIDIA#13202, NvTelemetry NVIDIA#12384, per-iteration aggregate counters NVIDIA#13199). - Disagg reliability fail-fast wave (NVIDIA#13119, NVIDIA#13408, NVIDIA#12718, NVIDIA#12888). - AutoDeploy onboarded DeepSeek-R1, Gemma-4 + 4-31B NVFP4, MiniMax-M2.7; standalone-package-ready; legacy EdgeLLM ONNX export removed. - KV cache V2 still default OFF (multiple V2 fixes; tracked via new V2-default-on milestone in §06). - New 4 framework comparison column for Dynamo v1.0; updated feature matrix and gap analysis to reflect vLLM v0.20 (FA4 MLA prefill default, TurboQuant 2-bit KV, vLLM IR foundation, Model Runner V2). - Strategic prioritization quadrant chart fully re-ranked; new Tier 1 items: TTFT re-benchmark vs vLLM v0.20, low-bit KV, MLA prefill kernel default, disagg chaos-test harness; new Tier 2: Dynamo Snapshot integration, TRT-LLM IR strategy, adaptive spec-dec depth. Snapshot of pre-refresh content saved at docs/overview/.snapshots/2026-04-06/. Per-file diff highlights, priority-shift table, sources, and blocked/skipped notes are in docs/overview/CHANGELOG.md. Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>

reasonsolo requested a review from a team as a code owner April 24, 2026 04:42

reasonsolo requested a review from hchings April 24, 2026 04:42

github-actions Bot assigned reasonsolo Apr 24, 2026

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

reasonsolo requested a review from lishicheng1996-nv April 24, 2026 06:04

reasonsolo force-pushed the bug6093712-session-fix branch from f8b5b00 to e742cfc Compare April 24, 2026 06:45

reasonsolo force-pushed the bug6093712-session-fix branch from e742cfc to 086ebc3 Compare April 24, 2026 06:51

reasonsolo requested a review from JunyiXu-nv April 24, 2026 06:58

JunyiXu-nv approved these changes Apr 24, 2026

View reviewed changes

zhenhuaw-me approved these changes Apr 24, 2026

View reviewed changes

reasonsolo requested review from a team as code owners April 24, 2026 13:23

[None][fix] Update test_workers.py to match consolidated session API

302c41c

Remove extra session argument from finish_request calls since the session is now managed internally via session_provider. Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>

reasonsolo force-pushed the bug6093712-session-fix branch 6 times, most recently from 8364dc3 to 8027f83 Compare April 25, 2026 05:11

reasonsolo force-pushed the bug6093712-session-fix branch from 8027f83 to b0f0b5d Compare April 25, 2026 05:14

Merge branch 'main' into bug6093712-session-fix

f523158

Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>

Shixiaowei02 approved these changes Apr 27, 2026

View reviewed changes

reasonsolo enabled auto-merge (squash) April 27, 2026 05:29

xinhe-nv approved these changes Apr 27, 2026

View reviewed changes

reasonsolo merged commit f3e458e into NVIDIA:main Apr 27, 2026
5 checks passed

Conversation

reasonsolo commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented Apr 24, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

reasonsolo commented Apr 24, 2026

Uh oh!

tensorrt-cicd commented Apr 24, 2026

Uh oh!

reasonsolo commented Apr 24, 2026

Uh oh!

tensorrt-cicd commented Apr 24, 2026

Uh oh!

JunyiXu-nv left a comment

Choose a reason for hiding this comment

Uh oh!

reasonsolo commented Apr 24, 2026

Uh oh!

lishicheng1996-nv commented Apr 24, 2026

Symptom

Root cause

Suggested fix — pass a session provider, not the session

Uh oh!

tensorrt-cicd commented Apr 25, 2026

Uh oh!

reasonsolo commented Apr 25, 2026

Uh oh!

tensorrt-cicd commented Apr 25, 2026

Uh oh!

tensorrt-cicd commented Apr 25, 2026

Uh oh!

reasonsolo commented Apr 25, 2026

Uh oh!

reasonsolo commented Apr 26, 2026

Uh oh!

tensorrt-cicd commented Apr 26, 2026

Uh oh!

tensorrt-cicd commented Apr 26, 2026

Uh oh!

reasonsolo commented Apr 26, 2026

Uh oh!

tensorrt-cicd commented Apr 26, 2026

Uh oh!

tensorrt-cicd commented Apr 26, 2026

Uh oh!

reasonsolo commented Apr 26, 2026

Uh oh!

tensorrt-cicd commented Apr 26, 2026

Uh oh!

tensorrt-cicd commented Apr 26, 2026

Uh oh!

reasonsolo commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

reasonsolo commented Apr 24, 2026 •

edited

Loading