Skip to content

[None][fix] Consolidate aiohttp session management in disagg router#13408

Merged
reasonsolo merged 5 commits intoNVIDIA:mainfrom
reasonsolo:bug6093712-session-fix
Apr 27, 2026
Merged

[None][fix] Consolidate aiohttp session management in disagg router#13408
reasonsolo merged 5 commits intoNVIDIA:mainfrom
reasonsolo:bug6093712-session-fix

Conversation

@reasonsolo
Copy link
Copy Markdown
Collaborator

@reasonsolo reasonsolo commented Apr 24, 2026

@coderabbitai summary

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@reasonsolo reasonsolo requested a review from a team as a code owner April 24, 2026 04:42
@reasonsolo reasonsolo requested a review from hchings April 24, 2026 04:42
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

📝 Walkthrough

Walkthrough

Refactors Router to use a shared, lazily-initialized aiohttp.ClientSession instead of creating per-call sessions. The shared session is passed to ServerState instances during construction, eliminating per-request session creation and teardown overhead while centralizing session lifecycle management.

Changes

Cohort / File(s) Summary
Router Session Management
tensorrt_llm/serve/router.py
Adds session property and close() method to Router for lazy session initialization and cleanup. Updates ServerState and KvCacheAwareServerState constructors to accept optional session parameter. Removes session parameter from KvCacheAwareServerState.decrement_load() and KvCacheAwareRouter.finish_request() to use shared session. Updates server state creation calls to pass self.session.
Router Tests
tests/unittest/disaggregated/test_router.py
Adds new async unit test validating KV-cache event polling in KvCacheAwareRouter.finish_request(). Mocks aiohttp.ClientSession response and verifies block hash is correctly applied to router's per-server KV cache state.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: consolidating aiohttp session management in the disaggregated router, which is the primary objective of this PR.
Description check ✅ Passed The PR description covers the core changes and test plan, but omits the required 'Description' and 'Test Coverage' sections from the template structure.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tensorrt_llm/serve/router.py (1)

74-79: ⚠️ Potential issue | 🟡 Minor

Potential AttributeError when _session is None.

If _session is None, calling self._session.get(...) raises AttributeError, which is silently caught and returns False. This masks a configuration error as "unhealthy". Consider adding an explicit check.

🛡️ Proposed fix
     async def is_healthy(self) -> bool:
+        if self._session is None:
+            return False
         try:
             async with self._session.get(self._server + "/health") as response:
                 return response.status == 200
         except Exception:
             return False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/serve/router.py` around lines 74 - 79, The is_healthy method
currently swallows an AttributeError when self._session is None; add an explicit
pre-check at the top of is_healthy to detect missing configuration (e.g., if
self._session is None) and raise a clear exception (RuntimeError or
AttributeError) with a descriptive message indicating the session/server is not
configured instead of silently returning False; keep the existing try/except
around the network call (self._session.get(self._server + "/health")) to handle
real network errors, and optionally also validate self._server is set before
making the request.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tensorrt_llm/serve/router.py`:
- Around line 74-79: The is_healthy method currently swallows an AttributeError
when self._session is None; add an explicit pre-check at the top of is_healthy
to detect missing configuration (e.g., if self._session is None) and raise a
clear exception (RuntimeError or AttributeError) with a descriptive message
indicating the session/server is not configured instead of silently returning
False; keep the existing try/except around the network call
(self._session.get(self._server + "/health")) to handle real network errors, and
optionally also validate self._server is set before making the request.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3ed70456-0fde-4f67-bc7e-8241e78c9779

📥 Commits

Reviewing files that changed from the base of the PR and between 371e38d and f8b5b00.

📒 Files selected for processing (2)
  • tensorrt_llm/serve/router.py
  • tests/unittest/disaggregated/test_router.py

@reasonsolo
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45347 [ run ] triggered by Bot. Commit: f8b5b00 Link to invocation

@reasonsolo reasonsolo force-pushed the bug6093712-session-fix branch from f8b5b00 to e742cfc Compare April 24, 2026 06:45
- Fix TypeError crash: KvCacheAwareRouter.finish_request was passing
  session= kwarg to decrement_load which no longer accepts it
- Replace fragile __del__ with explicit async close() method, called
  from stop_server_monitoring for clean session teardown
- Add assert in KvCacheAwareServerState.decrement_load to enforce
  session is set before polling kv_cache_events
- Remove redundant _create_server_state override in LoadBalancingRouter
  (identical to LoadBalancingMixin base)
- Remove vestigial **kwargs from _unregister_request and decrement_load
  call (no callers pass extra arguments)
- Deduplicate mock_aiohttp_session test fixture (single autouse fixture
  replaces autouse + non-autouse pair that double-patched)
- Add test verifying /kv_cache_events is queried on finish_request

Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
@reasonsolo reasonsolo force-pushed the bug6093712-session-fix branch from e742cfc to 086ebc3 Compare April 24, 2026 06:51
@reasonsolo
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@reasonsolo reasonsolo requested a review from JunyiXu-nv April 24, 2026 06:58
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45359 [ run ] triggered by Bot. Commit: 086ebc3 Link to invocation

Copy link
Copy Markdown
Collaborator

@JunyiXu-nv JunyiXu-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM.

BTW, is this shared session targeting to improve the performance? If so, do we have any benchmarking result?

@reasonsolo
Copy link
Copy Markdown
Collaborator Author

Changes LGTM.

BTW, is this shared session targeting to improve the performance? If so, do we have any benchmarking result?

No, this is to fix a bug, http session is not passed to finish_request, so the kvcacheaware_router doesn't really fetch kvcache_events

@lishicheng1996-nv
Copy link
Copy Markdown
Collaborator

I cherry-picked this fix onto a downstream branch and hit a server-startup crash that I believe also affects this PR against the current aiohttp>=3.13.3 constraint. Flagging it here so you can reproduce / fix before merge.

Symptom

File "/project/TensorRT-LLM/tensorrt_llm/serve/router.py", line 655, in __init__
    self.session)
File "/project/TensorRT-LLM/tensorrt_llm/serve/router.py", line 192, in session
    self._session = aiohttp.ClientSession()
File "/usr/local/lib/python3.12/dist-packages/aiohttp/client.py", line 321, in __init__
    loop = loop or asyncio.get_running_loop()
RuntimeError: no running event loop

Root cause

Starting aiohttp 3.10, ClientSession.__init__ calls asyncio.get_running_loop() and raises RuntimeError when no loop is running. The router construction path is synchronous:

  • tensorrt_llm/commands/serve.py:disaggregated() — sync click handler, no loop yet.
  • OpenAIDisaggServer.__init__create_router(...)KvCacheAwareRouter.__init___init_load_balancing()_create_server_state()self.session property → aiohttp.ClientSession()

All of that runs before asyncio.run(server(...)) in the same file. With aiohttp>=3.13.3 pinned in requirements.txt this is a hard crash at server startup.

The existing unit tests don't catch it because they mock aiohttp.ClientSessionClientSession.__init__ never actually runs, so the loop precondition is unchecked.

Suggested fix — pass a session provider, not the session

Defer session materialization to first async-context access. Minimal diff:

class ServerState:
    def __init__(
            self,
            server: str,
            use_tokens: bool = False,
            session_provider: Optional[Callable[[], aiohttp.ClientSession]] = None):
        ...
        # Store the callable, not the session. The router's `session` property
        # creates an aiohttp.ClientSession on first access; deferring until
        # an async method (decrement_load / is_healthy) runs guarantees the
        # event loop is live.
        self._session_provider = session_provider

    @property
    def _session(self) -> Optional[aiohttp.ClientSession]:
        return self._session_provider() if self._session_provider else None
# LoadBalancingMixin._create_server_state
def _create_server_state(self, server: str) -> ServerState:
    return self._server_state_class(
        server, self._use_tokens,
        lambda: self.session)   # ← callable, materialized later

# KvCacheAwareRouter._create_server_state
def _create_server_state(self, server: str) -> KvCacheAwareServerState:
    return KvCacheAwareServerState(
        server, self._use_tokens, self._tokens_per_block,
        lambda: self.session)

With this change, self.session is invoked for the first time from inside an async def (e.g. the first decrement_load call), by which point the loop is running and aiohttp.ClientSession() succeeds.

I verified this fix locally end-to-end: /kv_cache_events is now actually POSTed, and _kv_cache_block_table populates across iterations (the original intent of the PR). Happy to open a follow-up PR or you can pull the one-liner here.

Move poll_events HTTP call out of the finish_request critical path by
firing it as a background asyncio task. This eliminates serialized
2-16s blocking per request caused by poll_events being called under
the router lock.

Also adds integration tests for load_balancing, kv_cache_aware, and
conversation routers, and fixes missing http:// prefix in poll_events
and is_healthy URLs.

Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
@reasonsolo reasonsolo requested review from a team as code owners April 24, 2026 13:23
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45359 [ run ] completed with state SUCCESS. Commit: 086ebc3
/LLM/main/L0_MergeRequest_PR pipeline #35603 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Remove extra session argument from finish_request calls since the
session is now managed internally via session_provider.

Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
@reasonsolo reasonsolo force-pushed the bug6093712-session-fix branch 6 times, most recently from 8364dc3 to 8027f83 Compare April 25, 2026 05:11
…ish_request perf

- Add eager_poll config to KvCacheAwareRouter for test determinism
- Make finish_request non-blocking by firing poll_and_update as background task
- Add _base_url property to avoid double http:// prefix
- Add ConversationRouterTester with explicit conversation_id and implicit
  prefix matching tests in test_workers.py
- Add conversation router test to l0_dgx_h100 and QA test lists

Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
@reasonsolo reasonsolo force-pushed the bug6093712-session-fix branch from 8027f83 to b0f0b5d Compare April 25, 2026 05:14
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
@reasonsolo
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45483 [ run ] triggered by Bot. Commit: f523158 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45483 [ run ] completed with state SUCCESS. Commit: f523158
/LLM/main/L0_MergeRequest_PR pipeline #35713 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@reasonsolo
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

1 similar comment
@reasonsolo
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45545 [ run ] triggered by Bot. Commit: f523158 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45545 [ run ] completed with state SUCCESS. Commit: f523158
/LLM/main/L0_MergeRequest_PR pipeline #35763 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@reasonsolo
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45567 [ run ] triggered by Bot. Commit: f523158 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45567 [ run ] completed with state SUCCESS. Commit: f523158
/LLM/main/L0_MergeRequest_PR pipeline #35785 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@reasonsolo
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45584 [ run ] triggered by Bot. Commit: f523158 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45584 [ run ] completed with state SUCCESS. Commit: f523158
/LLM/main/L0_MergeRequest_PR pipeline #35800 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@reasonsolo
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45636 [ run ] triggered by Bot. Commit: f523158 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45636 [ run ] completed with state SUCCESS. Commit: f523158
/LLM/main/L0_MergeRequest_PR pipeline #35851 completed with status: 'SUCCESS'

CI Report

Link to invocation

@reasonsolo reasonsolo enabled auto-merge (squash) April 27, 2026 05:29
@reasonsolo reasonsolo merged commit f3e458e into NVIDIA:main Apr 27, 2026
5 checks passed
chienchunhung added a commit to chienchunhung/TensorRT-LLM that referenced this pull request Apr 29, 2026
Refreshed all 16 files under docs/overview/ to reflect TRT-LLM v1.3.0rc14
(upstream/main 3b7af1c, ~3 weeks of changes since the 2026-04-06
baseline 2b80f8d), competing frameworks (vLLM v0.20.0, SGLang
v0.5.10.post1, LMCache v0.4.4, NVIDIA Dynamo v1.0.0), and the current
hardware (Vera Rubin in production, AMD MI355X MLPerf 6.0, Google TPU v7
GA) and academic landscape (GOOSE, StreamServe, PrfaaS, FlowKV).

Highlights:
- Spec-dec algorithm count corrected 7 to 8 (DFlash added; EAGLE3 dynamic
  tree re-enabled; LoRA + spec-dec generic and EAGLE3-specific now work).
- Block reuse + overlap scheduler combined (NVIDIA#12816), removing a
  long-standing internal gap.
- First-class lmcache and kvbm KV connectors (NVIDIA#12626).
- Production observability stack (Prometheus NVIDIA#12545, modular logger
  NVIDIA#13202, NvTelemetry NVIDIA#12384, per-iteration aggregate counters NVIDIA#13199).
- Disagg reliability fail-fast wave (NVIDIA#13119, NVIDIA#13408, NVIDIA#12718, NVIDIA#12888).
- AutoDeploy onboarded DeepSeek-R1, Gemma-4 + 4-31B NVFP4, MiniMax-M2.7;
  standalone-package-ready; legacy EdgeLLM ONNX export removed.
- KV cache V2 still default OFF (multiple V2 fixes; tracked via new
  V2-default-on milestone in §06).
- New 4 framework comparison column for Dynamo v1.0; updated feature
  matrix and gap analysis to reflect vLLM v0.20 (FA4 MLA prefill default,
  TurboQuant 2-bit KV, vLLM IR foundation, Model Runner V2).
- Strategic prioritization quadrant chart fully re-ranked; new Tier 1
  items: TTFT re-benchmark vs vLLM v0.20, low-bit KV, MLA prefill kernel
  default, disagg chaos-test harness; new Tier 2: Dynamo Snapshot
  integration, TRT-LLM IR strategy, adaptive spec-dec depth.

Snapshot of pre-refresh content saved at docs/overview/.snapshots/2026-04-06/.
Per-file diff highlights, priority-shift table, sources, and
blocked/skipped notes are in docs/overview/CHANGELOG.md.

Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants