Skip to content

Conversation

@Superjomn
Copy link
Collaborator

@Superjomn Superjomn commented Dec 14, 2025

Summary by CodeRabbit

Release Notes

  • New Features

    • Added RPC-based retrieval methods for execution statistics and KV cache events with synchronous (get_stats, get_kv_events) and asynchronous (aget_stats, aget_kv_events) interfaces.
  • Refactor

    • Improved internal statistics and KV cache event handling through RPC-based communication for better performance and scalability.

✏️ Tip: You can customize this high-level summary in your review settings.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@Superjomn Superjomn requested a review from a team as a code owner December 14, 2025 11:06
@Superjomn Superjomn requested a review from hchings December 14, 2025 11:06
@Superjomn Superjomn changed the title [None][refactor] get_stats and get_kvcache_events with rpc [None][refactor] simplify get_stats and get_kvcache_events with rpc Dec 14, 2025
@Superjomn
Copy link
Collaborator Author

/bot run --disable-fail-fast

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 14, 2025

📝 Walkthrough

Walkthrough

This change migrates statistics and KV cache events retrieval from IPC-based continuous streaming to RPC-based on-demand fetching. The proxy layer generates RPC addresses and creates clients; the worker acts as an RPC server and serializes stats/events; dispatch threads for IPC are removed; tests shift to polling patterns.

Changes

Cohort / File(s) Summary
RPC Proxy Layer
tensorrt_llm/executor/proxy.py, tensorrt_llm/executor/rpc_proxy.py
Added four public retrieval methods: get_stats(timeout) returning List[dict], aget_stats(timeout) returning IterationResult, get_kv_events(timeout) returning List[dict], aget_kv_events(timeout) returning IterationResult. Introduced RPC client initialization and removed IPC-based stats/KV event dispatch threads.
RPC Proxy Mixin
tensorrt_llm/executor/rpc_proxy_mixin.py
Removed continuous streaming loops: _fetch_stats_loop_async and _fetch_kv_cache_events_loop_async. Deleted associated handlers: handle_stats, handle_kv_cache_events, _handle_iteration_data. Main loop now runs only responses dispatch.
RPC Worker Mixin
tensorrt_llm/executor/rpc_worker_mixin.py
Updated fetch_stats_async and fetch_kv_cache_events_async to serialize results before return. Removed loop-based methods: fetch_stats_loop_async, fetch_kv_cache_events_loop_async, _generic_fetch_loop_async.
Worker
tensorrt_llm/executor/worker.py
Integrated RpcWorkerMixin as base class; added rpc_addr and hmac_key parameters to initialization. Removed IPC dispatch thread startup/shutdown logic for stats and KV events. Made mp_stats_queue and kv_cache_events_queue optional, conditionally wired only when IPC addresses present.
Executor Tests
tests/unittest/llmapi/test_executor.py
Updated aget_stats to iterate async results and validate first yielded item. Wrapped get_stats calls in polling loops with timeout and validation.
LLM Tests
tests/unittest/llmapi/test_llm.py
Added get_stats_with_wait helper for polling-based stat retrieval with timeout. Replaced direct synchronous calls with polling helper throughout; updated async stats retrieval call signature.
LLM PyTorch Tests
tests/unittest/llmapi/test_llm_pytorch.py
Added two new tests: test_llm_rpc_get_stats and test_llm_rpc_get_stats_async for RPC-proxy LLM instances. Integrated polling helper for existing stat validation tests; added json import for parsing.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Proxy as GenerationExecutorProxy
    participant RpcClient as RPCClient
    participant Worker as GenerationExecutorWorker
    participant RpcServer as RPC Server

    rect rgb(200, 220, 240)
    note over Proxy,RpcServer: Old Flow (IPC-based Streaming)
    Proxy->>Proxy: Create IPC queues<br/>(mp_stats_queue, kv_events_queue)
    Proxy->>Proxy: Start dispatch threads<br/>(stats, kv_events)
    Worker->>Proxy: Stream stats/events continuously<br/>via IPC queues
    Proxy->>Proxy: Dispatch threads receive & buffer
    User->>Proxy: get_stats() / get_kv_events()
    Proxy->>Proxy: Return buffered results
    end

    rect rgb(220, 240, 200)
    note over Proxy,RpcServer: New Flow (RPC-based On-Demand)
    Proxy->>Proxy: Generate unique RPC address & HMAC key
    Proxy->>Worker: Pass rpc_addr, hmac_key during init
    Worker->>RpcServer: start_rpc_server(rpc_addr, hmac_key)
    RpcServer-->>Worker: RPC server running
    Proxy->>RpcClient: Create RPCClient(rpc_addr, hmac_key)
    RpcClient-->>Proxy: Connected
    User->>Proxy: get_stats(timeout) / aget_stats(timeout)
    Proxy->>RpcClient: Call fetch_stats_async()
    RpcClient->>RpcServer: RPC request to Worker
    Worker->>Worker: Serialize stats via _stats_serializer
    Worker-->>RpcServer: Return serialized stats
    RpcServer-->>RpcClient: RPC response
    RpcClient-->>Proxy: Return List[dict]
    Proxy-->>User: Return stats/events
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas requiring extra attention:

  • tensorrt_llm/executor/worker.py: Architectural change to worker initialization and lifecycle (RPC server startup, optional IPC path logic) — verify conditional handling of rpc_addr/hmac_key does not break existing IPC-only code paths.
  • tensorrt_llm/executor/proxy.py and tensorrt_llm/executor/rpc_proxy.py: Four new public methods with timeout and async handling — ensure consistent behavior between sync/async variants and proper error handling for RPC failures.
  • tensorrt_llm/executor/rpc_proxy_mixin.py: Removal of streaming loops changes the execution model significantly — verify no callers depend on continuous stats availability or event streaming semantics.
  • Serialization logic in tensorrt_llm/executor/rpc_worker_mixin.py: Changed from raw stats to serialized form — confirm serializers are properly initialized and serialization roundtrips correctly.
  • Test changes: Multiple test patterns changed from synchronous to polling-based — verify polling timeouts are adequate and no race conditions introduced with stat availability.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is mostly empty, with only template placeholders visible and no actual implementation details, rationale, or test coverage information provided. Fill in the Description section with details about the RPC refactoring, the problem solved, and migration approach. Complete the Test Coverage section with specific test names and coverage details.
Docstring Coverage ⚠️ Warning Docstring coverage is 44.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title '[None][refactor] simplify get_stats and get_kvcache_events with rpc' clearly summarizes the main change: refactoring get_stats and get_kvcache_events methods to use RPC instead of IPC.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (11)
tests/unittest/llmapi/test_llm_pytorch.py (3)

1010-1040: Duplicate import inside async test function.

Line 1014 imports json inside the test function, but json is already imported at the module level (line 1). Remove the redundant import.

 @skip_ray
 @pytest.mark.asyncio
 async def test_llm_rpc_get_stats_async():
     """Test that get_stats_async works with RPC orchestrator."""
-    import json
-
     with LLM(model=llama_model_path,

1082-1095: Fix implicit Optional and consider extracting helper to reduce duplication.

  1. Per PEP 484, expected_len: int = None should be expected_len: Optional[int] = None.
  2. This helper is duplicated at line 1173. Consider extracting it to module level or a shared utility.

Based on static analysis hint (RUF013).

-    def get_stats_with_wait(llm, timeout: float = 2, expected_len: int = None):
+    def get_stats_with_wait(llm, timeout: float = 2, expected_len: Optional[int] = None):

1173-1186: Duplicate helper function - same issues as line 1082.

This is an exact duplicate of the helper defined at lines 1082-1095. Apply the same Optional[int] fix. Consider refactoring both to a shared module-level helper to reduce duplication.

Based on static analysis hint (RUF013).

-    def get_stats_with_wait(llm, timeout: float = 2, expected_len: int = None):
+    def get_stats_with_wait(llm, timeout: float = 2, expected_len: Optional[int] = None):
tests/unittest/llmapi/test_llm.py (2)

2178-2191: Polling helper implementation looks correct.

The helper properly implements timeout-based polling for stats retrieval with JSON parsing. However, this same helper is duplicated in test_llm_pytorch.py and again at line 2267. Consider extracting to a shared test utility module.


2267-2280: Duplicate helper function.

This is identical to the helper at lines 2178-2191. Consider extracting to a shared location to reduce maintenance burden.

tests/unittest/llmapi/test_executor.py (1)

230-242: Move time import to module level and verify assertion stability.

  1. The import time at line 231 should be moved to the module level for consistency and performance.
  2. The assertion stats["iter"] == 1 at line 242 might be fragile depending on timing. Consider using >= 1 for robustness, similar to the async test above which uses >= 0.
+import time  # Add at module level (near line 4)

 ...

     # Poll for stats since RPC calls return immediately
-    import time
     stats_list = []
     for _ in range(10):
         stats_list = executor.get_stats(timeout=0.5)
         if stats_list:
             break
         time.sleep(0.1)

     assert len(stats_list) > 0
     stats = json.loads(stats_list[0]) if isinstance(stats_list[0],
                                                     str) else stats_list[0]
-    assert stats["iter"] == 1
+    assert stats["iter"] >= 1
tensorrt_llm/executor/proxy.py (2)

364-383: Consider narrowing exception type or documenting rationale.

The broad Exception catch is flagged by static analysis (BLE001). While catching broad exceptions may be intentional for RPC robustness (network errors, timeouts, etc.), consider either:

  1. Catching more specific exceptions (e.g., (TimeoutError, ConnectionError, RuntimeError))
  2. Adding a brief comment explaining why broad exception handling is needed

The debug-level logging is appropriate for transient RPC failures.

         try:
             # Stats are already serialized by the worker's fetch_stats_async()
             stats = self.rpc_client.fetch_stats_async().remote(timeout=timeout)
             return stats
-        except Exception as e:
+        except Exception as e:  # Broad catch for RPC robustness (network, timeout, etc.)
             logger.debug(f"Error fetching stats via RPC: {e}")
             return []

411-431: Same exception handling pattern as get_stats.

Consider applying the same comment or narrowing suggestion as for get_stats above for consistency.

tensorrt_llm/executor/rpc_proxy.py (2)

101-125: Consider documenting the synchronous fetch behavior in async method.

The aget_stats method performs a synchronous get_stats() call (line 120) and then populates the queue for async iteration. This differs from true asynchronous streaming where data arrives incrementally. While this may be intentional for the RPC-based on-demand fetch model, consider documenting this behavior in the docstring to set correct expectations for callers.


145-167: Consider documenting the synchronous fetch behavior in async method.

Similar to aget_stats, the aget_kv_events method performs a synchronous get_kv_events() call (line 162) and then populates the queue. Consider documenting this batch-fetch-then-stream behavior in the docstring.

tensorrt_llm/executor/rpc_worker_mixin.py (1)

112-120: Remove or document unused timeout parameter.

The timeout parameter is declared but not used in either fetch_stats_async or fetch_kv_cache_events_async. If it's intended for future use or API consistency with fetch_responses_async, add a docstring note. Otherwise, remove it to avoid confusion.

Apply this diff if removing:

-    async def fetch_stats_async(self, timeout: Optional[float] = None) -> list:
+    async def fetch_stats_async(self) -> list:
         """Async version of fetch_stats using asyncio.to_thread.
 
         This method is exposed via RPC and can be called directly by the proxy.
         Returns serialized stats (JSON strings) that can be sent over RPC.
         """
         stats = await asyncio.to_thread(self.fetch_stats)
         # Serialize stats before sending over RPC (IterationStats objects are not picklable)
         return [self._stats_serializer(s) for s in stats]
 
-    async def fetch_kv_cache_events_async(self, timeout: Optional[float] = None) -> list:
+    async def fetch_kv_cache_events_async(self) -> list:
         """Async version of fetch_kv_cache_events using asyncio.to_thread.
 
         This method is exposed via RPC and can be called directly by the proxy.
         Returns serialized events (JSON strings) that can be sent over RPC.
         """
         events = await asyncio.to_thread(self.fetch_kv_cache_events)
         # Serialize events before sending over RPC
         return [self._kv_cache_events_serializer(e) for e in events]
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1375910 and ce8dec6.

📒 Files selected for processing (8)
  • tensorrt_llm/executor/proxy.py (7 hunks)
  • tensorrt_llm/executor/rpc_proxy.py (2 hunks)
  • tensorrt_llm/executor/rpc_proxy_mixin.py (3 hunks)
  • tensorrt_llm/executor/rpc_worker_mixin.py (1 hunks)
  • tensorrt_llm/executor/worker.py (9 hunks)
  • tests/unittest/llmapi/test_executor.py (1 hunks)
  • tests/unittest/llmapi/test_llm.py (5 hunks)
  • tests/unittest/llmapi/test_llm_pytorch.py (7 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+
Indent Python code with 4 spaces; do not use tabs
Always maintain the namespace when importing in Python, even if only one class or function from a module is used (e.g., use from package.subpackage import foo and then foo.SomeClass() instead of from package.subpackage.foo import SomeClass)
Python filenames should use snake_case (e.g., some_file.py)
Python class names should use PascalCase (e.g., class SomeClass)
Python function and method names should use snake_case (e.g., def my_awesome_function():)
Python local variable names should use snake_case, with prefix k for variable names that start with a number (e.g., k_99th_percentile = ...)
Python global variables should use upper snake_case with prefix G (e.g., G_MY_GLOBAL = ...)
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...)
Avoid shadowing variables declared in an outer scope in Python
Initialize all externally visible members of a Python class in the constructor
For Python interfaces that may be used outside a file, prefer docstrings over comments
Python comments should be reserved for code within a function, or interfaces that are local to a file
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx
Python attributes and variables can be documented inline with type and description (e.g., self.x = 5 followed by """<type>: Description of 'x'""" )
Avoid using reflection in Python when functionality can be easily achieved without reflection
When using try-except blocks in Python, limit the except clause to the smallest set of specific errors possible instead of catching all exceptions
When using try-except blocks in Python to handle multiple possible variable types (duck-typing), keep the body of the try as small as possible and use the else block to implement the logic

Files:

  • tests/unittest/llmapi/test_llm.py
  • tensorrt_llm/executor/rpc_proxy_mixin.py
  • tensorrt_llm/executor/proxy.py
  • tensorrt_llm/executor/rpc_worker_mixin.py
  • tensorrt_llm/executor/rpc_proxy.py
  • tensorrt_llm/executor/worker.py
  • tests/unittest/llmapi/test_llm_pytorch.py
  • tests/unittest/llmapi/test_executor.py
**/*.{cpp,h,cu,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code files should contain an NVIDIA copyright header that includes the current year at the top

Files:

  • tests/unittest/llmapi/test_llm.py
  • tensorrt_llm/executor/rpc_proxy_mixin.py
  • tensorrt_llm/executor/proxy.py
  • tensorrt_llm/executor/rpc_worker_mixin.py
  • tensorrt_llm/executor/rpc_proxy.py
  • tensorrt_llm/executor/worker.py
  • tests/unittest/llmapi/test_llm_pytorch.py
  • tests/unittest/llmapi/test_executor.py
🧠 Learnings (2)
📚 Learning: 2025-07-17T09:01:27.402Z
Learnt from: amitz-nv
Repo: NVIDIA/TensorRT-LLM PR: 5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

Applied to files:

  • tensorrt_llm/executor/rpc_worker_mixin.py
  • tensorrt_llm/executor/worker.py
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
Repo: NVIDIA/TensorRT-LLM PR: 6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/unittest/llmapi/test_llm_pytorch.py
🧬 Code graph analysis (8)
tests/unittest/llmapi/test_llm.py (3)
tensorrt_llm/llmapi/llm.py (3)
  • LLM (1139-1155)
  • get_stats (522-533)
  • generate (262-344)
tensorrt_llm/executor/proxy.py (1)
  • get_stats (364-383)
tensorrt_llm/executor/rpc_proxy.py (1)
  • get_stats (84-99)
tensorrt_llm/executor/rpc_proxy_mixin.py (1)
tensorrt_llm/llmapi/utils.py (1)
  • _SyncQueue (450-517)
tensorrt_llm/executor/proxy.py (4)
tensorrt_llm/executor/rpc/rpc_client.py (1)
  • RPCClient (80-720)
tensorrt_llm/executor/rpc/rpc_common.py (1)
  • get_unique_ipc_addr (9-16)
tensorrt_llm/llmapi/llm.py (1)
  • get_stats (522-533)
tensorrt_llm/executor/result.py (2)
  • IterationResult (824-874)
  • set_timeout (840-841)
tensorrt_llm/executor/rpc_worker_mixin.py (1)
tensorrt_llm/executor/base_worker.py (4)
  • fetch_stats (299-306)
  • _stats_serializer (652-664)
  • fetch_kv_cache_events (308-312)
  • _kv_cache_events_serializer (668-670)
tensorrt_llm/executor/rpc_proxy.py (4)
tensorrt_llm/executor/result.py (3)
  • result (750-761)
  • IterationResult (824-874)
  • set_timeout (840-841)
tensorrt_llm/llmapi/llm.py (1)
  • get_stats (522-533)
examples/apps/fastapi_server.py (1)
  • stats (47-49)
tensorrt_llm/executor/rpc_worker_mixin.py (2)
  • fetch_stats_async (102-110)
  • fetch_kv_cache_events_async (112-120)
tensorrt_llm/executor/worker.py (2)
tensorrt_llm/executor/base_worker.py (2)
  • start (364-366)
  • _set_iteration_result_queue (324-330)
tensorrt_llm/executor/rpc_worker_mixin.py (2)
  • RpcWorkerMixin (12-124)
  • start_rpc_server (41-47)
tests/unittest/llmapi/test_llm_pytorch.py (2)
tensorrt_llm/executor/rpc_proxy.py (2)
  • GenerationExecutorRpcProxy (15-236)
  • get_stats (84-99)
tests/unittest/llmapi/test_llm.py (2)
  • get_stats_with_wait (2178-2191)
  • get_stats_with_wait (2267-2280)
tests/unittest/llmapi/test_executor.py (4)
tensorrt_llm/executor/proxy.py (2)
  • aget_stats (385-409)
  • get_stats (364-383)
tensorrt_llm/executor/rpc_proxy.py (2)
  • aget_stats (101-125)
  • get_stats (84-99)
examples/apps/fastapi_server.py (1)
  • stats (47-49)
tensorrt_llm/llmapi/utils.py (2)
  • AsyncQueue (363-447)
  • EventLoopShutdownError (369-370)
🪛 Ruff (0.14.8)
tensorrt_llm/executor/proxy.py

380-380: Consider moving this statement to an else block

(TRY300)


381-381: Do not catch blind exception: Exception

(BLE001)


428-428: Consider moving this statement to an else block

(TRY300)


429-429: Do not catch blind exception: Exception

(BLE001)

tensorrt_llm/executor/rpc_worker_mixin.py

112-112: Unused method argument: timeout

(ARG002)

tensorrt_llm/executor/rpc_proxy.py

96-96: Consider moving this statement to an else block

(TRY300)


97-97: Do not catch blind exception: Exception

(BLE001)


140-140: Consider moving this statement to an else block

(TRY300)


141-141: Do not catch blind exception: Exception

(BLE001)

tests/unittest/llmapi/test_llm_pytorch.py

1082-1082: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


1173-1173: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (25)
tensorrt_llm/executor/rpc_proxy_mixin.py (3)

9-9: LGTM - Import aligned with usage.

The _SyncQueue import is correctly used in handle_responses method for queue type checking and batch notification.


49-53: Good documentation of architectural change.

The docstring clearly explains that stats and kv_events are now fetched on-demand via direct RPC calls rather than streaming loops, which helps future maintainers understand the design decision.


168-170: Helpful contextual comment for removed functionality.

The NOTE comment provides good context explaining why the streaming loops were removed and where the functionality now resides.

tests/unittest/llmapi/test_llm_pytorch.py (2)

1-1: LGTM - Import added for JSON parsing.

The json import is correctly added for parsing stats JSON strings in the new RPC stats tests.


980-1007: Well-structured test for synchronous RPC stats retrieval.

The test properly validates RPC-based stats retrieval with appropriate polling, assertions for required fields, and JSON parsing. Good coverage for the new get_stats API.

tests/unittest/llmapi/test_llm.py (4)

2211-2211: LGTM - Updated to use polling helper.

Correctly replaced direct get_stats call with the polling helper to accommodate RPC-based retrieval semantics.


2221-2221: LGTM - Polling with short timeout for negative test.

Appropriately uses a short timeout (0.5s) to verify no additional stats are available.


2225-2225: LGTM - Verifies stats availability after generation.

Correctly tests that stats become available after a new generation completes.


2289-2289: LGTM - Updated to use polling helper.

Correctly uses the polling helper for stats retrieval in the queued stats test.

tests/unittest/llmapi/test_executor.py (1)

216-224: LGTM - Async stats retrieval updated correctly.

The test properly uses the new aget_stats(timeout) API, iterates over the async iterable, validates stats fields, and breaks after the first result. Good handling of the IterationResult pattern.

tensorrt_llm/executor/proxy.py (8)

3-3: LGTM - New imports for RPC functionality.

The imports are correctly added for RPC client, unique IPC address generation, and os module (for os.urandom).

Also applies to: 25-29


94-103: LGTM - RPC address and key initialization.

Properly generates a unique IPC address and cryptographically secure HMAC key for RPC communication. Passing these to worker_kwargs ensures the worker can start the RPC server at the correct address.


109-113: LGTM - RPC client creation after worker startup.

The RPC client is correctly created after workers are started, ensuring the RPC server is ready before the client attempts to connect.


141-147: LGTM - Stats and KV events queues disabled.

Setting stats_queue_addr and kv_cache_events_queue_addr to None correctly disables the IPC-based stats and KV events channels in favor of RPC-based retrieval.


211-212: Helpful comment documenting removed functionality.

Good documentation explaining why the dispatch tasks were removed.


318-321: LGTM - Proper RPC client cleanup on shutdown.

The RPC client is properly closed and set to None during shutdown, ensuring resources are released.


385-409: LGTM - Async stats retrieval implementation.

The aget_stats method correctly:

  1. Initializes iteration results if needed
  2. Returns an empty async iterable if stats result is not available
  3. Fetches stats via RPC and populates the queue
  4. Sets the timeout on the result

Good defensive programming with the None check.


433-455: LGTM - Async KV events retrieval implementation.

Mirrors the aget_stats implementation correctly for KV events. Consistent pattern and proper handling.

tensorrt_llm/executor/worker.py (4)

6-6: LGTM! Clean integration of RPC mixin.

The import additions and class hierarchy changes appropriately integrate RPC functionality. The method resolution order (MRO) with RpcWorkerMixin before BaseWorker ensures RPC-specific methods are prioritized correctly.

Also applies to: 19-19, 26-27, 34-34


82-84: LGTM! Comment accurately reflects the architectural change.

The updated comment correctly documents that stats and KV events are now fetched on-demand via RPC rather than through continuous dispatch threads.


152-153: LGTM! Solid conditional logic for hybrid IPC/RPC support.

The changes properly support both IPC-based and RPC-based operation paths:

  • Parameters are correctly propagated through worker_main to the worker constructor
  • IPC queues for stats and KV events are created only when addresses are provided
  • Conditional signaling prevents errors when queues don't exist
  • Iteration result queues are initialized only for IPC paths

This allows the system to operate in RPC-only mode when IPC addresses are not provided.

Also applies to: 201-214, 241-245, 291-293, 319-325


46-47: Rank initialization is properly ordered. The super().__init__() call at line 49 completes before start_rpc_server() is invoked at line 67. Since BaseWorker.__init__() initializes self.rank via mpi_rank() before returning, self.rank is safely available when start_rpc_server() checks if self.rank == 0. The partial RPC initialization pattern is safe.

tensorrt_llm/executor/rpc_proxy.py (1)

2-2: LGTM! Clean import additions and accurate docstring update.

The imports support the new RPC-based stats/events retrieval functionality, and the docstring correctly documents the architectural shift from streaming loops to on-demand RPC calls.

Also applies to: 5-5, 9-9, 73-77

tensorrt_llm/executor/rpc_worker_mixin.py (2)

102-110: LGTM! Proper serialization for RPC transmission.

The method now correctly serializes IterationStats objects into JSON strings before RPC transmission, as these objects are not picklable. The docstring clearly documents this RPC-exposed behavior.


122-124: LGTM! Clear documentation of architectural change.

The note effectively documents the removal of continuous streaming loops in favor of on-demand RPC fetching, helping future maintainers understand the design decision.

@Superjomn Superjomn force-pushed the refactor.get_stats_on_mpi branch from ce8dec6 to 455f758 Compare December 14, 2025 11:11
@Superjomn
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #28239 [ run ] triggered by Bot. Commit: 455f758

@tensorrt-cicd
Copy link
Collaborator

PR_Github #28240 [ run ] triggered by Bot. Commit: 455f758

@Superjomn Superjomn force-pushed the refactor.get_stats_on_mpi branch from 455f758 to 44e2d7f Compare December 14, 2025 11:20
@Superjomn
Copy link
Collaborator Author

/bot run --disable-fail-fast

@Superjomn Superjomn force-pushed the refactor.get_stats_on_mpi branch from 44e2d7f to 98996c7 Compare December 14, 2025 11:26
@Superjomn
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #28241 [ run ] triggered by Bot. Commit: 98996c7

@tensorrt-cicd
Copy link
Collaborator

PR_Github #28242 [ run ] triggered by Bot. Commit: 98996c7

@tensorrt-cicd
Copy link
Collaborator

PR_Github #28242 [ run ] completed with state FAILURE. Commit: 98996c7
/LLM/main/L0_MergeRequest_PR pipeline #21595 completed with status: 'FAILURE'

@Superjomn Superjomn force-pushed the refactor.get_stats_on_mpi branch from 98996c7 to c0b7a3e Compare December 14, 2025 13:40
@Superjomn
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #28251 [ run ] triggered by Bot. Commit: c0b7a3e

@tensorrt-cicd
Copy link
Collaborator

PR_Github #28251 [ run ] completed with state SUCCESS. Commit: c0b7a3e
/LLM/main/L0_MergeRequest_PR pipeline #21604 completed with status: 'FAILURE'

@Superjomn
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #28401 [ run ] triggered by Bot. Commit: f918ca4

@tensorrt-cicd
Copy link
Collaborator

PR_Github #28401 [ run ] completed with state SUCCESS. Commit: f918ca4
/LLM/main/L0_MergeRequest_PR pipeline #21733 completed with status: 'FAILURE'

@Superjomn Superjomn force-pushed the refactor.get_stats_on_mpi branch from 9cdcd05 to 3792aa0 Compare December 16, 2025 09:19
@Superjomn
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #28556 [ run ] triggered by Bot. Commit: 3792aa0

@Superjomn Superjomn force-pushed the refactor.get_stats_on_mpi branch from 3792aa0 to a9c5afc Compare December 16, 2025 09:37
@Superjomn
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #28558 [ run ] triggered by Bot. Commit: a9c5afc

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29233 [ ] completed with state ABORTED. Commit: 9f4adb8

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29232 [ run ] completed with state FAILURE. Commit: 9f4adb8
/LLM/main/L0_MergeRequest_PR pipeline #22437 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@Superjomn
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29236 [ run ] triggered by Bot. Commit: 9f4adb8

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29236 [ run ] completed with state SUCCESS. Commit: 9f4adb8
/LLM/main/L0_MergeRequest_PR pipeline #22440 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@Superjomn
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29239 [ run ] triggered by Bot. Commit: 9f4adb8

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29239 [ run ] completed with state SUCCESS. Commit: 9f4adb8
/LLM/main/L0_MergeRequest_PR pipeline #22443 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
@Superjomn Superjomn force-pushed the refactor.get_stats_on_mpi branch from 9f4adb8 to dcf449b Compare December 21, 2025 02:36
@Superjomn
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29272 [ run ] triggered by Bot. Commit: dcf449b

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29272 [ run ] completed with state SUCCESS. Commit: dcf449b
/LLM/main/L0_MergeRequest_PR pipeline #22471 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@Superjomn
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29330 [ run ] triggered by Bot. Commit: dcf449b

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29330 [ run ] completed with state SUCCESS. Commit: dcf449b
/LLM/main/L0_MergeRequest_PR pipeline #22522 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@Superjomn
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29377 [ run ] triggered by Bot. Commit: dcf449b

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29377 [ run ] completed with state SUCCESS. Commit: dcf449b
/LLM/main/L0_MergeRequest_PR pipeline #22567 completed with status: 'SUCCESS'

@nv-guomingz nv-guomingz merged commit ea6cd76 into NVIDIA:main Dec 22, 2025
5 checks passed
codego7250 pushed a commit to codego7250/TensorRT-LLM that referenced this pull request Dec 22, 2025
…VIDIA#9980)

Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
JunyiXu-nv pushed a commit to JunyiXu-nv/TensorRT-LLM that referenced this pull request Dec 30, 2025
…VIDIA#9980)

Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants