Skip to content

Conversation

@chuangz0
Copy link
Collaborator

@chuangz0 chuangz0 commented Dec 8, 2025

Summary by CodeRabbit

  • Bug Fixes

    • Improved robustness of cache transmission connection lifecycle management across multiple backend implementations.
    • Added safeguards to gracefully handle operations when connections are not in a running state.
  • Tests

    • Extended test coverage to validate cache transmission behavior with multiple backends (NIXL and UCX).

✏️ Tip: You can customize this high-level summary in your review settings.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
@chuangz0 chuangz0 requested a review from a team as a code owner December 8, 2025 10:23
@chuangz0 chuangz0 requested review from Tabrizian, pcastonguay and schetlur-nv and removed request for Tabrizian and schetlur-nv December 8, 2025 10:23
@chuangz0
Copy link
Collaborator Author

chuangz0 commented Dec 8, 2025

/bot run

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 8, 2025

📝 Walkthrough

Walkthrough

This change introduces a unified isRunning() interface across all ConnectionManager implementations to track and query running state. Atomic flags are added to each manager, lifecycle management clears the flag in destructors, and early-exit guards are added to critical request/response processing paths.

Changes

Cohort / File(s) Summary
Base Interface
cpp/include/tensorrt_llm/executor/cacheCommunicator.h
Added pure-virtual isRunning() const method to ConnectionManager interface
Agent Connection Manager
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h, cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
Implemented isRunning() const override; added mIsRunning atomic flag; set flag to false in destructors; added early exit checks when not running
MPI Connection Manager
cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.h, cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.cpp
Implemented isRunning() const override; added mIsRunning atomic flag with destructor that clears the flag
UCX Connection Manager
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/ucxCacheCommunicator.h, cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/ucxCacheCommunicator.cpp
Implemented isRunning() const override; added mIsRunning atomic flag; set flag to false in destructor
Request/Response Guards
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
Added null connection and running state checks in receiveRequestInfo and response processing loops; logs warning and returns early if not running
Test Parametrization
tests/unittest/llmapi/test_llm_pytorch.py
Added @pytest.mark.parametrize with NIXL and UCX backends for test_llm_context_only_timed_out_kv_cache_exhausted; updated function signature and hardcoded backend reference to use parameter

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

  • The pattern of adding isRunning() is consistently applied across four similar ConnectionManager implementations with minimal variation
  • Guard additions in dataTransceiver are straightforward null and state checks
  • Test parametrization is a standard pytest pattern
  • Despite touching nine files, the homogeneity of changes reduces review complexity

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ⚠️ Warning The PR description is entirely empty except for the template placeholder comments, missing the required Description, Test Coverage, and detailed implementation details. Fill in the Description section explaining the issue and solution, add Test Coverage listing relevant tests, and complete other required sections beyond the template boilerplate.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title describes a concrete fix: terminating NIXL running state when exiting, which aligns with the primary change across multiple files.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/ucxCacheCommunicator.cpp (1)

507-507: Consider setting mIsRunning = false earlier in destructor.

Currently, mIsRunning = false is set after stopping progress threads and joining the ZMQ thread (line 507). Consider moving it to the very beginning of the destructor (before line 483) to signal shutdown state immediately, allowing any concurrent operations to detect and exit early before waiting for thread cleanup.

Apply this diff to signal shutdown earlier:

 UcxConnectionManager::~UcxConnectionManager()
 {
     TLLM_LOG_DEBUG(mRank, "UcxConnectionManager::~UcxConnectionManager");
+    mIsRunning = false;
 
     for (auto& worker : mWorkersPool)
     {
         worker->stopProgressThread();
     }
     // ... rest of cleanup ...
-    mIsRunning = false;
     mZmqRepSocket.close();
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 96d9b67 and 46b109b.

📒 Files selected for processing (9)
  • cpp/include/tensorrt_llm/executor/cacheCommunicator.h (1 hunks)
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (2 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (3 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h (2 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.cpp (1 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.h (1 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/ucxCacheCommunicator.cpp (2 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/ucxCacheCommunicator.h (2 hunks)
  • tests/unittest/llmapi/test_llm_pytorch.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{cpp,h,cu}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.{cpp,h,cu}: Closing braces of namespaces should have a comment saying the namespace it closes (e.g., } // namespace foo)
Prefer const or constexpr variables over #define whenever possible, as the latter are not visible to the compiler
A variable that is not modified after its initialization should be declared as const
Except 0 (only used in comparison for checking signness/existence/emptiness) and nullptr, true, false, all other literals should only be used for variable initialization and should be replaced with named constants
Use Allman indentation style for braces in C++
Put the semicolon for an empty for or while loop in a new line
The statement forming the body of a switch, while, do .. while or for statement shall be a compound statement (use brace-delimited statements)
If and else should always be followed by brace-delimited statements, even if empty or a single statement
C++ filenames should use camel case with first letter lowercase (e.g., thisIsASubDir and thisIsAFilename.cpp)
All filenames involved in compilation of a compilation target must have case-insensitive unique filenames
All types (including class names) should use camel case with uppercase first letter (e.g., FooBarClass)
Local variables, methods and namespaces should use camel case with first letter lowercase (e.g., localFooBar)
Non-magic-number global variables that are non-static and not defined in anonymous namespace should use camel case prefixed by a lower case 'g' (e.g., gDontUseGlobalFoos)
Non-magic-number global variables that are static or defined in an anonymous namespace should use camel case prefixed by a lower case 's' (e.g., sMutableStaticGlobal)
Locally visible static variables should use camel case with lowercase prefix 's' as the first letter of the name (e.g., static std::once_flag sFlag;)
Public, private and protected class member variables should use camel case prefixed with 'm' (e.g., mNbFooValues), though the 'm' pre...

Files:

  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/ucxCacheCommunicator.h
  • cpp/include/tensorrt_llm/executor/cacheCommunicator.h
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/ucxCacheCommunicator.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
  • cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.h
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
**/*.h

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.h: Use a preprocessor guard in C++ header files with the guard name format TRTLLM_ followed by the filename in all caps (e.g., TRTLLM_FOO_BAR_HELLO_H for file FooBarHello.h); do not include directory names in the symbol
Do not use underscore prefix or suffix in C++ preprocessor guard symbols; they are reserved in C++ standard for compilers or implementation

Files:

  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/ucxCacheCommunicator.h
  • cpp/include/tensorrt_llm/executor/cacheCommunicator.h
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
  • cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.h
**/*.{cpp,h,cu,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code files should contain an NVIDIA copyright header that includes the current year at the top

Files:

  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/ucxCacheCommunicator.h
  • cpp/include/tensorrt_llm/executor/cacheCommunicator.h
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/ucxCacheCommunicator.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
  • cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.h
  • tests/unittest/llmapi/test_llm_pytorch.py
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+
Indent Python code with 4 spaces; do not use tabs
Always maintain the namespace when importing in Python, even if only one class or function from a module is used (e.g., use from package.subpackage import foo and then foo.SomeClass() instead of from package.subpackage.foo import SomeClass)
Python filenames should use snake_case (e.g., some_file.py)
Python class names should use PascalCase (e.g., class SomeClass)
Python function and method names should use snake_case (e.g., def my_awesome_function():)
Python local variable names should use snake_case, with prefix k for variable names that start with a number (e.g., k_99th_percentile = ...)
Python global variables should use upper snake_case with prefix G (e.g., G_MY_GLOBAL = ...)
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...)
Avoid shadowing variables declared in an outer scope in Python
Initialize all externally visible members of a Python class in the constructor
For Python interfaces that may be used outside a file, prefer docstrings over comments
Python comments should be reserved for code within a function, or interfaces that are local to a file
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx
Python attributes and variables can be documented inline with type and description (e.g., self.x = 5 followed by """<type>: Description of 'x'""" )
Avoid using reflection in Python when functionality can be easily achieved without reflection
When using try-except blocks in Python, limit the except clause to the smallest set of specific errors possible instead of catching all exceptions
When using try-except blocks in Python to handle multiple possible variable types (duck-typing), keep the body of the try as small as possible and use the else block to implement the logic

Files:

  • tests/unittest/llmapi/test_llm_pytorch.py
🧠 Learnings (5)
📚 Learning: 2025-08-06T08:18:28.669Z
Learnt from: zhengd-nv
Repo: NVIDIA/TensorRT-LLM PR: 6633
File: cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp:145-155
Timestamp: 2025-08-06T08:18:28.669Z
Learning: In cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp, the existing `mMtxForMap` mutex in DataSenderImpl is sufficient to synchronize measurement file operations in the `release` method, as all file operations occur within the same critical section that protects the `mRequestToSession` map access.

Applied to files:

  • cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
📚 Learning: 2025-08-20T06:56:02.889Z
Learnt from: eopXD
Repo: NVIDIA/TensorRT-LLM PR: 6768
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:577-579
Timestamp: 2025-08-20T06:56:02.889Z
Learning: In cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp, maxSequenceLength is now enforced as a non-optional argument in the BlockManager constructor, so concerns about std::nullopt defaulting to 0 are not applicable. When windowSize > maxSequenceLength, a warning should be added instead of handling optional parameter cases.

Applied to files:

  • cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
📚 Learning: 2025-08-21T09:41:49.347Z
Learnt from: eopXD
Repo: NVIDIA/TensorRT-LLM PR: 6768
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:2010-2045
Timestamp: 2025-08-21T09:41:49.347Z
Learning: In cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp, updateSequenceCacheBlockOffsets is specifically for updating bookkeeping when blocks are added during the context phase, not for refreshing offsets after detach operations. During detach operations, GenerationRequest::removeFrontBlock handles the necessary cache block bookkeeping internally.

Applied to files:

  • cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
📚 Learning: 2025-08-15T06:46:54.897Z
Learnt from: eopXD
Repo: NVIDIA/TensorRT-LLM PR: 6767
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:0-0
Timestamp: 2025-08-15T06:46:54.897Z
Learning: In cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp addToken function, newly allocated blocks are unshared by design. The beam search path in addToken (when sequence.getNumTokens() > windowSize) is currently broken/non-functional with SWA, so the block allocation doesn't follow a shared-then-unshared pattern.

Applied to files:

  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
Repo: NVIDIA/TensorRT-LLM PR: 6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/unittest/llmapi/test_llm_pytorch.py
🧬 Code graph analysis (7)
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/ucxCacheCommunicator.h (3)
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (2)
  • isRunning (603-606)
  • isRunning (603-603)
cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.cpp (2)
  • isRunning (80-83)
  • isRunning (80-80)
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/ucxCacheCommunicator.cpp (2)
  • isRunning (676-679)
  • isRunning (676-676)
cpp/include/tensorrt_llm/executor/cacheCommunicator.h (2)
cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.cpp (2)
  • isRunning (80-83)
  • isRunning (80-80)
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/ucxCacheCommunicator.cpp (2)
  • isRunning (676-679)
  • isRunning (676-676)
cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.cpp (3)
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (2)
  • isRunning (603-606)
  • isRunning (603-603)
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/ucxCacheCommunicator.cpp (2)
  • isRunning (676-679)
  • isRunning (676-676)
cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.h (1)
  • MpiConnectionManager (41-55)
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/ucxCacheCommunicator.cpp (2)
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (2)
  • isRunning (603-606)
  • isRunning (603-603)
cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.cpp (2)
  • isRunning (80-83)
  • isRunning (80-80)
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h (4)
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (3)
  • nodiscard (395-406)
  • isRunning (603-606)
  • isRunning (603-603)
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/ucxCacheCommunicator.h (2)
  • nodiscard (75-78)
  • nodiscard (83-88)
cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.cpp (2)
  • isRunning (80-83)
  • isRunning (80-80)
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/ucxCacheCommunicator.cpp (2)
  • isRunning (676-679)
  • isRunning (676-676)
tests/unittest/llmapi/test_llm_pytorch.py (3)
tests/unittest/llmapi/apps/_test_openai_chat.py (1)
  • backend (26-27)
tests/unittest/llmapi/apps/_test_openai_completions.py (1)
  • backend (21-22)
tests/unittest/llmapi/apps/_test_openai_misc.py (1)
  • backend (14-15)
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (2)
cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.cpp (2)
  • isRunning (80-83)
  • isRunning (80-80)
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/ucxCacheCommunicator.cpp (2)
  • isRunning (676-679)
  • isRunning (676-676)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (16)
cpp/include/tensorrt_llm/executor/cacheCommunicator.h (1)

69-69: LGTM! Well-designed interface addition.

The new isRunning() pure virtual method provides a clean way for external components to query connection manager lifecycle state. The [[nodiscard]] attribute ensures callers don't ignore the state, which is appropriate for this safety-critical query.

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (2)

625-628: Appropriate early-exit guard for graceful shutdown.

The dual check of mTerminate || !mManager->isRunning() correctly handles both internal shutdown signals and external manager lifecycle changes. This prevents the response thread from attempting operations on a shutting-down manager.


363-367: Verify that callers of recvRequestInfo() safely handle the returned info object during shutdown scenarios.

When connection is null and the manager is not running, the function returns info without signaling its validity to callers. Ensure all call sites either validate the returned object's state or use a mechanism (e.g., optional, status field) to indicate shutdown conditions where the result is invalid.

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h (1)

299-299: LGTM! Consistent implementation of running state tracking.

The isRunning() override and mIsRunning atomic flag follow the same pattern as other connection managers (MpiConnectionManager, UcxConnectionManager). The atomic initialization to true is correct, indicating the manager starts in a running state.

Also applies to: 313-313

cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.h (1)

45-45: LGTM! Complete lifecycle management interface.

The additions provide proper lifecycle management with:

  • Destructor to clean up and signal shutdown
  • isRunning() query method for external state checks
  • Atomic mIsRunning flag for thread-safe state tracking

Consistent with the broader pattern across all connection manager implementations.

Also applies to: 49-49, 55-55

cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/ucxCacheCommunicator.h (1)

65-65: LGTM! Consistent with other connection manager implementations.

The mIsRunning atomic flag and isRunning() override maintain consistency across all connection manager implementations (Agent, MPI, UCX), enabling uniform lifecycle management throughout the cache transmission layer.

Also applies to: 90-90

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (4)

498-501: Appropriate early-exit for graceful shutdown.

The early return when !mIsRunning allows waiting threads to cleanly exit during shutdown. This prevents indefinite blocking when the manager is being destroyed.


599-599: Correct lifecycle management in destructor.

Setting mIsRunning = false at the beginning of the destructor ensures that any concurrent operations can detect shutdown and exit early before resources are cleaned up. This ordering is critical for preventing use-after-free issues.


603-606: LGTM! Simple and thread-safe accessor.

The isRunning() implementation correctly returns the atomic flag, providing thread-safe state queries without requiring additional synchronization.


322-325: Verify nullptr handling in callers of recvConnectionAndRequestInfo.

When mIsRunning is false, this method returns nullptr without setting requestInfo. Ensure all callers (particularly in dataTransceiver.cpp) properly handle the null return and potentially uninitialized requestInfo.

cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.cpp (2)

80-83: LGTM! Thread-safe state accessor.

The implementation correctly returns the atomic mIsRunning flag, consistent with other connection manager implementations (AgentConnectionManager, UcxConnectionManager).


85-88: LGTM! Proper lifecycle signaling in destructor.

Setting mIsRunning = false in the destructor ensures that any threads checking the running state can detect shutdown and exit gracefully. This follows the same pattern as the other connection managers.

cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/ucxCacheCommunicator.cpp (1)

676-679: LGTM! Consistent with other connection manager implementations.

The isRunning() implementation correctly returns the atomic flag, providing thread-safe state queries. This matches the pattern used in AgentConnectionManager and MpiConnectionManager.

tests/unittest/llmapi/test_llm_pytorch.py (3)

1055-1055: LGTM! Good test coverage for both backends.

The parameterization correctly extends the test to cover both NIXL and UCX backends, ensuring that the KV cache exhaustion and timeout behavior works correctly for both implementations. This aligns with the PR's goal of introducing unified running state tracking across ConnectionManager implementations.


1056-1057: Function signature correctly updated.

The backend parameter is properly added to accept the parameterized values from the decorator, maintaining compatibility with the existing sender_future_timeout_ms parameter.


1077-1077: Backend parameter correctly applied.

The change from hardcoded "UCX" to the parameterized backend variable correctly enables the test to run against both NIXL and UCX backends, validating that the new unified running state tracking works properly for both implementations.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #27292 [ run ] triggered by Bot. Commit: 46b109b

@tensorrt-cicd
Copy link
Collaborator

PR_Github #27292 [ run ] completed with state SUCCESS. Commit: 46b109b
/LLM/main/L0_MergeRequest_PR pipeline #20843 completed with status: 'FAILURE'

@pcastonguay
Copy link
Collaborator

/bot run --disable-fail-fast

@pcastonguay pcastonguay changed the title [https://nvbugs//5716787][fix] terminate nixl running when existing [https://nvbugs//5716787][fix] terminate nixl running when exiting Dec 8, 2025
@pcastonguay pcastonguay changed the title [https://nvbugs//5716787][fix] terminate nixl running when exiting [https://nvbugs/5716787][fix] terminate nixl running when exiting Dec 8, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #27349 [ run ] triggered by Bot. Commit: 46b109b

@tensorrt-cicd
Copy link
Collaborator

PR_Github #27349 [ run ] completed with state SUCCESS. Commit: 46b109b
/LLM/main/L0_MergeRequest_PR pipeline #20894 completed with status: 'FAILURE'

@pcastonguay
Copy link
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #27515 [ run ] triggered by Bot. Commit: 46b109b

@tensorrt-cicd
Copy link
Collaborator

PR_Github #27515 [ run ] completed with state SUCCESS. Commit: 46b109b
/LLM/main/L0_MergeRequest_PR pipeline #20994 completed with status: 'FAILURE'

@pcastonguay
Copy link
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #27546 [ run ] triggered by Bot. Commit: b5be7da

@tensorrt-cicd
Copy link
Collaborator

PR_Github #27546 [ run ] completed with state FAILURE. Commit: b5be7da
LLM/main/L0_MergeRequest_PR #21023 (Blue Ocean) completed with status: ABORTED

@pcastonguay
Copy link
Collaborator

/bot run --disable-fail-fast

1 similar comment
@pcastonguay
Copy link
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #27733 [ run ] triggered by Bot. Commit: b5be7da

@pcastonguay
Copy link
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #27860 [ run ] triggered by Bot. Commit: b5be7da

@tensorrt-cicd
Copy link
Collaborator

PR_Github #27860 [ run ] completed with state SUCCESS. Commit: b5be7da
/LLM/main/L0_MergeRequest_PR pipeline #21268 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@pcastonguay pcastonguay merged commit 4cc4cbe into NVIDIA:main Dec 12, 2025
5 checks passed
sherry-1001 pushed a commit to sherry-1001/TensorRT-LLM that referenced this pull request Dec 16, 2025
…IDIA#9785)

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
codego7250 pushed a commit to codego7250/TensorRT-LLM that referenced this pull request Dec 19, 2025
…IDIA#9785)

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants