Skip to content

Conversation

@Shixiaowei02
Copy link
Collaborator

@Shixiaowei02 Shixiaowei02 commented Nov 21, 2025

Summary by CodeRabbit

  • Configuration Changes

    • Default backend for KV cache transmission changed from NIXL to UCX
    • Updated environment variable fallback logic for backend selection priority
  • Documentation

    • Updated configuration documentation to reflect new default backend selection

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 21, 2025

📝 Walkthrough

Walkthrough

The default backend for KV cache transceiver is changed from NIXL to UCX across the codebase. This includes updates to the C++ implementation, Python implementation, and corresponding documentation to reflect the new default backend selection behavior.

Changes

Cohort / File(s) Summary
C++ Cache Transceiver
cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
Default backend selection updated: when CacheTransceiverConfig::BackendType is DEFAULT, the fallback now selects UCX instead of NIXL
Python Cache Transceiver
tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py
Default backend changed from NIXL to UCX; environment variable fallback mapping reordered to prioritize TR_TL_LM_USE_NIXL_KVCACHE before MPI when applicable
Documentation
docs/source/features/disagg-serving.md
Updated default backend description in cache_transceiver_config docs from "NIXL" to "UCX"
Example Configuration
examples/disaggregated/README.md
Updated example YAML mapping for DEFAULT backend from "(i.e., NIXL)" to "(i.e., UCX)"

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

  • Changes follow a consistent pattern of replacing the default backend value across multiple files
  • Documentation updates are straightforward descriptive changes
  • Environment variable fallback logic is reordered but maintains the same conditional structure and warning behavior

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete; it contains only '@coderabbitai summary' without any actual explanation of the issue, solution, test coverage, or checklist items required by the template. Add a comprehensive description explaining why the backend is being reverted to UCX, list relevant tests that safeguard the changes, and complete the PR checklist items.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: reverting the transport backend default from NIXL back to UCX, which is the primary change across all modified files.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 39e6418 and 673f1bb.

📒 Files selected for processing (4)
  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (1 hunks)
  • docs/source/features/disagg-serving.md (1 hunks)
  • examples/disaggregated/README.md (1 hunks)
  • tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py (1 hunks)
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-08-20T06:56:02.889Z
Learnt from: eopXD
Repo: NVIDIA/TensorRT-LLM PR: 6768
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:577-579
Timestamp: 2025-08-20T06:56:02.889Z
Learning: In cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp, maxSequenceLength is now enforced as a non-optional argument in the BlockManager constructor, so concerns about std::nullopt defaulting to 0 are not applicable. When windowSize > maxSequenceLength, a warning should be added instead of handling optional parameter cases.

Applied to files:

  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
📚 Learning: 2025-08-21T09:41:49.347Z
Learnt from: eopXD
Repo: NVIDIA/TensorRT-LLM PR: 6768
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:2010-2045
Timestamp: 2025-08-21T09:41:49.347Z
Learning: In cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp, updateSequenceCacheBlockOffsets is specifically for updating bookkeeping when blocks are added during the context phase, not for refreshing offsets after detach operations. During detach operations, GenerationRequest::removeFrontBlock handles the necessary cache block bookkeeping internally.

Applied to files:

  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
📚 Learning: 2025-09-23T15:12:38.312Z
Learnt from: nv-lschneider
Repo: NVIDIA/TensorRT-LLM PR: 7910
File: cpp/tensorrt_llm/thop/allreduceOp.cpp:352-446
Timestamp: 2025-09-23T15:12:38.312Z
Learning: In TensorRT-LLM NCCL device implementation, NCCL version 2.28+ requirements are handled at runtime in the nccl_device/config layer rather than with compile-time guards. This allows the allreduceOp to remain version-agnostic and delegates version compatibility validation to the appropriate lower-level components that can gracefully handle unsupported configurations.

Applied to files:

  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
  • tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py
📚 Learning: 2025-08-15T06:46:54.897Z
Learnt from: eopXD
Repo: NVIDIA/TensorRT-LLM PR: 6767
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:0-0
Timestamp: 2025-08-15T06:46:54.897Z
Learning: In cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp addToken function, newly allocated blocks are unshared by design. The beam search path in addToken (when sequence.getNumTokens() > windowSize) is currently broken/non-functional with SWA, so the block allocation doesn't follow a shared-then-unshared pattern.

Applied to files:

  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
📚 Learning: 2025-08-20T06:48:45.368Z
Learnt from: eopXD
Repo: NVIDIA/TensorRT-LLM PR: 6768
File: cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h:0-0
Timestamp: 2025-08-20T06:48:45.368Z
Learning: In cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp, updateSequenceCacheBlockOffsets is only called when adding a sequence, not during detach operations. During detach, the cache block bookkeeping is handled by GenerationRequest::removeFrontBlock.

Applied to files:

  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
📚 Learning: 2025-07-17T09:01:27.402Z
Learnt from: amitz-nv
Repo: NVIDIA/TensorRT-LLM PR: 5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

Applied to files:

  • tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (3)
cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (1)

92-92: LGTM! Default backend correctly changed to UCX.

The fallback logic is clear and consistent with the PR objective to revert from NIXL to UCX as the default backend.

docs/source/features/disagg-serving.md (1)

121-121: LGTM! Documentation accurately updated.

The documentation now correctly states that UCX is the default backend, consistent with the code changes.

examples/disaggregated/README.md (1)

15-15: LGTM! Documentation accurately updated.

The comment now correctly indicates that DEFAULT maps to UCX, consistent with the code changes.

@reasonsolo
Copy link
Collaborator

Should we unwaive some (maybe not all) disagg tests?

@Shixiaowei02
Copy link
Collaborator Author

Should we unwaive some (maybe not all) disagg tests?

I suggest making the problem clues more explicit (after establishing the causal relationship) before unwaiving, to avoid interference and backlash on CI.

Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
@Shixiaowei02 Shixiaowei02 force-pushed the user/xiaoweis/revert-nixl branch from 76f7959 to 25e0c8e Compare November 21, 2025 08:22
@Shixiaowei02
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25334 [ run ] triggered by Bot. Commit: 25e0c8e

else
{
backendType = executor::CacheTransceiverConfig::BackendType::NIXL;
backendType = executor::CacheTransceiverConfig::BackendType::UCX;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we only revert #9247?

Copy link
Collaborator Author

@Shixiaowei02 Shixiaowei02 Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we only revert #9247?

OK, let’s not rush to fully revert for now. We'll try to find more clues before making a decision. Thanks! @bo-nv

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25334 [ run ] completed with state SUCCESS. Commit: 25e0c8e
/LLM/main/L0_MergeRequest_PR pipeline #19162 completed with status: 'FAILURE'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants