Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ std::unique_ptr<BaseCacheTransceiver> CacheTransceiverFactory::createCacheTransc
}
else
{
backendType = executor::CacheTransceiverConfig::BackendType::NIXL;
backendType = executor::CacheTransceiverConfig::BackendType::UCX;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we only revert #9247?

Copy link
Collaborator Author

@Shixiaowei02 Shixiaowei02 Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we only revert #9247?

OK, let’s not rush to fully revert for now. We'll try to find more clues before making a decision. Thanks! @bo-nv

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's compare the e2e time when running the accuracy tests and if we see that NIXL is slower, let's make UCX default.

}
}
cacheTransceiverConfig.value().setBackendType(backendType);
Expand Down
2 changes: 1 addition & 1 deletion docs/source/features/disagg-serving.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ cache_transceiver_config:
max_tokens_in_buffer: <int>
```
`backend` specifies the communication backend for transferring the kvCache, valid options include `DEFAULT`,`UCX`, `NIXL`, and `MPI`, the default backend is NIXL.
`backend` specifies the communication backend for transferring the kvCache, valid options include `DEFAULT`,`UCX`, `NIXL`, and `MPI`, the default backend is UCX.

`max_tokens_in_buffer` defines the buffer size for kvCache transfers, it is recommended to set this value greater than or equal to the maximum ISL (Input Sequence Length) of all requests for optimal performance.

Expand Down
2 changes: 1 addition & 1 deletion examples/disaggregated/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ The `trtllm-serve` command supports the `extra-llm-config.yaml` parameter. In th

```yaml
cache_transceiver_config:
# KV cache transmission backend. Valid options include `DEFAULT` (i.e., NIXL), `UCX`, `NIXL`.
# KV cache transmission backend. Valid options include `DEFAULT` (i.e., UCX), `UCX`, `NIXL`.
backend: <str>
# KV cache buffer size. Set it ≥ the maximum ISL (Input Sequence Length) for best performance.
max_tokens_in_buffer: <int>
Expand Down
6 changes: 3 additions & 3 deletions tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,10 @@ def create_kv_cache_transceiver(

if cache_transceiver_config.backend == "DEFAULT":
# When cache_transceiver_config.backend is not set, fallback to env_vars settings
# NIXL is the default backend
cache_transceiver_config.backend = "NIXL"
# UCX is the default backend
cache_transceiver_config.backend = "UCX"
# Ordered by priority
env_vars = [("TRTLLM_USE_UCX_KVCACHE", "UCX"),
env_vars = [("TRTLLM_USE_NIXL_KVCACHE", "NIXL"),
("TRTLLM_USE_MPI_KVCACHE", "MPI")]
for env_var, be_type in env_vars:
if getenv(env_var) == "1":
Expand Down