[None][feat] KVConnector shorthand paths for "lmcache" and "kvbm" with examples#12626
Conversation
📝 WalkthroughWalkthroughThis pull request adds LMCache as a configurable KV cache connector backend for TensorRT-LLM. It introduces a connector registry system, updates the KvCacheConnectorConfig to support preset-based configuration, adds a CLI option to the serve command, and provides example configuration and demonstration scripts. Changes
Sequence DiagramsequenceDiagram
participant User
participant CLI as serve CLI
participant Config as KvCacheConnectorConfig
participant Registry as CONNECTOR_REGISTRY
participant LLM as TensorRT-LLM
participant Connector as LMCache Connector
User->>CLI: invoke serve --kv-connector lmcache
CLI->>Config: create KvCacheConnectorConfig(connector="lmcache")
Config->>Registry: resolve preset "lmcache"
Registry-->>Config: return connector_module, scheduler_class, worker_class
Config->>Config: validate and populate fields
Config-->>CLI: return resolved config
CLI->>LLM: initialize with kv_connector_config + block_reuse=True
LLM->>Connector: initialize LMCache connector
Connector-->>LLM: ready for KV cache operations
LLM-->>User: serve ready
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/llm-api/llm_lmcache_connector.py`:
- Around line 78-102: The teardown call destroy_engine() can be skipped if
LLM.generate or the assertion fails; wrap the generation, prints and assertion
(the block using LLM, generate, output0/output1, text0/text1 and the assert) in
a try/finally so destroy_engine() is always executed in the finally block;
re-raise any caught exception after teardown to preserve failing behavior.
Ensure you reference the existing LLM instance, the generate calls, and the
assert when moving them into the try block and place destroy_engine() only in
finally.
In `@tensorrt_llm/commands/serve.py`:
- Around line 902-906: The current injection of KvCacheConnectorConfig
unconditionally sets llm_args['kv_connector_config'] when kv_connector is not
None, which can break non-PyTorch backends; before creating
KvCacheConnectorConfig (the block referencing kv_connector,
KvCacheConnectorConfig, and llm_args), add a guard that checks the configured
backend (e.g., the variable or llm_args['backend'] / backend_name used in this
module) and only inject the connector when the backend is a supported one (e.g.,
"pytorch"); if the backend is unsupported, raise a clear CLI error or exit with
a helpful message instead of setting llm_args['kv_connector_config'].
- Around line 907-915: The current code uses
kv_cc.setdefault('enable_block_reuse', True) which will not override an explicit
False and can leave enable_block_reuse disabled; change this to explicitly set
kv_cc['enable_block_reuse'] = True after converting/ensuring kv_cc is a dict so
that enable_block_reuse is always enforced when preparing
llm_args['kv_cache_config']; update the block handling around llm_args, kv_cc
and the KvCacheConfig conversion accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c96e9196-75c3-4b02-b7f6-6f5480637a95
📒 Files selected for processing (6)
examples/llm-api/configs/trtllm_lmcache_connector_extra.yamlexamples/llm-api/llm_lmcache_connector.pytensorrt_llm/commands/serve.pytensorrt_llm/connectors/__init__.pytensorrt_llm/connectors/registry.pytensorrt_llm/llmapi/llm_args.py
|
addressed all coderabbit comments too |
|
/bot run |
|
PR_Github #40982 [ run ] triggered by Bot. Commit: |
57c2965 to
f425bc5
Compare
|
/bot help |
GitHub Bot Help
Provide a user friendly way for developers to interact with a Jenkins server. Run See details below for each supported subcommand. Details
Launch build/test pipelines. All previously running jobs will be killed.
kill
Kill all running builds associated with pull request. skip
Skip testing for latest commit on pull request. reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break. |
|
/bot reuse-pipeline |
|
PR_Github #40987 [ reuse-pipeline ] triggered by Bot. Commit: |
|
PR_Github #40982 [ run ] completed with state |
|
PR_Github #40987 [ reuse-pipeline ] completed with state |
|
/bot run |
|
PR_Github #40989 [ run ] triggered by Bot. Commit: |
|
PR_Github #40989 [ run ] completed with state
|
|
PR_Github #42157 [ run ] completed with state
|
ed7895e to
b300924
Compare
|
sorry for the ping again, could a maintainer trigger the CI please, thank you! |
Signed-off-by: Ubuntu <ubuntu@g294.voltagepark.net>
Signed-off-by: samuel <slshen@uchicago.edu>
…nector Signed-off-by: samuel <slshen@uchicago.edu>
… kvbm - Remove --kv-connector CLI option from trtllm-serve (YAML-only config) - Move connectors/ to tensorrt_llm/_torch/pyexecutor/connectors/ - Add kvbm preset to connector registry (dynamo KVBM) - Add trtllm_kvbm_connector_extra.yaml example Signed-off-by: samuel <slshen@uchicago.edu>
Move kv_cache_connector.py into tensorrt_llm/_torch/pyexecutor/connectors/ alongside registry.py, as requested in review. Update all import paths. Fix pre-existing line-length lint violations in the moved file. Signed-off-by: samuel <slshen@uchicago.edu>
After moving kv_cache_connector.py into the connectors/ subdirectory, the relative imports for llm_request, scheduler, and resource_manager need to reference the parent package (..) instead of the current one (.). Signed-off-by: samuel <slshen@uchicago.edu>
Register "lmcache-mp" preset in the connector registry pointing to
LMCache's multi-process adapter (tensorrt_mp_adapter). This enables
process-isolated KV caching via a standalone LMCache ZMQ server.
Usage:
kv_connector_config:
connector: lmcache-mp
Signed-off-by: samuel <slshen@uchicago.edu>
ee23d85 to
ef93cf5
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #42332 [ run ] triggered by Bot. Commit: |
Optional field for connectors that run in multi-process mode
(e.g. lmcache-mp). Allows specifying the cache server URL
directly in the YAML config instead of environment variables.
Usage:
kv_connector_config:
connector: lmcache-mp
server_url: tcp://localhost:5555
Signed-off-by: samuel <slshen@uchicago.edu>
|
PR_Github #42332 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #42548 [ run ] triggered by Bot. Commit: |
|
PR_Github #42548 [ run ] completed with state |
|
surely something in this PR is offensive to the CI? 😅 |
|
/bot run --disable-fail-fast |
1 similar comment
|
/bot run --disable-fail-fast |
Our CI has been flaky recently, sorry for the inconvenience. Hopefully we can get this merged soon. Thx for your contribution. |
|
PR_Github #43037 [ run ] triggered by Bot. Commit: |
|
PR_Github #43037 [ run ] completed with state |
…h examples (NVIDIA#12626) Signed-off-by: Ubuntu <ubuntu@g294.voltagepark.net> Signed-off-by: samuel <slshen@uchicago.edu> Co-authored-by: Ubuntu <ubuntu@g294.voltagepark.net>
LMCache side co-PR: LMCache/LMCache#2920 (merge LMCache side first to not have faulty code in TRT-LLM)
Keep changes minimal and as non-intrusive as possible. This PR avoids touching any core TRT files and only engages with configurations and examples.
The high level goal is just:
what exactly each change is doing:
Summary by CodeRabbit
Release Notes
--kv-connectorCLI option to dynamically select cache connectors in the serve command.