Enabling Safe KVConnector #19972

prashant182 · 2025-06-23T07:01:56Z

PR Description: Add SafeLMCacheConnectorV1 for Non-Critical LMCache Integration

Purpose

Add SafeLMCacheConnectorV1, a circuit breaker wrapper that prevents vLLM failures when LMCache is unavailable. This makes LMCache non-critical - vLLM continues serving requests using native caching when LMCache fails, and automatically recovers when service is restored.

Fixes the issue where LMCache service problems cause complete vLLM system failures.

Test Plan

# Test with safe connector
--kv-transfer-config '{"kv_connector":"SafeLMCacheConnectorV1", "kv_role":"kv_both"}'

# Break LMCache service (stop daemon, network issues)
# Verify vLLM continues working

# Restore LMCache service  
# Verify automatic recovery

Test Result

Circuit Breaker Working:

Opens after 3 LMCache failures
vLLM continues processing requests without crashes
Automatic recovery when LMCache service restored
Sub-millisecond overhead during normal operation

System Resilience:

Zero system failures during LMCache outages
Graceful fallback to native vLLM caching
Self-healing with exponential backoff (30-300s)

(Optional) Documentation Update

None required. Drop-in replacement for LMCacheConnectorV1 with identical API.

gemini-code-assist

Summary of Changes

Hello @prashant182, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the resilience of vLLM's integration with LMCache by introducing a new SafeLMCacheConnectorV1. This connector acts as a circuit breaker, ensuring that vLLM can gracefully handle LMCache service outages by falling back to native caching, thereby preventing system failures and enabling automatic recovery when the LMCache service is restored.

Highlights

Enhanced Resilience: Implemented SafeLMCacheConnectorV1 to wrap the existing LMCacheConnectorV1, providing a circuit breaker pattern that prevents vLLM from crashing when LMCache is unavailable.
Automatic Fallback & Recovery: When LMCache experiences failures, the system automatically falls back to vLLM's native caching. The circuit breaker includes self-healing logic with exponential backoff to attempt recovery once the LMCache service is restored.
Circuit Breaker Logic: The new connector manages its state (CLOSED, OPEN, HALF_OPEN) based on LMCache operation successes and failures, with configurable thresholds (e.g., 3 failures to open the circuit) and recovery timeouts.
Graceful Degradation: During LMCache outages, operations are short-circuited, and predefined fallback results are returned, ensuring vLLM continues to function without critical dependencies on LMCache.
New Configuration Option: Users can now enable this safe connector via the --kv-transfer-config flag, specifying "kv_connector":"SafeLMCacheConnectorV1".

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2025-06-23T07:03:46Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request introduces SafeLMCacheConnectorV1, a robust circuit-breaker wrapper for LMCacheConnectorV1, to enhance system resilience against LMCache service failures. The implementation is solid and well-documented.

My review focuses on improving the maintainability and robustness of the new connector. I've identified a piece of unused code and a magic number that could be refactored. I've also suggested a change to make the fallback mechanism for failed operations more explicit and less prone to silent errors in the future.

Overall, this is a great contribution to improving the stability of vLLM when using LMCache.

gemini-code-assist · 2025-06-23T07:03:50Z

vllm/distributed/kv_transfer/kv_connector/v1/safe_lmcache_connector.py

+    def _get_fallback_result(self, operation: str):
+        """Get appropriate fallback result for each operation type."""
+        fallbacks = {
+            'get_num_new_matched_tokens': (0, False),
+            'update_state_after_alloc': None,
+            'build_connector_meta': KVConnectorMetadata(),
+            'request_finished': (False, None),
+            'start_load_kv': None,
+            'wait_for_layer_load': None,
+            'save_kv_layer': None,
+            'wait_for_save': None,
+            'get_finished': (None, None),
+            'register_kv_caches': None,
+        }
+        return fallbacks.get(operation)


Using fallbacks.get(operation) can silently return None for operations not defined in the fallbacks dictionary. This could lead to hard-to-debug TypeError exceptions if a new method is added to the base connector interface and a fallback is forgotten here. A more robust approach is to explicitly check for the operation and raise an error if it's not found, ensuring all operations are handled.

Suggested change

def _get_fallback_result(self, operation: str):

"""Get appropriate fallback result for each operation type."""

fallbacks = {

'get_num_new_matched_tokens': (0, False),

'update_state_after_alloc': None,

'build_connector_meta': KVConnectorMetadata(),

'request_finished': (False, None),

'start_load_kv': None,

'wait_for_layer_load': None,

'save_kv_layer': None,

'wait_for_save': None,

'get_finished': (None, None),

'register_kv_caches': None,

}

return fallbacks.get(operation)

def _get_fallback_result(self, operation: str):

"""Get appropriate fallback result for each operation type."""

fallbacks = {

'get_num_new_matched_tokens': (0, False),

'update_state_after_alloc': None,

'build_connector_meta': KVConnectorMetadata(),

'request_finished': (False, None),

'start_load_kv': None,

'wait_for_layer_load': None,

'save_kv_layer': None,

'wait_for_save': None,

'get_finished': (None, None),

'register_kv_caches': None,

}

if operation not in fallbacks:

raise NotImplementedError(

f"Fallback for operation '{operation}' is not implemented.")

return fallbacks[operation]

The suggestion to raise NotImplementedError for unknown operations contradicts our design goal. This circuit breaker is specifically designed to make
LMCache non-critical to vLLM operation. Raising exceptions would crash vLLM when encountering unknown operations, defeating the entire purpose of graceful
degradation.

vllm/distributed/kv_transfer/kv_connector/v1/safe_lmcache_connector.py

Signed-off-by: Prashant Patel <pnp249@nyu.edu>

maobaolong · 2025-06-26T00:50:56Z

@prashant182 Thanks for this improvement! This SafeLMCacheConnectorV1 is better and more robust than the default lmcache connector, but there is a bad case which cannot be covered by this PR.

Scheduler call get_num_new_matched_tokens and return an non-zero integer as normal, but there are something wrong within Worker call the wait_for_layer_load or start_load_kv for the same request.

I guess this case is covered by #19330, would do you think?

gargnipungarg · 2025-06-30T11:05:33Z

Hey @prashant182
I am using this command to test out local redis integration for LMcache
but it does not work

LMCACHE_USE_EXPERIMENTAL=True LMCACHE_LOCAL_CPU=False LMCACHE_CHUNK_SIZE=256 LMCACHE_REMOTE_URL="redis://localhost:6379" LMCACHE_REMOTE_SERDE="naive" vllm serve /models --max-model-len 16384 --kv-transfer-config '{"kv_connector":"LMCacheConnectorV1", "kv_role":"kv_both"}'

Is safe connector created to handle this ?

Error -

                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/vllm/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 53, in __init__
    self._init_executor()
  File "/opt/conda/envs/vllm/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor
    self.collective_rpc("init_device")
  File "/opt/conda/envs/vllm/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
    answer = run_method(self.driver_worker, method, args, kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/vllm/lib/python3.12/site-packages/vllm/utils.py", line 2671, in run_method
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 606, in init_device
    self.worker.init_device()  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker.py", line 193, in init_device
    init_worker_distributed_environment(self.vllm_config, self.rank,
  File "/opt/conda/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker.py", line 537, in init_worker_distributed_environment
    ensure_kv_transfer_initialized(vllm_config)
  File "/opt/conda/envs/vllm/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
    _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector_v0(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/vllm/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 49, in create_connector_v0
    assert issubclass(connector_cls, KVConnectorBase)

gemini-code-assist bot reviewed Jun 23, 2025

View reviewed changes

prashant182 force-pushed the main branch 9 times, most recently from 684782a to 8a8edb1 Compare June 23, 2025 18:56

Enabling Safe KVConnector

993e667

Signed-off-by: Prashant Patel <pnp249@nyu.edu>

prashant182 force-pushed the main branch from 8a8edb1 to 993e667 Compare June 23, 2025 19:32

njhill mentioned this pull request Jun 25, 2025

[Feature]: Fallback strategy for KV loading of KVConnectors. #20063

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Enabling Safe KVConnector #19972

Enabling Safe KVConnector #19972

Uh oh!

prashant182 commented Jun 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

github-actions bot commented Jun 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jun 23, 2025

Uh oh!

prashant182 Jun 23, 2025

Uh oh!

Uh oh!

Uh oh!

maobaolong commented Jun 26, 2025 •

edited

Loading

Uh oh!

gargnipungarg commented Jun 30, 2025 •

edited by njhill

Loading

Uh oh!

Uh oh!

Uh oh!

Enabling Safe KVConnector #19972

Are you sure you want to change the base?

Enabling Safe KVConnector #19972

Uh oh!

Conversation

prashant182 commented Jun 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Description: Add SafeLMCacheConnectorV1 for Non-Critical LMCache Integration

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Jun 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

prashant182 Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

maobaolong commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gargnipungarg commented Jun 30, 2025 • edited by njhill Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

prashant182 commented Jun 23, 2025 •

edited by github-actions bot

Loading

maobaolong commented Jun 26, 2025 •

edited

Loading

gargnipungarg commented Jun 30, 2025 •

edited by njhill

Loading