Skip to content

Enabling Safe KVConnector #19972

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

prashant182
Copy link

@prashant182 prashant182 commented Jun 23, 2025

PR Description: Add SafeLMCacheConnectorV1 for Non-Critical LMCache Integration

Purpose

Add SafeLMCacheConnectorV1, a circuit breaker wrapper that prevents vLLM failures when LMCache is unavailable. This makes LMCache non-critical - vLLM continues serving requests using native caching when LMCache fails, and automatically recovers when service is restored.

Fixes the issue where LMCache service problems cause complete vLLM system failures.

Test Plan

# Test with safe connector
--kv-transfer-config '{"kv_connector":"SafeLMCacheConnectorV1", "kv_role":"kv_both"}'

# Break LMCache service (stop daemon, network issues)
# Verify vLLM continues working

# Restore LMCache service  
# Verify automatic recovery

Test Result

Circuit Breaker Working:

  • Opens after 3 LMCache failures
  • vLLM continues processing requests without crashes
  • Automatic recovery when LMCache service restored
  • Sub-millisecond overhead during normal operation

System Resilience:

  • Zero system failures during LMCache outages
  • Graceful fallback to native vLLM caching
  • Self-healing with exponential backoff (30-300s)

(Optional) Documentation Update

None required. Drop-in replacement for LMCacheConnectorV1 with identical API.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @prashant182, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the resilience of vLLM's integration with LMCache by introducing a new SafeLMCacheConnectorV1. This connector acts as a circuit breaker, ensuring that vLLM can gracefully handle LMCache service outages by falling back to native caching, thereby preventing system failures and enabling automatic recovery when the LMCache service is restored.

Highlights

  • Enhanced Resilience: Implemented SafeLMCacheConnectorV1 to wrap the existing LMCacheConnectorV1, providing a circuit breaker pattern that prevents vLLM from crashing when LMCache is unavailable.
  • Automatic Fallback & Recovery: When LMCache experiences failures, the system automatically falls back to vLLM's native caching. The circuit breaker includes self-healing logic with exponential backoff to attempt recovery once the LMCache service is restored.
  • Circuit Breaker Logic: The new connector manages its state (CLOSED, OPEN, HALF_OPEN) based on LMCache operation successes and failures, with configurable thresholds (e.g., 3 failures to open the circuit) and recovery timeouts.
  • Graceful Degradation: During LMCache outages, operations are short-circuited, and predefined fallback results are returned, ensuring vLLM continues to function without critical dependencies on LMCache.
  • New Configuration Option: Users can now enable this safe connector via the --kv-transfer-config flag, specifying "kv_connector":"SafeLMCacheConnectorV1".
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces SafeLMCacheConnectorV1, a robust circuit-breaker wrapper for LMCacheConnectorV1, to enhance system resilience against LMCache service failures. The implementation is solid and well-documented.

My review focuses on improving the maintainability and robustness of the new connector. I've identified a piece of unused code and a magic number that could be refactored. I've also suggested a change to make the fallback mechanism for failed operations more explicit and less prone to silent errors in the future.

Overall, this is a great contribution to improving the stability of vLLM when using LMCache.

Comment on lines 180 to 194
def _get_fallback_result(self, operation: str):
"""Get appropriate fallback result for each operation type."""
fallbacks = {
'get_num_new_matched_tokens': (0, False),
'update_state_after_alloc': None,
'build_connector_meta': KVConnectorMetadata(),
'request_finished': (False, None),
'start_load_kv': None,
'wait_for_layer_load': None,
'save_kv_layer': None,
'wait_for_save': None,
'get_finished': (None, None),
'register_kv_caches': None,
}
return fallbacks.get(operation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using fallbacks.get(operation) can silently return None for operations not defined in the fallbacks dictionary. This could lead to hard-to-debug TypeError exceptions if a new method is added to the base connector interface and a fallback is forgotten here. A more robust approach is to explicitly check for the operation and raise an error if it's not found, ensuring all operations are handled.

Suggested change
def _get_fallback_result(self, operation: str):
"""Get appropriate fallback result for each operation type."""
fallbacks = {
'get_num_new_matched_tokens': (0, False),
'update_state_after_alloc': None,
'build_connector_meta': KVConnectorMetadata(),
'request_finished': (False, None),
'start_load_kv': None,
'wait_for_layer_load': None,
'save_kv_layer': None,
'wait_for_save': None,
'get_finished': (None, None),
'register_kv_caches': None,
}
return fallbacks.get(operation)
def _get_fallback_result(self, operation: str):
"""Get appropriate fallback result for each operation type."""
fallbacks = {
'get_num_new_matched_tokens': (0, False),
'update_state_after_alloc': None,
'build_connector_meta': KVConnectorMetadata(),
'request_finished': (False, None),
'start_load_kv': None,
'wait_for_layer_load': None,
'save_kv_layer': None,
'wait_for_save': None,
'get_finished': (None, None),
'register_kv_caches': None,
}
if operation not in fallbacks:
raise NotImplementedError(
f"Fallback for operation '{operation}' is not implemented.")
return fallbacks[operation]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion to raise NotImplementedError for unknown operations contradicts our design goal. This circuit breaker is specifically designed to make
LMCache non-critical to vLLM operation. Raising exceptions would crash vLLM when encountering unknown operations, defeating the entire purpose of graceful
degradation.

@prashant182 prashant182 force-pushed the main branch 9 times, most recently from 684782a to 8a8edb1 Compare June 23, 2025 18:56
Signed-off-by: Prashant Patel <pnp249@nyu.edu>
@maobaolong
Copy link
Contributor

maobaolong commented Jun 26, 2025

@prashant182 Thanks for this improvement! This SafeLMCacheConnectorV1 is better and more robust than the default lmcache connector, but there is a bad case which cannot be covered by this PR.

  • Scheduler call get_num_new_matched_tokens and return an non-zero integer as normal, but there are something wrong within Worker call the wait_for_layer_load or start_load_kv for the same request.

I guess this case is covered by #19330, would do you think?

@gargnipungarg
Copy link

gargnipungarg commented Jun 30, 2025

Hey @prashant182
I am using this command to test out local redis integration for LMcache
but it does not work

LMCACHE_USE_EXPERIMENTAL=True LMCACHE_LOCAL_CPU=False LMCACHE_CHUNK_SIZE=256 LMCACHE_REMOTE_URL="redis://localhost:6379" LMCACHE_REMOTE_SERDE="naive" vllm serve /models --max-model-len 16384 --kv-transfer-config '{"kv_connector":"LMCacheConnectorV1", "kv_role":"kv_both"}'

Is safe connector created to handle this ?

Error -

                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/vllm/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 53, in __init__
    self._init_executor()
  File "/opt/conda/envs/vllm/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor
    self.collective_rpc("init_device")
  File "/opt/conda/envs/vllm/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
    answer = run_method(self.driver_worker, method, args, kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/vllm/lib/python3.12/site-packages/vllm/utils.py", line 2671, in run_method
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 606, in init_device
    self.worker.init_device()  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker.py", line 193, in init_device
    init_worker_distributed_environment(self.vllm_config, self.rank,
  File "/opt/conda/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker.py", line 537, in init_worker_distributed_environment
    ensure_kv_transfer_initialized(vllm_config)
  File "/opt/conda/envs/vllm/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
    _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector_v0(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/vllm/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 49, in create_connector_v0
    assert issubclass(connector_cls, KVConnectorBase)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants