-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Enabling Safe KVConnector #19972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Enabling Safe KVConnector #19972
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @prashant182, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the resilience of vLLM's integration with LMCache by introducing a new SafeLMCacheConnectorV1
. This connector acts as a circuit breaker, ensuring that vLLM can gracefully handle LMCache service outages by falling back to native caching, thereby preventing system failures and enabling automatic recovery when the LMCache service is restored.
Highlights
- Enhanced Resilience: Implemented
SafeLMCacheConnectorV1
to wrap the existingLMCacheConnectorV1
, providing a circuit breaker pattern that prevents vLLM from crashing when LMCache is unavailable. - Automatic Fallback & Recovery: When LMCache experiences failures, the system automatically falls back to vLLM's native caching. The circuit breaker includes self-healing logic with exponential backoff to attempt recovery once the LMCache service is restored.
- Circuit Breaker Logic: The new connector manages its state (CLOSED, OPEN, HALF_OPEN) based on LMCache operation successes and failures, with configurable thresholds (e.g., 3 failures to open the circuit) and recovery timeouts.
- Graceful Degradation: During LMCache outages, operations are short-circuited, and predefined fallback results are returned, ensuring vLLM continues to function without critical dependencies on LMCache.
- New Configuration Option: Users can now enable this safe connector via the
--kv-transfer-config
flag, specifying"kv_connector":"SafeLMCacheConnectorV1"
.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces SafeLMCacheConnectorV1
, a robust circuit-breaker wrapper for LMCacheConnectorV1
, to enhance system resilience against LMCache service failures. The implementation is solid and well-documented.
My review focuses on improving the maintainability and robustness of the new connector. I've identified a piece of unused code and a magic number that could be refactored. I've also suggested a change to make the fallback mechanism for failed operations more explicit and less prone to silent errors in the future.
Overall, this is a great contribution to improving the stability of vLLM when using LMCache.
def _get_fallback_result(self, operation: str): | ||
"""Get appropriate fallback result for each operation type.""" | ||
fallbacks = { | ||
'get_num_new_matched_tokens': (0, False), | ||
'update_state_after_alloc': None, | ||
'build_connector_meta': KVConnectorMetadata(), | ||
'request_finished': (False, None), | ||
'start_load_kv': None, | ||
'wait_for_layer_load': None, | ||
'save_kv_layer': None, | ||
'wait_for_save': None, | ||
'get_finished': (None, None), | ||
'register_kv_caches': None, | ||
} | ||
return fallbacks.get(operation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using fallbacks.get(operation)
can silently return None
for operations not defined in the fallbacks
dictionary. This could lead to hard-to-debug TypeError
exceptions if a new method is added to the base connector interface and a fallback is forgotten here. A more robust approach is to explicitly check for the operation and raise an error if it's not found, ensuring all operations are handled.
def _get_fallback_result(self, operation: str): | |
"""Get appropriate fallback result for each operation type.""" | |
fallbacks = { | |
'get_num_new_matched_tokens': (0, False), | |
'update_state_after_alloc': None, | |
'build_connector_meta': KVConnectorMetadata(), | |
'request_finished': (False, None), | |
'start_load_kv': None, | |
'wait_for_layer_load': None, | |
'save_kv_layer': None, | |
'wait_for_save': None, | |
'get_finished': (None, None), | |
'register_kv_caches': None, | |
} | |
return fallbacks.get(operation) | |
def _get_fallback_result(self, operation: str): | |
"""Get appropriate fallback result for each operation type.""" | |
fallbacks = { | |
'get_num_new_matched_tokens': (0, False), | |
'update_state_after_alloc': None, | |
'build_connector_meta': KVConnectorMetadata(), | |
'request_finished': (False, None), | |
'start_load_kv': None, | |
'wait_for_layer_load': None, | |
'save_kv_layer': None, | |
'wait_for_save': None, | |
'get_finished': (None, None), | |
'register_kv_caches': None, | |
} | |
if operation not in fallbacks: | |
raise NotImplementedError( | |
f"Fallback for operation '{operation}' is not implemented.") | |
return fallbacks[operation] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggestion to raise NotImplementedError
for unknown operations contradicts our design goal. This circuit breaker is specifically designed to make
LMCache non-critical to vLLM operation. Raising exceptions would crash vLLM when encountering unknown operations, defeating the entire purpose of graceful
degradation.
vllm/distributed/kv_transfer/kv_connector/v1/safe_lmcache_connector.py
Outdated
Show resolved
Hide resolved
vllm/distributed/kv_transfer/kv_connector/v1/safe_lmcache_connector.py
Outdated
Show resolved
Hide resolved
684782a
to
8a8edb1
Compare
Signed-off-by: Prashant Patel <pnp249@nyu.edu>
@prashant182 Thanks for this improvement! This
I guess this case is covered by #19330, would do you think? |
Hey @prashant182
Is safe connector created to handle this ? Error -
|
PR Description: Add SafeLMCacheConnectorV1 for Non-Critical LMCache Integration
Purpose
Add SafeLMCacheConnectorV1, a circuit breaker wrapper that prevents vLLM failures when LMCache is unavailable. This makes LMCache non-critical - vLLM continues serving requests using native caching when LMCache fails, and automatically recovers when service is restored.
Fixes the issue where LMCache service problems cause complete vLLM system failures.
Test Plan
Test Result
Circuit Breaker Working:
System Resilience:
(Optional) Documentation Update
None required. Drop-in replacement for LMCacheConnectorV1 with identical API.