Fix vllm to omit ctx size by default allowing for auto detection#2452
Fix vllm to omit ctx size by default allowing for auto detection#2452rhatdan merged 4 commits intocontainers:mainfrom
Conversation
… llama.cpp Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
Reviewer's GuideAdjusts the vLLM engine command to omit --max_model_len when context size is 0 (enabling auto-detection), adds a corresponding engine spec fixture, and introduces tests verifying the generated command line for different context sizes. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Summary of ChangesHello @ieaves, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refines the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The new
if: "{{ args.ctx_size > 0 }}"condition assumesctx_sizeis always defined and numeric; consider guarding forNone/missing or non-numeric values (e.g.,args.ctx_size and args.ctx_size > 0) to avoid template evaluation issues. - There are now two vLLM engine specs (
inference-spec/engines/vllm.yamlandtest/unit/command/data/engines/vllm.yaml) with very similar content; consider refactoring to reduce duplication and the risk of these definitions drifting apart. - The tests cover
ctx_sizeexplicitly set to0and4096, but not the case where it is omitted altogether; please confirm that the default value path (whateverCLIArgssets when no context is provided) behaves correctly with the new conditional logic.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new `if: "{{ args.ctx_size > 0 }}"` condition assumes `ctx_size` is always defined and numeric; consider guarding for `None`/missing or non-numeric values (e.g., `args.ctx_size and args.ctx_size > 0`) to avoid template evaluation issues.
- There are now two vLLM engine specs (`inference-spec/engines/vllm.yaml` and `test/unit/command/data/engines/vllm.yaml`) with very similar content; consider refactoring to reduce duplication and the risk of these definitions drifting apart.
- The tests cover `ctx_size` explicitly set to `0` and `4096`, but not the case where it is omitted altogether; please confirm that the default value path (whatever `CLIArgs` sets when no context is provided) behaves correctly with the new conditional logic.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request correctly modifies the vLLM command generation to omit the context size by default, allowing for auto-detection. The change is accompanied by a new unit test to verify the behavior. My review includes a suggestion to improve the maintainability of the test code by reducing duplication.
Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
vLLM was wired to default to a ctz size of 2048 by default instead of allowing the runtime to pick up the models default context size. This fix omits the argument when unspecified by the user and adds tests.
Summary by Sourcery
Adjust vLLM command configuration to omit context size by default and only pass it when explicitly set, and add tests and test data to validate the behavior.
Bug Fixes:
Tests: