Fix vllm to omit ctx size by default allowing for auto detection by ieaves · Pull Request #2452 · containers/ramalama

ieaves · 2026-02-23T21:48:32Z

vLLM was wired to default to a ctz size of 2048 by default instead of allowing the runtime to pick up the models default context size. This fix omits the argument when unspecified by the user and adds tests.

Summary by Sourcery

Adjust vLLM command configuration to omit context size by default and only pass it when explicitly set, and add tests and test data to validate the behavior.

Bug Fixes:

Ensure vLLM server commands no longer default to a fixed max_model_len and instead rely on auto-detection when no context size is provided.

Tests:

Add unit tests and test spec data for vLLM engine command generation covering cases with and without an explicit context size.

… llama.cpp Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>

sourcery-ai · 2026-02-23T21:48:39Z

Reviewer's Guide

Adjusts the vLLM engine command to omit --max_model_len when context size is 0 (enabling auto-detection), adds a corresponding engine spec fixture, and introduces tests verifying the generated command line for different context sizes.

File-Level Changes

Change	Details	Files
Make vLLM runtime omit --max_model_len when context size is zero to allow engine-side auto-detection.	Changed vllm engine spec to set --max_model_len value directly from args.ctx_size Added a conditional so --max_model_len is only included when args.ctx_size > 0 instead of always defaulting to 2048	`inference-spec/engines/vllm.yaml`
Add unit coverage for vLLM command generation with and without explicit context size.	Introduced parameterized test to assert generated vLLM serve command when ctx_size is 0 vs 4096 Constructed Ramalama*Context objects and used CommandFactory to build the command based on CLIArgs	`test/unit/command/test_factory.py`
Provide a dedicated vLLM engine spec fixture for unit tests.	Added a test-only vLLM engine YAML spec mirroring the runtime spec behavior Ensured the spec uses conditional --max_model_len emission based on args.ctx_size > 0	`test/unit/command/data/engines/vllm.yaml`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

gemini-code-assist · 2026-02-23T21:48:50Z

Summary of Changes

Hello @ieaves, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the vLLM inference engine configuration to enable automatic context size detection. By making the --max_model_len argument conditional, the system now defaults to allowing vLLM to determine the optimal context length when no specific size is provided, mirroring the behavior of llama.cpp. This change improves flexibility and reduces the need for explicit context size configuration, accompanied by new tests to ensure correct functionality.

Highlights

vLLM Context Size Handling: Modified the vllm.yaml engine specification to conditionally include the --max_model_len argument only when a context size (args.ctx_size) greater than 0 is provided. This allows vLLM to auto-detect the context size by default, aligning with llama.cpp's behavior.
New vLLM Test Specification: Introduced a new test specification file test/unit/command/data/engines/vllm.yaml to reflect the updated vLLM engine configuration.
Unit Test for vLLM Context Size: Added a new parameterized unit test in test_factory.py to validate that the --max_model_len argument is correctly omitted or included based on the provided context size for vLLM commands.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

inference-spec/engines/vllm.yaml
- Removed the default value for --max_model_len and added a conditional if: "{{ args.ctx_size > 0 }}" to its inclusion.
test/unit/command/data/engines/vllm.yaml
- Added a new YAML file defining the vllm engine specification for testing purposes, including the conditional --max_model_len argument.
test/unit/command/test_factory.py
- Introduced test_vllm_ctx_size, a parameterized test function to verify the command generation for vllm with and without a specified context size.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

sourcery-ai

Hey - I've left some high level feedback:

The new if: "{{ args.ctx_size > 0 }}" condition assumes ctx_size is always defined and numeric; consider guarding for None/missing or non-numeric values (e.g., args.ctx_size and args.ctx_size > 0) to avoid template evaluation issues.
There are now two vLLM engine specs (inference-spec/engines/vllm.yaml and test/unit/command/data/engines/vllm.yaml) with very similar content; consider refactoring to reduce duplication and the risk of these definitions drifting apart.
The tests cover ctx_size explicitly set to 0 and 4096, but not the case where it is omitted altogether; please confirm that the default value path (whatever CLIArgs sets when no context is provided) behaves correctly with the new conditional logic.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The new `if: "{{ args.ctx_size > 0 }}"` condition assumes `ctx_size` is always defined and numeric; consider guarding for `None`/missing or non-numeric values (e.g., `args.ctx_size and args.ctx_size > 0`) to avoid template evaluation issues.
- There are now two vLLM engine specs (`inference-spec/engines/vllm.yaml` and `test/unit/command/data/engines/vllm.yaml`) with very similar content; consider refactoring to reduce duplication and the risk of these definitions drifting apart.
- The tests cover `ctx_size` explicitly set to `0` and `4096`, but not the case where it is omitted altogether; please confirm that the default value path (whatever `CLIArgs` sets when no context is provided) behaves correctly with the new conditional logic.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

gemini-code-assist

Code Review

This pull request correctly modifies the vLLM command generation to omit the context size by default, allowing for auto-detection. The change is accompanied by a new unit test to verify the behavior. My review includes a suggestion to improve the maintainability of the test code by reducing duplication.

test/unit/command/test_factory.py

Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>

rhatdan

LGTM

fix vllm to omit ctx size by default allowing for auto detection like…

4b0b37b

… llama.cpp Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>

ieaves requested review from bmahabirbu, cgruver, engelmi, jhjaggars, maxamillion, mikebonnet, olliewalsh, rhatdan and swarajpande5 as code owners February 23, 2026 21:48

ieaves changed the title ~~fix vllm to omit ctx size by default allowing for auto detection like…~~ Fix vllm to omit ctx size by default allowing for auto detection Feb 23, 2026

ieaves temporarily deployed to macos-installer February 23, 2026 21:48 — with GitHub Actions Inactive

sourcery-ai bot reviewed Feb 23, 2026

View reviewed changes

gemini-code-assist bot reviewed Feb 23, 2026

View reviewed changes

test/unit/command/test_factory.py Outdated Show resolved Hide resolved

update test structure

ddbf347

Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>

ieaves temporarily deployed to macos-installer February 23, 2026 21:53 — with GitHub Actions Inactive

remove bad tests

bd5b477

Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>

ieaves temporarily deployed to macos-installer February 23, 2026 22:08 — with GitHub Actions Inactive

updated e2e tests for new vllm defaults

6b8ce9e

Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>

ieaves temporarily deployed to macos-installer February 24, 2026 17:06 — with GitHub Actions Inactive

rhatdan approved these changes Feb 24, 2026

View reviewed changes

rhatdan merged commit 421ee0a into containers:main Feb 24, 2026
35 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix vllm to omit ctx size by default allowing for auto detection#2452

Fix vllm to omit ctx size by default allowing for auto detection#2452
rhatdan merged 4 commits intocontainers:mainfrom
ramalama-labs:feat/vllm-default-ctx

ieaves commented Feb 23, 2026 •

edited

Loading

Uh oh!

sourcery-ai bot commented Feb 23, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

gemini-code-assist bot commented Feb 23, 2026

Uh oh!

sourcery-ai bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

rhatdan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ieaves commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

gemini-code-assist bot commented Feb 23, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

rhatdan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ieaves commented Feb 23, 2026 •

edited

Loading

sourcery-ai bot commented Feb 23, 2026 •

edited

Loading