Skip to content

Fix vllm to omit ctx size by default allowing for auto detection#2452

Merged
rhatdan merged 4 commits intocontainers:mainfrom
ramalama-labs:feat/vllm-default-ctx
Feb 24, 2026
Merged

Fix vllm to omit ctx size by default allowing for auto detection#2452
rhatdan merged 4 commits intocontainers:mainfrom
ramalama-labs:feat/vllm-default-ctx

Conversation

@ieaves
Copy link
Collaborator

@ieaves ieaves commented Feb 23, 2026

vLLM was wired to default to a ctz size of 2048 by default instead of allowing the runtime to pick up the models default context size. This fix omits the argument when unspecified by the user and adds tests.

Summary by Sourcery

Adjust vLLM command configuration to omit context size by default and only pass it when explicitly set, and add tests and test data to validate the behavior.

Bug Fixes:

  • Ensure vLLM server commands no longer default to a fixed max_model_len and instead rely on auto-detection when no context size is provided.

Tests:

  • Add unit tests and test spec data for vLLM engine command generation covering cases with and without an explicit context size.

… llama.cpp

Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Feb 23, 2026

Reviewer's Guide

Adjusts the vLLM engine command to omit --max_model_len when context size is 0 (enabling auto-detection), adds a corresponding engine spec fixture, and introduces tests verifying the generated command line for different context sizes.

File-Level Changes

Change Details Files
Make vLLM runtime omit --max_model_len when context size is zero to allow engine-side auto-detection.
  • Changed vllm engine spec to set --max_model_len value directly from args.ctx_size
  • Added a conditional so --max_model_len is only included when args.ctx_size > 0 instead of always defaulting to 2048
inference-spec/engines/vllm.yaml
Add unit coverage for vLLM command generation with and without explicit context size.
  • Introduced parameterized test to assert generated vLLM serve command when ctx_size is 0 vs 4096
  • Constructed Ramalama*Context objects and used CommandFactory to build the command based on CLIArgs
test/unit/command/test_factory.py
Provide a dedicated vLLM engine spec fixture for unit tests.
  • Added a test-only vLLM engine YAML spec mirroring the runtime spec behavior
  • Ensured the spec uses conditional --max_model_len emission based on args.ctx_size > 0
test/unit/command/data/engines/vllm.yaml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@ieaves ieaves changed the title fix vllm to omit ctx size by default allowing for auto detection like… Fix vllm to omit ctx size by default allowing for auto detection Feb 23, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ieaves, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the vLLM inference engine configuration to enable automatic context size detection. By making the --max_model_len argument conditional, the system now defaults to allowing vLLM to determine the optimal context length when no specific size is provided, mirroring the behavior of llama.cpp. This change improves flexibility and reduces the need for explicit context size configuration, accompanied by new tests to ensure correct functionality.

Highlights

  • vLLM Context Size Handling: Modified the vllm.yaml engine specification to conditionally include the --max_model_len argument only when a context size (args.ctx_size) greater than 0 is provided. This allows vLLM to auto-detect the context size by default, aligning with llama.cpp's behavior.
  • New vLLM Test Specification: Introduced a new test specification file test/unit/command/data/engines/vllm.yaml to reflect the updated vLLM engine configuration.
  • Unit Test for vLLM Context Size: Added a new parameterized unit test in test_factory.py to validate that the --max_model_len argument is correctly omitted or included based on the provided context size for vLLM commands.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • inference-spec/engines/vllm.yaml
    • Removed the default value for --max_model_len and added a conditional if: "{{ args.ctx_size > 0 }}" to its inclusion.
  • test/unit/command/data/engines/vllm.yaml
    • Added a new YAML file defining the vllm engine specification for testing purposes, including the conditional --max_model_len argument.
  • test/unit/command/test_factory.py
    • Introduced test_vllm_ctx_size, a parameterized test function to verify the command generation for vllm with and without a specified context size.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@ieaves ieaves temporarily deployed to macos-installer February 23, 2026 21:48 — with GitHub Actions Inactive
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The new if: "{{ args.ctx_size > 0 }}" condition assumes ctx_size is always defined and numeric; consider guarding for None/missing or non-numeric values (e.g., args.ctx_size and args.ctx_size > 0) to avoid template evaluation issues.
  • There are now two vLLM engine specs (inference-spec/engines/vllm.yaml and test/unit/command/data/engines/vllm.yaml) with very similar content; consider refactoring to reduce duplication and the risk of these definitions drifting apart.
  • The tests cover ctx_size explicitly set to 0 and 4096, but not the case where it is omitted altogether; please confirm that the default value path (whatever CLIArgs sets when no context is provided) behaves correctly with the new conditional logic.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new `if: "{{ args.ctx_size > 0 }}"` condition assumes `ctx_size` is always defined and numeric; consider guarding for `None`/missing or non-numeric values (e.g., `args.ctx_size and args.ctx_size > 0`) to avoid template evaluation issues.
- There are now two vLLM engine specs (`inference-spec/engines/vllm.yaml` and `test/unit/command/data/engines/vllm.yaml`) with very similar content; consider refactoring to reduce duplication and the risk of these definitions drifting apart.
- The tests cover `ctx_size` explicitly set to `0` and `4096`, but not the case where it is omitted altogether; please confirm that the default value path (whatever `CLIArgs` sets when no context is provided) behaves correctly with the new conditional logic.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly modifies the vLLM command generation to omit the context size by default, allowing for auto-detection. The change is accompanied by a new unit test to verify the behavior. My review includes a suggestion to improve the maintainability of the test code by reducing duplication.

Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
@ieaves ieaves temporarily deployed to macos-installer February 23, 2026 21:53 — with GitHub Actions Inactive
Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
@ieaves ieaves temporarily deployed to macos-installer February 23, 2026 22:08 — with GitHub Actions Inactive
Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
@ieaves ieaves temporarily deployed to macos-installer February 24, 2026 17:06 — with GitHub Actions Inactive
Copy link
Member

@rhatdan rhatdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rhatdan rhatdan merged commit 421ee0a into containers:main Feb 24, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants