Skip to content

chore: update vllm-metal version to v0.2.0 and vllm version to 0.19.1#876

Merged
ericcurtin merged 7 commits intomainfrom
bump-vllm-and-vllm-metal
Apr 21, 2026
Merged

chore: update vllm-metal version to v0.2.0 and vllm version to 0.19.1#876
ericcurtin merged 7 commits intomainfrom
bump-vllm-and-vllm-metal

Conversation

@ilopezluna
Copy link
Copy Markdown
Contributor

This pull request updates the vLLM Metal backend to use the latest upstream and internal release versions. The main changes are version bumps to ensure compatibility with the newest features and bug fixes.

Dependency version updates:

  • Updated VLLM_UPSTREAM_VERSION to 0.19.1 and VLLM_METAL_RELEASE to v0.2.0-20260420-142150 in .versions to align with the latest vLLM releases.
  • Updated the vllmMetalVersion constant in pkg/inference/backends/vllm/vllm_metal.go to match the new internal release tag.

vLLM: https://github.com/vllm-project/vllm/releases/tag/v0.19.1
vLLM Metal: https://github.com/vllm-project/vllm-metal/releases/tag/v0.2.0-20260420-142150

@ilopezluna ilopezluna requested a review from a team April 20, 2026 19:50
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the vLLM upstream version to 0.19.1 and the vLLM metal release to v0.2.0-20260420-142150 across the configuration and source files. A critical review comment identifies a version inconsistency where VLLM_VERSION remains at 0.19.0 despite the upstream update, which could lead to mismatched components within the system.

Comment thread .versions Outdated
@ericcurtin
Copy link
Copy Markdown
Contributor

Might be worth bumping vllm-metal to 0.19.1 before we do this. 0.19.0 might be better for this PR as .1 is not implemented in vllm-metal.

vllm-metal v0.2.0 caches compiled Metal kernels in
~/.cache/vllm-metal. Add this path to the Python sandbox
file-read* and file-write* allowlists so the engine can
start without a PermissionError.
vllm-metal v0.2.0 compiles Metal kernels at runtime and needs
the Python 3.12 include headers. Instead of removing the entire
include/ directory, only remove non-Python entries to keep the
tarball as small as possible.
vllm-metal v0.2.0 JIT-compiles a paged_ops C++ extension using clang++
at runtime. This fails inside the macOS sandbox which blocks compiler
invocations. Instead, compile the extension during the tarball build
(where Xcode CLT is available) and ship the .so in a prebuilt/ dir.

At install time, model-runner copies the pre-built .so into the user's
~/.cache/vllm-metal/ cache directory. vllm-metal's build.py sees the
cached .so is newer than the sources and skips JIT compilation.

This also reverts the include/ directory preservation since the Python
headers are only needed for compilation, which now happens at build time.
@ilopezluna ilopezluna marked this pull request as draft April 21, 2026 08:38
@ilopezluna ilopezluna marked this pull request as draft April 21, 2026 08:38
The pre-compiled paged_ops .so extension needs to be dlopen()'d at
runtime, which requires mmap with execute permissions. Add a targeted
file-map-executable allowance for ~/.cache/vllm-metal where the
pre-built extension is cached.
@ilopezluna ilopezluna marked this pull request as ready for review April 21, 2026 09:26
@ilopezluna ilopezluna requested a review from ericcurtin April 21, 2026 09:26
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The prebuilt kernel copy logic in downloadAndExtract swallows most failure modes (missing prebuilt dir, read/write errors, etc.); consider logging when the prebuilt directory is absent or when os.ReadFile fails so it’s easier to diagnose why JIT is still happening at runtime.
  • When copying prebuilt extensions, you currently ReadFile and WriteFile the entire contents with a fixed 0755 mode; using io.Copy with os.Open/os.Create and preserving the original file mode from entry.Info() would avoid loading whole files into memory and better respect upstream permissions.
  • In build-vllm-metal-tarball.sh, the glob cp "$WORK_DIR/.cache/vllm-metal/"*_paged_ops* will fail if no matching files exist or if the cache layout changes; it might be safer to check that at least one matching file exists before copying and emit a clear error message if not.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The prebuilt kernel copy logic in `downloadAndExtract` swallows most failure modes (missing `prebuilt` dir, read/write errors, etc.); consider logging when the prebuilt directory is absent or when `os.ReadFile` fails so it’s easier to diagnose why JIT is still happening at runtime.
- When copying prebuilt extensions, you currently `ReadFile` and `WriteFile` the entire contents with a fixed `0755` mode; using `io.Copy` with `os.Open`/`os.Create` and preserving the original file mode from `entry.Info()` would avoid loading whole files into memory and better respect upstream permissions.
- In `build-vllm-metal-tarball.sh`, the glob `cp "$WORK_DIR/.cache/vllm-metal/"*_paged_ops*` will fail if no matching files exist or if the cache layout changes; it might be safer to check that at least one matching file exists before copying and emit a clear error message if not.

## Individual Comments

### Comment 1
<location path="pkg/inference/backends/vllm/vllm_metal.go" line_range="191" />
<code_context>
+		homeDir, err := os.UserHomeDir()
</code_context>
<issue_to_address>
**suggestion:** Consider surfacing or aggregating non-transient errors during prebuilt cache setup instead of fully swallowing them.

This block currently ignores all errors except a WARN on failed writes. Since this cache helps avoid JIT failures in the sandbox, it’d be helpful to separate best-effort failures from configuration issues. At minimum, consider logging when `os.UserHomeDir`, `os.ReadDir`, or `os.MkdirAll` fail, or logging once when prebuilt cache setup is skipped entirely, to ease debugging of misconfigured environments.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

for _, entry := range entries {
src := filepath.Join(prebuiltDir, entry.Name())
dst := filepath.Join(cacheDir, entry.Name())
if data, cpErr := os.ReadFile(src); cpErr == nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Consider surfacing or aggregating non-transient errors during prebuilt cache setup instead of fully swallowing them.

This block currently ignores all errors except a WARN on failed writes. Since this cache helps avoid JIT failures in the sandbox, it’d be helpful to separate best-effort failures from configuration issues. At minimum, consider logging when os.UserHomeDir, os.ReadDir, or os.MkdirAll fail, or logging once when prebuilt cache setup is skipped entirely, to ease debugging of misconfigured environments.

@ericcurtin ericcurtin merged commit 6178207 into main Apr 21, 2026
14 checks passed
@ericcurtin ericcurtin deleted the bump-vllm-and-vllm-metal branch April 21, 2026 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants