chore: update vllm-metal version to v0.2.0 and vllm version to 0.19.1#876
chore: update vllm-metal version to v0.2.0 and vllm version to 0.19.1#876ericcurtin merged 7 commits intomainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the vLLM upstream version to 0.19.1 and the vLLM metal release to v0.2.0-20260420-142150 across the configuration and source files. A critical review comment identifies a version inconsistency where VLLM_VERSION remains at 0.19.0 despite the upstream update, which could lead to mismatched components within the system.
|
Might be worth bumping vllm-metal to 0.19.1 before we do this. 0.19.0 might be better for this PR as .1 is not implemented in vllm-metal. |
vllm-metal v0.2.0 caches compiled Metal kernels in ~/.cache/vllm-metal. Add this path to the Python sandbox file-read* and file-write* allowlists so the engine can start without a PermissionError.
vllm-metal v0.2.0 compiles Metal kernels at runtime and needs the Python 3.12 include headers. Instead of removing the entire include/ directory, only remove non-Python entries to keep the tarball as small as possible.
vllm-metal v0.2.0 JIT-compiles a paged_ops C++ extension using clang++ at runtime. This fails inside the macOS sandbox which blocks compiler invocations. Instead, compile the extension during the tarball build (where Xcode CLT is available) and ship the .so in a prebuilt/ dir. At install time, model-runner copies the pre-built .so into the user's ~/.cache/vllm-metal/ cache directory. vllm-metal's build.py sees the cached .so is newer than the sources and skips JIT compilation. This also reverts the include/ directory preservation since the Python headers are only needed for compilation, which now happens at build time.
The pre-compiled paged_ops .so extension needs to be dlopen()'d at runtime, which requires mmap with execute permissions. Add a targeted file-map-executable allowance for ~/.cache/vllm-metal where the pre-built extension is cached.
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The prebuilt kernel copy logic in
downloadAndExtractswallows most failure modes (missingprebuiltdir, read/write errors, etc.); consider logging when the prebuilt directory is absent or whenos.ReadFilefails so it’s easier to diagnose why JIT is still happening at runtime. - When copying prebuilt extensions, you currently
ReadFileandWriteFilethe entire contents with a fixed0755mode; usingio.Copywithos.Open/os.Createand preserving the original file mode fromentry.Info()would avoid loading whole files into memory and better respect upstream permissions. - In
build-vllm-metal-tarball.sh, the globcp "$WORK_DIR/.cache/vllm-metal/"*_paged_ops*will fail if no matching files exist or if the cache layout changes; it might be safer to check that at least one matching file exists before copying and emit a clear error message if not.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The prebuilt kernel copy logic in `downloadAndExtract` swallows most failure modes (missing `prebuilt` dir, read/write errors, etc.); consider logging when the prebuilt directory is absent or when `os.ReadFile` fails so it’s easier to diagnose why JIT is still happening at runtime.
- When copying prebuilt extensions, you currently `ReadFile` and `WriteFile` the entire contents with a fixed `0755` mode; using `io.Copy` with `os.Open`/`os.Create` and preserving the original file mode from `entry.Info()` would avoid loading whole files into memory and better respect upstream permissions.
- In `build-vllm-metal-tarball.sh`, the glob `cp "$WORK_DIR/.cache/vllm-metal/"*_paged_ops*` will fail if no matching files exist or if the cache layout changes; it might be safer to check that at least one matching file exists before copying and emit a clear error message if not.
## Individual Comments
### Comment 1
<location path="pkg/inference/backends/vllm/vllm_metal.go" line_range="191" />
<code_context>
+ homeDir, err := os.UserHomeDir()
</code_context>
<issue_to_address>
**suggestion:** Consider surfacing or aggregating non-transient errors during prebuilt cache setup instead of fully swallowing them.
This block currently ignores all errors except a WARN on failed writes. Since this cache helps avoid JIT failures in the sandbox, it’d be helpful to separate best-effort failures from configuration issues. At minimum, consider logging when `os.UserHomeDir`, `os.ReadDir`, or `os.MkdirAll` fail, or logging once when prebuilt cache setup is skipped entirely, to ease debugging of misconfigured environments.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| for _, entry := range entries { | ||
| src := filepath.Join(prebuiltDir, entry.Name()) | ||
| dst := filepath.Join(cacheDir, entry.Name()) | ||
| if data, cpErr := os.ReadFile(src); cpErr == nil { |
There was a problem hiding this comment.
suggestion: Consider surfacing or aggregating non-transient errors during prebuilt cache setup instead of fully swallowing them.
This block currently ignores all errors except a WARN on failed writes. Since this cache helps avoid JIT failures in the sandbox, it’d be helpful to separate best-effort failures from configuration issues. At minimum, consider logging when os.UserHomeDir, os.ReadDir, or os.MkdirAll fail, or logging once when prebuilt cache setup is skipped entirely, to ease debugging of misconfigured environments.
This pull request updates the vLLM Metal backend to use the latest upstream and internal release versions. The main changes are version bumps to ensure compatibility with the newest features and bug fixes.
Dependency version updates:
VLLM_UPSTREAM_VERSIONto0.19.1andVLLM_METAL_RELEASEtov0.2.0-20260420-142150in.versionsto align with the latest vLLM releases.vllmMetalVersionconstant inpkg/inference/backends/vllm/vllm_metal.goto match the new internal release tag.vLLM: https://github.com/vllm-project/vllm/releases/tag/v0.19.1
vLLM Metal: https://github.com/vllm-project/vllm-metal/releases/tag/v0.2.0-20260420-142150