Skip to content

vLLM v0.14.2 — Pre-built Windows Binary

Choose a tag to compare

@aivrar aivrar released this 28 Feb 17:39
· 16 commits to master since this release

vLLM for Windows — One-Click Install

Pre-built vLLM v0.14.2 with all CUDA kernels compiled. No build tools needed.

Download

vllm-0.14.2-win.zip (371 MB)

Usage

  1. Extract the zip anywhere
  2. Double-click launch.bat
  3. On first run, it auto-installs Python 3.10, PyTorch 2.9.1+cu126, and vLLM (~2.5 GB download)
  4. Select a model from the interactive picker, or pass --model path\to\model
launch.bat                                    # interactive model selector
launch.bat --model E:\models\Qwen2.5-1.5B    # direct launch
launch.bat --model E:\models\Phi-4 --port 8000 --gpu-memory-utilization 0.8

What's included

File Description
launch.bat One-click launcher (start here)
install.bat Portable multi-stage installer (Python, PyTorch, vLLM)
vllm_launcher.py OpenAI-compatible server with interactive model selector
build_wheel.py Re-package script (advanced, for rebuilding the wheel)
dist/vllm-*.whl Pre-built vLLM wheel (380 MB, all 5 compiled .pyd extensions)
vllm-windows.patch Source patch for building from scratch

Compiled extensions included

  • vllm/_C.pyd (142 MB) — core CUDA ops
  • vllm/_moe_C.pyd (91 MB) — mixture of experts
  • vllm/cumem_allocator.pyd — CUDA memory allocator
  • vllm/vllm_flash_attn/_vllm_fa2_C.pyd (426 MB) — Flash Attention 2
  • vllm/vllm_flash_attn/_vllm_fa3_C.pyd (626 MB) — Flash Attention 3

Requirements

  • Windows 10/11 (64-bit)
  • NVIDIA GPU with CUDA Compute Capability 7.0+ (RTX 20xx or newer)
  • CUDA 12.6 runtime (driver 560+)
  • ~5 GB disk space (after install)
  • Internet connection (first run only)

API endpoints

GET  /health                 → {"status": "ok"}
GET  /v1/models              → list loaded models
POST /v1/chat/completions    → OpenAI-compatible chat (with tool calling)
POST /v1/embeddings          → text embeddings (--task embed)

Works with any OpenAI-compatible client — just point base_url at http://127.0.0.1:8100/v1.