Release vLLM v0.14.2 — Pre-built Windows Binary · aivrar/vllm-windows-build

vLLM for Windows — One-Click Install

Pre-built vLLM v0.14.2 with all CUDA kernels compiled. No build tools needed.

Download

Usage

Extract the zip anywhere
Double-click launch.bat
On first run, it auto-installs Python 3.10, PyTorch 2.9.1+cu126, and vLLM (~2.5 GB download)
Select a model from the interactive picker, or pass --model path\to\model

launch.bat                                    # interactive model selector
launch.bat --model E:\models\Qwen2.5-1.5B    # direct launch
launch.bat --model E:\models\Phi-4 --port 8000 --gpu-memory-utilization 0.8

What's included

File	Description
`launch.bat`	One-click launcher (start here)
`install.bat`	Portable multi-stage installer (Python, PyTorch, vLLM)
`vllm_launcher.py`	OpenAI-compatible server with interactive model selector
`build_wheel.py`	Re-package script (advanced, for rebuilding the wheel)
`dist/vllm-*.whl`	Pre-built vLLM wheel (380 MB, all 5 compiled .pyd extensions)
`vllm-windows.patch`	Source patch for building from scratch

Compiled extensions included

vllm/_C.pyd (142 MB) — core CUDA ops
vllm/_moe_C.pyd (91 MB) — mixture of experts
vllm/cumem_allocator.pyd — CUDA memory allocator
vllm/vllm_flash_attn/_vllm_fa2_C.pyd (426 MB) — Flash Attention 2
vllm/vllm_flash_attn/_vllm_fa3_C.pyd (626 MB) — Flash Attention 3

Requirements

Windows 10/11 (64-bit)
NVIDIA GPU with CUDA Compute Capability 7.0+ (RTX 20xx or newer)
CUDA 12.6 runtime (driver 560+)
~5 GB disk space (after install)
Internet connection (first run only)

API endpoints

GET  /health                 → {"status": "ok"}
GET  /v1/models              → list loaded models
POST /v1/chat/completions    → OpenAI-compatible chat (with tool calling)
POST /v1/embeddings          → text embeddings (--task embed)

Works with any OpenAI-compatible client — just point base_url at http://127.0.0.1:8100/v1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM v0.14.2 — Pre-built Windows Binary

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

vLLM for Windows — One-Click Install

Download

Usage

What's included

Compiled extensions included

Requirements

API endpoints

Uh oh!