This project provides a PowerShell script to automate the setup of the llama.cpp development environment on Windows. It installs all required prerequisites silently, selects an appropriate CUDA Toolkit version, and builds llama.cpp from source.
- No more
-CudaArchflag. The script auto-detects your GPU’s SM (compute capability) via the NVIDIA driver (NVML). If detection isn’t possible, it falls back toCMAKE_CUDA_ARCHITECTURES=native. - Headless VS Build Tools install (via winget). Includes the Windows SDK and required C++ components—no GUI.
- Sane CUDA selection. If your GPU is pre-Turing (SM < 70), the script uses CUDA 12.4 for compatibility; otherwise it uses the latest installed (≥ 12.4).
- Windows 10/11 x64
- PowerShell 7
- Recent NVIDIA driver (no CUDA toolkit required)
- ~20 GB free disk space
- App Installer / winget available (to install dependencies)
- Administrator rights (elevated PowerShell)
The GPU SM auto-detect uses
nvml.dllfrom the NVIDIA driver. If NVML isn’t available, the script falls back to a WMI-based heuristic and then toCMAKE_CUDA_ARCHITECTURES=native.
-
Admin check (must be elevated).
-
Installs prerequisites if missing (silent, via
winget):- Git
- CMake
- Visual Studio 2022 Build Tools (with C++ toolchain and Windows SDK)
- Ninja (and a portable fallback if needed)
-
Chooses CUDA Toolkit:
- Detects your GPU’s SM via NVML.
- SM < 70 (pre-Turing) → installs/uses CUDA 12.4.
- SM ≥ 70 or unknown → uses latest installed CUDA (≥ 12.4).
-
Clones and builds
llama.cppundervendor\llama.cpp.
Run from an elevated PowerShell prompt:
# Allow script execution for this session
Set-ExecutionPolicy Bypass -Scope Process
# Run the installer (auto-detects GPU SM; falls back to native)
./install_llama_cpp.ps1Optional: skip the build step (installs prerequisites + CUDA only):
./install_llama_cpp.ps1 -SkipBuildThe built binaries will be in:
vendor\llama.cpp\build\bin
Run from an elevated PowerShell prompt:
Set-ExecutionPolicy Bypass -Scope Process
./uninstall_llama_cpp.ps1This removes the winget-installed prerequisites and the vendor directory (and portable Ninja if it was created).
The run_llama_cpp_server.ps1 script provides a convenient way to start the llama.cpp server with the Qwen3-4B model.
- Downloads the Model: It automatically downloads the
Qwen3-4B-Instruct-2507-Q8_0-GGUFmodel to amodelssubdirectory if it's not already present. - Starts the Server: It launches the
llama-server.exewith parameters optimized for the Qwen3-4B model. - Opens Web UI: After starting the server, it automatically opens
http://localhost:8080in your default web browser.
To run the server, use the following command in PowerShell:
./run_llama_cpp_server.ps1You can also specify the number of CPU threads to use with the -Threads parameter:
./run_llama_cpp_server.ps1 -Threads 12- winget not found: Install “App Installer” from the Microsoft Store, then re-run.
- Pending reboot: Some installs require a reboot (Windows Update/VS Installer). Reboot and re-run.
- CUDA side-by-side: Multiple CUDA toolkits can co-exist; the uninstaller can remove them via winget.
- NVML missing: The script falls back to a heuristic and then
CMAKE_CUDA_ARCHITECTURES=native. - Locked files: Stop
llama-server/llama-clibefore uninstalling.
This project is licensed under the MIT License. See the LICENSE file for details.
This project is a simplified version of the local-qwen3-coder-env repository, focusing solely on the installation of llama.cpp.