Skip to content

llama.cpp b9601 with CUDA

Latest

Choose a tag to compare

@github-actions github-actions released this 12 Jun 05:23
20acfc9

llama.cpp b9601 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b9601
Commit: 4c6595503fe45d5a39f88d194e270f64c7424677

CUDA Versions

  • CUDA 12.8 - GPU compute capabilities: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Host architectures

Tarballs are published per host CPU architecture (Linux):

  • -amd64.tar.gz — x86_64 (most desktops, servers, cloud VMs)
  • -arm64.tar.gz — aarch64 (Grace Hopper / Grace Blackwell / DGX Spark / Ampere Altra)

GPU compute capability reference

  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200, GH200
  • 10.0: B200, GB200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the tarball matching your host CPU arch and CUDA version, then extract:

# amd64 host
tar -xzf llama.cpp-b9601-cuda-12.8-amd64.tar.gz
# arm64 host (e.g. Grace Blackwell)
tar -xzf llama.cpp-b9601-cuda-12.8-arm64.tar.gz
./llama-cli --help