Skip to content

TurboQuant macOS ARM64 (c419fd5)

Choose a tag to compare

TurboQuant KV Cache — macOS ARM64 (Metal)

Built from feature/turboquant-kv-cache branch at commit c419fd5.

What's included

  • llama-server with --cache-type-k turbo3 / turbo4 support
  • llama-cli, llama-bench, llama-perplexity
  • Metal backend with BF16 + embedded shader library

Usage

# Option 1: zip (notarized + stapled)
unzip llama-turboquant-macos-arm64.zip
# Option 2: tar.gz
tar -xzf llama-turboquant-macos-arm64.tar.gz

./build/bin/llama-server -m model.gguf --cache-type-k turbo3 --cache-type-v turbo3

For Atomic Chat integration

Replace the binary at:

~/Library/Application Support/Atomic Chat/data/llamacpp/backends/<version>/macos-arm64/build/bin/llama-server