TurboQuant macOS ARM64 (c419fd5)
TurboQuant KV Cache — macOS ARM64 (Metal)
Built from feature/turboquant-kv-cache branch at commit c419fd5.
What's included
llama-serverwith--cache-type-k turbo3/turbo4supportllama-cli,llama-bench,llama-perplexity- Metal backend with BF16 + embedded shader library
Usage
# Option 1: zip (notarized + stapled)
unzip llama-turboquant-macos-arm64.zip
# Option 2: tar.gz
tar -xzf llama-turboquant-macos-arm64.tar.gz
./build/bin/llama-server -m model.gguf --cache-type-k turbo3 --cache-type-v turbo3For Atomic Chat integration
Replace the binary at:
~/Library/Application Support/Atomic Chat/data/llamacpp/backends/<version>/macos-arm64/build/bin/llama-server