Skylark llama-server v1.0
Pre-releaseInitial binary release of llama-server with proprietary TBQ3 KV cache compression and EAGLE/MTP speculative decoding extensions.
Hardware: NVIDIA GPUs with compute capability 6.1 (Pascal) through 9.0 (Hopper) — Pascal P40/P100, Volta V100, Turing T4/RTX 20-series, Ampere A100/RTX 30-series, Ada RTX 40-series, Hopper H100/H200.
System: Linux x86_64, glibc 2.35+ (Ubuntu 22.04 / RHEL 9 / etc.), CUDA 12+ runtime (provided by your NVIDIA driver install — libraries not bundled).
Bundle contents: stripped llama-server binary + ggml/llama/mtmd shared libraries (PTX removed, SASS only — Pascal through Hopper), NOTICE (MIT attribution + Skylark proprietary terms), README.md, SETUP.md (step-by-step install guide), and kv-cache-guide.md (KV preset selection).
Usage: drop-in replacement for upstream llama-server. See SETUP.md in the bundle for first-run instructions and kv-cache-guide.md for the new KV preset options (tbq3_1, tbq3_2).
See bundled NOTICE for licensing — the binary is proprietary; the source on this branch (eagle-public) is MIT. Inquiries: info@skylarksoftware.me
SHA256: fd99d81e0d46047c890ea13b188b046b28a2f7411c9a13e41d771321cf14addb