Release Skylark llama-server v1.0 · Skylark-Software/EagleBranch

Initial binary release of llama-server with proprietary TBQ3 KV cache compression and EAGLE/MTP speculative decoding extensions.

Hardware: NVIDIA GPUs with compute capability 6.1 (Pascal) through 9.0 (Hopper) — Pascal P40/P100, Volta V100, Turing T4/RTX 20-series, Ampere A100/RTX 30-series, Ada RTX 40-series, Hopper H100/H200.

System: Linux x86_64, glibc 2.35+ (Ubuntu 22.04 / RHEL 9 / etc.), CUDA 12+ runtime (provided by your NVIDIA driver install — libraries not bundled).

Bundle contents: stripped llama-server binary + ggml/llama/mtmd shared libraries (PTX removed, SASS only — Pascal through Hopper), NOTICE (MIT attribution + Skylark proprietary terms), README.md, SETUP.md (step-by-step install guide), and kv-cache-guide.md (KV preset selection).

Usage: drop-in replacement for upstream llama-server. See SETUP.md in the bundle for first-run instructions and kv-cache-guide.md for the new KV preset options (tbq3_1, tbq3_2).

See bundled NOTICE for licensing — the binary is proprietary; the source on this branch (eagle-public) is MIT. Inquiries: info@skylarksoftware.me

SHA256: fd99d81e0d46047c890ea13b188b046b28a2f7411c9a13e41d771321cf14addb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skylark llama-server v1.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!