Skip to content

v1.1.1

Choose a tag to compare

@wizzense wizzense released this 04 Apr 16:17
· 12 commits to main since this release

What's New

  • Benchmark tables in README — KV cache memory by model, throughput on RTX 5090, max context window, quantization quality across all modes
  • vLLM quickstart notebook (notebooks/vllm_quickstart.ipynb) — step-by-step from install to graph-aware eviction
  • GitHub Discussions enabled for Q&A
  • Community section in README linking Discussions and Issues

No code changes — docs and usability improvements only.