v1.1.1
What's New
- Benchmark tables in README — KV cache memory by model, throughput on RTX 5090, max context window, quantization quality across all modes
- vLLM quickstart notebook (
notebooks/vllm_quickstart.ipynb) — step-by-step from install to graph-aware eviction - GitHub Discussions enabled for Q&A
- Community section in README linking Discussions and Issues
No code changes — docs and usability improvements only.