Skip to content

v0.8.0 — Graphable fused decode, CUDA graph capture

Choose a tag to compare

@wizzense wizzense released this 31 Mar 15:31
· 24 commits to main since this release

What's New

  • CUDA graph capture: The fused TQ attention path is now graphable — torch.compile and CUDA graphs can capture the decode kernel
  • 87.9 tok/s aggregate throughput at 5 concurrent sequences (up from 51 tok/s pre-graph)
  • Algorithm file sync from AitherOS production

Install

pip install aither-kvcache==0.8.0