v0.5.0 — Blackwell SM_120 validated
What's New
Fused Triton decode validated on NVIDIA Blackwell (RTX 5090, SM_120)
- Triton kernels generate valid PTX for SM_120 without modification
AITHER_TQ_FORCE_TRITON=1overrides conservative SM_100+ guard- 26.1 tok/s decode throughput at 5 concurrent requests
- TQGPUCache: 36L x 856B @ TQ4, 512 MB VRAM + 512 MB DDR5
- Zero runtime errors under concurrent load
Effective Context Budget (RTX 5090, 32 GiB)
| Tier | Format | Capacity |
|---|---|---|
| Hot (VRAM) | TQ4 | ~280K tokens |
| Cold (DDR5) | TQ4 | ~3.9M tokens |
Env Vars
AITHER_TQ_BITS=4 # 2, 3, or 4
AITHER_TQ_FUSED=1 # fused Triton decode
AITHER_TQ_FORCE_TRITON=1 # required on Blackwell SM_100+Install
pip install aither-kvcache==0.5.0
pip install aither-kvcache[vllm]==0.5.0Full blog post: https://demo.aitherium.com/blog/turboquant-sub-byte-kv-cache-from-paper-to-production