v0.5.0 — Blackwell SM_120 validated

wizzense released this 30 Mar 17:09

· 27 commits to main since this release

d281c46

What's New

Fused Triton decode validated on NVIDIA Blackwell (RTX 5090, SM_120)

Triton kernels generate valid PTX for SM_120 without modification
AITHER_TQ_FORCE_TRITON=1 overrides conservative SM_100+ guard
26.1 tok/s decode throughput at 5 concurrent requests
TQGPUCache: 36L x 856B @ TQ4, 512 MB VRAM + 512 MB DDR5
Zero runtime errors under concurrent load

Effective Context Budget (RTX 5090, 32 GiB)

Tier	Format	Capacity
Hot (VRAM)	TQ4	~280K tokens
Cold (DDR5)	TQ4	~3.9M tokens

Env Vars

AITHER_TQ_BITS=4           # 2, 3, or 4
AITHER_TQ_FUSED=1          # fused Triton decode
AITHER_TQ_FORCE_TRITON=1   # required on Blackwell SM_100+

Install

pip install aither-kvcache==0.5.0
pip install aither-kvcache[vllm]==0.5.0

Full blog post: https://demo.aitherium.com/blog/turboquant-sub-byte-kv-cache-from-paper-to-production

Assets 2