What's New
- TQ35-PRIMARY mode: 329,072 tokens on a single consumer GPU (4x compression over standard fp8 pages)
- CUDA graph diagnosis: identified graph capture corruption as root cause of gibberish output after spec decoding experiments
- Recommended config:
AITHER_TQ_MODE=tq4-primary + AITHER_TQ_EAGER=1 — 309K tokens, coherent output, 6-8 tok/s
Configuration
# Maximum capacity (slower decode, ~2-5 tok/s)
AITHER_TQ_MODE=tq35-primary
# Best balance (309K tokens, ~6-8 tok/s, coherent)
AITHER_TQ_MODE=tq4-primary
AITHER_TQ_EAGER=1
pip install aither-kvcache==0.8.1