v2.0.0 — TriAttention spectral KV compression
What's New in v2.0.0
TriAttention — Spectral KV Cache Compression (NEW)
Trigonometric series attention for ~10× KV cache compression. Exploits the fact that RoPE attention is naturally a trig series in position difference — store only the dominant frequency pairs per token instead of full vectors.
Modules:
- \�ither_kvcache.triattention.config\ — \TriAttentionConfig\
- \�ither_kvcache.triattention.encoder\ — \SpectralKVEncoder\ (encode/decode)
- \�ither_kvcache.triattention.scorer\ — \TrigSeriesScorer\ (pre-RoPE trig series scoring)
- \�ither_kvcache.triattention.cache\ — \SpectralKVCache\ (paged block storage)
- \�ither_kvcache.triattention.attention\ — \TriAttention\ (full pipeline)
- \�ither_kvcache.triattention.calibration\ — Qwen3.5 family profiles (0.6B–32B)
Compression:
| Mode | Bytes/Token | vs FP16 |
|---|---|---|
| F=12, int4 | 26 | 9.85× |
| F=12, int8 | 38 | 6.74× |
| F=8, int4 | 18 | 14.2× |
\\python
from aither_kvcache.triattention import TriAttention, TriAttentionConfig
config = TriAttentionConfig(head_dim=128, num_freqs=12, coeff_bits=4,
num_kv_heads=8, num_query_heads=32)
tri = TriAttention(config, device='cuda')
k_enc, v_enc = tri.encode_kv(keys, values)
output = tri.decode_step(query, k_enc, v_enc, query_pos, key_positions)
\\
TurboQuant Core Sync
- Updated quantizer, hybrid_quantizer, fused_attention, fused_kv_update, triton_ops, block_selector from monorepo
Tests
- 184 passed, 5 skipped (GPU-only tests)
- 32 new TriAttention tests covering spectral math, encoder roundtrip, trig series scoring, paged cache, Qwen3.5 calibration
Full Changelog: v1.3.1...v2.0.0