v2.0.1

wizzense released this 07 Apr 23:18

· 3 commits to main since this release

3d9cae0

Fixed

TriAttention exports: rom aither_kvcache.triattention import TriAttention, get_profile, get_config_for_model now works
Multi-family calibration profiles: 13 models across Qwen3.5, Nemotron (Qwen3), DeepSeek-R1 (Qwen2), Llama 3.1

Changed

README: Native vLLM integration (--kv-cache-dtype tq-t4nc) as primary workflow
Notebook: Complete rewrite for v2.0 APIs, memory calculator, TriAttention quick start

Production Verified

Nemotron-Orchestrator-8B-AWQ on RTX 5090: 250,928 tokens, 6.1x concurrent sessions at 40K context

Assets 2