Skip to content

v2.0.1

Choose a tag to compare

@wizzense wizzense released this 07 Apr 23:18
· 3 commits to main since this release

Fixed

  • TriAttention exports: rom aither_kvcache.triattention import TriAttention, get_profile, get_config_for_model now works
  • Multi-family calibration profiles: 13 models across Qwen3.5, Nemotron (Qwen3), DeepSeek-R1 (Qwen2), Llama 3.1

Changed

  • README: Native vLLM integration (--kv-cache-dtype tq-t4nc) as primary workflow
  • Notebook: Complete rewrite for v2.0 APIs, memory calculator, TriAttention quick start

Production Verified

  • Nemotron-Orchestrator-8B-AWQ on RTX 5090: 250,928 tokens, 6.1x concurrent sessions at 40K context