Skip to content

v1.3.0

Choose a tag to compare

@Alberto-Codes Alberto-Codes released this 31 Mar 23:54
b02544d

1.3.0 (2026-03-31)

Features

  • kv-cache: add sliding window attention bypass for Gemma models (#53) (ee5300e)
  • triton: add head_dim 64/96 kernel support with non-pow2 padding (#52) (44edd60)
  • verify: add head_dim 256 support and validate Gemma-2/Gemma-3 (#55) (e0f5d45)
  • verify: validate Phi-3-mini compression quality (#54) (f5fec46)
  • verify: validate Qwen2.5-3B and Phi-4 compression quality (#51) (48243c9)

Bug Fixes

  • benchmark: add head_dim detection division guard (#56) (380a7eb)
  • triton: add defensive assertions and throughput penalty docs (44edd60)
  • vllm: guard forward() against kv_cache=None during profiling (#57) (3c5fdde)

Performance Improvements

  • experiments: add experiment 022 clip duration comparison (#47) (55503a9)
  • experiments: add experiment 023 frame count sweep (#49) (d025208)
  • experiments: add experiment 024 zero-change model probe (#50) (d11fc3a)

Documentation

  • roadmap: add experiment 023 frame count sweep findings (d025208)