Skip to content

CTranslate2 2.5.0

Compare
Choose a tag to compare
@guillaumekln guillaumekln released this 01 Oct 12:22
· 795 commits to master since this release

New features

  • Add an 8-bit GEMM backend on AArch64 using Ruy

Fixes and improvements

  • Skip unnecessary transpositions of the projected decoder queries in the multi-head attention
  • Use 32-bit indexing in all CUDA kernels to slightly improve performance
  • Let the compiler auto-vectorize the LayerNorm CPU kernel
  • Update Intel oneAPI to 2021.4