CTranslate2 2.5.0
New features
- Add an 8-bit GEMM backend on AArch64 using Ruy
Fixes and improvements
- Skip unnecessary transpositions of the projected decoder queries in the multi-head attention
- Use 32-bit indexing in all CUDA kernels to slightly improve performance
- Let the compiler auto-vectorize the
LayerNorm
CPU kernel - Update Intel oneAPI to 2021.4