CTranslate2 2.5.0

guillaumekln released this 01 Oct 12:22

· 795 commits to master since this release

New features

Add an 8-bit GEMM backend on AArch64 using Ruy

Fixes and improvements

Skip unnecessary transpositions of the projected decoder queries in the multi-head attention
Use 32-bit indexing in all CUDA kernels to slightly improve performance
Let the compiler auto-vectorize the LayerNorm CPU kernel
Update Intel oneAPI to 2021.4

Assets 2