Skip to content

CTranslate2 2.11.0

Compare
Choose a tag to compare
@guillaumekln guillaumekln released this 11 Jan 12:44
· 678 commits to master since this release

Changes

  • With CUDA >= 11.2, the environment variable CT2_CUDA_ALLOCATOR now defaults to cuda_malloc_async which should improve performance on GPU.

New features

  • Build Python wheels for AArch64 Linux

Fixes and improvements

  • Improve performance of Gather CUDA kernel by using vectorized copy
  • Update Intel oneAPI to 2022.1
  • Update oneDNN to 2.5.1
  • Log some additional information with CT2_VERBOSE >= 1:
    • Location and compute type of loaded models
    • Version of the dynamically loaded cuBLAS library
    • Selected CUDA memory allocator