Skip to content

CTranslate2 3.19.0

Compare
Choose a tag to compare
@guillaumekln guillaumekln released this 31 Aug 14:36
· 94 commits to master since this release

Changes

  • Binary wheels for Python 3.7 are no longer built

New features

  • Build wheels for Python 3.12
  • Update the Transformers converter to support more model architectures:
    • Falcon-RW
    • DistilBERT
    • Llama with linear RoPE scaling (e.g. Vicuna v1.5)
    • Llama with a non default RoPE base period (e.g. CodeLlama)
  • Accept the token type IDs as inputs for encoder models
  • Add property GenerationStepResult.hypothesis_id to identify the different hypotheses when running random sampling with num_hypotheses > 1

Fixes and improvements

  • Improve performance of 8-bit models on CPU:
    • Vectorize the GEMM output dequantization
    • Fuse the GEMM output dequantization with bias and activation
  • Allow inputs shorter than 30 seconds in Whisper methods
  • Fix incorrect batch_id values passed to the callback function
  • Fix a shape error in models using both MQA and relative positions
  • Fix compilation error related to AVX512 when using GCC 7
  • Call .detach() on PyTorch tensors before getting the Numpy array in converters