Skip to content

CTranslate2 4.2.0

Compare
Choose a tag to compare
@minhthuc2502 minhthuc2502 released this 10 Apr 11:41
· 26 commits to master since this release
e491a51

New features

  • Support Flash Attention (#1651)
  • Implementation of gemm for FLOAT32 compute type with RUY backend (#1598)
  • Conv1D quantization for only CPU (DNNL and CUDA backend is not supported) (#1601)

Fixes and improvements

  • Fix bug tensor parallel (#1643)
  • Use BestSampler when temperature is 0 (#1659)
  • Fix bug gemma (#1660)
  • Optimize loading/unloading time for Translator with cache (#1645)