Skip to content

v4.0 - CUDA 12.1+ support!

Compare
Choose a tag to compare
@BBC-Esq BBC-Esq released this 22 Feb 18:00
· 280 commits to main since this release
5410c1b

NOTE~

This release is only for Windows. Linux and MacOs users should continue to use v3.5.2 until I can get those versions up and running. Download the ZIP file from Release 3.5.2 and follow the instructions in the readme.md INCLUDING the prerequisites, which are different than this release.

CUDA 12.1+ support finally brings support for flash attention 2 and other improvements. Those will be implemented in subsequent incremental releases. For this initial release, the following major improvements have been made:

The transcribe Tool has had a major improvement due to CUDA 12.1+ being supported. It allowed switching from faster-whisper to the amazing new library (only ~75 stars) located here:

https://github.com/shashikg/WhisperS2T

In summary, this library enables "batch" processing of audio using ctranslate2 version 4.0, which supports CUDA 12.1+. Here is a comparison of a long audio file under Release 3.5.1 versus this release:

Release 3.5.2

large-v2 model, float16 = 10 minutes 1 second

This release:

large-v2, float16, speed set at 50 = 54 seconds
medium.en, float16, speed set at 75 = 32 seconds
small.en, float16, speed set at 100 = 15 seconds!!!

This cannot be understated and has been a feature that faster-whisper, while great, has been lacking for quite some time.

KNOWN ISSUES:

  1. Bark models will still run but with errors printed to the command prompt. This will be fixed as flash attention 2 is implemented and the option to NOT use FA2 is made available, thus preventing the errors.

  2. The voice transcriber sometimes takes way longer than in release 3.5.2 and/or prints multiple transcriptions. This is due to issues with the faster-whisper library itself - not ctranslate2 - since the improved transcribe file tool works just fine and it relies on ctranslate2 already. If it's not addressed in the future, this may require switching from faster-whisper to something else like whisperS2T, but for shorter audio faster-whisper is probably just as good and I'd rather keep it if possible.

  3. The transcriber tool no longer lets you choose a quantization nor compute device (e.g. cuda or cpu). This was a choice in order to get initial CUDA 12+ support as soon as possible. It'll be addressed in subsequent releases.

Please contact me if you to help out if you want faster releases for Linux and MacoS as I don't own those systems. I plan to update support for Linux systems using Nvidia GPUs on windows and AMD GPUs on Linux, just like before, as well as MacOs support, just like before.