Utilizing CTRanslate2 for multiple times speed up #836

alexkrasov · 2026-02-17T00:20:56Z

alexkrasov
Feb 17, 2026

Hey folks, I wanted to throw out an idea: adding a CTranslate2 backend to Handy.

In plain terms, this could make local transcription feel way snappier, especially on CPU-only machines.

On my own older i7 (no GPU), I tested Whisper Large V3 Turbo, 8-bit quantized, and got almost real-time speed: about
10 seconds of audio processed in ~10 seconds.
That was many (like 10) times faster than a similar “regular” setup I've tried before (all large whisper models, including turbo, are basically unusable on my machine).

If Handy could support this engine, it could be a big quality-of-life win for a lot of people, especially anyone not
running a strong GPU.

If anyone wants to play with this stack, check out faster-whisper - it’s a great way to see these speed gains in
practice.

In the end, this could make Whisper models genuinely usable on CPU, which is basically not practical right now for
many users.

cjpais · 2026-02-17T01:15:38Z

cjpais
Feb 17, 2026
Maintainer

I didn't realize it had good gains for CPU only, I will take a closer look if for only that reason

I've played with it before and it's quite good. Problem, potentially more to bundle

4 replies

alexkrasov Feb 17, 2026
Author

That's great, thanks. The performance difference with 8-bit int is quite amazing, think it's worth the extra bundling :)

cjpais Feb 17, 2026
Maintainer

What is the time for the CPU with handy for 10 seconds of audio right now? It does use whisper.cpp under the hood but I would expect it to be maybe 1/2 real time? like 20 sec for 10 sec audio? just trying to estimate

for what its worth the non-whisper models are cpu only and typically much much faster than whisper. I generally dont recommend whisper unless you need the multilingual capability. Parakeet v3 runs 5x real time on an old i5 I had

alexkrasov Feb 17, 2026
Author

I tested both on the same 9-second audio clip (pulled from Handy history). For the comparison, I used a CTranslate2 setup via a small Python CLI I wrote with faster-whisper (https://github.com/SYSTRAN/faster-whisper).

Both runs were CPU-only on an Intel Core i7-8700. Handy used the Whisper Turbo model. The CTranslate2 run used Whisper large-v3-turbo, 8-bit quantized, in CPU execution mode.

The difference was quite huge: 68 seconds with Handy versus 16 seconds with CTranslate2.

cjpais Feb 18, 2026
Maintainer

thanks for the info wow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Utilizing CTRanslate2 for multiple times speed up #836

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Utilizing CTRanslate2 for multiple times speed up #836

Uh oh!

alexkrasov Feb 17, 2026

Replies: 1 comment · 4 replies

Uh oh!

Uh oh!

cjpais Feb 17, 2026 Maintainer

Uh oh!

alexkrasov Feb 17, 2026 Author

Uh oh!

Uh oh!

cjpais Feb 17, 2026 Maintainer

Uh oh!

Uh oh!

alexkrasov Feb 17, 2026 Author

Uh oh!

cjpais Feb 18, 2026 Maintainer

alexkrasov
Feb 17, 2026

Replies: 1 comment 4 replies

cjpais
Feb 17, 2026
Maintainer

alexkrasov Feb 17, 2026
Author

cjpais Feb 17, 2026
Maintainer

alexkrasov Feb 17, 2026
Author

cjpais Feb 18, 2026
Maintainer