Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Distil-Whisper #533

Open
sanchit-gandhi opened this issue Nov 1, 2023 · 12 comments
Open

Support for Distil-Whisper #533

sanchit-gandhi opened this issue Nov 1, 2023 · 12 comments

Comments

@sanchit-gandhi
Copy link
Contributor

sanchit-gandhi commented Nov 1, 2023

Hey @guillaumekln! Thanks for this fantastic resource. We're looking at supporting the Distil-Whisper checkpoints in faster-whisper.

The checkpoints are fairly easy to convert: we just pin the number of decoder layers to 2 always, and load the 32 encoder layers.

For inference, we found a chunk length of 15-20s to be optimal for WER performance of the distilled model, see Table 23 of the paper:

Screenshot 2023-11-01 at 17 05 58

Would you be open to a PR allowing the user to specify the chunk length and also the maximum generation length? This would enable full support of Distil-Whisper in faster-whisper!

@Axbon
Copy link

Axbon commented Nov 2, 2023

This would be amazing, one step closer to something that feels realtime*ish

@ozancaglayan
Copy link
Contributor

Is the distilled large-v2 model still multilingual or does it lose that attribute due to how distilling was done?

@AmgadHasan
Copy link

Is the distilled large-v2 model still multilingual or does it lose that attribute due to how distilling was done?

It was trained on English audio only so it most probably lost its multilingual capabilities.

@guilhermehge
Copy link

Is the distilled large-v2 model still multilingual or does it lose that attribute due to how distilling was done?

From their repo: "Note: Distil-Whisper is currently only available for English speech recognition. Multilingual support will be provided soon."

@hoonlight
Copy link
Contributor

Waiting for a multilingual model for this task. Looking forward to it

@martinkallstrom
Copy link

Subscribing for updates!

@silvacarl2
Copy link

can't wait to test it, this will be awesome.

@WikiLucas00
Copy link

hi @sanchit-gandhi. FYI, @guillaumekln's account seems inactive since September, which correlates with the moment he moved from his former company. I don't know if he plans to continue maintaining this repo or if other users such as OpenNMT devs (cc @vince62s @homink @nguyendc-systran) have ownership on the repo or forked it. Maybe consider forking it yourself under HF's Github namespace?

@guillaumekln
Copy link
Contributor

Hi, I confirm that I'm no longer actively maintaining this repo but other people can still make it move forward. Please ping @nguyendc-systran to merge changes in faster-whisper. For anything related to CTranslate2, please ping @vince62s.

@BBC-Esq
Copy link
Contributor

BBC-Esq commented Nov 9, 2023

Hi, I confirm that I'm no longer actively maintaining this repo but other people can still make it move forward. Please ping @nguyendc-systran to merge changes in faster-whisper. For anything related to CTranslate2, please ping @vince62s.

Great job on Ctranslate2 and faster-whisper, glad I came across it awhile ago now...and good luck in the future.

@WikiLucas00
Copy link

FYI, distil-whisper should now be supported by CTranslate2: https://github.com/OpenNMT/CTranslate2/releases/tag/v3.21.0

We "just" need to adapt faster-whisper in order to have faster-distil-whisper :)

@metame-none
Copy link
Contributor

metame-none commented Nov 11, 2023

FYI, I create a PR: #557 to support distil-whisper, hope it helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests