forked from ggerganov/whisper.cpp
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
support tdrz via simple hack overriding solm tokens
- Loading branch information
Showing
1 changed file
with
9 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
50c822c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because
token_transcribe
shares the id astoken_solm
in this change, it did not work immediately. However, I adopted your comments in Lines 387-388 regarding the technically correct .en model ids by setting the translate and transcribe ids as follows:Then, whisper.cpp was showing
[SPEAKER TURN]
in the output for my test audio.Clever proof-of-concept, fine-tuning for that unused token!
50c822c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for giving it a spin! I've fixed the incorrect task token ids in a follow up commit.
50c822c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice. How would I go about using this with the larger models? i.e. the whisper-large? I assume I need to create my own ggml version of your approach?
50c822c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! That's correct, it will need another finetuned checkpoint. Releasing finetuning code is on the roadmap, which should provide a reference. I anticipate things to be a little more tricky with the multi-task/multilingual models, so i'd say
large-v2
won't be imminent very soon. Unless of course, contributions are always welcome.