Skip to content

Commit

Permalink
support tdrz via simple hack overriding solm tokens
Browse files Browse the repository at this point in the history
  • Loading branch information
akashmjn committed Jun 19, 2023
1 parent 12778c4 commit 50c822c
Showing 1 changed file with 9 additions and 6 deletions.
15 changes: 9 additions & 6 deletions whisper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -378,14 +378,14 @@ struct whisper_vocab {

id token_eot = 50256;
id token_sot = 50257;
id token_solm = 50359; // ?? TODO@Akash - rename appropriately
id token_prev = 50360;
id token_solm = 50361; // ??
id token_not = 50362; // no timestamps
id token_beg = 50363;
id token_beg = 50363; // begin timestamps

// available tasks
static const id token_translate = 50358;
static const id token_transcribe = 50359;
static const id token_translate = 50358; // TODO@Akash - technically it's 50357 for .en models
static const id token_transcribe = 50359; // TODO@Akash - technically it's 50358 for .en models

bool is_multilingual() const {
return n_vocab == 51865;
Expand Down Expand Up @@ -3545,7 +3545,7 @@ static void whisper_process_logits(

// suppress sot and solm tokens
logits[vocab.token_sot] = -INFINITY;
logits[vocab.token_solm] = -INFINITY;
// logits[vocab.token_solm] = -INFINITY;

// suppress task tokens
logits[vocab.token_translate] = -INFINITY;
Expand Down Expand Up @@ -4524,7 +4524,6 @@ int whisper_full_with_state(
prompt_past.push_back(tokens_cur[i].id);
}

// store the text from this iteration
if (!tokens_cur.empty() && ctx->model.n_loaded > 0) {
int i0 = 0;
auto t0 = seek + 2*(tokens_cur.front().tid - whisper_token_beg(ctx));
Expand All @@ -4541,6 +4540,10 @@ int whisper_full_with_state(
text += whisper_token_to_str(ctx, tokens_cur[i].id);
}

if (tokens_cur[i].id == whisper_token_solm(ctx)){
text += " [SPEAKER TURN]";
};

if (tokens_cur[i].id > whisper_token_beg(ctx) && !params.single_segment) {
const auto t1 = seek + 2*(tokens_cur[i].tid - whisper_token_beg(ctx));

Expand Down

4 comments on commit 50c822c

@bryanlavergne
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because token_transcribe shares the id as token_solm in this change, it did not work immediately. However, I adopted your comments in Lines 387-388 regarding the technically correct .en model ids by setting the translate and transcribe ids as follows:

static const id token_translate = 50357;
static const id token_transcribe = 50358; 

Then, whisper.cpp was showing [SPEAKER TURN] in the output for my test audio.

Clever proof-of-concept, fine-tuning for that unused token!

@akashmjn
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for giving it a spin! I've fixed the incorrect task token ids in a follow up commit.

@stevelizcano
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice. How would I go about using this with the larger models? i.e. the whisper-large? I assume I need to create my own ggml version of your approach?

@akashmjn
Copy link
Owner Author

@akashmjn akashmjn commented on 50c822c Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! That's correct, it will need another finetuned checkpoint. Releasing finetuning code is on the roadmap, which should provide a reference. I anticipate things to be a little more tricky with the multi-task/multilingual models, so i'd say large-v2 won't be imminent very soon. Unless of course, contributions are always welcome.

Please sign in to comment.