Feature/add hotwords #731

jax-explorer · 2024-03-03T16:33:53Z

hello!
During the transcription process, I often encounter some proprietary or new vocabulary, and Whisper cannot handle it well. I searched for solutions, and the community provided two options:

Fine-tuning the model: This approach is costly, and it's not practical to fine-tune the model every time a new term emerges.

Using initial_prompt: However, initial_prompt only applies to the first window. If specialized terms don't appear at the beginning, this method is ineffective.

Upon reviewing other transcription models, it's common practice to use hotwords. So, I implemented this feature. My approach is to add hotword-related prompts before each transcription window. Since there's a maximum length limit, I occupy the space previously used by the prefix. When the prefix isn't set, hotwords take effect. After testing, it indeed resolved the issue of specialized vocabulary in my scenario.

The following is the community discussion on this issue:
openai/whisper#1477
https://discuss.huggingface.co/t/adding-custom-vocabularies-on-whisper/29311
https://stackoverflow.com/questions/73833916/how-can-i-give-some-hint-phrases-to-openais-whisper-asr

Since my project utilizes faster-whisper, I will first submit it to the git project. If the submission is approved, I will synchronize it with the whisper project.

jax-explorer · 2024-03-04T02:36:00Z

@nguyendc-systran hello， please check out this pr.

trungkienbkhn · 2024-03-04T03:48:19Z

@jax-explorer, hello. Can you please provide an example of your test cases?

jax-explorer · 2024-03-04T07:58:59Z

@trungkienbkhn

ok, comfyUI is a new word, it is The most powerful and modular stable diffusion GUI and backend.

the test video is https://www.youtube.com/watch?v=Ybu6qTbEsew

if no hotwords
segments, info = model.transcribe(input_file, beam_size=5, language="en", vad_filter=False, vad_parameters = dict(min_silence_duration_ms=1000))
the result is:
“[261.76s -> 263.12s] The first thing you need to do is,
[263.12s -> 265.36s] of course, to copy the web address
[265.36s -> 266.12s] up here.
[266.12s -> 267.84s] Then you go into your Conf UI
[267.84s -> 270.04s] folder, again in the Conf UI
[270.04s -> 272.08s] folder, in there in the custom
[272.08s -> 274.08s] nodes folder and then up here
[274.08s -> 276.28s] in the address bar type CMD,
[276.28s -> 277.40s] hit enter.
[277.40s -> 279.40s] This opens up your command
[279.40s -> 281.24s] window. In here you type
[281.24s -> 283.36s] git clone and then
[283.36s -> 285.32s] put the web address and hit
[285.32s -> 287.36s] enter to clone the git
[287.36s -> 289.68s] project into your custom
[289.68s -> 290.56s] nodes folder.
[290.56s -> 291.60s] After you've done this, you're going
[291.60s -> 293.32s] to find in here the Conf UI”

It is incorrectly recognized as Conf UI

if add hotwords
segments, info = model.transcribe(input_file, hotwords="the video is about comfyUI", beam_size=5, language="en", vad_filter=False, vad_parameters = dict(min_silence_duration_ms=1000))
the result is:
"
[261.76s -> 263.12s] The first thing you need to do is,
[263.12s -> 264.84s] of course, to copy the web
[264.84s -> 266.68s] address up here, then you go
[266.68s -> 268.48s] into your comfyUI folder,
[268.48s -> 270.80s] again in the comfyUI folder,
[270.80s -> 272.48s] in there in the custom nodes
[272.48s -> 274.28s] folder, and then up here in the
[274.28s -> 276.28s] address bar type cmd,
[276.28s -> 277.40s] hit enter.
[277.40s -> 279.40s] This opens up your command
[279.40s -> 281.20s] window. In here you type
[281.20s -> 283.08s] git clone and
[283.08s -> 285.00s] then put the web address and
[285.00s -> 286.92s] hit enter to clone
[286.92s -> 288.88s] the git project into
[288.88s -> 290.56s] your custom nodes folder.
[290.56s -> 291.48s] After you've done this, you're
[291.48s -> 293.32s] going to find in here the comfyUI
"
It is correctly recognized as comfyUI

jax-explorer · 2024-03-06T12:35:23Z

@trungkienbkhn hello,
Please see if there are any other changes or information that needs to be made to this MR, if not I'm going to submit this MR to Whisper.

trungkienbkhn · 2024-03-07T01:42:34Z

@jax-explorer , thanks for your PR. LGTM.

jax-explorer · 2024-03-07T02:09:19Z

ok, thanks.

RichardQin1 · 2024-03-07T08:05:24Z

@jax-explorer I encountered an issue when using fast-whisper where the person name in initial_prompt only takes effect in the first part. Can your method solve this problem. How should I use it. thank

jax-explorer · 2024-03-08T03:06:03Z

@RichardQin1 Yes, this PR will solve your problem.
this is example:
segments, info = model.transcribe(input_file, hotwords="the video is about comfyUI", beam_size=5, language="en", vad_filter=False, vad_parameters = dict(min_silence_duration_ms=1000))

arabcoders · 2024-03-09T14:23:03Z

I tested the patch out and it does seems to improve the vocabulary if given appropriate words

Edit:

I've noticed a small side effect, when the model is hallucinating, it will show the hot words, i personally can clean it up by a post-processor. But it worth mentioning. the hallucinated line is exact copy of the hot_words given.

Edit2:

After longer test, the hallucination still happens and there are variety to it. Sometimes it's exact copy of hot_words, on the next it's slight variation of the hot_words

jax-explorer · 2024-03-11T10:55:18Z

@arabcoders hello, It is true that the output of the hallucination will be affected by this setting and will change from the last few sentences of the previous window to the hot word related sentences, but I don't think that this is a side effect because when a hallucination occurs we shouldn't be concerned about the output of the hallucination, but rather the resolution of the hallucination, e.g. by using a vad, etc.

arabcoders · 2024-03-11T15:38:00Z

@arabcoders hello, It is true that the output of the hallucination will be affected by this setting and will change from the last few sentences of the previous window to the hot word related sentences, but I don't think that this is a side effect because when a hallucination occurs we shouldn't be concerned about the output of the hallucination, but rather the resolution of the hallucination, e.g. by using a vad, etc.

Hi, this was with silvero vad filtering out the silence segments. I noticed it's occurring when the voice pitch changes i.e. before a song for example. This rarely happens without this patch. The promote reset when that happens and because this patch add hot words when the prompt is empty this occurs more frequently.

I suggest you implement a more state aware injection instead of blindly adding the hot words when the prompt is empty.

Thank you.

jax-explorer · 2024-03-13T15:20:54Z

@arabcoders hi, Got it, the equivalent of using hotwords where the original illusion didn't appear right, is there a link to an audio that can reproduce this problem? I'll try to modify and test.

arabcoders · 2024-03-13T15:48:10Z

@arabcoders hi, Got it, the equivalent of using hotwords where the original illusion didn't appear right, is there a link to an audio that can reproduce this problem? I'll try to modify and test.

Sure, try this partial clip, i couldn't upload the entire thing as it's 2h+, this a 10min clip of that concert and it shows the problem i am speaking about. you can download this clip. the hot words i used the video is about #Babababambi an all girls idol group from Japan. the parameters were

{
  "task": "translate",
  "language": "Japanese",
  "temperature": [
    0.0,
    0.2,
    0.4,
    0.6000000000000001,
    0.8,
    1.0
  ],
  "best_of": 5,
  "beam_size": 5,
  "patience": 2,
  "length_penalty": null,
  "suppress_tokens": "-1",
  "initial_prompt": null,
  "condition_on_previous_text": true,
  "compression_ratio_threshold": 2.4,
  "logprob_threshold": -1.0,
  "no_speech_threshold": 0.6,
  "word_timestamps": false,
  "prepend_punctuations": "\"'“¿([{-",
  "append_punctuations": "\"'.。,，!！?？:：”)]}、"
}

JH90iOS · 2024-04-30T02:46:15Z

Thanks for your PR ! It's very useful to me .
I have tested it with more than 300 speech datas ,and it works fine. I also did not find any significant increase of hallucinations.

jax-explorer · 2024-04-30T08:17:18Z

@JH90iOS Thanks for the affirmation, as I've been busy lately and haven't checked out the previous feedback on adding illusions.

WeiFangping · 2024-05-23T03:01:13Z

@jax-explorer Thanks for your PR! This helps me a lot. But I also encontered these two problems, please tell me if there is any solution:

the hallucination, sometimes hallucination will occur in the transcribed text
the timestamps changed if hotwords are set. After setting hotwords, .this whole speech is devided into only 2 or 3 parts, not devided by sentence. I only added a hotwords setting, I tried to change the VAD parameters, but it doesn't help.
This is the timestamps before set hotwords:

This is the timestamps after set hotwords:

So do you have any idea about this problem?

jax added 5 commits March 3, 2024 23:17

add hotwords init

b547794

add hotword params

ec78dd8

change hotwords pos

aa345f5

change hotwords pos

b0a29ce

format

2858892

BBC-Esq mentioned this pull request Mar 4, 2024

have you considered using the "hotwords" concept for newer terminology? shashikg/WhisperS2T#49

Closed

fix hotwords first window invaild

78869df

Merge branch 'master' into feature/add_hotwords

1498a8c

Purfview mentioned this pull request Mar 27, 2024

Missing whole parts of the text [r186.1] Purfview/whisper-standalone-win#226

Closed

JH90iOS mentioned this pull request Apr 30, 2024

initial_prompt used to the long speech #814

Closed

trungkienbkhn merged commit 847fec4 into SYSTRAN:master May 4, 2024
3 checks passed

ahmedadelhassan mentioned this pull request Jul 1, 2024

Expose the "Hotwords" option in the transcriptions endpoint fedirz/faster-whisper-server#28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/add hotwords #731

Feature/add hotwords #731

jax-explorer commented Mar 3, 2024

jax-explorer commented Mar 4, 2024

trungkienbkhn commented Mar 4, 2024

jax-explorer commented Mar 4, 2024

jax-explorer commented Mar 6, 2024

trungkienbkhn commented Mar 7, 2024

jax-explorer commented Mar 7, 2024

RichardQin1 commented Mar 7, 2024

jax-explorer commented Mar 8, 2024

arabcoders commented Mar 9, 2024 •

edited

Loading

jax-explorer commented Mar 11, 2024

arabcoders commented Mar 11, 2024

jax-explorer commented Mar 13, 2024

arabcoders commented Mar 13, 2024

JH90iOS commented Apr 30, 2024

jax-explorer commented Apr 30, 2024

WeiFangping commented May 23, 2024 •

edited

Loading

Feature/add hotwords #731

Feature/add hotwords #731

Conversation

jax-explorer commented Mar 3, 2024

jax-explorer commented Mar 4, 2024

trungkienbkhn commented Mar 4, 2024

jax-explorer commented Mar 4, 2024

jax-explorer commented Mar 6, 2024

trungkienbkhn commented Mar 7, 2024

jax-explorer commented Mar 7, 2024

RichardQin1 commented Mar 7, 2024

jax-explorer commented Mar 8, 2024

arabcoders commented Mar 9, 2024 • edited Loading

jax-explorer commented Mar 11, 2024

arabcoders commented Mar 11, 2024

jax-explorer commented Mar 13, 2024

arabcoders commented Mar 13, 2024

JH90iOS commented Apr 30, 2024

jax-explorer commented Apr 30, 2024

WeiFangping commented May 23, 2024 • edited Loading

arabcoders commented Mar 9, 2024 •

edited

Loading

WeiFangping commented May 23, 2024 •

edited

Loading