hallucinated words in the output #9

noah-george · 2024-01-18T09:09:39Z

While running the program during silence periods irrelevant output like "okay" and "thank you" appears.
Is there a way to fix this or is it a feature of faster-whisper

alesaccoia · 2024-01-19T10:21:32Z

I have also noticed that behaviour.

I think the particular VAD model we're using, while conceptually it fits the project well, is almost useless at the moment, could be worth experimenting with other models.

For the time being, what I do is to read the language_probability. After a bit of experimenting, I've found out that setting a 0.9 threshold will basically prevent all the false positives.

websocketRecognition = new WebSocket(recognitionWebSocketAddress);
websocketRecognition.onmessage = function(event) {
    const response = JSON.parse(event.data);

    console.log(response.language_probability);

    if (response.language_probability > 0.9) {
        doSomethingWith(response.text);
    } else {
        console.log("Speech not recognized. Could be just noise or hallucinations");
    }
};

AI-General · 2024-02-02T08:41:42Z

@alesaccoia

But when I use this for only English, language_probability is always 1.

What is the solution in this case?

alesaccoia · 2024-02-02T08:43:37Z

I see, didn't try that. then maybe you could take a look at the word level probabilities to isolate the problem?

AI-General · 2024-02-02T08:51:49Z

I understnad, I looked at word level probabilities but couldn't find a threshold.

AI-General · 2024-02-02T08:52:36Z

So, do you suggest me to use multilanguage option?

AI-General · 2024-02-02T08:56:26Z

@alesaccoia
Also, when I use multilanguage mode, will I meet additional latency?

gongouveia · 2024-02-02T16:33:18Z

Yes multiingual model does have additional latency, it first computes the audio language.
Although this is a very small extra latency compared to the transcription

KZyred · 2024-02-20T03:14:26Z

Please add "vad_filter=True" to fix the problem
Exam:
model.transcribe(fileName, beam_size=10, language="vi", vad_filter=True)

alesaccoia · 2024-02-20T06:59:51Z

@KZyred didn't know that argument existed. did you test that?

gongouveia · 2024-02-20T11:45:00Z

Indeed it helps, however is not any miracle option.
It does decrease latency for long audios.
Other types of hallucination may appear, play with vad filter options.

noah-george changed the title ~~irrelevant output~~ hallucinated words in the output Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hallucinated words in the output #9

hallucinated words in the output #9

noah-george commented Jan 18, 2024

alesaccoia commented Jan 19, 2024 •

edited

AI-General commented Feb 2, 2024

alesaccoia commented Feb 2, 2024

AI-General commented Feb 2, 2024

AI-General commented Feb 2, 2024

AI-General commented Feb 2, 2024

gongouveia commented Feb 2, 2024

KZyred commented Feb 20, 2024

alesaccoia commented Feb 20, 2024

gongouveia commented Feb 20, 2024

hallucinated words in the output #9

hallucinated words in the output #9

Comments

noah-george commented Jan 18, 2024

alesaccoia commented Jan 19, 2024 • edited

AI-General commented Feb 2, 2024

alesaccoia commented Feb 2, 2024

AI-General commented Feb 2, 2024

AI-General commented Feb 2, 2024

AI-General commented Feb 2, 2024

gongouveia commented Feb 2, 2024

KZyred commented Feb 20, 2024

alesaccoia commented Feb 20, 2024

gongouveia commented Feb 20, 2024

alesaccoia commented Jan 19, 2024 •

edited