Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hallucinated words in the output #9

Open
noah-george opened this issue Jan 18, 2024 · 10 comments
Open

hallucinated words in the output #9

noah-george opened this issue Jan 18, 2024 · 10 comments

Comments

@noah-george
Copy link

While running the program during silence periods irrelevant output like "okay" and "thank you" appears.
Is there a way to fix this or is it a feature of faster-whisper

@noah-george noah-george changed the title irrelevant output hallucinated words in the output Jan 18, 2024
@alesaccoia
Copy link
Owner

alesaccoia commented Jan 19, 2024

I have also noticed that behaviour.

I think the particular VAD model we're using, while conceptually it fits the project well, is almost useless at the moment, could be worth experimenting with other models.

For the time being, what I do is to read the language_probability. After a bit of experimenting, I've found out that setting a 0.9 threshold will basically prevent all the false positives.

websocketRecognition = new WebSocket(recognitionWebSocketAddress);
websocketRecognition.onmessage = function(event) {
    const response = JSON.parse(event.data);

    console.log(response.language_probability);

    if (response.language_probability > 0.9) {
        doSomethingWith(response.text);
    } else {
        console.log("Speech not recognized. Could be just noise or hallucinations");
    }
};

@AI-General
Copy link

@alesaccoia

But when I use this for only English, language_probability is always 1.

What is the solution in this case?

@alesaccoia
Copy link
Owner

I see, didn't try that. then maybe you could take a look at the word level probabilities to isolate the problem?

@AI-General
Copy link

I understnad, I looked at word level probabilities but couldn't find a threshold.

@AI-General
Copy link

So, do you suggest me to use multilanguage option?

@AI-General
Copy link

@alesaccoia
Also, when I use multilanguage mode, will I meet additional latency?

@gongouveia
Copy link

Yes multiingual model does have additional latency, it first computes the audio language.
Although this is a very small extra latency compared to the transcription

@KZyred
Copy link

KZyred commented Feb 20, 2024

Please add "vad_filter=True" to fix the problem
Exam:
model.transcribe(fileName, beam_size=10, language="vi", vad_filter=True)

@alesaccoia
Copy link
Owner

@KZyred didn't know that argument existed. did you test that?

@gongouveia
Copy link

Indeed it helps, however is not any miracle option.
It does decrease latency for long audios.
Other types of hallucination may appear, play with vad filter options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants