-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hallucinated words in the output #9
Comments
I have also noticed that behaviour. I think the particular VAD model we're using, while conceptually it fits the project well, is almost useless at the moment, could be worth experimenting with other models. For the time being, what I do is to read the websocketRecognition = new WebSocket(recognitionWebSocketAddress);
websocketRecognition.onmessage = function(event) {
const response = JSON.parse(event.data);
console.log(response.language_probability);
if (response.language_probability > 0.9) {
doSomethingWith(response.text);
} else {
console.log("Speech not recognized. Could be just noise or hallucinations");
}
}; |
But when I use this for only English, language_probability is always 1. What is the solution in this case? |
I see, didn't try that. then maybe you could take a look at the word level probabilities to isolate the problem? |
I understnad, I looked at word level probabilities but couldn't find a threshold. |
So, do you suggest me to use multilanguage option? |
@alesaccoia |
Yes multiingual model does have additional latency, it first computes the audio language. |
Please add "vad_filter=True" to fix the problem |
@KZyred didn't know that argument existed. did you test that? |
Indeed it helps, however is not any miracle option. |
While running the program during silence periods irrelevant output like "okay" and "thank you" appears.
Is there a way to fix this or is it a feature of faster-whisper
The text was updated successfully, but these errors were encountered: