Hallucinations and VAD [BLANK_AUDIO] Generations #45

atx-barnes · 2023-07-28T22:13:13Z

Tested with both small and tiny model sizes.

Using the Streaming example with VAD turned on etc. I've tried different settings and tried using a prompt to try and eliminate hallucinations and sound effects but to no avail or getting VAD to properly work I might be missing something because it treats the hallucinations of sounds like words so it struggles to turn on AD. Examples of outputs are below:

When I'm not talking and the background noise is low the following gets transcribed. Ideally, it would run inference in the background and only detect incoming audio from me talking, etc.
[BLANK_AUDIO] [BLANK_AUDIO] [BLANK_AUDIO]

Most of the time with the tiny model, it loves to hallucinate sound effects from no audio or low background noises.
(wind blowing), (clicking), (barking)

Are there any settings that I can try that would help eliminate hallucinations from no audio or static or get VAD correctly working?

Great project, excited for any future features or updates.

The text was updated successfully, but these errors were encountered:

Macoron · 2023-07-29T08:49:03Z

Using the Streaming example with VAD turned on etc. I've tried different settings and tried using a prompt to try and eliminate hallucinations and sound effects but to no avail or getting VAD to properly work I might be missing something because it treats the hallucinations of sounds like words so it struggles to turn on AD.

Right now VAD isn't really working with streaming. It should be integrated into WhisperStream to ignore silent parts of audio which are causing hallucination. In theory it should also fix problem with repeating words. There was attempt to implement this, but it wasn't merged. See #29 for more context.

Are there any settings that I can try that would help eliminate hallucinations from no audio or static or get VAD correctly working?

You can try to:

Enable SingleSegment
Set LengthSec to something big, like 3000

This will basically make WhisperStream recurrently transcribe whole audio from begging to the end over and over again. Of course it will only make sense if you have really powerful CPU and relevantly small audio stream to work with.

BaMarcy · 2023-07-30T16:19:32Z

I can recommend Silero VAD model which has ONNX version and that's the state of the art and open source BTW

https://github.com/snakers4/silero-vad/tree/master/examples/cpp

atx-barnes · 2023-07-31T16:05:28Z

You can try to:

Enable SingleSegment

Set LengthSec to something big, like 3000

This will basically make WhisperStream recurrently transcribe whole audio from begging to the end over and over again. Of course it will only make sense if you have really powerful CPU and relevantly small audio stream to work with.

Using the microphone record scene with VAD stop enabled and those settings you provided seem to work a lot better. Another thing I noticed is that the mic has to be pretty good for VAD to work. When I used my webcam mic the VAD struggled to stop the recording but when I used a jack mic near my face it worked relatively well within ~1 sec. after I was done talking.

Thanks

Macoron · 2023-07-31T16:33:03Z

Another thing I noticed is that the mic has to be pretty good for VAD to work. When I used my webcam mic the VAD struggled to stop the recording but when I used a jack mic near my face it worked relatively well within ~1 sec. after I was done talking.

The VAD implementation is very basic. Original author of whisper.cpp recommend to use something more robust, like @BaMarcy suggested. But it is other extra dependency, which is out of this project reach.

Still you can try to play with Vad Thd and Vad Freq Thd parameters.

Macoron · 2023-08-14T18:49:06Z

@atx-barnes VAD support for streaming was recently merged (see #49). In my tests this reduced hallucinations drastically. You might want to check this out.

atx-barnes changed the title ~~Hallucinations and [BLANK_AUDIO] Outputs~~ Hallucinations and VAD [BLANK_AUDIO] Generations Jul 28, 2023

Macoron added the enhancement New feature or request label Jul 29, 2023

Macoron mentioned this issue Aug 8, 2023

Microphone streaming #14

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hallucinations and VAD [BLANK_AUDIO] Generations #45

Hallucinations and VAD [BLANK_AUDIO] Generations #45

atx-barnes commented Jul 28, 2023 •

edited

Loading

Macoron commented Jul 29, 2023

BaMarcy commented Jul 30, 2023 •

edited

Loading

atx-barnes commented Jul 31, 2023

Macoron commented Jul 31, 2023

Macoron commented Aug 14, 2023

Hallucinations and VAD [BLANK_AUDIO] Generations #45

Hallucinations and VAD [BLANK_AUDIO] Generations #45

Comments

atx-barnes commented Jul 28, 2023 • edited Loading

Macoron commented Jul 29, 2023

BaMarcy commented Jul 30, 2023 • edited Loading

atx-barnes commented Jul 31, 2023

Macoron commented Jul 31, 2023

Macoron commented Aug 14, 2023

atx-barnes commented Jul 28, 2023 •

edited

Loading

BaMarcy commented Jul 30, 2023 •

edited

Loading