Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Junk output when not spoke or for some junk sound. #14

Open
BakingBrains opened this issue Jan 19, 2023 · 6 comments
Open

Junk output when not spoke or for some junk sound. #14

BakingBrains opened this issue Jan 19, 2023 · 6 comments

Comments

@BakingBrains
Copy link

Thanks for the Cool Repo. But I found when there is no audio the model gives junk transcription.
Any suggestions to improve this here?

@SunnyOd
Copy link

SunnyOd commented Jan 19, 2023

@BakingBrains That's a known issue with a lot of AI models, called hallucination. Not much can be done about it unfortunately at this stage afaik

@BakingBrains
Copy link
Author

Yep @SunnyOd , I knew, I thought there might be some method so that we can suppress the junk output. But thanks though👍

@mallorbc
Copy link
Owner

Perhaps some analysis could be done on the wav audio to see if its silence or not

@SunnyOd
Copy link

SunnyOd commented Aug 11, 2023

Hi. Has anyone managed to suppress the junk output? I keep getting "Thank you" and a couple of other phrases pop up during inactivity/silence. I've played with energy levels and that has helped some but I think there might be more mileage with the fix below - possibly

I've found some threads where people have managed to fix it, this one looks to give the solution as including --suppress_tokens in the command when running whisper. Not sure how to add this flag to whisper_mic since the move to pip installation of whisper_mic. Is it worth looking into?

Thanks!
S

@MelvinGueneau
Copy link

MelvinGueneau commented Aug 11, 2023 via email

mallorbc added a commit that referenced this issue Jan 23, 2024
@mallorbc
Copy link
Owner

So this is due to hallucinations. There doesn't seem to be a way to fully fix it due to the way that the model was trained. It hallucinates when there is little to no audio so it makes something up.

I added a new flag that helps with this.

I'm going to leave this issue open. If anyone finds a real solution, please ping me here. However, I think with the current flags that exist with the tool that you can find a configuration that limits hallucinations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants