Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How to introduce VAD to solve the problem of hallucinations #54

Open
GuuuWei opened this issue Nov 22, 2023 · 2 comments
Open

Comments

@GuuuWei
Copy link
Contributor

GuuuWei commented Nov 22, 2023

Background:
I've noticed that when processing audio files containing silent or non-speech segments, Whisper tends to generate hallucinatory content. This not only affects the segments with silence or non-human voices but also seems to impact the subsequent normal speech parts in the audio.

Inquiry:
Given that this is an inherent issue with Whisper, I am curious to know if it's feasible to incorporate strategies similar to VAD in Whisper-turbo. I am aware of approaches like those used in projects such as WhisperX, which seem to effectively mitigate such issues.

Thank you for your time and the incredible work on this project.

@FL33TW00D
Copy link
Owner

Background: I've noticed that when processing audio files containing silent or non-speech segments, Whisper tends to generate hallucinatory content. This not only affects the segments with silence or non-human voices but also seems to impact the subsequent normal speech parts in the audio.

Inquiry: Given that this is an inherent issue with Whisper, I am curious to know if it's feasible to incorporate strategies similar to VAD in Whisper-turbo. I am aware of approaches like those used in projects such as WhisperX, which seem to effectively mitigate such issues.

Thank you for your time and the incredible work on this project.

I think the approach from WhisperX is very good: https://www.robots.ox.ac.uk/~vgg/publications/2023/Bain23/bain23.pdf

Unfortunately - it is lots of work to implement.

If whisper-turbo gets more traction and enterprise customers, it may be feasible for me to implement it.

@obenjiro
Copy link

@from-gu-wei
FIY: You can use https://github.com/ricky0123/vad look like this library is compatible with whisper-turbo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants