Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request - VAD processing for Japanese transcription #54

Open
fznx922 opened this issue Mar 17, 2023 · 3 comments
Open

Feature Request - VAD processing for Japanese transcription #54

fznx922 opened this issue Mar 17, 2023 · 3 comments

Comments

@fznx922
Copy link

fznx922 commented Mar 17, 2023

Hey there Konstantin

currently i use a branch of whisper that uses a VAD, which produces great results with Japanese language,

Im really impressed with your program here and the ability to use it on an AMD device as im limited to running in colab with the previous model i use, is there any possibility that there could be an integration of a VAD to help break down the text and stop the ghosting error i get when trying to transcribe Japanese content using your application

Thank you so much for your efforts :) appreciate it

@Bubblemint864
Copy link

Honestly I agree, Whisper seems to hallucinate a lot with Japanese and spirals into a never ending death loop where a phrase is repeated every line from time to time. VAD is pretty much necessary for it. Would also love to see it be implemented in some shape or form!

@codenan42
Copy link

I would second this. Especially if it include all different type of VADs

  • silero-vad (fast and accurate) - For real-time translation

  • silero-vad-skip-gaps (useful in situations where there are frequent pauses in speech, such as during dictation) - For real-time translation

  • silero-vad-expand-into-gaps (best for quality) - more on the accuracy for quality and take longer time

  • periodic vad (It is designed to be robust to noise and can handle non-stationary acoustic environments) - more on the accuracy for quality and take longer time

It would be nice to have since I use whisper for Japanese language as well.

@JRWSP
Copy link

JRWSP commented Nov 16, 2023

Hi!

I find that Whisper still have a problem of repeating lines when transcribe long file in Japanese. Since this issue still open, I assumed nobody is working on it.

Here in my repo, I made a simple script that calling Silero-VAD to filter out silence parts and generate chunks of voice-containing audio files. So we can passing those chunks into Whisper to extract subtitles and avoid the hallucination. There is also a script to re-create a complete subtitle from them.

The scripts were very naively written and may need more polishing. So feel free to download and modify by yourself.
But it works pretty well as I tested it. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants