Feature Request - VAD processing for Japanese transcription #54

fznx922 · 2023-03-17T03:16:22Z

Hey there Konstantin

currently i use a branch of whisper that uses a VAD, which produces great results with Japanese language,

Im really impressed with your program here and the ability to use it on an AMD device as im limited to running in colab with the previous model i use, is there any possibility that there could be an integration of a VAD to help break down the text and stop the ghosting error i get when trying to transcribe Japanese content using your application

Thank you so much for your efforts :) appreciate it

Bubblemint864 · 2023-03-17T20:09:39Z

Honestly I agree, Whisper seems to hallucinate a lot with Japanese and spirals into a never ending death loop where a phrase is repeated every line from time to time. VAD is pretty much necessary for it. Would also love to see it be implemented in some shape or form!

codenan42 · 2023-03-18T23:00:39Z

I would second this. Especially if it include all different type of VADs

silero-vad (fast and accurate) - For real-time translation
silero-vad-skip-gaps (useful in situations where there are frequent pauses in speech, such as during dictation) - For real-time translation
silero-vad-expand-into-gaps (best for quality) - more on the accuracy for quality and take longer time
periodic vad (It is designed to be robust to noise and can handle non-stationary acoustic environments) - more on the accuracy for quality and take longer time

It would be nice to have since I use whisper for Japanese language as well.

JRWSP · 2023-11-16T03:24:24Z

Hi!

I find that Whisper still have a problem of repeating lines when transcribe long file in Japanese. Since this issue still open, I assumed nobody is working on it.

Here in my repo, I made a simple script that calling Silero-VAD to filter out silence parts and generate chunks of voice-containing audio files. So we can passing those chunks into Whisper to extract subtitles and avoid the hallucination. There is also a script to re-create a complete subtitle from them.

The scripts were very naively written and may need more polishing. So feel free to download and modify by yourself.
But it works pretty well as I tested it. :)

VRCWizard mentioned this issue Jun 21, 2023

Issue about Whisper Model recognition Chinese VRCWizard/TTS-Voice-Wizard#36

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request - VAD processing for Japanese transcription #54

Feature Request - VAD processing for Japanese transcription #54

fznx922 commented Mar 17, 2023

Bubblemint864 commented Mar 17, 2023

codenan42 commented Mar 18, 2023

JRWSP commented Nov 16, 2023

Feature Request - VAD processing for Japanese transcription #54

Feature Request - VAD processing for Japanese transcription #54

Comments

fznx922 commented Mar 17, 2023

Bubblemint864 commented Mar 17, 2023

codenan42 commented Mar 18, 2023

JRWSP commented Nov 16, 2023