encoding errors on example program #23

Masaiki · 2023-02-14T12:10:18Z

Thank you for your contribution. The directml version of whisper is much faster than the pure cpu version of whisper.cpp. And I had some issues when using it. The first one was the encoding problem. In the debugoutput of the desktop version, the output content sometimes lacked a few characters. I think it was a conversion from utf-8 to CP_ACP (windows936 gb2312-80). Similar encoding errors also happened in the cli, and the output dictation text was almost unreadable '?'
The second issue is that almost every audio that is transcribed will report the error “runFullImpl: failed to generate timestamp token - skipping one second”.
The third problem is similar to #18, it always stops working after recognizing for a period of time, and repeatedly outputs the last sentence of recognition content.
If you plan to track the last two problems, I can open another issue

Masaiki · 2023-02-14T12:30:14Z

Example file that can reproduce the above problem: https://drive.google.com/file/d/19WnNJLL1IThoVznUog6hyMQUwTZKU2T2/view?usp=share_link
the audio file come from the command line "ffmpeg -i input_video -vn -ar 16000 -ac 1 -c:a pcm_s16le output.wav"
model is ggml-medium.bin from whisper.cpp, audio language is japanese

rsmith02ct · 2023-02-15T06:59:23Z

I've noticed the same with Japanese- I get the same line repeated again and again. Using medium and large models.

emcodem · 2023-02-18T14:58:28Z

feeding higher rate audio than 16k helped me to get rid of "runFullImpl: failed to generate timestamp token - skipping one second". I only fed 16k because the CPU version of whispercpp wanted that but Mr Const-me's version doesnt seem to have that limit.
The "repeating" sentences AKA "sequence-to-sequence architecture failure loop" however is not influenced by the source audio rate. Instead i can influence it by feeding different parts of the source audio, e.g. when feeding just 1 minute of the portion that caused the issue, the issue does not occur but when feeding 8 minutes before more, it occurs. (Of course this must be done using a binary exact portion of the source audio, e.g. ffmpeg -codec copy). The issue is also mentioned here: openai/whisper#192
These 2 problems don't seem to be related for me.

Does anyone like to confirm that higher audio resolution solvves "runFullImpl" error?
On the other hand regarding the failure loop issue, condition_on_previous_text is recommended to handle that but this is not available in this project.

emcodem mentioned this issue Feb 22, 2023

text output looping/repeating (until end) #26

Closed

emcodem mentioned this issue Sep 26, 2023

main.exe language Ukrainian outputs only ???? (this is not the colorize issue) #178

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

encoding errors on example program #23

encoding errors on example program #23

Masaiki commented Feb 14, 2023

Masaiki commented Feb 14, 2023 •

edited

Loading

rsmith02ct commented Feb 15, 2023

emcodem commented Feb 18, 2023 •

edited

Loading

encoding errors on example program #23

encoding errors on example program #23

Comments

Masaiki commented Feb 14, 2023

Masaiki commented Feb 14, 2023 • edited Loading

rsmith02ct commented Feb 15, 2023

emcodem commented Feb 18, 2023 • edited Loading

Masaiki commented Feb 14, 2023 •

edited

Loading

emcodem commented Feb 18, 2023 •

edited

Loading