Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisper / SubtitleEdit srt result differences #6386

Closed
coastal45 opened this issue Nov 6, 2022 · 18 comments
Closed

Whisper / SubtitleEdit srt result differences #6386

coastal45 opened this issue Nov 6, 2022 · 18 comments

Comments

@coastal45
Copy link

coastal45 commented Nov 6, 2022

After running Whisper speech to text, SE returns subtitles that can be saved as an srt file (of course).
Also, Whisper itself generates three results files (srt, txt, and vtt) that are saved in the users home directory.

While I noticed this some time ago, I never really checked them. In my case these are Japanese subtitles.

One problem I had with the srt generated by SE is there are many separate lines run together in one subtitle. This requires creating new lines and splitting the text, which takes time.

Then looking at the Whisper generated srt file, the lines are split correctly. Timing still needs some correction (a general issue regarding how whisper works) but each spoken line is separated correctly.

So I wonder how the difference happens. Text samples showing the difference are attached.

srt files.zip

srt files2.zip

@niksedk
Copy link
Member

niksedk commented Nov 9, 2022

Is this any better: #6357 (comment)

@coastal45
Copy link
Author

Is this any better: #6357 (comment)

No, it's still the same. I ran another sample attached if you want to confirm it. Japanese, small model.

I guess my point is that I wonder why the whisper generated srt file is correct (properly separated lines), but the one displayed in SE is different.
To get around this, I just open the whisper generated srt file in SE and then proceed.

So it's not really a problem for me, but I wonder why the difference occurs.

srt file3.zip

@niksedk
Copy link
Member

niksedk commented Nov 11, 2022

Could you attach the audio file?

@coastal45
Copy link
Author

Could you attach the audio file?

It's in the zip file.

@niksedk
Copy link
Member

niksedk commented Nov 12, 2022

SE logs a lot more in latest beta: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.6.8/SubtitleEditBeta.zip
See error_log.txt in the SE data folder.

SE now uses the result from the output file (if SE can find it).

It seems whisper has a hard time outputting Japanese as far as I can see.

UnicodeEncodeError: 'charmap' codec can't encode characters in position 26-33: character maps to <undefined>

return codecs.charmap_encode(input,self.errors,encoding_table)[0]

File "C:\Users\nikse\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py", line 19, in encode

print(f"[{format_timestamp(start)} --> {format_timestamp(end)}] {text}")

File "C:\Users\nikse\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\transcribe.py", line 165, in add_segment

add_segment(

File "C:\Users\nikse\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\transcribe.py", line 204, in transcribe

result = transcribe(model, audio_path, temperature=temperature, **args)

File "C:\Users\nikse\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\transcribe.py", line 298, in cli

sys.exit(load_entry_point('whisper==1.0', 'console_scripts', 'whisper')())

File "C:\Users\nikse\AppData\Local\Programs\Python\Python310\Scripts\whisper-script.py", line 33, in <module>

@coastal45
Copy link
Author

coastal45 commented Nov 12, 2022

Using same audio file, with beta 95 I got this error and never completed.

Screenshot from 2022-11-12 12-05-46

I noticed in beta 70 a few days ago it created a Whisper folder like this:

Screenshot from 2022-11-12 12-10-45

but not in beta 90. This seems to be a handy way for SE to find the Whisper output file.

Nothing to speak of in the error log files.
error_logs.zip

Japanese might require Asian fonts or locale to be installed. I haven't had any problems like that.

@niksedk
Copy link
Member

niksedk commented Nov 13, 2022

You have probably enabled whisper.cpp (the faster c++ version). To use whisper.cpp you must download/compile https://github.com/ggerganov/whisper.cpp:

  1. git clone https://github.com/ggerganov/whisper.cpp.git
  2. Compile whisper.cpp using make or cmake
  3. Copy main to [SE-DataFolder]\Whisper

You can control using whisper.cpp or the default Python one by edit the settings UseWhisperCpp in the file Settings.xml

@coastal45
Copy link
Author

You have probably enabled whisper.cpp (the faster c++ version). To use whisper.cpp you must download/compile https://github.com/ggerganov/whisper.cpp:

1. `git clone https://github.com/ggerganov/whisper.cpp.git`

2. Compile whisper.cpp using make or cmake

3. Copy `main` to [SE-DataFolder]\Whisper

You can control using whisper.cpp or the default Python one by edit the settings UseWhisperCpp in the file Settings.xml

Did SE change that recently? AFAIK, I didn't do anything related. Still just Python installed version.
I haven't changed anything in my system since Whisper first supported in SE.
Going back to release version 3.6.8, it works correctly.

@niksedk
Copy link
Member

niksedk commented Nov 13, 2022

What is your UseWhisperCpp setting?

@niksedk
Copy link
Member

niksedk commented Nov 13, 2022

Some minor improvements for whisper detection in latest beta: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.6.8/SubtitleEditBeta.zip

I was able to get both whisper php + whisper cpp versions running on Windows + Ubuntu...

@coastal45
Copy link
Author

What is your UseWhisperCpp setting?

Setting is "False".

@coastal45
Copy link
Author

coastal45 commented Nov 13, 2022

Some minor improvements for whisper detection in latest beta: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.6.8/SubtitleEditBeta.zip

I was able to get both whisper php + whisper cpp versions running on Windows + Ubuntu...

I appreciate your efforts, but to keep this from getting too complicated, perhaps it would be better to keep php / cpp operation separate, maybe by a separate selection in the "Video" menu?

As for me, I don't need cpp version at all (actually my cpu is too old, no AVX2 and won't support it), so it's operation could be skipped this way.

@niksedk
Copy link
Member

niksedk commented Nov 14, 2022

OK, I've updated whisper.cpp and now just compiled with "SSE2" which I assume most CPUs support.
Not really much performance difference between whisper.cpp with SSE2/AVX2.

Does this beta work for you on Windows: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.6.8/SubtitleEditBeta.zip ?

@coastal45
Copy link
Author

So far I haven't got the SE whisper function to work on Windows, as in this issue, #6328
Same result for this beta.

As a result I'm only running whisper on Linux (Ubuntu 20.04).

@niksedk
Copy link
Member

niksedk commented Nov 16, 2022

OK, still trying to make Windows version easier to use, is this easier to get working: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.6.8/SubtitleEditBeta.zip ?

@coastal45
Copy link
Author

coastal45 commented Nov 17, 2022

I still get "No text found!". Error log is attached. Looks like it only took ~1 sec.
Not enough time to check the F2 log. As before, command line operation works correctly, though I can't use exact wav file in temp directory as you delete it. Can it not be deleted? Or use/save it in an SE directory rather than user temp location?
Reason being, then you can copy exact command SE sends to whisper on a command line.

Anyway, I think some extra check points or logging are needed.

This same problem also happened in release version 3.6.8, though it ran about 10 sec. longer before failing. So I think the problem is more basic, but I can't tell why.

error_log.txt

@niksedk niksedk mentioned this issue Nov 18, 2022
@hendrack
Copy link

I still get "No text found!". Error log is attached. Looks like it only took ~1 sec. Not enough time to check the F2 log. As before, command line operation works correctly, though I can't use exact wav file in temp directory as you delete it. Can it not be deleted? Or use/save it in an SE directory rather than user temp location? Reason being, then you can copy exact command SE sends to whisper on a command line.

Anyway, I think some extra check points or logging are needed.

This same problem also happened in release version 3.6.8, though it ran about 10 sec. longer before failing. So I think the problem is more basic, but I can't tell why.

error_log.txt

Go here for Linux fix: #6433 (comment)
Windows howto: #6324 (comment)

@niksedk
Copy link
Member

niksedk commented Nov 20, 2022

OK, latest beta (probably very close to SE 3.6.9 now have a small context menu, to make things easier to switch between php and cpp version + allow keeping temp files).

Beta link: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.6.8/SubtitleEditBeta.zip

image

I'm pretty sure not all videos or all people will/can work with whisper...

@niksedk niksedk closed this as completed Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants