Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

word_timestamps parameter #12

Open
lucydjo opened this issue Sep 6, 2023 · 4 comments
Open

word_timestamps parameter #12

lucydjo opened this issue Sep 6, 2023 · 4 comments

Comments

@lucydjo
Copy link

lucydjo commented Sep 6, 2023

Hello,

Is it possible to generate and synchronize subtitles with Whisper's "word_timestamps" parameter?

Thank you!

@EtienneAb3d
Copy link
Owner

@lucydjo
It's the main goal of WhisperTimeSync.
Did I misunderstood something in your question?

@EtienneAb3d
Copy link
Owner

@lucydjo
Ok, I just understood.
Some modifications has to be done on the data pre-processing.

@lucydjo
Copy link
Author

lucydjo commented Sep 6, 2023

I have use "java -Xmx2G -jar WhisperTimeSync/distrib/WhisperTimeSync.jar before_correct.srt original_text.txt fr" and it's working great ! But I have a problem with the output file.

See sample of my data : https://gist.github.com/lucydjo/9ffea6ac4b60cd5a9c7b5fec7cb5126a

As you can see, there are empty lines, formatting problems... Do you know what I'm doing wrong? Thank you very much!

@EtienneAb3d
Copy link
Owner

EtienneAb3d commented Sep 7, 2023

@lucydjo
First of all, I do not understand why you do not get the exact full original text in the output (especially the beginning, lacking in the SRT). I suppose you truncated the result.
You get a blank on timestamp 5 because WhisperTimeSync considers "l'Ozone" as only one word, while your timestamp is cutting it in 2 parts. It thus matches "l'Ozone" with "Ozone", leaving the "l'" part empty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants