Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Generate srt file together with wave file #1674

Closed
hengway opened this issue Jun 21, 2022 · 5 comments
Closed

[Feature request] Generate srt file together with wave file #1674

hengway opened this issue Jun 21, 2022 · 5 comments
Labels
feature request feature requests for making TTS better.

Comments

@hengway
Copy link

hengway commented Jun 21, 2022

馃殌 Feature Description
Generate srt file together with wave file

Solution
I have a need of generate srt file. and I found this solution: https://github.com/abhirooptalasila/AutoSub, which using coqui ai as well. So i think might be good to have this option to generate srt during wave generation

@hengway hengway added the feature request feature requests for making TTS better. label Jun 21, 2022
@p0p4k
Copy link
Contributor

p0p4k commented Jun 21, 2022

Looks interesting and quick to integrate. Thanks for sharing.

@p0p4k
Copy link
Contributor

p0p4k commented Jun 24, 2022

FYI, the .srt file generation is a part of STT and not TTS.

@erogol
Copy link
Member

erogol commented Jun 27, 2022

Thanks for taking the time for this feature request.
it is out of scope for 馃惛TTS since it requires a lot of new components to run the STT model here.

@gwpl
Copy link

gwpl commented Apr 10, 2023

I hoped as well that maybe it would be easier to generate SRT / VTT or other files along with WAV.

Otherwise, let me drop for other readers looking for solutions (maybe something worth to add into documentation "tips&tricks" section):

use

in following steps:

  • Step 1: ensure that .wav is 16kHz mono:

    • ffmpeg -i "$filepath" -ac 1 -ar 16000 -acodec pcm_s16le "${base}.wav"
  • Step 2: transcribe with whisper and generate extra files

    • whisper.cpp --model /usr/share/whisper.cpp-model-large/large.bin -otxt -ovtt -osrt -ocsv -ml 1 "$filepath"

    Last not least!
    I just discovered that ogg allows to add "Lyrics" in Lrc or SRT format into files!

    Therefore one can add generated transcript with timestamps directly into audio file:

    • oggenc "$filepath" --lyrics "${filepath}.srt" -o "${base}.ogg" ( or use .oga extension - ogg is old convention for all ogg files, now organization specialized oga for audio , ogv for video etc)

@shigabeev
Copy link

This library retrieves the alignments between phonemized text (tokens). So if you want to get the timings, especially on a sentence level, it's relatively straightforward how to do it.

But straightforward doesn't mean easy. There are so many steps involved between tokenization and the output sequence. I had to adjust each of them to get the timings right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request feature requests for making TTS better.
Projects
None yet
Development

No branches or pull requests

5 participants