[Feature request] Generate srt file together with wave file #1674

hengway · 2022-06-21T08:35:22Z

🚀 Feature Description
Generate srt file together with wave file

Solution
I have a need of generate srt file. and I found this solution: https://github.com/abhirooptalasila/AutoSub, which using coqui ai as well. So i think might be good to have this option to generate srt during wave generation

p0p4k · 2022-06-21T10:16:09Z

Looks interesting and quick to integrate. Thanks for sharing.

p0p4k · 2022-06-24T01:16:49Z

FYI, the .srt file generation is a part of STT and not TTS.

erogol · 2022-06-27T08:43:12Z

Thanks for taking the time for this feature request.
it is out of scope for 🐸TTS since it requires a lot of new components to run the STT model here.

gwpl · 2023-04-10T21:08:39Z

I hoped as well that maybe it would be easier to generate SRT / VTT or other files along with WAV.

Otherwise, let me drop for other readers looking for solutions (maybe something worth to add into documentation "tips&tricks" section):

use

whisper.cpp : https://github.com/ggerganov/whisper.cpp with flags -osrt -ovtt etc...
- e.g. https://aur.archlinux.org/packages/whisper.cpp-model-large and https://aur.archlinux.org/packages/whisper-git

in following steps:

Step 1: ensure that .wav is 16kHz mono:
- ffmpeg -i "$filepath" -ac 1 -ar 16000 -acodec pcm_s16le "${base}.wav"
Step 2: transcribe with whisper and generate extra files
- whisper.cpp --model /usr/share/whisper.cpp-model-large/large.bin -otxt -ovtt -osrt -ocsv -ml 1 "$filepath"
Last not least!
I just discovered that ogg allows to add "Lyrics" in Lrc or SRT format into files!

Therefore one can add generated transcript with timestamps directly into audio file:
- oggenc "$filepath" --lyrics "${filepath}.srt" -o "${base}.ogg" ( or use .oga extension - ogg is old convention for all ogg files, now organization specialized oga for audio , ogv for video etc)

shigabeev · 2023-04-17T09:47:54Z

This library retrieves the alignments between phonemized text (tokens). So if you want to get the timings, especially on a sentence level, it's relatively straightforward how to do it.

But straightforward doesn't mean easy. There are so many steps involved between tokenization and the output sequence. I had to adjust each of them to get the timings right.

hengway added the feature request feature requests for making TTS better. label Jun 21, 2022

erogol closed this as completed Jun 27, 2022

bropines mentioned this issue Jan 13, 2023

[Feature request] voicing by SRT and ASS files #2285

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Generate srt file together with wave file #1674

[Feature request] Generate srt file together with wave file #1674

hengway commented Jun 21, 2022

p0p4k commented Jun 21, 2022

p0p4k commented Jun 24, 2022

erogol commented Jun 27, 2022

gwpl commented Apr 10, 2023

shigabeev commented Apr 17, 2023

[Feature request] Generate srt file together with wave file #1674

[Feature request] Generate srt file together with wave file #1674

Comments

hengway commented Jun 21, 2022

p0p4k commented Jun 21, 2022

p0p4k commented Jun 24, 2022

erogol commented Jun 27, 2022

gwpl commented Apr 10, 2023

shigabeev commented Apr 17, 2023