Timestamping transcriptions? #234

stuartpb · 2024-06-12T03:11:47Z

Would it be possible to add some option to delimit the transcribed output as timestamp-prefixed lines, or some other mark/metadata when each word occurs in the source media?

This is the way I was thinking I could hack it, if there isn't any way to surface this from the lower-level implementation:

Split the audio into chunks of lineDuration seconds (where lineDuration is the number of seconds to elapse between each line, like 5 or 10).
Get the transcript for each of those spans of text.
To ensure no words are getting cut on the clip boundary, produce a transcript for the gapSpan long seconds of audio on either side of the cut boundary (where gapSpan is some amount of time we expect the transcription to become stable within: I would guess something like four seconds would probably be fine).
- If the transcript of the seam section conflicts in its middle with the transcript of the two sections concatenated, replace the words (in roughly balanced proportion) at the ends of the lines with the transcribed words from the seam.

The text was updated successfully, but these errors were encountered:

stuartpb · 2024-06-12T08:51:17Z

I see now that #211 links to a fork with word-level timestamps: it looks like someone still needs to submit a pull request?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timestamping transcriptions? #234

Timestamping transcriptions? #234

stuartpb commented Jun 12, 2024

stuartpb commented Jun 12, 2024

Timestamping transcriptions? #234

Timestamping transcriptions? #234

Comments

stuartpb commented Jun 12, 2024

stuartpb commented Jun 12, 2024