Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timestamping transcriptions? #234

Open
stuartpb opened this issue Jun 12, 2024 · 1 comment
Open

Timestamping transcriptions? #234

stuartpb opened this issue Jun 12, 2024 · 1 comment

Comments

@stuartpb
Copy link

Would it be possible to add some option to delimit the transcribed output as timestamp-prefixed lines, or some other mark/metadata when each word occurs in the source media?

This is the way I was thinking I could hack it, if there isn't any way to surface this from the lower-level implementation:

  • Split the audio into chunks of lineDuration seconds (where lineDuration is the number of seconds to elapse between each line, like 5 or 10).
  • Get the transcript for each of those spans of text.
  • To ensure no words are getting cut on the clip boundary, produce a transcript for the gapSpan long seconds of audio on either side of the cut boundary (where gapSpan is some amount of time we expect the transcription to become stable within: I would guess something like four seconds would probably be fine).
    • If the transcript of the seam section conflicts in its middle with the transcript of the two sections concatenated, replace the words (in roughly balanced proportion) at the ends of the lines with the transcribed words from the seam.
@stuartpb
Copy link
Author

I see now that #211 links to a fork with word-level timestamps: it looks like someone still needs to submit a pull request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant