Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script: parsing transcript .srt files into readable text #76

Open
Jamoverjelly opened this issue Jun 27, 2018 · 1 comment
Open

Script: parsing transcript .srt files into readable text #76

Jamoverjelly opened this issue Jun 27, 2018 · 1 comment

Comments

@Jamoverjelly
Copy link

Hello,

I am working through an online class and trying to produce notes based on the instructional video content. Since many of the concepts covered in these videos are worth taking note of, I'm finding myself writing out nearly every line spoken by the instructor. Obviously, this process is laborious and extremely time-consuming. I am wondering if there is an easier way to extract the text from these videos using an srt tool to help parse and modify the text.

The syntax of the transcript files for each video are identical to standard srt format. Here's an example:

1
00:00:00,710 --> 00:00:03,220
Rob just showed us how we can
make things accessible to

2
00:00:03,220 --> 00:00:05,970
anyone who can't use a mouse or
pointing device.

3
00:00:05,970 --> 00:00:09,130
Whether that's because it's any
type of physical impairment or

4
00:00:09,130 --> 00:00:11,510
a technology issue or
simply personal preference.

Does pysrt currently provide any tools for modifying text content so that it's formatted into a more readable format? To clarify, for the above example, I would like to remove blank lines, lines beginning with the record number and time-stamp, and then join the remaining lines, adding spaces after periods, like so:

Rob just showed us how we can make things accessible to anyone who can't use a mouse or pointing device. Whether that's because it's any type of physical impairment or a technology issue or simply personal preference.

I am interested in creating the following output from the example above and being able to apply such a modification to more of the files in the series. In my current situation, I am really pretty rusty working with python, though believe this capability could be pretty easily implemented with
an understanding of common string methods.

Can anyone contributing to this project let me know how this is done or if the functionality already exists in pysrt?

Thanks!

@whoizit
Copy link

whoizit commented May 4, 2019

@Jamoverjelly https://gist.github.com/whoizit/c54f916c1c6d78ad5ac88cf4735c9d7d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants