-
Notifications
You must be signed in to change notification settings - Fork 107
Video Transcripts
Videos are one of our official Academy content types and, in order to keep our resources inclusive, we aim to include a transcript with each video we publish.
Our videos are hosted on YouTube which generates video transcripts automatically. However, there isn’t an easy way to copy the transcript of a video while retaining the timestamp URLs.
We were stuck doing this manually until Adrian kindly assembled the following Python script which does this for us.
import sys
import re
def is_time_format(line):
return re.match(r"^\d+:\d{2}$", line.strip())
def time_to_seconds(time_str):
minutes, seconds = time_str.split(":")
return int(minutes) * 60 + int(seconds)
def convert_line(line, code):
if is_time_format(line): # Check if it's a time line
time_str = line.strip()
seconds = time_to_seconds(time_str)
return f'<a href="https://youtu.be/{code}?t={seconds}" target="_blank">{time_str}</a> '
else:
return line
def main():
if len(sys.argv) < 3:
print("Usage: python script_name.py <input_file> <youtube_code>")
sys.exit(1)
input_file = sys.argv[1]
code = sys.argv[2]
output_file = "output.txt"
with open(input_file, 'r') as fin, open(output_file, 'w') as fout:
lines = fin.readlines()
for i, line in enumerate(lines):
fout.write(convert_line(line, code))
if __name__ == "__main__":
main()
Copy this script to a file (in the following example, convert.py). Then, go to the Youtube video and copy the transcript to an input file (input.txt). Then, retrieve the video’s code. This is an 11-digit string at the end of a YouTube video URL. For example, the code for a YouTube video with the URL https://www.youtube.com/watch?v=rqIcDrg1XOs is rqIcDrg1XOs.
After this, run a command like the following to run the script and add hyperlinks to the plaintext transcript:
python3 convert.py input.txt rqIcDrg1XOs
This will output the updated transcript (including hyperlinks) to an output file named output.txt.
Be sure to review the final transcript for any errors before publishing. Although YouTube generally does a fair job with generating transcription data, it isn’t always perfect.
If you have any questions or suggestions regarding our Style Guide, please create an issue and we will follow up accordingly.