# **Generate timestamps for videos using AssemblyAI**

Chanin Nantasenamat, PhD

[Data Professor YouTube channel](https://youtube.com/dataprofessor)

> In a nutshell, you're building a Python workflow for generating video timestamps using AssemblyAI's LeMUR and Claude 3.5 Sonnet.

## Install prerequisites

In [None]:
! apt-get install ffmpeg

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
0 upgraded, 0 newly installed, 0 to remove and 49 not upgraded.


In [None]:
! pip install yt-dlp assemblyai



## Load API key

In [None]:
from google.colab import userdata
import assemblyai as aai

aai.settings.api_key = userdata.get('AAI_KEY')

## Retrieving audio from a YouTube video

We'll start out by downloading the YouTube video using the `yt_dlp` Python library.

In [None]:
import yt_dlp

# Retrieving audio from a YouTube video
def download_audio(url):
    ydl_opts = {
        'format': 'bestaudio/best',
        'postprocessors': [{
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'mp3',
            'preferredquality': '192',
        }],
        'outtmpl': '%(title)s.%(ext)s',
        'verbose': True,
    }

    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        ydl.download([url])

URL = "https://www.youtube.com/watch?v=UF8uR6Z6KLc"
download_audio(URL)

# Retrieving audio file name
video_title = yt_dlp.YoutubeDL({}).extract_info(URL, download=False)['title']
audio_file = f'{video_title}.mp3'

## Generating the timestamps

1. Transcribe the audio file
2. Group transcript into paragraphs then groups
3. Assign timestamps for each group
4. Generate the final timestamps

Here, we applied the `get_paragraphs()` method on the `transcript` object, which gives us the entire text corpus (i.e. paragraphs).



In [None]:
# Generate transcript
transcriber = aai.Transcriber()
transcript = transcriber.transcribe(audio_file)

# Create paragraphs and assign timestamps to it
paragraphs = transcript.get_paragraphs()
combined_paragraphs = []
step = 2

for i in range(0, len(paragraphs), step):
    paragraph_group = paragraphs[i : i + step]
    start = paragraph_group[0].start
    end = paragraph_group[-1].end
    text = ""
    for paragraph in paragraph_group:
        text += f"{paragraph.text} "
    combined_paragraphs.append(f"Paragraph: {text} Start: {start} End: {end}")

In [None]:
combined_paragraphs

["Paragraph: This program is brought to you by Stanford University. Please visit us@stanford.edu thank you. I'm honored to be with you today for your commencement from one of the finest universities in the world.  Start: 7560 End: 32260",
 "Paragraph: Truth be told, I never graduated from college, and this is the closest I've ever gotten to a college graduation. Today, I want to tell you three stories from my life. That's it. No big deal. Just three stories. The first story is about connecting the dots.  Start: 35680 End: 58950",
 "Paragraph: I dropped out of Reed College after the first six months, but then stayed around as a drop in for another 18 months or so before I really quit. So why did I drop out? It started before I was born. My biological mother was a young, unwed graduate student, and she decided to put me up for adoption. She felt very strongly that I should be adopted by college graduates. So everything was all set for me to be adopted at birth by a lawyer and his wife, e


We iterate through each paragraph in the group, to assign a timestamp to it.

Finally, we generate the final timestamp by using the LLM model via `aai.LemurModel.claude3_5_sonnet`.

In [None]:
# Generate the final timestamp
results = []

for paragraph in combined_paragraphs:
    result = aai.Lemur().task(
        prompt="Generate chapters of key topics in the audio and also provide the start timestamps in minutes:seconds format. Please put the timestamps before the topic, for example, '0:00 Introduction'. Please don't generate 'Notes', 'Based on the given transcript provided' or 'Here are the', I want only the timestamps and topics.",
        input_text=paragraph,
        final_model=aai.LemurModel.claude3_5_sonnet,
    )
    results.append(result.response)

for result in results:
    print(f"{result}\n")

0:07 Introduction and acknowledgment of Stanford University
0:26 Commencement address begins

0:35 Introduction
0:59 First story: Connecting the dots

1:00 Dropping out of Reed College
1:15 Biological mother's decision for adoption
1:30 Adoption plans and unexpected change
1:40 Adoptive parents' background

1:58 Adoption and college promise
2:28 College choice and financial strain
2:48 Questioning college value
3:03 Decision to drop out
3:15 Exploring interesting classes

3:04 College experiences
3:15 Returning coke bottles for food money
3:22 Hare Krishna temple meals
3:30 Value of curiosity and intuition
3:40 Reed College calligraphy instruction
3:55 Learning typography and design principles

4:11 Calligraphy course influence on Mac design
4:30 Typography in personal computers
4:46 Connecting the dots in hindsight
5:03 Trusting in future connections

5:16 Trusting your intuition
5:30 Finding your passion early in life
5:38 The founding and growth of Apple

5:59 Release of the Macinto

## References

- [LeMUR](https://www.assemblyai.com/docs/lemur) - AssemblyAI Documentation
- [Ask questions about your audio data](https://www.assemblyai.com/docs/lemur/ask-questions) - AssemblyAI Documentation
- [Processing Audio Files with LLMs using LeMUR](https://github.com/AssemblyAI/cookbook/blob/master/lemur/using-lemur.ipynb) - AssemblyAI Cookbook