In [21]:
!uv add youtube-transcript-api

[2mResolved [1m150 packages[0m [2min 4ms[0m[0m
[2mAudited [1m131 packages[0m [2min 0.12ms[0m[0m


In [22]:
#Get the Transcript
from youtube_transcript_api import YouTubeTranscriptApi

video_id = 'ph1PxZIkz1o'

ytt_api = YouTubeTranscriptApi()
transcript = ytt_api.fetch(video_id)


In [23]:
transcript

FetchedTranscript(snippets=[FetchedTranscriptSnippet(text='So hi everyone. Uh today we are going to', start=0.0, duration=5.04), FetchedTranscriptSnippet(text='talk about our upcoming course. The', start=2.96, duration=3.52), FetchedTranscriptSnippet(text='upcoming course is called machine', start=5.04, duration=5.92), FetchedTranscriptSnippet(text='learning zoom camp. And um this is', start=6.48, duration=5.92), FetchedTranscriptSnippet(text='already I put the link in the', start=10.96, duration=3.599), FetchedTranscriptSnippet(text="description. So if you're watching um", start=12.4, duration=4.719), FetchedTranscriptSnippet(text="this video in recording or you're", start=14.559, duration=4.88), FetchedTranscriptSnippet(text='watching it live, you go here in the', start=17.119, duration=4.561), FetchedTranscriptSnippet(text='description after under this video and', start=19.439, duration=5.6), FetchedTranscriptSnippet(text='then you see a link course. uh click on', start=21.68, durat

In [24]:

def format_timestamp(seconds: float) -> str:
    """Convert seconds to H:MM:SS if > 1 hour, else M:SS"""
    total_seconds = int(seconds)
    hours, remainder = divmod(total_seconds, 3600)
    minutes, secs = divmod(remainder, 60)

    if hours > 0:
        return f"{hours}:{minutes:02}:{secs:02}"
    else:
        return f"{minutes}:{secs:02}"

def make_subtitles(transcript) -> str:
    lines = []

    for entry in transcript:
        ts = format_timestamp(entry.start)
        text = entry.text.replace('\n', ' ')
        lines.append(ts + ' ' + text)

    return '\n'.join(lines)

subtitles = make_subtitles(transcript)


In [25]:
make_subtitles(transcript)

"0:00 So hi everyone. Uh today we are going to\n0:02 talk about our upcoming course. The\n0:05 upcoming course is called machine\n0:06 learning zoom camp. And um this is\n0:10 already I put the link in the\n0:12 description. So if you're watching um\n0:14 this video in recording or you're\n0:17 watching it live, you go here in the\n0:19 description after under this video and\n0:21 then you see a link course. uh click on\n0:25 that link and this bring you will bring\n0:27 you to\n0:29 this website this GitHub page.\n0:34 This GitHub page is the main entry point\n0:36 to our course and um yeah I think it's\n0:41 more or less self-explanatory. If you\n0:43 want to sign up this is the button you\n0:45 click and the actual course starts in on\n0:48 September 15th. it means that it's uh\n0:51 slightly less than one one month before\n0:53 the course starts and the purpose of\n0:55 today's um session is to just answer\n0:58 your questions. So you have some\n1:00 questions and uh you can ask th

In [26]:
instructions = """
Summarize the transcript and describe the main purpose of the video
and the main ideas.

Also output chapters with time. Use usual sentence case, not Title Case for the chapter.

Output format:

<OUTPUT>
Summary

timestamp chapter
timestamp chapter
...
timestamp chapter
</OUTPUT>

don't include <OUTPUT> in the output
"""


In [27]:
#Interact with LLM
from openai import OpenAI

openai_client = OpenAI()

def interact_with_llm(user_prompt, instructions=None, model="gpt-4o-mini"):
    messages = []

    if instructions:
        messages.append({
            "role": "system",
            "content": instructions
        })

    messages.append({
        "role": "user",
        "content": user_prompt
    })

    response = openai_client.responses.create(
        model=model,
        input=messages
    )

    return response.output_text

In [28]:
llm_response = interact_with_llm(make_subtitles(transcript), instructions=instructions)

In [29]:
print(llm_response)

The video discusses an upcoming course titled "Machine Learning Zoom Camp," set to launch on September 15th. The presenter discusses course content, structure, and prerequisites while addressing common questions from potential participants. The course emphasizes practical machine learning engineering skills, focusing on deployment and engineering aspects while covering foundational topics. The Q&A section includes queries about job placement, prerequisites, recommended resources, and how to maximize learning.

**Key Ideas:**
- Overview of the "Machine Learning Zoom Camp" course and enrollment details.
- Focus on practical skills critical for machine learning engineering, especially in deployment.
- Clarification on prerequisites such as programming comfort, and some basic knowledge in linear algebra.
- Discussion around job prospects post-course, emphasizing that the course aims to equip participants with skills that enhance employability.
- Responses to questions about course content,

In [30]:
#Pydantic takes a Output Text and and matches it to a Object Defined
from pydantic import BaseModel

class Chapter(BaseModel):
    timestamp: str
    title: str

class YTSummaryResponse(BaseModel):
    summary: str
    chapters: list[Chapter]


In [31]:
# Here output_format is the Pydantic Model


def llm_structured(instructions, user_prompt, output_format, model="gpt-4o-mini"):
    messages = [
        {"role": "system", "content": instructions},
        {"role": "user", "content": user_prompt}
    ]

    response = openai_client.responses.parse( # note the use of responses.parse instead of responses.create
        model=model,
        input=messages,
        text_format=output_format
    )

    return response.output_parsed

In [32]:
structured_response = llm_structured(instructions=instructions,user_prompt=make_subtitles(transcript),output_format=YTSummaryResponse)

In [33]:
structured_response

YTSummaryResponse(summary='The video discusses the upcoming "Machine Learning Zoom Camp" course starting on September 15, addressing various topics related to the course structure, prerequisites, content updates, and overall goals. The presenter emphasizes that this is an engineering-focused course rather than a data science course, where participants will learn practical skills in machine learning and deployment. Several questions from the audience about job placements, prerequisites, module contents, and expected outcomes are answered, indicating that while formal placement support is not provided, many past participants found jobs after completing the course. The course aims to equip learners with enough skills for entry-level ML engineering roles, and encourages self-driven learning and project completion for certification.', chapters=[Chapter(timestamp='0:00', title='Introduction to the upcoming course'), Chapter(timestamp='1:00', title='Course details and GitHub page overview'), 

In [34]:
structured_response.summary

'The video discusses the upcoming "Machine Learning Zoom Camp" course starting on September 15, addressing various topics related to the course structure, prerequisites, content updates, and overall goals. The presenter emphasizes that this is an engineering-focused course rather than a data science course, where participants will learn practical skills in machine learning and deployment. Several questions from the audience about job placements, prerequisites, module contents, and expected outcomes are answered, indicating that while formal placement support is not provided, many past participants found jobs after completing the course. The course aims to equip learners with enough skills for entry-level ML engineering roles, and encourages self-driven learning and project completion for certification.'

In [35]:
structured_response.chapters

[Chapter(timestamp='0:00', title='Introduction to the upcoming course'),
 Chapter(timestamp='1:00', title='Course details and GitHub page overview'),
 Chapter(timestamp='2:00', title='Course content updates and structure'),
 Chapter(timestamp='3:00', title='Job placement and outcomes post-course'),
 Chapter(timestamp='4:00', title='Computer vision coverage in the course'),
 Chapter(timestamp='6:00', title='Prerequisites for course participants'),
 Chapter(timestamp='8:00', title='Command line use in the course'),
 Chapter(timestamp='10:00', title='Using GitHub Codespaces for the course'),
 Chapter(timestamp='11:00', title='Prior knowledge for deep learning'),
 Chapter(timestamp='12:00', title='Target audience for ML engineers vs data scientists'),
 Chapter(timestamp='13:00', title='Recommended reading and available resources'),
 Chapter(timestamp='14:00', title='Mathematics requirements for the course'),
 Chapter(timestamp='15:00', title='Technical requirements and necessary tools'),
 