Notebook of summarizing a YouTube video using LLM model.

First, we download automatically generated subtitles from videos:

In [1]:
from youtube_transcript_api import YouTubeTranscriptApi

In [2]:
def get_id(url):
    return url.split("watch?v=")[-1]

In [3]:
yt_video_link = "https://www.youtube.com/watch?v=i3DPKC0Xa3c"

transcript = YouTubeTranscriptApi.get_transcript(get_id(yt_video_link))
transcript[:10]

[{'text': 'what exactly is antimatter', 'start': 0.92, 'duration': 5.08},
 {'text': 'what happens when matter and antimatter',
  'start': 3.84,
  'duration': 3.6},
 {'text': 'collide', 'start': 6.0, 'duration': 3.42},
 {'text': 'how did antimatter come to be discovered',
  'start': 7.44,
  'duration': 4.26},
 {'text': "and why don't we encounter antimatter in",
  'start': 9.42,
  'duration': 4.02},
 {'text': 'our daily lives', 'start': 11.7, 'duration': 3.84},
 {'text': 'all of these concerns and many more',
  'start': 13.44,
  'duration': 5.22},
 {'text': 'arise when one considers antimatter',
  'start': 15.54,
  'duration': 5.34},
 {'text': 'but first and foremost let us first',
  'start': 18.66,
  'duration': 5.1},
 {'text': 'Define and comprehend anti-matter',
  'start': 20.88,
  'duration': 6.5}]

We need to join all the specific transcripts into a single one.

In [4]:
transcipt_full = " ".join([line["text"] for line in transcript])
transcipt_full

"what exactly is antimatter what happens when matter and antimatter collide how did antimatter come to be discovered and why don't we encounter antimatter in our daily lives all of these concerns and many more arise when one considers antimatter but first and foremost let us first Define and comprehend anti-matter hello and welcome to Z be sure to subscribe to learn more about antimatter and other thought-provoking contents antimatter is the polar opposite of matter each subatomic particle such as an electron Proton or Neutron has an antiparticle such as an anti-electron anti-proton or anti-neutron the antiparticle will have the same mass as the particle but the sign of its charge and other quantum numbers will be different the lepton number is one for the electron and each of the other five members of the lepton family and the baryon number is one-third for each of the six quarks that comprise the baryon family antiparticles are intended to interact with other antibarticles in the sam

The result does not contain any punctuation characters. Therefore, it is necessary to add these characters to our result. This is especially important because LLM (Large Language Models) usually require input to contain punctuation characters.

In [5]:
from rpunct import RestorePuncts
restore_punct = RestorePuncts()

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
restored_punct_transcript = restore_punct.punctuate(transcipt_full)

In [9]:
restored_punct_transcript

"What exactly is Antimatter? What happens when matter and antimatter collide? How did antimatter come to be discovered? and why don't we encounter antimatter in our daily lives? All of these concerns and many more arise when one considers antimatter. But first and foremost, let us first Define and comprehend anti-matter Hello, and welcome to Z. Be sure to subscribe to learn more about antimatter and other thought-provoking contents. Antimatter is the polar opposite of matter. Each subatomic particle, such as an electron, Proton or Neutron has an antiparticle, such as an anti-electron anti-proton or anti-neutron The antiparticle will have the same mass as the particle, but the sign of its charge and other quantum numbers will be different. The lepton number is one for the electron and each of the other five members of the lepton family, and the baryon number is one-third for each of the six quarks that comprise the baryon family. Antiparticles are intended to interact with other antibar

In [13]:
import openai

In [14]:
with open("openai_key.txt") as f:
    openai.api_key = f.readline()

In [17]:
prompt = f"Summarize this transcript: \ntext = '{restored_punct_transcript}'"

In [None]:
# ChatCompletion request of openai API makes it work as with ChatGPT

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {
            "role": "user", # "user" or "assistant"
            "content": prompt # user's input or assistant's reply
        }
    ],
    temperature=1, # randomness
    max_tokens=256,
    top_p=1, # Top-p (nucleus) sampling parameter, higher values make output more focused
    frequency_penalty=0, # Frequency penalty discourages the model from repeating words or phrases
    presence_penalty=0 # Presence penalty discourages the model from adding verbose or unnecessary words
)

In [None]:
print(response["choices"][0]["message"]["content"])

Since I do not have a paid OpenAI account this API will not allow any tokens to be submitted.