# Analyze a Youtube video by asking the LLM
By [Lior Gazit](https://www.linkedin.com/in/liorgazit/)  

<a target="_blank" href="https://colab.research.google.com/github/LiorGazit/LLM_search_inside_youtube_videos/blob/main/Analyze_a_Youtube_video_by_asking_the_LLM.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

**Description of the notebook:**  
Pick a Youtube video that you'd like to understand what value it brings you without having to spend the time to watch all of it.  
For instance: an hour long lecture about a topic you are looking to learn about, and your goal is know whether it touches on all key points before dedicating time to watch it.  
This is with the intuition that if it were a PDF instead of a video, you'd be able to search through it.  

**Requirements:**  
* Open this notebook in a free [Google Colab instance](https://colab.research.google.com/).  
* This code picks OpenAI's API as a choice of LLM, so a paid **API key** is necessary.   

Install:

In [11]:
# !pip -q install youtube-transcript-api
# !pip -q install openai
# !pip -q install numpy
# !pip -q install pytube
# !pip -q install faiss-cpu
# !pip -q install tiktoken

Imports:

In [12]:
import os
from youtube_transcript_api import YouTubeTranscriptApi
import faiss
import numpy as np
import openai
import tiktoken
from urllib.parse import urlparse, parse_qs

#### Insert API Key

In [None]:
my_api_key = ""

#### Pick the Youtube Video and Insert its URL

In [14]:
video_url = "https://www.youtube.com/watch?v=ySEx_Bqxvvo&ab_channel=AlexanderAmini"

#### Save API Key to Environement Variable

In [15]:
os.environ["OPENAI_API_KEY"] = my_api_key

#### Define functions:

In [16]:
# Extract video ID from URL
def extract_video_id(url):
    query = urlparse(url).query
    params = parse_qs(query)
    return params['v'][0]

# Fetch transcript using youtube-transcript-api
def get_transcript(video_url):
    video_id = extract_video_id(video_url)
    transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=['en'])
    text = ' '.join([t['text'] for t in transcript])
    return text

# Split transcript into chunks
def split_chunks(transcript, max_tokens=500):
    encoding = tiktoken.get_encoding("cl100k_base")
    words = transcript.split()
    chunks, current_chunk = [], []

    for word in words:
        current_chunk.append(word)
        if len(encoding.encode(' '.join(current_chunk))) > max_tokens:
            current_chunk.pop()
            chunks.append(' '.join(current_chunk))
            current_chunk = [word]
    if current_chunk:
        chunks.append(' '.join(current_chunk))
    return chunks

# Get embeddings using OpenAI embeddings API
def get_embeddings(chunks, model="text-embedding-ada-002"):
    embeddings = openai.embeddings.create(
        input=chunks,
        model=model
    )
    embeddings_list = [e.embedding for e in embeddings.data]
    return np.array(embeddings_list, dtype='float32')

# Build FAISS index
def build_index(embeddings):
    dim = embeddings.shape[1]
    index = faiss.IndexFlatL2(dim)
    index.add(embeddings)
    return index

# Similarity search
def search_chunks(question, chunks, index, top_k=3):
    query_embedding = openai.embeddings.create(
        input=[question],
        model="text-embedding-ada-002"
    ).data[0].embedding
    query_embedding = np.array([query_embedding], dtype='float32')

    _, indices = index.search(query_embedding, top_k)
    return [chunks[i] for i in indices[0]]

# Query LLM with retrieved context
def query_llm(prompt, model="gpt-3.5-turbo"):
    completion = openai.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You answer questions based on video transcripts. Drop a new line after every sentence!"},
            {"role": "user", "content": prompt}
        ],
        temperature=0.5,
        max_tokens=1000
    )
    return completion.choices[0].message.content.strip()

### Set Up the Retrieval Mechanism:

In [17]:
# Entire pipeline execution
def pipeline(video_url, question):
    print("--- Prompt ---\n")
    print(question)

    # Fetching transcript:
    transcript = get_transcript(video_url)

    # Splitting transcript into chunks:
    chunks = split_chunks(transcript)

    # Getting embeddings:
    embeddings = get_embeddings(chunks)

    # Building FAISS index:
    index = build_index(embeddings)

    # Searching relevant chunks:
    relevant_chunks = search_chunks(question, chunks, index)

    context = "\n\n".join(relevant_chunks)
    prompt = f"Context from video:\n\n{context}\n\nQuestion: {question}\nStart a new line after every sentence in your answer!"

    print("Querying LLM...")
    print("\n--- Answer ---\n")
    return query_llm(prompt)

### Some Questions About the Content of the Video

In [18]:
question = "Do they mention transformers? In what way? Tell me in 2-3 sentences."
print(pipeline(video_url, question))


--- Prompt ---

Do they mention transformers? In what way? Tell me in 2-3 sentences.
Querying LLM...

--- Answer ---

Yes, transformers are mentioned in the video transcript. 
The speaker explains that attention is the foundational mechanism of the Transformer architecture. 
The video delves into how attention works in neural networks like Transformers, emphasizing its power and importance in processing information efficiently.


In [19]:
question = "Do they mention attention?"
print(pipeline(video_url, question))


--- Prompt ---

Do they mention attention?
Querying LLM...

--- Answer ---

Yes, attention is mentioned multiple times in the video transcript.  
The speaker highlights the importance of attention as a concept in modern deep learning and AI.  
Attention is described as the foundational mechanism of the Transformer architecture.  
The video explains how attention allows for identifying and attending to important information in a sequential stream of data.  
Self-attention, specifically, is focused on as a key concept.  
The concept of attending to the most important parts of input examples is discussed.  
Multiple powerful neural networks and deep learning models leverage the idea of self-attention.  
Attention is used in a variety of applications, from language models to computer vision.  
The self-attention mechanism is explained step by step in the lecture.  
The video emphasizes the transformative impact of attention in various fields.


In [20]:
question = "Do they mention back propogation? Please provide 2-3 sentences that tell about it."
backprop_answer_english = pipeline(video_url, question)
print(backprop_answer_english)

--- Prompt ---

Do they mention back propogation? Please provide 2-3 sentences that tell about it.
Querying LLM...

--- Answer ---

Yes, the video mentions back propagation. 
Back propagation through time is a key algorithm discussed in the video. 
It involves feeding back data, predictions, and errors in time to train the network effectively.


#### Translate the Last Response to Hindi

In [21]:
prompt = f"Please translate this answer from English to Hindi: <{backprop_answer_english}>. Make sure to translate properly with the appropriate technical terms."
print(query_llm(prompt))

हाँ, वीडियो में बैक प्रोपेगेशन का उल्लेख है।
समय के माध्यम से बैक प्रोपेगेशन एक मुख्य एल्गोरिथ्म है जो वीडियो में चर्चा किया गया है।
इसमें डेटा, पूर्वानुमान, और गलतियों को समय के माध्यम से प्रशिक्षित करने के लिए प्रभावी रूप से वापस भेजना शामिल है।


#### Translate the Last Response to Tamil

In [22]:
prompt = f"Please translate this answer from English to Tamil: <{backprop_answer_english}>. Make sure to translate properly with the appropriate technical terms."
print(query_llm(prompt))

ஆம், வீடியோவில் பின்னூட்டம் குறிப்பிடுகின்றது.
காலத்தில் பின்னூட்டம் மூல அறிவியல் கருவியில் உரைக்கப்பட்டுள்ளது.
நெட்வொர்க்கை செயலாக்க நேரத்தில் தரவு, உரைகள், வழுவானங்களை பின்னூட்டிக் கொள்ளும் முறையில் உரைக்கின்றது.
