# ChatYT
This notebook enables you to


*  Summarise YouTube videos
*  Ask questions about the topics discussed in the video

Please connect to a GPU runtime for faster execution.

You will need:


*  The url of the YouTube video
*  Your Gemini API Key




In [None]:
!pip install yt-dlp
!pip install -q openai-whisper
!apt-get install -y ffmpeg
!pip install langchain
!pip install langchain-huggingface sentence-transformers langchain-chroma
!pip install langchain-community
!pip install langchain-openai
!pip install google-generativeai




Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
0 upgraded, 0 newly installed, 0 to remove and 35 not upgraded.
Collecting langchain-huggingface
  Downloading langchain_huggingface-0.3.1-py3-none-any.whl.metadata (996 bytes)
Collecting langchain-chroma
  Downloading langchain_chroma-0.2.5-py3-none-any.whl.metadata (1.1 kB)
Collecting chromadb>=1.0.9 (from langchain-chroma)
  Downloading chromadb-1.0.20-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.3 kB)
Collecting pybase64>=1.4.1 (from chromadb>=1.0.9->langchain-chroma)
  Downloading pybase64-1.4.2-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl.metadata (8.7 kB)
Collecting posthog<6.0.0,>=2.4.0 (from chromadb>=1.0.9->langchain-chroma)
  Downloading posthog-5.4.0-py3-none-any.whl.metadata (5.7 kB)
Collecting onnxruntime>=1.14.1 (from chromadb>=1.0.9->l

Imports


In [None]:
import yt_dlp
import whisper
import os
from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_openai import ChatOpenAI
from transformers import BartTokenizer, BartForConditionalGeneration
import google.generativeai as genai
from langchain.prompts import ChatPromptTemplate
from transformers import pipeline
from google.colab import userdata

In [None]:
genai.configure(api_key=userdata.get('GEMINI_API_KEY'))

In [None]:
url = "https://www.youtube.com/watch?v=1tRTWwZ5DIc"

In [None]:
def download_audio(link, file_name='audio.mp3'):
    with yt_dlp.YoutubeDL({'extract_audio': True,
                           'format': 'worstaudio',
                           'overwrites': True,
                           'outtmpl': file_name}) as video:
        info_dict = video.extract_info(link, download = True)
        video_title = info_dict['title']
        #video.download(link)
    return file_name

In [None]:
def download_subtitles(link, lang="en"):
    ydl_opts = {
        'writesubtitles': True,       # Enable subtitle download
        'subtitleslangs': ['en'],     # Language of subtitles
        'skip_download': True,        # Skip video/audio, only download subs
        'outtmpl': 'subtitles.%(ext)s'  # Save as subtitles.vtt (or srt if available)
    }
    return

In [None]:
# download_subtitles(url)

In [None]:
# download_audio(url)

In [None]:
def compress_audio(input_file, output_file="compressed.mp3"):
    os.system(f"ffmpeg -y -i {input_file} -ar 16000 -ac 1 {output_file}")
    return output_file

In [None]:
summ_model = whisper.load_model("medium")

100%|█████████████████████████████████████| 1.42G/1.42G [01:28<00:00, 17.4MiB/s]


In [None]:
def speech_to_text(audio_file):
    result = summ_model.transcribe(audio_file, task="translate")
    return result["text"]

In [None]:
# text = speech_to_text("compress_audio.mp3")

In [None]:
#langchain patr
# doc = Document(page_content=text, metadata={"source": "youtube"})
# print(doc.page_content[:100])

In [None]:
# splitter = RecursiveCharacterTextSplitter(
#     chunk_size=2000,
#     chunk_overlap=500,
# )

# chunks = splitter.split_documents([doc])

# print(len(chunks))
# print(chunks[0].page_content)

In [None]:
def generate_embeddings(text):
    doc = Document(page_content=text, metadata={"source": "youtube"})
    # print(doc.page_content[:100])
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
    )

    chunks = splitter.split_documents([doc])
    embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
    db = Chroma.from_documents(chunks, embeddings, persist_directory="./chroma_db")
    return db

In [None]:
def closest(query, db):
    results = db.similarity_search(query, k=3)
    if len(results)==0:
        print("No matching results...")
        return
    return results

In [None]:
# embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
# db = Chroma.from_documents(docs, embeddings, persist_directory="./chroma_db")

In [None]:
# PROMPT = """Answer the following questions based only on the following context:
# {context}
# ---
# Answer the question based on the above context:
# {que}
# """

In [None]:
# context_text = "\n\n---\n\n".join(doc.page_content for doc, _score in results)
# prompt_template = ChatPromptTemplate.from_template(PROMPT)
# prompt = prompt_template.format(context = context_text, question = query_text)

In [None]:
def create_prompt(results, question):
    PROMPT = """Answer the following questions based only on the following context:
    {context}
    ---
    Answer the question based on the above context:
    {que}
    """
    if not results:
      return "Sorry, I couldn’t find anything relevant."

    context_text = "\n\n---\n\n".join(
        doc.page_content if not isinstance(doc, tuple) else doc[0].page_content
        for doc in results
    )

    prompt_template = ChatPromptTemplate.from_template(PROMPT)
    prompt = prompt_template.format(context = context_text, que = question)
    return prompt

In [None]:
def answer__llm(question, closest_chunks):
    model = genai.GenerativeModel("gemini-2.5-pro")
    prompt = create_prompt(closest_chunks, question)
    response =  model.generate_content(prompt)
    # return response_text
    # response = model.generate_content(question)

    if response.candidates and response.candidates[0].content.parts:
        answer = response.candidates[0].content.parts[0].text
        #print(answer)
        return answer
    else:
        return "No answer generated."
        #return None


In [None]:
def complete(video_url):
    audio_file = download_audio(video_url)
    compressed_audio = compress_audio(input_file = audio_file)
    transcript = speech_to_text(compressed_audio)
    generate_embeddings(transcript)


In [None]:
def preparation(url):
    audio_file = download_audio(url)
    compressed_audio = compress_audio(input_file = audio_file)
    transcript = speech_to_text(compressed_audio)
    return transcript

In [None]:
# model_name = "sshleifer/distilbart-cnn-12-6"

In [None]:
def download_summariser(model_name):
    from huggingface_hub import hf_hub_download
    model_dir = hf_hub_download(repo_id=model_name, filename="config.json", cache_dir="models")
    summarizer = pipeline("summarization", model=model_name, cache_dir="models")
    return summarizer



In [None]:
def load_model_and_tokenizer(model_name):
    tokenizer = BartTokenizer.from_pretrained(model_name)
    model = BartForConditionalGeneration.from_pretrained(model_name)
    return tokenizer, model


In [None]:
# def summarize(text, maxSummarylength=1000):
#     model_name = "sshleifer/distilbart-cnn-12-6"
#     tokenizer, model = load_model_and_tokenizer(model_name)
#     inputs = tokenizer.encode("summarize: " + text,
#                               return_tensors="pt",
#                               max_length=1024,
#                               truncation=True)
#     summary_ids = model.generate(
#         inputs,
#         max_length=int(maxSummarylength),
#         min_length=int(maxSummarylength / 5),
#         length_penalty=10.0,
#         num_beams=4,
#         early_stopping=True
#     )
#     summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
#     return summary
def summarize(text, max_summary_length=200, min_summary_length=50):
    model_name = "sshleifer/distilbart-cnn-12-6"
    tokenizer, model = load_model_and_tokenizer(model_name)

    inputs = tokenizer.encode(
        "summarize: " + text,
        return_tensors="pt",
        max_length=1024,
        truncation=True
    )

    summary_ids = model.generate(
        inputs,
        max_length=max_summary_length,
        min_length=min_summary_length,
        length_penalty=2.0,   # balanced
        num_beams=4,
        early_stopping=True
    )

    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary

In [None]:
# def recursive_summarize(text, chunk_size=800):
#     words = text.split()
#     chunks = [" ".join(words[i:i+chunk_size]) for i in range(0, len(words), chunk_size)]

#     partial_summaries = []
#     for chunk in chunks:
#         partial_summaries.append(summarize(chunk))

#     return summarize(" ".join(partial_summaries))
def recursive_summarize(text, chunk_size=500):
    words = text.split()
    chunks = [" ".join(words[i:i+chunk_size]) for i in range(0, len(words), chunk_size)]

    partial_summaries = [summarize(chunk) for chunk in chunks]


    if len(" ".join(partial_summaries).split()) < 900:
        return summarize(" ".join(partial_summaries), max_summary_length=250)
    else:
        return " ".join(partial_summaries)

In [None]:
def adaptive_recursive_summarize(text, chunk_size=800, min_words=150, max_words=400):
    """
    Adaptive summarizer that produces summaries of reasonable length automatically.
    """
    # Step 1: Split into chunks
    words = text.split()
    chunks = [" ".join(words[i:i+chunk_size]) for i in range(0, len(words), chunk_size)]

    partial_summaries = [summarize(chunk, max_summary_length=max_words) for chunk in chunks]

    combined_summary = " ".join(partial_summaries)

    # Step 2: Adaptive refinement
    word_count = len(combined_summary.split())

    if word_count < min_words:
        # Too short → expand by re-summarizing original text with higher length
        return summarize(text, max_summary_length=max_words * 2)

    elif word_count > max_words * 3:
        # Too long → compress again
        return summarize(combined_summary, max_summary_length=max_words)

    else:
        # Good length → return combined
        return combined_summary

In [None]:
def summarize_functionality(text, loaded):
    if not loaded:
        model_name = "sshleifer/distilbart-cnn-12-6"
        download_summariser(model_name)
        tokenizer, model = load_model_and_tokenizer(model_name)
    return adaptive_recursive_summarize(text)

In [None]:
text = preparation(url)

[youtube] Extracting URL: https://www.youtube.com/watch?v=1tRTWwZ5DIc
[youtube] 1tRTWwZ5DIc: Downloading webpage
[youtube] 1tRTWwZ5DIc: Downloading tv client config
[youtube] 1tRTWwZ5DIc: Downloading tv player API JSON
[youtube] 1tRTWwZ5DIc: Downloading tv simply player API JSON


KeyboardInterrupt: 

In [None]:
# print(text)

 Well, starting tomorrow, work is the rest of your life. That's right. So my first wish, find work that gives you great joy. Jensen is here, of course the CEO of NVIDIA. This is the clear winner of every winner in the world of artificial intelligence. What an amazing year! His company powers everything from OpenAI, Google's programs. You are the stars. This is a celebration of your science, of your work and your innovations. My second wish for you is to learn to embrace failure. You know, once upon a time, NVIDIA was a failing company. Their first product was such a disaster that the customers had completely rejected their products. Microsoft had launched a lethal product that made them absolutely useless. And NVIDIA only had 30 days of cash left. At the time that we started the company in 1993, we were the only consumer 3D graphics company ever created. The year is 1996 and NVIDIA is collapsing. We want to contract with Sega to build their game console, which attracted games for our p

In [None]:
def summarize_url(text):
    loaded = False
    summary = summarize_functionality(text, loaded)
    loaded = True
    return summary

In [None]:
print(summarize_url(text))

Device set to use cuda:0


 Odoo is an all-in-one enterprise resource planning platform that brings together 45 easy to use applications . The company is worth $4.2 trillion, more than Apple and more than the GDP of 185 countries in the world . Odoo's user-friendly interface ensures that you can easily manage your business .  To understand Nvidia, you need to know what exactly is a chip and why does the world care so much about chips . Back in 1993 the PC revolution was just starting while the major tech companies would be racing to the pinnacle of the PC market From Microsoft software and faster processors to 3d graphics and modern gaming, Nvidia would change how we work play and live .  In 1995 Nvidia launched their first product a computer chip called Nv1 The chip that was released in November 1995 sound blaster compatible audio systems and 15 pin joystick boards To top it off, it would be compatible with the Sega Saturn console that was about to hit the market . But this is when something terrible happened .

In [None]:
db = generate_embeddings(text)

In [None]:
question = "tell me about the rise on nvidia"


In [None]:
def qna_functionality(question):
    res = closest(question, db)
    answer = answer__llm(question, res)
    # if answer !=None:
    print(answer)


In [None]:
qna_functionality(question)

Based on the context provided, here is a summary of the rise of Nvidia:

Nvidia was on the brink of failure when a "miracle" happened: Jensen Yuan built a chip called the Riva 128 in 1997. This chip, which stands for "real-time interactive video and animation accelerator," was the world's first fully hardware-accelerated pipeline for 3D rendering. Games ran smoothly on it, and both developers and reviewers loved it, leading to a surge in orders. The Riva 128 sold so well that Nvidia shipped one million units in the first four months, more than any other chipmaker in that period. This success marked the "rebirth of Nvidia" and made it the "undisputed king of 3d graphics acceleration."

To stay ahead of the competition, Nvidia pushed harder and in 1999 launched the GeForce 256, the world's first GPU (graphics processing unit). This new chip started the "parallel processing revolution," as a GPU can handle thousands of simple tasks simultaneously. This journey was a stepping stone that tu

In [None]:
# for m in genai.list_models():
#     print(m.name, m.supported_generation_methods)