# 🎓 YouTube Learning Assistant with Gemini AI

## 🔍 Problem Statement

With the rapid growth of educational and tutorial content on YouTube, learners often struggle to extract key points, summarize long videos, or ask contextual questions. Watching a full 40-minute video can be overwhelming, especially for students, researchers, and busy professionals.

**Solution**: A YouTube Learning Assistant that:
- Extracts the video transcript automatically
- Summarizes the content intelligently using Gemini
- Answers user questions using the transcript as knowledge base (RAG)
- Optionally outputs structured content like chapter breakdown or quiz Q&A format

This assistant makes video content more accessible, efficient, and interactive using powerful GenAI capabilities.

## 💡 GenAI Capabilities Used
- 📄 Document Understanding & Summarization
- 🧠 Retrieval-Augmented Generation (RAG)
- 🧰 Structured Output with Function Calling (JSON)


In [1]:
# !pip install youtube-transcript-api


In [2]:
from youtube_transcript_api import YouTubeTranscriptApi
from urllib.parse import urlparse, parse_qs

In [3]:
def get_all_transcripts(video_id):
    try:
        # Retrieve all transcript objects for the video regardless of language or type.
        transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
        
        # Initialize a dictionary to store transcripts keyed by language code.
        all_transcripts = []
        
        # For each transcript object, fetch its transcript data.
        for transcript in transcript_list:
            transcript_data = transcript.fetch()
            # Join the list of text pieces into a single string using attribute access.
            transcript_text = " ".join([segment.text for segment in transcript_data])

            all_transcripts.append(transcript.language_code)
            
            # Store the transcript text along with its language details.
        #     all_transcripts[transcript.language_code] = {
        #         "language": transcript.language,
        #         "type": "auto-generated" if transcript.is_generated else "manually created",
        #         "transcript": transcript_text
        #     }
        return all_transcripts
    except Exception as e:
        return f"Error fetching transcripts: {e}"

In [4]:
def get_youtube_transcript(youtube_url,language = 'en'):
    # Extract video ID
    video_id = parse_qs(urlparse(youtube_url).query).get("v")
    if not video_id:
        raise ValueError("Invalid YouTube URL. Couldn't find video ID.")
    video_id = video_id[0]

    # Fetch transcript
    try:
        transcript_data = YouTubeTranscriptApi.get_transcript(video_id,languages = get_all_transcripts(video_id))
        transcript = " ".join([t["text"] for t in transcript_data])
        return transcript
    except Exception as e:
        return f"Error fetching transcript: {e}"


In [5]:
yt_url = "https://www.youtube.com/watch?v=Z6U3tVjHcUI"  # Replace with a real link
transcript = get_youtube_transcript(yt_url)
print(transcript[:1000])  # Just show first 1000 characters


ए गंगनाम गंगनाम गंगनाम अरे लाव की अजय अतुल लाव दिस इज द साउंड ऑफ अजय अतुल ब्रिंग इट ऑन बेबी ये ब्रिंग बेबी [संगीत] पोरं जमली एसी वरती चर्चा बोरिंग झाली चल र भावड्या पार्टीला मंग पारावरती आली मार्ग रिचली क्वार्टर भावड्याला मंग बसला स्टार्टर चल र पिंट्या मिटवू आपल्या डिस्को डान्सिंग खाजेला डॉल्बी वाल्या बोलाव माझ्या डीजे ला डीजेला वाल्या बोलाव माझ्या डीजेला डीजेला डॉल्बी वाल्या बोलाव माझ्या डीजेला वाल्या बोला माझ्या [संगीत] डीजेला अरे वराडी नसून वराटी मंदी हा घुसतो नाचाया अन झिंगून झिंगून नाचला हा निस्त लागू दे वाजाया आला मिरवण येत भावड्या कधी दांडिया खेळतोय भावड्या अंडी फोडायवर गोविंदा खाली नाचून घेतोय भावड्या टांगा पलटी सुटले घोडे फाटून तुटले जोडे घिरक्या घेतो कारण सजरा घेतो सोडून लाजेला डॉल्बी वाल्या बोलाव माझ्या डीजेला डीजेला डॉल्बी वाल्या बोलाव माझ्या डीजेला डीजेला डॉल्बी वाल्या बोलाव माझ्या डीजेला ये hey dj when you play my sound when you're playing on now when you get it all [संगीत] around आला डीजे रंगात चेटून चेटून लावतो  भवड्या भरून कलास करतो कलास टॉप वॉटमा आला डीजे बी रंगात खेट

In [6]:
# ! pip install google-generativeai
# ! pip install ipywidgets

In [7]:
import google.generativeai as genai
import os

#  Safely access Gemini API Key
# from google.colab import userdata
GOOGLE_API_KEY = 'AIzaSyD1HZI0Tvbl7BlhcNt0zcN3HZBptgoEb-E'

In [8]:
genai.configure(api_key=GOOGLE_API_KEY)

#  Initialize the Gemini Pro model
model = genai.GenerativeModel('models/gemini-2.0-flash')


In [9]:
# Summarize transcript
summary_prompt = f"Summarize this YouTube video transcript clearly and concisely:\n\n{transcript[:12000]}"
summary_response = model.generate_content(summary_prompt)

# Print the summary
print(summary_response.text)


The video transcript is a lively, upbeat song about partying hard and celebrating with friends, fueled by drinks and a DJ. It describes scenes of dancing, revelry, and wild behavior, encouraging the DJ to keep the music pumping and the party going all night long. It references various cultural elements like "Varadi," "Dandia," and "Govidha," highlighting a celebratory and energetic atmosphere.



In [13]:
def generate_prompt(transcript, question=None, video_type="general", is_hindi = False, task="qa"):
    """
    Generates a prompt for Gemini model based on transcript type and task.

    Parameters:
    - transcript: str, the video transcript
    - question: str, optional question for Q&A (ignored for summary task)
    - video_type: str, e.g., 'educational', 'motivational', etc.
    - is_hindi: bool, whether the transcript is in Hindi
    - task: str, either 'qa' or 'summary'

    Returns:
    - str: Formatted prompt for Gemini
    """
    base_prompts = {
        "educational": "You are a helpful tutor.",
        "motivational": "You are a motivational content explainer.",
        "product": "You are a product assistant helping users understand tech reviews or tutorials.",
        "news": "You are a factual news analyst.",
        "general": "You are a smart assistant that explains things clearly."
    }

    role = base_prompts.get(video_type.lower(), base_prompts["general"])

    if is_hindi and task == "summary":
        return f"""{role}
The following transcript is in Hindi. Translate it into English and summarize the key points clearly and concisely.

Transcript:
{transcript[:12000]}
"""
    elif task == "summary":
        return f"""{role}
Summarize the following transcript clearly and concisely.

Transcript:
{transcript[:12000]}
"""
    else:
        return f"""{role}
Based on the transcript below, answer the question clearly and concisely.

Transcript:
{transcript[:12000]}

Question: {question}
"""


In [14]:
def ask_question(model, transcript, question=None, video_type="general", is_hindi=False, task="qa"):
    """
    Generates a response from Gemini based on a video transcript.

    Parameters:
    - model: Gemini model object
    - transcript: full transcript from video
    - question: question to ask (optional for summary)
    - video_type: genre of video e.g., "motivational", "educational", "news", etc.
    - is_hindi: set True if the transcript is in Hindi
    - task: either "qa" or "summary"

    Returns:
    - text response from Gemini
    """
    prompt = generate_prompt(transcript, question, video_type, is_hindi, task)
    response = model.generate_content(prompt)
    return response.text


In [15]:
question = "What are the main points discussed in the video?"
video_type = "motivational"  # change to "news", "product", "educational", or "general"

answer = ask_question(model, transcript, question, video_type)
print(answer)


The video is a high-energy song celebrating music, dance, and friendship, specifically focusing on the excitement and fun surrounding a DJ and a lively party atmosphere. It describes people coming together to dance, drink, and enjoy themselves, with references to local customs and traditions.



In [16]:
question = "What are the main points discussed in the video?"
answer = ask_question(model, transcript, question, video_type="educational", is_hindi=False, task="qa")
print(answer)


The video is a song about partying, dancing, and calling for the DJ to play music. It describes a lively atmosphere with people dancing, drinking, and enjoying themselves. The lyrics also mention specific scenarios like a wedding procession, playing Dandiya, and celebrating like it's a festival.



In [17]:
answer = ask_question(model, transcript, video_type="motivational", is_hindi=True, task="summary")
print(answer)

Okay, here's the translation and summary of the provided Hindi transcript:

**Translation:**

The transcript is the lyrics to a Marathi song, likely a high-energy, celebratory track perfect for dancing. Here's a general sense of the lyrics, translated into English:

"A Gangnam Gangnam Gangnam... Hey, Ajay Atul, bring it on! This is the sound of Ajay Atul! Bring it on baby!

(Music)

The discussion is boring, let's go to the party. Then, the atmosphere rises and the road is reached, let's take a starter with quarter. Let's dance with our friends. Call my DJ!

(Chorus): Call the DJ!

(Verse 2):
He is from 'Varhati' not 'Varadi' but he is barging to dance. He's dancing with intoxication, let the music play. The procession is coming! Sometimes he is playing dandiya, and sometimes he is breaking handis and dancing as Govinda. He is taking rounds like a top without shyness.

(Chorus): Call the DJ!

Hey DJ when you play my sound when you're playing on now when you get it all around!

The DJ i

In [18]:
answer = ask_question(model, transcript, video_type="news", is_hindi=False, task="summary")
print(answer)


This is a song, seemingly in Marathi, celebrating a party atmosphere with heavy emphasis on music provided by a DJ. The lyrics describe people drinking, dancing wildly, and specifically requesting the DJ to play louder and louder. The song mentions various celebratory activities, including traditional dances and revelry. The overall tone is energetic, chaotic, and focused on enjoying the music and party.

