<a href="https://www.kaggle.com/code/darshanabalakannan/youtube-video-summarization-using-genai?scriptVersionId=235254443" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# 🎬 GenAI-Powered YouTube Summarizer — Whisper + Gemini + GPT

This notebook demonstrates an end-to-end **GenAI pipeline** that turns any YouTube video into a **smart summary**, leveraging:

-  **Audio understanding** via OpenAI Whisper
-  **Text summarization** with Gemini/GPT
-  **Web search + grounding** when transcripts/audio fail

This addresses the common pain point of **long-form video content overload**, especially for:
- Students & Researchers
- Busy Professionals
- Knowledge Workers

## ❓ Problem Statement

> YouTube hosts 800M+ videos, but most people lack time to watch full-length content.

This tool solves:
- Lack of summaries for videos with no transcripts
- Language barriers or missing captions
- Time spent parsing through 10+ min videos

###  What this notebook does:
1. Attempts to fetch a transcript using `youtube_transcript_api`
2. If not available, transcribes the video with **Whisper** (Audio Understanding)
3. If that fails, fetches **title + description**, runs a **web search**, and summarizes
4. Generates an **AI summary** using **Gemini or GPT** (Few-shot + Controlled Output)

## 🧠 GenAI Capabilities Demonstrated

| Capability                     | Implementation                         |
|-------------------------------|----------------------------------------|
| Audio Understanding           | OpenAI Whisper for transcription       |
| Few-shot Prompting            | Gemini/GPT structured summarization    |
| Grounding                     | Google Search + YouTube Metadata       |
| Long Context Window           | Handles long transcripts with Gemini   |
| Controlled Generation         | Instruction-based summarization        |

## 📦 Import Required Libraries

We begin by importing all the necessary libraries and packages for the GenAI-powered YouTube summarizer pipeline.

These cover a wide range of capabilities including:

- **Web scraping & downloads** (`yt_dlp`, `youtube_transcript_api`)
- **Data handling & utilities** (`pandas`, `datetime`, `re`, `os`, `html`, `emoji`)
- **GenAI APIs** (`openai`, `google.generativeai`)
- **Audio transcription** (`whisper`)
- **YouTube Data API** (`googleapiclient.discovery`)
- **Search grounding** (`googlesearch`)
- **Torch for GPU/CPU support** (`torch`)
- **Jupyter formatting** (`IPython.display`)
- **Warning suppression** (`warnings`)

In [1]:
# Standard Library
import os
import re
import html
import warnings
from datetime import datetime

# Third-party Libraries
import pandas as pd
import emoji
import yt_dlp
import openai
import whisper
import torch
from youtube_transcript_api import YouTubeTranscriptApi
from googlesearch import search
from IPython.display import HTML, Markdown, display

# Google APIs
from google.generativeai import GenerativeModel, configure
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError

# Suppress warnings
warnings.filterwarnings("ignore")


## 🔐 API Key Configuration (Kaggle Secrets)

This section securely loads API keys stored in your **Kaggle Secrets Manager**, ensuring credentials are not exposed in the notebook.

We retrieve and configure keys for:
-  **Gemini API** (for summarization using Google’s models)
-  **OpenAI API** (for Whisper or GPT summarization fallback)
-  **YouTube Data API** (to fetch video metadata like title & description)

### Steps:
1. Use `UserSecretsClient()` to fetch secrets stored in Kaggle.
2. Set and configure each API client accordingly:
   - `configure()` for Google Gemini
   - `openai.api_key` for OpenAI
   - `build()` to initialize the YouTube Data API client

In [2]:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("GEMINI_API_KEY")
secret_value_1 = user_secrets.get_secret("OPENAI_API_KEY")
secret_value_2 = user_secrets.get_secret("YOUTUBE_API_KEY")

GEMINI_API_KEY= secret_value_0
configure(api_key=GEMINI_API_KEY)
OPENAI_API_KEY = secret_value_1
openai.api_key = OPENAI_API_KEY
YOUTUBE_API_KEY =secret_value_2
youtube = build("youtube", "v3", developerKey=YOUTUBE_API_KEY)

## 🔍 Extract YouTube Video ID

This utility function extracts the **YouTube video ID** from a given video URL. This is necessary because APIs like `youtube_transcript_api` and the YouTube Data API require the raw video ID, not the full URL.

### How it works:
- It uses a regular expression to match:
  - `v=` in standard URLs (e.g., `https://www.youtube.com/watch?v=VIDEO_ID`)
  - or shortened URLs (e.g., `https://youtu.be/VIDEO_ID`)
- Returns the video ID if found, otherwise returns `None`.

 Handles both full and short YouTube link formats.

In [3]:
def extract_video_id(url):
    match = re.search(r"(?:v=|youtu\.be/)([^&]+)", url)
    return match.group(1) if match else None

## 🧹 Data Cleaning Utility

This function cleans and normalizes raw text data extracted from transcripts, audio, or metadata. It's particularly useful when dealing with messy content from YouTube captions or speech-to-text output.

###  Steps Performed:
- **Remove emojis** using the `emoji` package
- **Decode HTML entities** like `&amp;` → `&`
- **Strip non-ASCII characters**, ensuring clean English text
- **Normalize whitespace** by collapsing multiple spaces into one and trimming

###  Output:
Returns a clean, plain-text string ready for use in summarization or display.

In [4]:
def datacleaning(text: str) -> str:
    text = emoji.replace_emoji(text, replace='')  # Remove emojis
    text = html.unescape(text)  # Decode HTML entities
    text = re.sub(r'[^\x00-\x7F]+', '', text)  # Remove non-ASCII characters
    text = re.sub(r'\s+', ' ', text).strip()  # Remove extra spaces
    return text

## 🎥 Fetch Transcript from YouTube

This function attempts to fetch the transcript of a YouTube video using the `youtube_transcript_api`. It first checks if transcripts are available in English and then attempts to retrieve and clean the text.

###  Process:
1. **Check for available transcripts**: The function starts by listing available transcripts for the given video ID.
2. **Filter for English Transcripts**: It prioritizes English transcripts, including variants like `en-US`, `en-GB`, etc.
3. **Fetch and Clean**: Once an English transcript is found, the text is cleaned using the `datacleaning` function, removing non-ASCII characters, extra spaces, etc.
4. **Fallback**: If no English transcript is found, it tries transcripts in other languages available for the video.
5. **Return Cleaned Text**: If any valid transcript is found, the function returns the cleaned text. Otherwise, it returns `None`.

###  Output:
- **Cleaned transcript text** in case of success, or `None` if no transcript is available.

In [5]:
def get_transcript(video_id):
    try:
        transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
        english_transcripts = [t.language_code for t in transcript_list if t.language_code.startswith('en')]

        if english_transcripts:
            for lang in ['en'] + english_transcripts:
                try:
                    transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=[lang])
                    raw_text = " ".join([t['text'] for t in transcript])
                    if raw_text.strip():
                        return datacleaning(raw_text)
                except:
                    continue
        else:
            available_langs = [t.language_code for t in transcript_list]
            for lang in available_langs:
                try:
                    transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=[lang])
                    raw_text = " ".join([t['text'] for t in transcript])
                    if raw_text.strip():
                        return datacleaning(raw_text)
                except:
                    continue
    except:
        return None

## 🎧 Download YouTube Audio

This function downloads the audio from a YouTube video in M4A format using the `yt-dlp` library. It does not require FFmpeg, making it a simple and efficient method for downloading YouTube audio.

###  Process:
1. **Directory Setup**: It creates a folder (`audio` by default) to store the downloaded file. If the directory already exists, it will be skipped.
2. **File Naming**: The audio file is named using the `video_id` and a timestamp to ensure uniqueness.
3. **Download Options**: 
    - It specifies to download the best available audio in M4A format.
    - The downloaded file's extension is preserved using the correct output path.
4. **Download Execution**: The `yt-dlp` library is used to fetch the audio.
5. **File Verification**: After the download completes, the script checks whether the file exists at the specified path and confirms success.

###  Output:
- **Audio file path**: If the download is successful, it returns the full path to the downloaded M4A file.
- **None**: If an error occurs during the download or file verification.

###  Error Handling:
If anything goes wrong (e.g., network issues, download failures), the function will print an error message and return `None`.

In [6]:
def download_audio(video_id, output_dir="audio"):
    """Download YouTube audio in native M4A format (no FFmpeg needed)"""
    try:
        os.makedirs(output_dir, exist_ok=True)
        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
        filename = f"{video_id}_{timestamp}.m4a"
        output_path = os.path.join(output_dir, filename)

        ydl_opts = {
            'format': 'bestaudio[ext=m4a]',
            'outtmpl': output_path.replace('.m4a', '.%(ext)s'),  # Preserve correct extension
            'quiet': True,
            'no_warnings': True,
        }

        print(f"Downloading audio for video ID: {video_id}...")
        with yt_dlp.YoutubeDL(ydl_opts) as ydl:
            ydl.download([f"https://youtube.com/watch?v={video_id}"])

        # Recheck actual filename in case extension was overwritten
        actual_file = output_path.replace('.m4a', '.m4a')  # This should still be the right path

        if os.path.exists(actual_file):
            print(f"Audio downloaded successfully: {actual_file}")
            return actual_file
        else:
            print("Download completed, but file not found.")
            return None

    except Exception as e:
        print(f"Download failed: {str(e)}")
        return None

## 🎙️ Whisper Audio Transcription

This function uses OpenAI's Whisper model to transcribe audio files into text. It automatically selects the appropriate device (GPU or CPU) based on the system's hardware capabilities.

###  Process:
1. **Device Selection**: The function checks whether CUDA-compatible GPU is available and selects either "cuda" (GPU) or "cpu" for transcription.
2. **Load Whisper Model**: It loads the Whisper model (using "small" size in this case, but "medium" or "large" can also be used based on accuracy requirements).
3. **Transcribe Audio**: The audio file at the given `audio_path` is passed to the model for transcription.
4. **Clean Text**: Once transcription is done, the raw text is cleaned using the `datacleaning` function to remove unwanted characters and formatting.
5. **Error Handling**: If any errors occur during transcription, the function will print an error message and return `None`.

###  Output:
- **Cleaned transcription text** if successful, or `None` if an error occurs during transcription.

In [7]:
def whisper_transcribe(audio_path):
    try:
        device = "cuda" if torch.cuda.is_available() else "cpu"
        print("Transcribing with Whisper...")
        whisper_model = whisper.load_model("small",device=device)  # or "small", "medium", "large"
        result = whisper_model.transcribe(audio_path)
        return datacleaning(result['text'])
    except Exception as e:
        print(f"Whisper transcription failed: {e}")
        return None

## 🔍 Search and Summarize

This function performs a web search based on the provided YouTube video **title** and **description**, then generates a concise summary using the search results. The goal is to provide a high-level overview of the video's subject matter by referencing external content.

###  Process:
1. **Search Query**: The function creates a search query by combining the `title` and `description` of the video.
2. **Search Results**: It performs a web search and retrieves the top 3 results related to the combined query.
3. **Content Extraction**: For each search result, it extracts the URL and includes a placeholder for content (typically, this would be scraped or fetched).
4. **Prompt Creation**: The script constructs a prompt that includes:
    - The video title and description.
    - The extracted search result content.
5. **Summarization**: The prompt is sent to a summarization model (e.g., GPT-3, GPT-4, or any custom model), and it returns a summary based on the context provided.

###  Output:
- **Summary**: A 5-7 sentence summary highlighting key takeaways from the video metadata and search results.

###  Error Handling:
If an error occurs during the search process or summarization, it will print an error message and return `None`.

### Example Use Case:
If you have a YouTube video with a title like "How to Bake a Cake" and a description explaining the ingredients and process, this function will:
- Search for articles and content related to "How to Bake a Cake."
- Summarize the relevant content in 5-7 sentences, focusing on the most important details about cake baking.

In [8]:
def search_and_summarize(title, description):
    try:
        query = f"{title} {description}"
        search_results = search(query, num_results=3)

        web_content = ""
        for url in search_results:
            web_content += f"\nURL: {url}\nExtracted Content: This is a top search result related to the title and description.\n"

        prompt = f"""
You are a helpful assistant. Based on the following YouTube video metadata and search results, generate a concise summary:

Title: {title}
Description: {description}

Search Result Context:
{web_content}

Summarize the content above in 5-7 sentences focusing on the key takeaways or subject matter.
"""

        summary = summarize_with_any_model(prompt)
        return summary

    except Exception as e:
        print(f"Error during web search or summarization: {e}")
        return None

## 📡 Fetch Metadata from YouTube API

This function retrieves the **title** and **description** of a YouTube video using the **YouTube Data API v3**. It fetches the video metadata by calling the `videos().list()` method, which returns detailed information about the video.

###  Process:
1. **API Request**: The function makes an API request to YouTube's Data API, passing the video ID and requesting the `snippet` part, which includes metadata like the title and description.
2. **Extract Metadata**: After the request is executed, it extracts the `title` and `description` from the response JSON.
3. **Return Values**: The function returns the video title and description as a tuple `(title, description)`.

###  Output:
- **title**: The title of the video.
- **description**: A brief description of the video content.

###  Error Handling:
If the API request fails or encounters an error (e.g., invalid video ID, API quota issues), the function prints an error message and returns `None` for both the title and description.

### Example Use Case:
Given a YouTube video ID, this function will return the video's title and description, which can be used for further analysis or processing, such as summarization or metadata extraction.

### Example:
```python
video_id = 'dQw4w9WgXcQ'  # Sample YouTube video ID
title, description = fetch_metadata_youtube_api(video_id, youtube)
print(title)
print(description)

In [9]:
def fetch_metadata_youtube_api(video_id, youtube):
    try:
        request = youtube.videos().list(part="snippet", id=video_id)
        response = request.execute()
        title = response['items'][0]['snippet']['title']
        description = response['items'][0]['snippet']['description']
        return title, description
    except HttpError as e:
        print(f"YouTube API metadata fetch failed: {e}")
        return None, None

## 🧠 Summarize with Multiple Models

This function attempts to summarize a given **text** using multiple AI models. It first tries Gemini models (from Google's **Gemini** suite) and, if they fail, falls back to **OpenAI's GPT models** for summarization.

###  Process:
1. **Gemini Models**:
   - The function first tries several **Gemini models** from Google's suite (`gemini-1.5` and `gemini-2.0`).
   - It attempts to summarize the provided text using each model in sequence.
   - If a model fails (e.g., connection issues, model error), it moves to the next one in the list.

2. **Fallback to OpenAI GPT Models**:
   - If all Gemini models fail, the function tries to summarize the text using **OpenAI GPT models**.
   - It first attempts to use **GPT-4** for summarization.
   - If GPT-4 fails, it falls back to **GPT-3.5** for the task.

3. **Return Summarized Text**:
   - The function returns the summarized text from the first model that successfully processes the input.
   - If all models fail, it returns `None`.

###  Output:
- A **summarized text** based on the input text. 
- The function tags the response from GPT models to indicate which model generated the summary (e.g., "[GPT-4]" or "[GPT-3.5]").

###  Error Handling:
- The function prints the error details for each model that fails.
- If all models fail, it returns `None`.

### Example Use Case:
The function is ideal for summarizing large pieces of text where multiple models are used to ensure a good quality summary, even if one model encounters an issue.

### Example:
```python
text_to_summarize = "Here is a large block of text that needs to be summarized. It could be a transcript or any other long content."
summary = summarize_with_any_model(text_to_summarize)
print(summary)

In [10]:
def summarize_with_any_model(text):
    gemini_models = [
        "models/gemini-1.5-pro", "models/gemini-1.5-pro-001", "models/gemini-1.5-pro-002",
        "models/gemini-1.5-flash", "models/gemini-1.5-flash-latest", "models/gemini-1.5-flash-001",
        "models/gemini-2.0-flash", "models/gemini-2.0-pro-exp", "models/gemini-2.0-flash-001"
    ]

    # Try Gemini models
    for model_name in gemini_models:
        try:
            print(f"Trying Gemini model: {model_name}")
            model = GenerativeModel(model_name)
            gemini_response = model.generate_content(f"Summarize the following transcript:\n\n{text}")
            return gemini_response.text
        except Exception as e:
            print(f"Gemini model {model_name} failed.")

    print("All Gemini models failed. Trying OpenAI GPT...")

    # Try OpenAI GPT-4, fallback to GPT-3.5
    try:
        response = openai.ChatCompletion.create(
            model="gpt-4",
            messages=[
                {"role": "system", "content": "Summarize the following transcript."},
                {"role": "user", "content": text}
            ],
            max_tokens=1024,
            temperature=0.7
        )
        return "[GPT-4] " + response.choices[0].message.content.strip()
    except Exception as e:
        print(f"GPT-4 failed.")
        try:
            response = openai.ChatCompletion.create(
                model="gpt-3.5-turbo",
                messages=[
                    {"role": "system", "content": "Summarize the following transcript."},
                    {"role": "user", "content": text}
                ],
                max_tokens=1024,
                temperature=0.7
            )
            return "[GPT-3.5] " + response.choices[0].message.content.strip()
        except Exception as e:
            print(f"All summarization models failed.")
            return None

## 🎬 YouTube Summarizer with Gemini + Whisper

This function is the main driver for processing YouTube video URLs. It fetches the transcript of a video (either from YouTube's transcript API or by transcribing audio using **Whisper**). If a transcript is not available, it retrieves the video’s **metadata** (title and description) and performs a **web search** to summarize relevant content. The summary is then generated using either **Gemini** models or **OpenAI GPT models**.

###  Process Overview:
1. **Input URL**:
   - The function prompts the user to enter a **YouTube URL**.

2. **Video ID Extraction**:
   - The function extracts the **video ID** from the URL using the `extract_video_id` function.
   - If the URL is invalid or no ID is found, the process ends.

3. **Transcript Fetching**:
   - The function first tries to fetch the **YouTube transcript** for the video using `get_transcript`.
   - If the transcript is available, it’s used for summarization.
   - If the transcript is unavailable, the function attempts to **download the audio** and **transcribe it using Whisper**.

4. **Metadata & Web Search**:
   - If no transcript is available, the function retrieves the **video metadata** (title and description) via the YouTube API (`fetch_metadata_youtube_api`).
   - It then searches the web for related content using the `search_and_summarize` function to generate a **summary** based on the title, description, and web search results.

5. **Summarization**:
   - Once the transcript is available, the function generates a **summary** of the content using either **Gemini** or **OpenAI GPT models** (via `summarize_with_any_model`).

6. **Output**:
   - The function prints:
     - **Transcript** (if available)
     - **Web Search Summary** (if transcript isn't available)
     - **Final Summary** (from AI models)

###  Example Flow:
1. **Enter YouTube URL**
2. **Fetch Transcript**:
   - If transcript is found → Display and summarize.
   - If transcript is not found → Download audio → Use Whisper for transcription.
3. **If No Transcript**:
   - Fetch metadata → Perform web search → Generate web summary.
4. **Generate AI Summary**:
   - Use either Gemini models or OpenAI GPT models to generate a summary.

### Example:
```python
def main():
    # Main driver function
    # Takes user input for YouTube URL, processes it, fetches transcript, 
    # attempts audio transcription, and summarizes the content.

In [11]:
def main():
    import time

    print("🎬 YouTube Summarizer with Gemini + Whisper")
    #Commenting the below lines since Kaggle Environment doesn't support input request while saving version
    # url = input("Enter YouTube URL: ").strip()

    # if not url:
    #     print("No URL entered. Exiting.")
    #     return

    url ='https://www.youtube.com/watch?v=l-kE11fhfaQ'

    video_id = extract_video_id(url)
    if not video_id:
        print("Invalid YouTube URL.")
        return

    print("Processing...")
    time.sleep(1)

    transcript = get_transcript(video_id)

    if transcript:
        print("✅ Transcript fetched successfully!")
    else:
        print("❗Transcript not available. Trying audio transcription...")
        audio_path = download_audio(video_id)
        if audio_path:
            transcript = whisper_transcribe(audio_path)
            if transcript:
                print("✅ Audio transcribed successfully!")

    if not transcript:
        print("⚠️ No transcript available. Fetching metadata and searching the web...")
        # Make sure you've already initialized `youtube = build("youtube", "v3", developerKey=...)`
        title, description = fetch_metadata_youtube_api(video_id, youtube)

        if title and description:
            # print(f"\n📄 Title: {title}\n\n📋 Description: {description}\n")
            web_summary = search_and_summarize(title, description)
            if web_summary:
                print("\n🧠 Web Search Summary:")
                print(web_summary)
            else:
                print("❌ Failed to generate web summary.")
        else:
            print("❌ Failed to retrieve metadata.")
    else:
        print("\n📄 Transcript (preview):")
        print(transcript[:500] + "..." if len(transcript) > 500 else transcript)

        print("\n🧠 Generating summary...")
        summary = summarize_with_any_model(transcript)
        if summary:
            print("\n✅ Summary:")
            display(Markdown(summary))
        else:
            print("❌ Summarization failed.")

In [12]:
# Run the script
if __name__ == "__main__":
    main()
    #https://www.youtube.com/watch?v=l-kE11fhfaQ
    # https://www.youtube.com/watch?v=xH5EY7FCFQw
    #['XvtFppcynYM','mg1ZqahIpVw','xH5EY7FCFQw','an8SrFtJBdM','Qm79wDSCZ-w','vPd7H8EMmD0']

🎬 YouTube Summarizer with Gemini + Whisper
Processing...
✅ Transcript fetched successfully!

📄 Transcript (preview):
Hi guys! I'm sure you have all heard of ChatGPT by now. It has become a buzzword within days of its release and professionals in all fields, especially in high skilled areas like lawyers, doctors, engineers are questioning whether such AI can actually replace them and work. So in this video I want to talk about what Chat GPT is and how it even popped up, talk a bit about the organization behind GPT called "OpenAI", which has already created many other machine learning models besides Chat GPT and...

🧠 Generating summary...
Trying Gemini model: models/gemini-1.5-pro

✅ Summary:


This video discusses the impact of ChatGPT on engineering, particularly DevOps.  The presenter explores what ChatGPT is, its origins in OpenAI, and its technical underpinnings.  The video then dives into a live demo, using ChatGPT for several DevOps tasks:

* **Docker:** Generating Dockerfiles for a Node.js application, optimizing them with .dockerignore files, and implementing multi-stage builds.  The presenter highlights ChatGPT's ability to explain its output and incorporate feedback for iterative improvements.
* **Kubernetes:** Creating deployment and service manifests, adding resource quotas, and implementing production-ready and security best practices.  The presenter notes that while ChatGPT provides good boilerplate code, some knowledge is needed to validate its output.
* **Jenkins:** Generating a Jenkinsfile for a CI/CD pipeline, including building, testing, Docker image creation, and deployment to Kubernetes.  The presenter finds this task more challenging for ChatGPT, requiring significant manual adjustments and highlighting the need for existing CI/CD knowledge.  The video also briefly touches on converting the Jenkinsfile to a GitLab CI configuration, with relatively positive results.

Beyond the demo, the video discusses an open-source CLI tool called aiac, built on top of ChatGPT for generating infrastructure-as-code templates. The presenter demonstrates its usage for creating Dockerfiles and Terraform scripts, emphasizing its concise output and ability to retry generations.

Finally, the video addresses the question of whether ChatGPT will replace engineers. The presenter argues that while AI can automate certain tasks, engineers will become more productive by leveraging these tools.  The focus should shift towards higher-level thinking, problem-solving, and "prompt engineering" to effectively utilize AI. The presenter concludes that engineering jobs are not threatened but rather evolving, emphasizing the importance of continuous learning and adaptation.  He encourages viewers to share their own experiences with AI in the comments.
