# YouTube Video Summarizer with Gemini AI
## 5-day Gen AI Intensive Course Capstone Project

### Introduction

This notebook demonstrates a YouTube video summarization system that leverages multiple Gen AI capabilities from Google's Vertex AI. The project aims to help users quickly understand the content of YouTube videos without watching them in full by generating concise, informative summaries.

### Use Case

In today's information-rich world, we often encounter interesting YouTube videos but lack the time to watch them entirely. This project addresses this problem by:
1. Extracting transcripts from YouTube videos
2. Using Gen AI to analyze and summarize the content
3. Providing structured, concise summaries that capture key information

### Gen AI Capabilities Demonstrated
1. **Document Understanding** - Processing and comprehending video transcripts
2. **Structured Output** - Generating summaries in consistent, structured formats
3. **Few-shot Prompting** - Improving summary quality with examples
4. **Long Context Window** - Handling lengthy video transcripts

## Setup
### Install Required Libraries

In [None]:
!pip install google-generativeai youtube-transcript-api pytubefix pandas numpy matplotlib

### Import Libraries

In [None]:
import google.generativeai as genai
from youtube_transcript_api import YouTubeTranscriptApi
from pytubefix import YouTube
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import re
import json
import os
from IPython.display import HTML, display

### Set up your API key

To run the following cell, your API key must be stored it in a [Kaggle secret](https://www.kaggle.com/discussions/product-feedback/114053) named `GOOGLE_API_KEY`.

If you don't already have an API key, you can grab one from [AI Studio](https://aistudio.google.com/app/apikey). You can find [detailed instructions in the docs](https://ai.google.dev/gemini-api/docs/api-key).

To make the key available through Kaggle secrets, choose `Secrets` from the `Add-ons` menu and follow the instructions to add your key or enable it for this notebook.

In [None]:
from kaggle_secrets import UserSecretsClient

GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")

genai.configure(api_key=GOOGLE_API_KEY)

## Data Collection: YouTube Video Processing

### Function to Extract Video Information and Transcript

In [None]:
def get_video_info(video_url):
    """Get video information from YouTube URL with better error handling."""
    try:
        # Extract video ID from URL
        if "youtube.com" in video_url:
            video_id = re.findall(r'(?:v=|\/)([0-9A-Za-z_-]{11}).*', video_url)[0]
        elif "youtu.be" in video_url:
            video_id = video_url.split('/')[-1][:11]
        else:
            raise ValueError("Invalid YouTube URL format")
            
        print(f"Extracted video ID: {video_id}")
        
        # Try using pytube
        try:
            yt = YouTube(video_url)
            
            video_info = {
                'title': yt.title,
                'channel': yt.author,
                'publish_date': yt.publish_date,
                'views': yt.views,
                'description': yt.description,
                'length': yt.length,
                'thumbnail': yt.thumbnail_url,
                'video_id': video_id
            }
            
            return video_info
            
        except Exception as e:
            print(f"Pytube error: {e}")
            # Fall back to basic info
            return {
                'title': f"Video {video_id}",
                'channel': "Unknown",
                'publish_date': None,
                'views': 0,
                'description': "",
                'length': 0,
                'thumbnail': f"https://img.youtube.com/vi/{video_id}/maxresdefault.jpg",
                'video_id': video_id
            }
            
    except Exception as e:
        print(f"Error fetching video info: {e}")
        return None

def get_transcript(video_id):
    """Get transcript from YouTube video ID with fallback options."""
    try:
        transcript_list = YouTubeTranscriptApi.get_transcript(video_id)
        full_transcript = " ".join([entry['text'] for entry in transcript_list])
        return full_transcript
    except Exception as e:
        print(f"Error fetching transcript: {e}")
        return "No transcript available. The AI will analyze based on video metadata only."

### Function to Display Video Information

In [None]:
def display_video_info(video_info):
    """Display video information in a nice format."""
    if not video_info:
        return
    
    html_content = f"""
    <div style="display: flex; margin-bottom: 20px;">
        <div style="margin-right: 20px; margin-top: 15px;">
            <img src="{video_info['thumbnail']}" width="320" style="border-radius: 5px;"/>
        </div>
        <div>
            <h2>{video_info['title']}</h2>
            <p><b>Channel:</b> {video_info['channel']}</p>
            <p><b>Published:</b> {video_info['publish_date']}</p>
            <p><b>Views:</b> {video_info['views']:,}</p>
            <p><b>Length:</b> {int(video_info['length'] // 60)} min {video_info['length'] % 60} sec</p>
        </div>
    </div>
    """
    display(HTML(html_content))

## Implementing Gen AI Capabilities

### 1. Document Understanding: Process YouTube Transcript

In [None]:
def preprocess_transcript(transcript):
    """Clean and prepare the transcript for summarization."""
    # Remove extra whitespace
    cleaned = re.sub(r'\s+', ' ', transcript).strip()
    
    # Split into chunks for processing if necessary
    # (Important for longer videos)
    MAX_CHUNK_SIZE = 30000
    chunks = []
    
    if len(cleaned) <= MAX_CHUNK_SIZE:
        chunks = [cleaned]
    else:
        # Split by sentences to avoid cutting in the middle of a sentence
        sentences = re.split(r'(?<=[.!?])\s+', cleaned)
        current_chunk = ""
        
        for sentence in sentences:
            if len(current_chunk) + len(sentence) <= MAX_CHUNK_SIZE:
                current_chunk += sentence + " "
            else:
                chunks.append(current_chunk.strip())
                current_chunk = sentence + " "
                
        if current_chunk:
            chunks.append(current_chunk.strip())
    
    return chunks

### 2. Few-shot Prompting: Creating Examples for Better Summaries

In [None]:
def create_few_shot_prompt(transcript_chunk):
    """Create a prompt with examples for better summarization."""
    
    examples = """
Example 1:
Transcript: In this tutorial, I'll show you how to build a machine learning model using TensorFlow. First, we'll install the necessary libraries. Then, we'll load and preprocess our dataset. After that, we'll create a neural network with several layers. Finally, we'll train the model and evaluate its performance.
Summary: 
{
  "title": "TensorFlow ML Model Tutorial",
  "key_points": [
    "Installation of required libraries",
    "Dataset loading and preprocessing",
    "Neural network creation with multiple layers",
    "Model training and performance evaluation"
  ],
  "main_topics": ["TensorFlow", "Machine Learning", "Neural Networks"],
  "summary": "A step-by-step tutorial on building a machine learning model with TensorFlow, covering installation, data preparation, model architecture, and training."
}

Example 2:
Transcript: Today we're discussing climate change. The Earth's average temperature has increased by about 1 degree Celsius since pre-industrial times. This warming is primarily caused by human activities, especially the burning of fossil fuels which releases greenhouse gases. These gases trap heat in the atmosphere, leading to global warming. The consequences include rising sea levels, more extreme weather events, and threats to biodiversity.
Summary:
{
  "title": "Climate Change Explained",
  "key_points": [
    "Earth's temperature has risen 1°C since pre-industrial era",
    "Human activities, especially fossil fuel burning, are the primary cause",
    "Greenhouse gases trap heat in the atmosphere",
    "Effects include rising seas, extreme weather, and biodiversity loss"
  ],
  "main_topics": ["Climate Change", "Global Warming", "Environmental Science"],
  "summary": "An explanation of climate change, including its causes related to human activities and greenhouse gas emissions, plus major consequences like rising sea levels and extreme weather events."
}
"""
    
    prompt = f"""
{examples}

Now, analyze the following transcript and provide a similar structured summary:

Transcript: {transcript_chunk}

Summary:
"""
    return prompt

### 3. Structured Output: JSON Format Summaries

In [None]:
def generate_structured_summary(transcript_chunk, model):
    """Generate a structured summary in JSON format."""
    prompt = create_few_shot_prompt(transcript_chunk)
    
    try:
        response = model.generate_content(prompt)
        
        # Extract JSON from response
        summary_text = response.text
        # Find JSON object in the response
        json_match = re.search(r'\{.*\}', summary_text, re.DOTALL)
        
        if json_match:
            try:
                summary_json = json.loads(json_match.group(0))
                return summary_json
            except json.JSONDecodeError:
                print("Failed to parse JSON from response")
                return {"error": "Failed to parse JSON", "raw_response": summary_text}
        else:
            print("No JSON found in response")
            return {"error": "No JSON found", "raw_response": summary_text}
    
    except Exception as e:
        print(f"Error generating summary: {e}")
        return {"error": str(e)}

### 4. Long Context Window: Handling Complete Video Transcripts

In [None]:
def summarize_full_video(transcript, model):
    """Process and summarize complete video transcript, handling long context."""
    chunks = preprocess_transcript(transcript)
    
    # For single chunk videos
    if len(chunks) == 1:
        return generate_structured_summary(chunks[0], model)
    
    # For multi-chunk videos
    chunk_summaries = []
    for i, chunk in enumerate(chunks):
        print(f"Processing chunk {i+1}/{len(chunks)}...")
        summary = generate_structured_summary(chunk, model)
        chunk_summaries.append(summary)
    
    # Combine chunk summaries into a comprehensive summary
    combined_prompt = f"""
Combine the following summaries from different parts of a video into one coherent summary:

{json.dumps(chunk_summaries, indent=2)}

Provide a final comprehensive summary in the same JSON format with title, key_points, main_topics, and summary.
"""
    
    try:
        response = model.generate_content(combined_prompt)
        summary_text = response.text
        json_match = re.search(r'\{.*\}', summary_text, re.DOTALL)
        
        if json_match:
            try:
                final_summary = json.loads(json_match.group(0))
                return final_summary
            except json.JSONDecodeError:
                print("Failed to parse JSON from combined response")
                return {"error": "Failed to parse combined JSON", "raw_response": summary_text}
        else:
            print("No JSON found in combined response")
            return {"error": "No JSON found in combined response", "raw_response": summary_text}
            
    except Exception as e:
        print(f"Error generating combined summary: {e}")
        return {"error": str(e)}

### Display Function for the Final Summary

In [None]:
def display_summary(summary, video_info):
    """Display the structured summary in a readable format."""
    if "error" in summary:
        print(f"Error in summary: {summary['error']}")
        if "raw_response" in summary:
            print(f"Raw response: {summary['raw_response']}")
        return
        
    html_content = f"""
    <div style="padding: 20px; border: 1px solid #ddd; border-radius: 5px;">
        <h2>{summary.get('title', video_info['title'])}</h2>
        
        <h3>Key Points:</h3>
        <ul>
            {"".join([f"<li>{point}</li>" for point in summary.get('key_points', [])])}
        </ul>
        
        <h3>Main Topics:</h3>
        <div style="margin-bottom: 15px;">
            {"".join([f'<span style="background-color: #007bff; color: white; padding: 5px 10px; margin-right: 10px; border-radius: 15px;">{topic}</span>' for topic in summary.get('main_topics', [])])}
        </div>
        
        <h3>Summary:</h3>
        <p>{summary.get('summary', 'No summary available.')}</p>
        
        <div style="margin-top: 20px; font-size: 0.8em; color: #666;">
            <p>Original video: <a href="https://www.youtube.com/watch?v={video_info['video_id']}" target="_blank">{video_info['title']}</a> by {video_info['channel']}</p>
        </div>
    </div>
    """
    display(HTML(html_content))

## Putting It All Together

### Main Function to Process a YouTube Video

In [None]:
def process_youtube_video(video_url):
    """Process a YouTube video: extract info, transcript, and generate summary."""
    print(f"Processing video: {video_url}")
    
    # Get video information
    video_info = get_video_info(video_url)
    if not video_info:
        print("Failed to get video information. Please check the URL.")
        return None, None, None
    
    # Display video information
    display_video_info(video_info)
    
    # Get transcript
    transcript = get_transcript(video_info['video_id'])
    if not transcript:
        print("Failed to get transcript. The video might not have subtitles or they are disabled.")
        return None, video_info, None
    
    print(f"Transcript length: {len(transcript)} characters")
    
    # Initialize Gemini model for text generation
    # Use the most capable model available for handling long context
    model = genai.GenerativeModel('gemini-1.5-pro')
    
    # Generate summary
    print("Generating summary...")
    summary = summarize_full_video(transcript, model)
    
    # Display summary
    display_summary(summary, video_info)
    
    return summary, video_info, transcript

## Example Usage

### Try It with a YouTube Video

In [None]:
# Example usage with a YouTube video URL
video_url = "https://www.youtube.com/watch?v=8jPQjjsBbIc"  # Replace with your chosen video URL
summary_result, video_info, transcript = process_youtube_video(video_url)

### Save Results for Future Reference

In [None]:
# Save results to files
def save_results(summary, video_info, transcript, output_dir="outputs"):
    """Save the results to files."""
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    
    # Clean up video title for filename
    safe_title = re.sub(r'[^\w\s-]', '', video_info['title']).strip().replace(' ', '_')
    
    # Save summary
    with open(f"{output_dir}/{safe_title}_summary.json", "w") as f:
        json.dump(summary, f, indent=2)
    
    # Save video info
    with open(f"{output_dir}/{safe_title}_info.json", "w") as f:
        # Convert datetime to string for JSON serialization
        info_copy = video_info.copy()
        if 'publish_date' in info_copy and info_copy['publish_date']:
            info_copy['publish_date'] = str(info_copy['publish_date'])
        json.dump(info_copy, f, indent=2)
    
    # Save transcript
    with open(f"{output_dir}/{safe_title}_transcript.txt", "w") as f:
        f.write(transcript)
    
    print(f"Results saved to {output_dir}/ directory")

# Uncomment to save results
# save_results(summary_result, video_info, transcript)

## Batch Processing Multiple Videos

In [None]:
def process_multiple_videos(video_urls):
    """Process multiple YouTube videos and compile their summaries."""
    results = []
    
    for i, url in enumerate(video_urls):
        print(f"\n\n--- Processing video {i+1}/{len(video_urls)} ---")
        try:
            summary, video_info, _ = process_youtube_video(url)
            results.append({
                "url": url,
                "title": video_info['title'],
                "channel": video_info['channel'],
                "summary": summary
            })
        except Exception as e:
            print(f"Error processing {url}: {e}")
            results.append({
                "url": url,
                "error": str(e)
            })
    
    return results

# Example list of videos
example_videos = [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",  # Replace with actual videos
    "https://www.youtube.com/watch?v=9bZkp7q19f0"
]

# Uncomment to process multiple videos
# batch_results = process_multiple_videos(example_videos)

## Evaluation and Analysis

### Evaluate Summary Quality

In [None]:
def evaluate_summary(summary, transcript):
    """Evaluate the quality of the generated summary."""
    # Initialize Gemini model
    model = genai.GenerativeModel('gemini-1.5-pro')
    
    evaluation_prompt = f"""
You are an expert in evaluating text summaries. Please evaluate the following summary of a video transcript.

Original Transcript (excerpt): 
{transcript[:2000]}... [truncated]

Generated Summary:
{json.dumps(summary, indent=2)}

Please evaluate this summary on the following criteria:
1. Accuracy - Does the summary correctly represent the main points in the transcript?
2. Completeness - Does the summary include all key information?
3. Conciseness - Is the summary appropriately brief while conveying necessary information?
4. Clarity - Is the summary well-structured and easy to understand?

For each criterion, give a score from 1-10 and provide a brief explanation.
Then give an overall score out of 10, and suggest one or two specific improvements.

Format your response as JSON like this:
{{
  "accuracy": {{
    "score": 0,
    "explanation": ""
  }},
  "completeness": {{
    "score": 0,
    "explanation": ""
  }},
  "conciseness": {{
    "score": 0,
    "explanation": ""
  }},
  "clarity": {{
    "score": 0,
    "explanation": ""
  }},
  "overall": {{
    "score": 0,
    "explanation": ""
  }},
  "improvements": []
}}
"""
    
    try:
        response = model.generate_content(evaluation_prompt)
        evaluation_text = response.text
        
        # Find JSON object in the response
        json_match = re.search(r'\{.*\}', evaluation_text, re.DOTALL)
        
        if json_match:
            try:
                evaluation = json.loads(json_match.group(0))
                return evaluation
            except json.JSONDecodeError:
                print("Failed to parse JSON from evaluation response")
                return {"error": "Failed to parse JSON", "raw_response": evaluation_text}
        else:
            print("No JSON found in evaluation response")
            return {"error": "No JSON found", "raw_response": evaluation_text}
            
    except Exception as e:
        print(f"Error evaluating summary: {e}")
        return {"error": str(e)}

### Display Evaluation Results

In [None]:
def display_evaluation(evaluation):
    """Display the evaluation results in a readable format."""
    if "error" in evaluation:
        print(f"Error in evaluation: {evaluation['error']}")
        if "raw_response" in evaluation:
            print(f"Raw response: {evaluation['raw_response']}")
        return
    
    scores = [
        evaluation.get('accuracy', {}).get('score', 0),
        evaluation.get('completeness', {}).get('score', 0),
        evaluation.get('conciseness', {}).get('score', 0),
        evaluation.get('clarity', {}).get('score', 0)
    ]
    
    categories = ['Accuracy', 'Completeness', 'Conciseness', 'Clarity']
    
    # Create radar chart
    plt.figure(figsize=(8, 8))
    
    # Create radar plot
    angles = np.linspace(0, 2*np.pi, len(categories), endpoint=False).tolist()
    angles += angles[:1]  # Close the loop
    
    scores = scores + scores[:1]  # Close the loop
    
    ax = plt.subplot(111, polar=True)
    ax.set_theta_offset(np.pi / 2)
    ax.set_theta_direction(-1)
    ax.set_rlabel_position(0)
    
    plt.xticks(angles[:-1], categories)
    ax.set_rlim(0, 10)
    plt.yticks([2, 4, 6, 8, 10], ['2', '4', '6', '8', '10'], color='grey', size=7)
    
    ax.plot(angles, scores, linewidth=1, linestyle='solid')
    ax.fill(angles, scores, alpha=0.1)
    
    plt.title('Summary Evaluation Scores', size=14)
    plt.tight_layout()
    
    plt.show()
    
    # Display detailed results
    overall = evaluation.get('overall', {})
    print(f"Overall Score: {overall.get('score', 'N/A')}/10")
    print(f"Overall Assessment: {overall.get('explanation', 'N/A')}")
    
    print("\nDetailed Evaluation:")
    for category in categories:
        cat_lower = category.lower()
        if cat_lower in evaluation:
            print(f"{category}: {evaluation[cat_lower].get('score', 'N/A')}/10")
            print(f"  {evaluation[cat_lower].get('explanation', 'N/A')}")
    
    print("\nSuggested Improvements:")
    for improvement in evaluation.get('improvements', []):
        print(f"- {improvement}")

### Run an Evaluation

In [None]:
# Uncomment to run evaluation
# evaluation_results = evaluate_summary(summary_result, transcript)
# display_evaluation(evaluation_results)

## Conclusion

This notebook demonstrates a practical application of Gen AI capabilities for summarizing YouTube videos. We've implemented:

1. **Document Understanding** - Processing and comprehending video transcripts
2. **Structured Output** - Generating summaries in consistent, structured JSON format
3. **Few-shot Prompting** - Improving summary quality with examples
4. **Long Context Window** - Handling lengthy video transcripts

The system provides valuable time-saving benefits by extracting the key information from videos, allowing users to quickly understand content without watching the entire video.

### Limitations and Future Work

- The quality of summaries depends on the quality of video transcripts
- Videos without transcripts cannot be processed
- The system could be expanded to include:
  - Multi-language support
  - Visual information extraction from video frames
  - Topic clustering across multiple related videos
  - User feedback to improve summary quality over time