<a href="https://colab.research.google.com/github/Vishak05/SDC-GenAI/blob/main/YouTube_Video_Summarizer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [6]:
# prompt: youtube video summarizer

!pip install youtube-transcript-api transformers

from youtube_transcript_api import YouTubeTranscriptApi
from transformers import pipeline

def summarize_youtube_video(video_id, model_name="facebook/bart-large-cnn"):
    """
    Summarizes a YouTube video using a pre-trained transformer model.

    Args:
        video_id: The ID of the YouTube video.
        model_name: The name of the pre-trained summarization model.

    Returns:
        A summary of the YouTube video.
    """

    try:
        transcript = YouTubeTranscriptApi.get_transcript(video_id)
    except Exception as e:
        print(f"Error retrieving transcript: {e}")
        return None

    text = " ".join([entry['text'] for entry in transcript])

    summarizer = pipeline("summarization", model=model_name)
    summary = summarizer(text, max_length=130, min_length=30, do_sample=False)

    return summary[0]['summary_text']


# Example usage
video_id = "dQw4w9WgXcQ"  # Replace with the actual video ID
summary = summarize_youtube_video(video_id)

if summary:
  print(summary)




The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu


"We've known each other for so long your heart's been aching but you're too sh to say inside" "We're no strangers to love you know the rules and so do I I full commitments while I'm thinking of you wouldn't get this from any other guy" "I just want to tell you how I'm feeling got to make you understand Never Going To Give You Up"


In [1]:
# prompt: youtube video summarizer using langchain and rag using gemini

!pip install langchain google-generativeai

from youtube_transcript_api import YouTubeTranscriptApi
from langchain.llms import GooglePalm
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document

def summarize_youtube_video_gemini(video_id):
    """
    Summarizes a YouTube video using Google's Gemini.

    Args:
        video_id: The ID of the YouTube video.

    Returns:
        A summary of the YouTube video.
    """
    try:
        transcript = YouTubeTranscriptApi.get_transcript(video_id)
    except Exception as e:
        print(f"Error retrieving transcript: {e}")
        return None

    text = " ".join([entry['text'] for entry in transcript])
    docs = [Document(page_content=text)]

    llm = GooglePalm(google_api_key="AIzaSyAb5qWWY1tOh8lhAoVZMPy49KejvL2ejs4") # Replace with your actual API key
    chain = load_summarize_chain(llm, chain_type="map_reduce")
    summary = chain.run(docs)

    return summary

# Example usage
video_id = "dQw4w9WgXcQ"  # Replace with the actual video ID
summary = summarize_youtube_video_gemini(video_id)

if summary:
  print(summary)




NotImplementedError: Need to determine which default deprecation schedule to use. within ?? minor releases

In [6]:
# ✅ Step 1: Install Dependencies
!pip install langchain langchain-community youtube-transcript-api google-generativeai faiss-cpu langchain-google-genai

# ✅ Step 2: Import Libraries
import os
import re
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled, NoTranscriptFound
from langchain.chains import RetrievalQA
from langchain.vectorstores import FAISS
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
import google.generativeai as genai

# ✅ Step 3: Set Your Gemini API Key
GOOGLE_API_KEY = "AIzaSyD2FpyhNgfXSl5kRgzx-AU2L8xd3At6TrE"  # 🔑 Replace with your actual Gemini 1.5 API key
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
genai.configure(api_key=GOOGLE_API_KEY)

# ✅ Step 4: Extract YouTube Video ID (supports youtube.com and youtu.be)
def extract_video_id(url):
    patterns = [
        r"(?:https?://)?(?:www\.)?youtube\.com/watch\?v=([^&]+)",
        r"(?:https?://)?(?:www\.)?youtu\.be/([^?&]+)"
    ]
    for pattern in patterns:
        match = re.search(pattern, url)
        if match:
            return match.group(1)
    return None

# ✅ Step 5: Get Transcript from YouTube
def get_transcript(youtube_url):
    try:
        video_id = extract_video_id(youtube_url)
        if not video_id:
            return None
        transcript = YouTubeTranscriptApi.get_transcript(video_id)
        full_text = " ".join([entry["text"] for entry in transcript])
        return full_text
    except (TranscriptsDisabled, NoTranscriptFound):
        return None
    except Exception as e:
        return f"❌ Error while fetching transcript: {str(e)}"

# ✅ Step 6: Split transcript into chunks
def split_text_into_docs(text):
    splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    docs = splitter.create_documents([text])
    return docs

# ✅ Step 7: Build RAG pipeline
def build_rag_chain(docs):
    embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
    vectorstore = FAISS.from_documents(docs, embeddings)
    retriever = vectorstore.as_retriever()

    llm = ChatGoogleGenerativeAI(model="models/gemini-1.5-pro-latest", temperature=0.2)  # Fixed model name

    rag_chain = RetrievalQA.from_chain_type(
        llm=llm,
        retriever=retriever,
        chain_type="stuff",
        return_source_documents=False
    )
    return rag_chain

# ✅ Step 8: Main Summary Function
def summarize_youtube_video(youtube_url):
    transcript = get_transcript(youtube_url)
    if not transcript:
        return "❌ Transcript not available or invalid video URL."
    if transcript.startswith("❌"):
        return transcript

    docs = split_text_into_docs(transcript)
    rag_chain = build_rag_chain(docs)

    query = "Give a clear and concise summary of this video."
    summary = rag_chain.run(query)
    return summary

# ✅ Step 9: Get user input and run the summarizer
video_url = input("Enter the YouTube video URL: ")  # User input for video URL
summary = summarize_youtube_video(video_url)

# Print the summary
print("\n🎯 Video Summary:\n")
print(summary)

Enter the YouTube video URL: https://www.youtube.com/watch?v=as7xe8UQEr4

🎯 Video Summary:

Summarizing non-fiction requires using your own words to retell the order of events and the important details of a text.  It’s like making a simple map, only including the essential information needed to understand the overall meaning.  Unnecessary details should be left out.
