<a href="https://colab.research.google.com/github/Nandhaceg7/trail/blob/main/Youtube_Summerizer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [4]:
# ✅ Install Required Packages
# !pip install youtube-transcript-api langchain langchain-community faiss-cpu sentence-transformers transformers gradio
# ✅ Imports
from youtube_transcript_api import YouTubeTranscriptApi
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import SentenceTransformerEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from transformers import pipeline
import gradio as gr
import re

# ✅ Get YouTube Transcript
def extract_video_id(url):
    match = re.search(r"(?:v=|\/)([0-9A-Za-z_-]{11})", url)
    return match.group(1) if match else None

def get_transcript(url):
    video_id = extract_video_id(url)
    if not video_id:
        raise ValueError("Invalid YouTube URL")
    transcript = YouTubeTranscriptApi.get_transcript(video_id)
    return " ".join([entry["text"] for entry in transcript])

# ✅ Summarizer pipeline (HuggingFace, free to use)
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# ✅ Main Function
def summarize_youtube_video(url):
    try:
        print("📥 Fetching transcript...")
        transcript = get_transcript(url)

        print("🔗 Chunking and embedding...")
        splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
        docs = splitter.create_documents([transcript])

        embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
        vectordb = FAISS.from_documents(docs, embeddings)

        print("🔎 Retrieving relevant chunks...")
        retriever = vectordb.as_retriever()
        relevant_docs = retriever.get_relevant_documents("Summarize this video in detail")

        context = " ".join([doc.page_content for doc in relevant_docs])

        print("🧠 Summarizing...")
        summary = summarizer(context, max_length=300, min_length=80, do_sample=False)[0]["summary_text"]
        return summary
    except Exception as e:
        return f"❌ Error: {e}"

# ✅ Option 1: Terminal/Colab Input
video_url = input("Enter a YouTube video URL: ")
summary = summarize_youtube_video(video_url)
print("\n🎬 Summary:\n")
print(summary)

# ✅ Option 2: Gradio Interface
def gradio_interface(url):
    return summarize_youtube_video(url)

gr.Interface(
    fn=gradio_interface,
    inputs=gr.Textbox(label="YouTube Video URL"),
    outputs=gr.Textbox(label="Video Summary"),
    title="🎬 YouTube Video Summarizer (RAG + LangChain)",
    description="Summarize YouTube videos using HuggingFace + LangChain (No API key needed)",
).launch()




Device set to use cpu


Enter a YouTube video URL: https://youtu.be/FWyiYzgPNi4?si=b2ZfgFbvEdek-ZTV
📥 Fetching transcript...
🔗 Chunking and embedding...
🔎 Retrieving relevant chunks...
🧠 Summarizing...

🎬 Summary:

The first and most crucial habit to become a successful programmer is simple: start developing. Start small - perhaps a simple calculator, to-do app, or even a simple HTML page displaying your name. Don't be afraid to ask for help. When things get tough, it’s easy to get distracted by other interesting things. Every struggle with a concept, you're actually growing your coding muscles.
It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://4e9c51c072b2d81746.gradio.live

This share link expires in 1 week. Fo

