# üé¨ YouTube Knowledge Base
### Turn hours of video into instant answers

---

**The Problem:** You follow multiple YouTube channels on AI, programming, or any topic. Each posts 2-3 videos per week. That's 50+ hours of content per month. You can't watch it all, but you need specific answers buried in those videos.

**The Solution:** This tool extracts transcripts from YouTube videos, chunks them semantically, and lets you search with natural language. Ask a question ‚Üí get an answer with a timestamped link to the exact moment in the video.

---

### What you'll see in this demo:
1. **Ask questions** ‚Üí Get synthesized answers from your video library
2. **Jump to source** ‚Üí Timestamped links to exact video moments
3. **Organize content** ‚Üí Tags, collections, and personal notes

Let's start! üëá

## 1. Setup

Just two cells to get started.

In [None]:
# Environment & imports (run once)
import os
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

from youtube_knowledgebase_mcp import (
    process_youtube_video, search_knowledge_base, get_status,
    get_source, list_sources, add_tags, list_tags,
    add_to_collection, list_collections, set_summary, get_summary,
)

client = OpenAI()
print("‚úÖ Ready!")

In [None]:
# Check what's in the knowledge base
status = get_status()
print(f"üìö Knowledge Base: {status.total_sources} videos, {status.total_chunks} searchable chunks")

if status.total_sources == 0:
    print("\n‚ö†Ô∏è  Empty! Let's add a video in the next section.")
else:
    print("\nüì∫ Videos loaded:")
    for src in list_sources(limit=5):
        print(f"   ‚Ä¢ {src.title[:60]}{'...' if len(src.title) > 60 else ''}")

## 2. Add a Video (Optional)

Skip this if you already have videos loaded. Processing takes ~30 seconds.

In [None]:
# Add a video to the knowledge base
# This video covers 6 context engineering techniques for LLM agents
VIDEO_URL = "https://www.youtube.com/watch?v=nyKvyRrpbyY"

print(f"Processing: {VIDEO_URL}")
print("(This may take 30-60 seconds...)\n")

result = await process_youtube_video(VIDEO_URL)

if result.success:
    print(f"‚úÖ Added: {result.title}")
    print(f"   Created {result.chunk_count} searchable chunks")
else:
    print(f"‚ùå Error: {result.error}")

---

## 3. üîç Ask Questions (The Main Event)

This is where the magic happens. Ask natural language questions and get:
- **Synthesized answers** from video transcripts
- **Timestamped links** to jump directly to the relevant part

**Try it:** Change the question below and re-run the cell!

In [None]:
async def ask(question: str, num_sources: int = 5) -> None:
    """
    Ask a question and get an answer from your video knowledge base.
    
    This is the complete RAG pipeline:
    1. Search for relevant video chunks
    2. Use LLM to synthesize an answer
    3. Show timestamped source links
    """
    # Retrieve relevant chunks
    results = await search_knowledge_base(question, limit=num_sources)
    
    if results.total_results == 0:
        print("No relevant content found. Try adding more videos!")
        return
    
    # Build context from chunks
    context = "\n\n".join(
        f"[{i+1}] {r.chunk.content}" 
        for i, r in enumerate(results.results)
    )
    
    # Generate answer with LLM
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Answer based ONLY on the provided video transcript excerpts. Be concise but comprehensive. If the context doesn't have enough info, say so."},
            {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
        ],
        temperature=0.3,
        max_tokens=400
    )
    
    # Display answer
    print(f"üí° {response.choices[0].message.content}")
    
    # Show sources with timestamps
    print("\nüìç Sources:")
    seen = set()
    for r in results.results:
        if r.timestamp_link and r.timestamp_link not in seen:
            seen.add(r.timestamp_link)
            print(f"   {r.timestamp_link}")

print("‚úÖ ask() function ready")

In [None]:
# üéØ TRY IT: Change this question and re-run!

await ask("What is context quarantine and why is it useful for AI agents?")

In [None]:
# More example questions to try:

await ask("How can I implement RAG for my agent?")

In [None]:
await ask("What is context offloading and how does it help with memory?")

---

## 4. üìä Explore Your Library

See what's in your knowledge base and search by topic.

In [None]:
# List all videos with their metadata
print("üì∫ Your Video Library:\n")

for source in list_sources():
    tags = f" [{', '.join(source.tags)}]" if source.tags else ""
    print(f"‚Ä¢ {source.title}")
    print(f"  Channel: {source.channel} | Chunks: {source.chunk_count}{tags}")
    print(f"  {source.url}\n")

In [None]:
# Quick search (retrieval only, no LLM)
query = "system prompt"

results = await search_knowledge_base(query, limit=3)
print(f"üîç Search: '{query}'\n")

for i, r in enumerate(results.results, 1):
    print(f"[{i}] {r.timestamp_link}")
    print(f"    {r.chunk.content[:150]}...\n")

---

## 5. üè∑Ô∏è Organize with Tags & Collections

As your library grows, organize videos for easy filtering.

In [None]:
# Get a video to organize
sources = list_sources(limit=1)
if sources:
    video = sources[0]
    video_id = video.source_id
    
    # Add tags
    add_tags(video_id, ["agents", "context-engineering", "langchain"])
    
    # Add to collections
    add_to_collection(video_id, "AI Engineering")
    add_to_collection(video_id, "Must Review")
    
    print(f"‚úÖ Organized: {video.title[:50]}...")
    print(f"   Tags: {list_tags()}")
    print(f"   Collections: {list_collections()}")
else:
    print("No videos to organize. Add one first!")

In [None]:
# Filter by tag or collection
print("Videos tagged 'agents':")
for v in list_sources(tags=["agents"]):
    print(f"  ‚Ä¢ {v.title}")

print("\nVideos in 'Must Review' collection:")
for v in list_sources(collections=["Must Review"]):
    print(f"  ‚Ä¢ {v.title}")

---

## 6. üìù Add Your Notes

Add personal summaries or key takeaways to any video.

In [None]:
# Add a summary to the video
sources = list_sources(limit=1)
if sources:
    video_id = sources[0].source_id
    
    my_notes = """
Key Takeaways:
‚Ä¢ Context engineering = filling the context window with the RIGHT info at each step
‚Ä¢ 6 techniques: system prompt, few-shot, RAG, tool feedback, offloading, quarantine
‚Ä¢ Context quarantine: use sub-agents to isolate different topics
‚Ä¢ Don't underestimate the power of a good system prompt!
"""
    
    set_summary(video_id, my_notes.strip())
    print("‚úÖ Summary saved!\n")
    print(get_summary(video_id))

---

## üöÄ Next Steps

### Use with Claude Desktop
This is an MCP (Model Context Protocol) server. Connect it to Claude Desktop for a conversational AI interface to your video library.

```json
// claude_desktop_config.json
{
  "mcpServers": {
    "youtube-kb": {
      "command": "uv",
      "args": ["--directory", "/path/to/youtube-knowledge-base-mcp", "run", "youtube-kb"]
    }
  }
}
```

### Build Your Library
```python
# Add more videos
videos = [
    "https://www.youtube.com/watch?v=...",
    "https://www.youtube.com/watch?v=...",
]
for url in videos:
    await process_youtube_video(url)
```

### Ideas
- Process all videos from your favorite AI channel
- Create collections by skill level (Beginner, Advanced)
- Tag videos by framework (LangChain, LlamaIndex, etc.)
- Use for research: find relevant talks before writing a blog post