# YouTube Transcript Knowledge Base - Demo

This notebook demonstrates the key functionality of the YouTube Transcript Knowledge Base project. It allows you to process YouTube videos, build a searchable knowledge base from their transcripts, organize videos into lists, and query the knowledge base for specific information.

## Setup

First, ensure you have the required dependencies installed and your OpenAI API key configured.

In [None]:
import os
import json
import dotenv

# Load environment variables from .env file (containing your OPENAI_API_KEY)
dotenv.load_dotenv()

# Verify OpenAI API key is available
if 'OPENAI_API_KEY' not in os.environ:
    print("⚠️ OPENAI_API_KEY not found in environment variables.")
    print("Please create a .env file with your OpenAI API key or set it manually below:")
    # Uncomment and replace with your key if needed
    # os.environ['OPENAI_API_KEY'] = 'your-api-key-here'
else:
    print("✅ OpenAI API key found in environment variables")

## Initialize Data Paths

Set up paths for data storage and initialize the MCP tools.

In [None]:
# Define paths similar to main.py
BASE_PATH = os.path.dirname(os.path.abspath("."))
DATA_FOLDER = os.path.join(BASE_PATH, "data")
DATA_PATH = os.path.join(DATA_FOLDER, "processed_data")
FAISS_INDEX_PATH = os.path.join(DATA_FOLDER, "youtube_faiss_index")
VIDEO_LISTS_PATH = os.path.join(DATA_FOLDER, "video_lists.json")
VIDEO_SUMMARIES_PATH = os.path.join(DATA_FOLDER, "video_summaries.json")
ALL_VIDEOS_METADATA_PATH = os.path.join(DATA_FOLDER, "all_videos_metadata.json")

# Create directories if they don't exist
os.makedirs(DATA_PATH, exist_ok=True)
os.makedirs(os.path.dirname(FAISS_INDEX_PATH), exist_ok=True)

# Create a dictionary of paths to pass to init_mcp_tools
data_paths = {
    'DATA_PATH': DATA_PATH,
    'FAISS_INDEX_PATH': FAISS_INDEX_PATH,
    'VIDEO_LISTS_PATH': VIDEO_LISTS_PATH,
    'VIDEO_SUMMARIES_PATH': VIDEO_SUMMARIES_PATH,
    'ALL_VIDEOS_METADATA_PATH': ALL_VIDEOS_METADATA_PATH
}

print(f"Data will be stored in: {DATA_FOLDER}")

In [None]:
print(DATA_FOLDER)

## Import the MCP Tools

Here we import the main functionality from the YouTube Transcript Knowledge Base project.

In [None]:
try:
    # Try importing as a package
    from youtube_knowledgebase_mcp.mcp_tools import init_mcp_tools
    from youtube_knowledgebase_mcp.data_management import initialize_data_files
except ImportError:
    # If package import fails, try importing from local files
    print("Importing from local files instead of packages")
    
    # This requires the project files to be in the same directory as this notebook
    from importlib.machinery import SourceFileLoader
    
    # Load necessary modules from files
    data_management = SourceFileLoader("data_management", "./data_management.py").load_module()
    mcp_tools = SourceFileLoader("mcp_tools", "./mcp_tools.py").load_module()
    
    # Get required functions
    init_mcp_tools = mcp_tools.init_mcp_tools
    initialize_data_files = data_management.initialize_data_files

# Initialize data files
initialize_data_files(data_paths)

# Initialize MCP tools
init_mcp_tools(data_paths)

print("✅ MCP tools initialized successfully")

## Import Tools for Direct Use

Now we'll import the specific tools we need for our demo.

In [None]:
# Import all the tools we'll use in this demo
try:
    from youtube_knowledgebase_mcp.mcp_tools import (
        process_youtube_video,
        youtube_transcript_query_tool,
        check_knowledge_base_status,
        create_video_list,
        add_video_to_list,
        get_video_lists,
        add_video_summary,
        get_video_summary,
        get_all_videos_info,
        get_video_info,
        filter_videos
    )
except ImportError:
    # If package import fails, get functions from the module loaded above
    process_youtube_video = mcp_tools.process_youtube_video
    youtube_transcript_query_tool = mcp_tools.youtube_transcript_query_tool
    check_knowledge_base_status = mcp_tools.check_knowledge_base_status
    create_video_list = mcp_tools.create_video_list
    add_video_to_list = mcp_tools.add_video_to_list
    get_video_lists = mcp_tools.get_video_lists
    add_video_summary = mcp_tools.add_video_summary
    get_video_summary = mcp_tools.get_video_summary
    get_all_videos_info = mcp_tools.get_all_videos_info
    get_video_info = mcp_tools.get_video_info
    filter_videos = mcp_tools.filter_videos

print("✅ All tools imported successfully")

## 1. Check Knowledge Base Status

First, let's check the current status of our knowledge base.

In [None]:
status = check_knowledge_base_status()
print(status)

## 2. Process a YouTube Video

Now, let's process a YouTube video and add it to our knowledge base. Replace the URL with any YouTube video you'd like to process.

In [None]:
# Choose an educational YouTube video to process (replace with any video URL)
video_url = "https://www.youtube.com/watch?v=CDjjaTALI68"  # Example: Understanding MCP From Scratch

print(f"Processing video: {video_url}\n")
result = process_youtube_video(video_url)
print(result)

## 3. Query the Knowledge Base

Now that we have a video in our knowledge base, let's query it to find specific information.

In [None]:
from langchain_openai import ChatOpenAI

# If you processed "Study Less Study Smart", a good query might be:
query = "Can you tell what is MCP based on the video?"

# You can modify this query for your specific video
print(f"Querying: '{query}'\n")
results = youtube_transcript_query_tool(query)
print(results) # raw results for FAISS retrieval
# Process the query results using langchain_openai to get a more structured answer

# Initialize the LLM
llm = ChatOpenAI(model="gpt-3.5-turbo")

# Format a prompt with the results to get a concise answer
prompt = f"""
Based on the transcript segments from the video, please provide a clear explanation of what MCP is.
Here are the relevant transcript segments:
{results}

Please summarize what MCP is according to this video in a concise paragraph.
"""

# Get a structured answer
structured_answer = llm.invoke(prompt)
print("\n=== Structured Answer ===")
print(structured_answer.content)

## 4. Get Video Information

Let's examine the metadata for the video we just processed.

In [None]:
# We need to extract the video ID from the result
# This is a simple way to do it from the previous processing result
import re

# Extract video ID from the result or use a known ID
video_id_match = re.search(r'ID: ([\w-]+)', result)
if video_id_match:
    video_id = video_id_match.group(1)
    print(f"Found video ID: {video_id}\n")
else:
    # Fallback in case regex didn't work
    video_id = "zduSFxRajkE"  # Replace with the actual video ID if known
    print(f"Using default video ID: {video_id}\n")

# Get detailed information about the video
video_info = get_video_info(video_id)
print(video_info)

## 5. Create a Video List and Add the Video

Let's organize our videos by creating a themed list.

In [None]:
# Create a new list for educational videos
list_name = "educational-videos"
list_description = "Videos about learning, education, and study techniques"

create_result = create_video_list(list_name, list_description)
print(create_result)

# Add our video to the list
add_result = add_video_to_list(video_id, list_name)
print(add_result)

# View all lists
lists = get_video_lists()
print("\nCurrent video lists:")
print(lists)

## 6. Add a Custom Summary

Let's add our own summary to the video to enhance searchability.

In [None]:
# Create a summary for the video
summary = """
This video is a comprehensive introduction to MCP (Model Context Protocol) presented by Lan from LangChain. 
The 12-minute tutorial takes a hands-on approach to explaining what MCP is and how to implement it from scratch.
"""

# Add the summary to the video
summary_result = add_video_summary(video_id, summary)
print(summary_result)

# Retrieve the summary to verify
get_summary_result = get_video_summary(video_id)
print("\nRetrieved summary:")
print(get_summary_result)

## 7. Process Another Video (Optional)

To build a more useful knowledge base, let's add another video.

In [None]:
# Uncomment and run this cell to process another video

# video_url2 = "https://www.youtube.com/watch?v=D7_ipDqhtwk"  # Example: How We Build Effective Agents: Barry Zhang, Anthropic
# print(f"Processing second video: {video_url2}\n")
# result2 = process_youtube_video(video_url2)
# print(result2)

# # Extract video ID for the second video
# video_id2_match = re.search(r'ID: ([\w-]+)', result2)
# if video_id2_match:
#     video_id2 = video_id2_match.group(1)
#     print(f"\nAdding video ID: {video_id2} to educational-videos list")
#     add_video_to_list(video_id2, list_name)


## 8. Get All Videos Information

Finally, let's get comprehensive information about all videos in our knowledge base.

In [None]:
all_videos = get_all_videos_info()
print("All videos in knowledge base:")
print(all_videos)

## Conclusion

This notebook has demonstrated the main functionality of the YouTube Transcript Knowledge Base:

1. Processing YouTube videos to extract transcripts
2. Querying the knowledge base for specific information
3. Getting detailed information about videos
4. Organizing videos into lists
5. Adding custom summaries


You can continue building your knowledge base by:
- Processing more videos on topics you're interested in
- Creating more specific lists to organize your videos
- Adding detailed summaries to improve searchability
- Running increasingly specific queries to find exactly the information you need

This system helps you retain and retrieve valuable information from videos without having to rewatch them completely.