Skip to content

AymanAzim/YoutubeChannelRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YouTube Channel RAG

A Retrieval-Augmented Generation app that lets you ask questions about any YouTube channel's content. It fetches video transcripts, indexes them using vector embeddings, and answers your questions using only the channel's actual content — with source links.

Built with Streamlit, OpenAI, ChromaDB, and yt-dlp.

Screenshots

Home Indexing indexed Query Result

How It Works

  1. Fetch — Retrieves all video IDs from a YouTube channel using yt-dlp
  2. Transcribe — Downloads transcripts for each video via youtube-transcript-api
  3. Chunk & Embed — Splits transcripts into overlapping chunks and generates embeddings with OpenAI's text-embedding-3-small
  4. Store — Stores embeddings in a ChromaDB vector database
  5. Query — Finds the most relevant chunks for your question and generates an answer with GPT-4o, citing the source videos

Setup

Prerequisites

Installation

git clone https://github.com/AymanAzim/YoutubeChannelRAG.git
cd YoutubeChannelRAG
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Environment Variables

Create a .env file or export your key:

export OPENAI_API_KEY="sk-your-key-here"

Run

streamlit run app.py

Usage

  1. Paste a YouTube channel URL (e.g. https://www.youtube.com/@ChannelName/videos)
  2. Click Index Channel and wait for it to finish
  3. Ask any question about the channel's content
  4. Get an answer with links to the source videos

Tech Stack

Component Tool
Frontend Streamlit
LLM OpenAI GPT-4o
Embeddings OpenAI text-embedding-3-small
Vector DB ChromaDB
Transcripts youtube-transcript-api
Video Fetching yt-dlp

License

MIT

About

A RAG project that allows you ask questions about Youtube channel's content

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages