A Retrieval-Augmented Generation app that lets you ask questions about any YouTube channel's content. It fetches video transcripts, indexes them using vector embeddings, and answers your questions using only the channel's actual content — with source links.
Built with Streamlit, OpenAI, ChromaDB, and yt-dlp.
- Fetch — Retrieves all video IDs from a YouTube channel using
yt-dlp - Transcribe — Downloads transcripts for each video via
youtube-transcript-api - Chunk & Embed — Splits transcripts into overlapping chunks and generates embeddings with OpenAI's
text-embedding-3-small - Store — Stores embeddings in a ChromaDB vector database
- Query — Finds the most relevant chunks for your question and generates an answer with GPT-4o, citing the source videos
- Python 3.10+
- An OpenAI API key
git clone https://github.com/AymanAzim/YoutubeChannelRAG.git
cd YoutubeChannelRAG
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtCreate a .env file or export your key:
export OPENAI_API_KEY="sk-your-key-here"streamlit run app.py- Paste a YouTube channel URL (e.g.
https://www.youtube.com/@ChannelName/videos) - Click Index Channel and wait for it to finish
- Ask any question about the channel's content
- Get an answer with links to the source videos
| Component | Tool |
|---|---|
| Frontend | Streamlit |
| LLM | OpenAI GPT-4o |
| Embeddings | OpenAI text-embedding-3-small |
| Vector DB | ChromaDB |
| Transcripts | youtube-transcript-api |
| Video Fetching | yt-dlp |
MIT



