Skip to content

YouTube Video Chatbot — a Streamlit app that fetches YouTube transcripts, splits text with RecursiveCharacterTextSplitter, creates Ollama embeddings, indexes chunks in FAISS, and answers user queries via LangChain. Local-first LLM support (Ollama) enables fast, private retrieval-augmented conversation.

Notifications You must be signed in to change notification settings

codegeekyyy/Youtubechatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Youtubechatbot

YouTube Video Chatbot — a Streamlit app that fetches YouTube transcripts, splits text with RecursiveCharacterTextSplitter, creates Ollama embeddings, indexes chunks in FAISS, and answers user queries via LangChain. Local-first LLM support (Ollama) enables fast, private retrieval-augmented conversation.

🧠 Project Overview

The chatbot extracts a video transcript using the youtube-transcript-api, splits it into smaller chunks, and embeds each chunk using Ollama Embeddings. These embeddings are stored in a FAISS vector database, allowing efficient retrieval of the most relevant transcript sections when a user asks a question.

The Ollama LLM (e.g., gemma3:4b) is then used to generate an accurate, context-aware response based on the retrieved transcript snippets.

A simple Streamlit-based frontend lets users:

Enter a YouTube video URL

View transcript chunks

Ask questions related to the video content

Get detailed, AI-generated answers derived directly from the transcript

⚙️ Tech Stack

Python 3.10+

LangChain (core logic)

LangChain Community & Ollama

FAISS (vector storage)

YouTube Transcript API (video transcript extraction)

Streamlit (frontend UI)

Ollama Models (nomic-embed-text, gemma3:4b)

🧩 Features

✅ Extracts transcripts from YouTube videos ✅ Splits and embeds text into FAISS vector storage ✅ Uses Ollama models for embeddings and generation ✅ Implements RAG for contextual Q&A ✅ Simple and interactive Streamlit interface

🗂️ Folder Structure YouTubeChatbotusingLangChain/ │ ├── app.py # Streamlit frontend ├── yt.py # Backend RAG logic ├── .gitignore # Ignore unnecessary files ├── requirements.txt # Project dependencies ├── venv/ # Virtual environment (ignored) └── README.md # Project documentation

🧰 Installation and Setup 1️⃣ Clone the Repository git clone https://github.com/your-username/YouTubeChatbotusingLangChain.git cd YouTubeChatbotusingLangChain

2️⃣ Create a Virtual Environment python -m venv venv

3️⃣ Activate the Virtual Environment

Windows:

venv\Scripts\activate

macOS/Linux:

source venv/bin/activate

4️⃣ Install Dependencies pip install -r requirements.txt

Example dependencies:

youtube-transcript-api langchain langchain-community langchain-ollama faiss-cpu streamlit python-dotenv

🚀 Run the Application Backend (Data Processing) python yt.py

Frontend (Streamlit UI) streamlit run app.py

💡 Example Usage

Launch the Streamlit app.

Enter a YouTube video URL (e.g., https://www.youtube.com/watch?v=Gfr50f6ZBvo).

Wait for the transcript to load and process.

Ask any question about the video — e.g., “What is DeepMind?” The bot will provide an accurate, transcript-based answer.

📚 How It Works

Transcript Extraction: Fetches captions using the YouTube Transcript API.

Chunking: Splits text using RecursiveCharacterTextSplitter.

Embedding: Converts text chunks into embeddings via OllamaEmbeddings.

Vector Storage: Stores vectors using FAISS for fast similarity search.

Retrieval: Retrieves top-k chunks relevant to the question.

Generation: Feeds context + question to Ollama LLM for final answer.

🧑‍💻 Author

Harshdeep singh B.Tech Student, Pranveer Singh Institute of Technology Focusing on Artificial Intelligence and Machine Learning

📜 License

This project is licensed under the MIT License — feel free to modify and use it for your learning or projects.

🌟 Acknowledgments

LangChain

Ollama

FAISS

Streamlit

YouTube Transcript API

About

YouTube Video Chatbot — a Streamlit app that fetches YouTube transcripts, splits text with RecursiveCharacterTextSplitter, creates Ollama embeddings, indexes chunks in FAISS, and answers user queries via LangChain. Local-first LLM support (Ollama) enables fast, private retrieval-augmented conversation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages