A fast, offline-friendly proof of concept (POC) to transcribe YouTube videos and chat with the content using a local LLM via Ollama. Built with Flask, Whisper, PostgreSQL, and Mistral.
- 🧠 Transcribe YouTube audio using Whisper for Transcribe operation and Mistral Model for Chat
- 💬 Chat with transcript using Ollama (Mistral, LLaMA2, etc.)
- 📦 Local PostgreSQL-backed storage
- ⚙️ Configurable via
.env
- Python 3.10+
- Docker
- Ollama with mistral:7b-instruct-fp16 model
git clone https://github.com/13shivam/yt-agent.git
cd yt-agent
FLASK_APP=app.py
FLASK_ENV=development
# Ollama setup
OLLAMA_MODEL=mistral:7b-instruct-fp16
OLLAMA_API=http://localhost:11434/api/chat
# PostgreSQL connection
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=ytagent
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgers
# Whisper model
WHISPER_MODEL=base
brew install ollama
ollama run mistral:7b-instruct-fp16
docker compose build --no-cache
docker compose up -d
- Speaker Diarization support via open source
pyannote.audio
- Speaker Diarization — NVIDIA NeMo Framework
This project is licensed under the MIT License - see the license file for details.
Important Notice Regarding Open Source Dependencies:
This project relies on various open-source models and libraries, each with its own licensing terms. It is the user's responsibility to understand and adhere to the specific licenses of all the open-source components they choose to use. Consult the individual licenses provided by the respective model and library providers.