Skip to content

Offline-friendly backend POC to transcribe YouTube videos and chat with video content using Whisper (no cloud required) and local LLMs via Ollama like Mistral or LLaMA2. Built with Flask and PostgreSQL, fully open source with Swagger APIs. Easily connect any frontend. ⚠️ Use Submit API to download one video at a time to avoid YouTube throttling.

License

Notifications You must be signed in to change notification settings

13shivam/yt-agent

Repository files navigation

🎙️ YouTube Transcription & Chat Agent

A fast, offline-friendly proof of concept (POC) to transcribe YouTube videos and chat with the content using a local LLM via Ollama. Built with Flask, Whisper, PostgreSQL, and Mistral.


✅ Features

  • 🧠 Transcribe YouTube audio using Whisper for Transcribe operation and Mistral Model for Chat
  • 💬 Chat with transcript using Ollama (Mistral, LLaMA2, etc.)
  • 📦 Local PostgreSQL-backed storage
  • ⚙️ Configurable via .env

🔧 Prerequisites

  • Python 3.10+
  • Docker
  • Ollama with mistral:7b-instruct-fp16 model

🚀 Getting Started

1. Clone the Project

git clone https://github.com/13shivam/yt-agent.git
cd yt-agent

2. Make sure a .env File exists

FLASK_APP=app.py
FLASK_ENV=development

# Ollama setup
OLLAMA_MODEL=mistral:7b-instruct-fp16
OLLAMA_API=http://localhost:11434/api/chat

# PostgreSQL connection
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=ytagent
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgers

# Whisper model
WHISPER_MODEL=base

3. Run Local Ollama with Mistral Model (7b-instruct-fp16 - 14gb)

brew install ollama
ollama run mistral:7b-instruct-fp16

4. Run App

docker compose build --no-cache
docker compose up -d

My Local Image


Access APIs via Swagger


Sequence Diagram all APIs

My Local Image

Swagger API Local Demo

My Local Image

Step 1: Upload API youtube URL

My Local Image

Step 2: Check JobStatus

My Local Image

Step 3: Initiate Chat from JobId

My Local Image

Misc Screenshots

  1. DB Screenshots in_progress_statu.png db_transcript_complete_status.png
  2. Chat API preview chat_interface_job_id.png
  3. Flask App Logs flask_app_logs.png

🪪 WIP to add in future release:

  • Speaker Diarization support via open source pyannote.audio
  • Speaker Diarization — NVIDIA NeMo Framework

📝 License

This project is licensed under the MIT License - see the license file for details.

Important Notice Regarding Open Source Dependencies:

This project relies on various open-source models and libraries, each with its own licensing terms. It is the user's responsibility to understand and adhere to the specific licenses of all the open-source components they choose to use. Consult the individual licenses provided by the respective model and library providers.

About

Offline-friendly backend POC to transcribe YouTube videos and chat with video content using Whisper (no cloud required) and local LLMs via Ollama like Mistral or LLaMA2. Built with Flask and PostgreSQL, fully open source with Swagger APIs. Easily connect any frontend. ⚠️ Use Submit API to download one video at a time to avoid YouTube throttling.

Topics

Resources

License

Stars

Watchers

Forks