🎙️ YouTube Transcription & Chat Agent

A fast, offline-friendly proof of concept (POC) to transcribe YouTube videos and chat with the content using a local LLM via Ollama. Built with Flask, Whisper, PostgreSQL, and Mistral.

✅ Features

🧠 Transcribe YouTube audio using Whisper for Transcribe operation and Mistral Model for Chat
💬 Chat with transcript using Ollama (Mistral, LLaMA2, etc.)
📦 Local PostgreSQL-backed storage
⚙️ Configurable via .env

🔧 Prerequisites

Python 3.10+
Docker
Ollama with mistral:7b-instruct-fp16 model

🚀 Getting Started

1. Clone the Project

git clone https://github.com/13shivam/yt-agent.git
cd yt-agent

2. Make sure a .env File exists

FLASK_APP=app.py
FLASK_ENV=development

# Ollama setup
OLLAMA_MODEL=mistral:7b-instruct-fp16
OLLAMA_API=http://localhost:11434/api/chat

# PostgreSQL connection
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=ytagent
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgers

# Whisper model
WHISPER_MODEL=base

3. Run Local Ollama with Mistral Model (7b-instruct-fp16 - 14gb)

brew install ollama
ollama run mistral:7b-instruct-fp16

4. Run App

docker compose build --no-cache
docker compose up -d

Access APIs via Swagger

http://127.0.0.1:5050/apidocs/#

Sequence Diagram all APIs

Swagger API Local Demo

Step 1: Upload API youtube URL

Step 2: Check JobStatus

Step 3: Initiate Chat from JobId

Misc Screenshots

DB Screenshots
Chat API preview
Flask App Logs

🪪 WIP to add in future release:

Speaker Diarization support via open source pyannote.audio
Speaker Diarization — NVIDIA NeMo Framework

📝 License

This project is licensed under the MIT License - see the license file for details.

Important Notice Regarding Open Source Dependencies:

This project relies on various open-source models and libraries, each with its own licensing terms. It is the user's responsibility to understand and adhere to the specific licenses of all the open-source components they choose to use. Consult the individual licenses provided by the respective model and library providers.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
resource		resource
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
app.py		app.py
db.py		db.py
db_init.sql		db_init.sql
docker-compose.yml		docker-compose.yml
readme.md		readme.md
requirements.txt		requirements.txt
swagger_specs.py		swagger_specs.py
whisper_utils.py		whisper_utils.py
yt_utils.py		yt_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎙️ YouTube Transcription & Chat Agent

✅ Features

🔧 Prerequisites

🚀 Getting Started

1. Clone the Project

2. Make sure a .env File exists

3. Run Local Ollama with Mistral Model (7b-instruct-fp16 - 14gb)

4. Run App

Access APIs via Swagger

http://127.0.0.1:5050/apidocs/#

Sequence Diagram all APIs

Swagger API Local Demo

Step 1: Upload API youtube URL

Step 2: Check JobStatus

Step 3: Initiate Chat from JobId

Misc Screenshots

🪪 WIP to add in future release:

📝 License

About

Uh oh!

Uh oh!

Languages

License

13shivam/yt-agent

Folders and files

Latest commit

History

Repository files navigation

🎙️ YouTube Transcription & Chat Agent

✅ Features

🔧 Prerequisites

🚀 Getting Started

1. Clone the Project

2. Make sure a .env File exists

3. Run Local Ollama with Mistral Model (7b-instruct-fp16 - 14gb)

4. Run App

Access APIs via Swagger

http://127.0.0.1:5050/apidocs/#

Sequence Diagram all APIs

Swagger API Local Demo

Step 1: Upload API youtube URL

Step 2: Check JobStatus

Step 3: Initiate Chat from JobId

Misc Screenshots

🪪 WIP to add in future release:

📝 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages