A Python tool that transcribes YouTube videos and extracts key information using AI. Available as both a CLI tool and a web interface with REST API.
- Dual Interface: Command-line tool and web-based UI with REST API
- Fast Local Transcription: MLX Whisper optimized for Apple Silicon (5-10x faster than standard Whisper)
- YouTube Captions: Extract auto-generated captions when available (fastest option)
- AI Extraction: Extracts key insights using Claude or GPT with hierarchical summarization
- Real-time Progress: Web interface shows live progress via Server-Sent Events
- One-Command Setup:
./start-local.shhandles environment setup and dependencies - Library-style API:
process_video()can be imported and reused programmatically
transcript-pipeline/
├── start-local.sh # One-command local setup and run script
├── requirements-local.txt # Python dependencies (with MLX Whisper)
├── requirements.txt # Minimal dependencies (for Docker/cloud)
├── server.py # FastAPI web server with SSE streaming
├── .env.example # Environment variables template
├── README.md # This file
├── CLAUDE.md # Claude Code guidance
├── Dockerfile # Docker image configuration (alternative)
├── docker-compose.yml # Docker Compose setup (alternative)
├── src/
│ ├── __init__.py
│ ├── main.py # CLI entry point (thin wrapper)
│ ├── config.py # Centralized configuration and constants
│ ├── models.py # Shared data models (Segment, etc.)
│ ├── downloader.py # YouTube audio download via yt-dlp
│ ├── transcriber.py # Whisper/ElevenLabs transcription engines
│ ├── extractor.py # LLM-based content extraction
│ ├── utils.py # Helper functions and utilities
│ └── services/
│ ├── __init__.py
│ ├── pipeline_service.py # Core pipeline logic (process_video)
│ └── markdown_service.py # Markdown generation functions
├── frontend/
│ └── index.html # Web interface (standalone HTML)
├── tests/ # Test suite
│ ├── test_utils.py
│ └── test_transcriber_scribe_parsing.py
├── dev/
│ └── REFACTORING.md # Refactoring progress documentation
├── output/ # Generated files (gitignored)
│ ├── audio/ # Temporary audio files
│ ├── transcripts/ # Transcript markdown files
│ └── summaries/ # Summary markdown files
└── models/ # Whisper model cache (gitignored)
- macOS with Apple Silicon (M1/M2/M3/M4) - recommended for MLX Whisper
- Python 3.11 - 3.13 (Python 3.14+ not supported)
- ffmpeg - for audio processing (
brew install ffmpeg) - API key for Claude (Anthropic) or GPT (OpenAI) - for AI extraction
Note: Docker is available as an alternative but does not support MLX Whisper (local transcription). Use Docker for deployment or non-macOS environments.
# 1. Configure environment
cp .env.example .env
# Edit .env and add your ANTHROPIC_API_KEY (or OPENAI_API_KEY)
# 2. Start the backend (installs dependencies automatically)
./start-local.sh
# 3. In another terminal, start the frontend
cd web && npm install && npm run dev
# 4. Open http://localhost:3000The start-local.sh script automatically:
- Creates a Python virtual environment (
.venv/) - Installs all dependencies including MLX Whisper
- Starts the FastAPI server on port 8000 with hot-reload
Copy the example environment file and add your API keys:
cp .env.example .envEdit .env with your settings:
# Transcription Engine: 'auto', 'mlx-whisper', or 'captions'
# - auto: Try YouTube captions first, fall back to MLX Whisper (recommended)
# - mlx-whisper: Always use local transcription
# - captions: Only use YouTube captions
TRANSCRIPTION_ENGINE=auto
# MLX Whisper model (when using mlx-whisper)
# Options: tiny, base, small, medium, large, large-v3, large-v3-turbo, distil-large-v3
MLX_WHISPER_MODEL=large-v3-turbo
# LLM for AI extraction (required)
ANTHROPIC_API_KEY=your_anthropic_key_here # For Claude (recommended)
# OPENAI_API_KEY=your_openai_key_here # For GPT (alternative)
DEFAULT_LLM=claudeSee .env.example for all available options.
# Start backend
./start-local.sh
# In another terminal, start frontend
cd web && npm run devThe web interface at http://localhost:3000 provides:
- Real-time progress updates via Server-Sent Events
- Model selection (MLX Whisper sizes)
- Direct download of transcripts and summaries
- Activity log with detailed status
Process videos directly from the command line:
# Activate the virtual environment first
source .venv/bin/activate
# Process a video
python -m src.main https://www.youtube.com/watch?v=VIDEO_ID
# Use GPT instead of Claude
python -m src.main https://youtu.be/VIDEO_ID --llm gpt
# Only transcribe (skip AI extraction)
python -m src.main https://youtu.be/VIDEO_ID --no-extractFor deployment or non-macOS environments. Note: MLX Whisper is not available in Docker.
# Build and start API server
docker-compose build
docker-compose up transcript-api
# CLI mode
docker-compose run --rm --profile cli transcript-pipeline https://www.youtube.com/watch?v=VIDEO_IDThe API server provides REST endpoints for programmatic access:
Start Processing:
POST /api/process
Content-Type: application/json
{
"url": "https://youtube.com/watch?v=VIDEO_ID",
"llm_type": "claude", # optional
"extract": true # optional
}
# Returns: {"job_id": "abc123", "status": "pending", ...}Get Job Status:
GET /api/jobs/{job_id}
# Returns: {"job_id": "abc123", "status": "complete", "transcript_path": "...", ...}Stream Progress (Server-Sent Events):
GET /api/jobs/{job_id}/stream
# Returns SSE stream with real-time updatesDownload Files:
GET /api/jobs/{job_id}/download/transcript
GET /api/jobs/{job_id}/download/summaryGet Configuration (non-sensitive):
GET /api/config
# Returns: {
# "default_llm": "claude",
# "transcription_engine": "whisper", # or "elevenlabs"
# "whisper_model": "large-v3", # if using whisper
# "has_elevenlabs_key": true,
# ...
# }Interactive API Documentation:
Visit http://localhost:8000/docs when the server is running for Swagger UI with interactive testing.
The tool generates organized output files in the output/ directory:
output/
├── audio/ # Temporary audio files (cleaned up after processing)
├── transcripts/ # Transcript markdown files
│ └── {video-title}-transcript.md
└── summaries/ # Summary markdown files
└── {video-title}-summary.md
Contains:
- Video metadata (title, author, date, duration, URL)
- Video description (truncated to 500 chars)
- Full transcript with timestamps
Example:
# Introduction to Machine Learning
**Author**: Tech Channel
**Date**: 20240115
**URL**: https://www.youtube.com/watch?v=...
**Duration**: 15m 30s
## Description
This video covers the fundamentals of machine learning...
## Transcript
[00:00:00] Welcome to this introduction to machine learning...
[00:00:15] Today we'll cover the basics of supervised learning...Contains AI-extracted information:
- Executive summary
- Key points
- Important quotes (with timestamps)
- Main topics
- Actionable insights
Example:
# Introduction to Machine Learning - Summary
**Author**: Tech Channel
**Date**: 20240115
**Processed**: 2024-01-20
---
## Executive Summary
This video provides a comprehensive introduction to machine learning...
## Key Points
- Machine learning is a subset of artificial intelligence
- Supervised learning requires labeled training data
...python -m src.main [-h]
[--llm {claude,gpt}] [--output-dir OUTPUT_DIR]
[--no-extract]
url
Positional arguments:
url YouTube video URL
Optional arguments:
-h, --help Show help message
--llm LLM LLM for extraction: claude or gpt (default: claude)
--output-dir DIR Output directory (default: ./output)
--no-extract Skip extraction, only transcribe
Docker is available for deployment to cloud platforms or non-macOS environments. The Docker image supports YouTube caption extraction but not MLX Whisper (local transcription).
# Build and run
docker-compose build
docker-compose up transcript-apiServices:
transcript-api: FastAPI web server on port 8000transcript-pipeline: CLI mode (use--profile cli)
This error occurs when running in Docker. MLX Whisper requires Apple Silicon and cannot run in Docker containers. Solutions:
- Run locally (recommended): Use
./start-local.shinstead of Docker - Use captions: Set
TRANSCRIPTION_ENGINE=captionsin.envto use YouTube auto-captions
The video may be:
- Private or deleted
- Geo-restricted
- Age-restricted
Try a different video URL.
Make sure your .env file contains the correct API key:
ANTHROPIC_API_KEYfor ClaudeOPENAI_API_KEYfor GPT
The first run downloads the model from Hugging Face. This is normal:
smallmodel: ~500MBlarge-v3-turbomodel: ~1.5GB
Models are cached in ~/.cache/huggingface/ for future use.
If transcribing large videos, use a smaller model:
MLX_WHISPER_MODEL=small # In .envInstall ffmpeg (required for audio processing):
brew install ffmpegConfigure CORS origins via the CORS_ORIGINS environment variable:
CORS_ORIGINS=https://yourdomain.comCreate a script to process multiple videos:
#!/bin/bash
# process-videos.sh
source .venv/bin/activate
while IFS= read -r url; do
echo "Processing: $url"
python -m src.main "$url"
done < video-urls.txtUsage:
chmod +x process-videos.sh
./process-videos.shThe process_video() function can be imported from the services package and used as a library:
from src.services import process_video
result = process_video(
url="https://youtube.com/watch?v=VIDEO_ID",
llm_type="claude",
output_dir="./output",
transcription_engine="whisper", # or "elevenlabs"
no_extract=False, # Set True to skip extraction
)
if result['success']:
print(f"Transcript: {result['transcript_path']}")
print(f"Summary: {result['summary_path']}")
print(f"Segments: {len(result['segments'])}")
else:
print(f"Error: {result['error']}")You can also provide a status callback for progress updates:
def my_callback(phase, status, message):
print(f"[{phase}] {status}: {message}")
result = process_video(
url="https://youtube.com/watch?v=VIDEO_ID",
status_callback=my_callback,
)To customize the extraction prompt, edit src/extractor.py and modify:
EXTRACTION_PROMPT- Single-pass extraction for short transcriptsCHUNK_SUMMARY_PROMPT- Per-chunk summarization for long transcriptsFINAL_SUMMARY_PROMPT- Final synthesis across chunk summaries
The pipeline automatically handles long videos:
- Transcription: Videos >30 minutes are chunked with 5-second overlaps
- Extraction: Transcripts >8000 characters use hierarchical summarization (chunk summaries → final summary)
This project is provided as-is for educational and personal use.
Contributions welcome! Please test thoroughly before submitting pull requests.
Run the test suite:
# Activate virtual environment and run tests
source .venv/bin/activate
pytest
# Or run specific tests
pytest tests/test_utils.py -vTests cover:
- Utility functions (filename sanitization, timestamps, path validation)
- Transcription parsing
- Retry logic with exponential backoff
Built with:
- yt-dlp - YouTube downloader
- MLX Whisper - Fast local transcription on Apple Silicon
- Anthropic Claude - AI extraction
- OpenAI GPT - AI extraction (alternative)
- FastAPI - Web framework
- Next.js - Frontend framework