A fully local CLI tool that transcribes meetings, summarizes YouTube videos, or generates personalized tech podcasts. Runs entirely on your machine — no cloud APIs, no data leaves your device.
- Python 3.10+
- Ollama running locally with a model pulled (default:
llama3.1:8b) - ffmpeg
- TTS engine for podcast audio: macOS
say(built-in, zero setup) or Piper TTS
pip install -r requirements.txt
ollama pull llama3.1:8bpython src/main.py meeting.mp3Output: output/meeting/transcript.md, summary.md, remarks.md
python src/main.py --youtube "https://www.youtube.com/watch?v=abc123"Audio is downloaded to data/, output goes to output/<video_title>/. Includes a Watch Recommendation Score (1-10) in remarks.md.
python src/main.py --podcastFetches recent articles from RSS feeds and web search, filters by your interests, generates a podcast script, and converts it to audio.
Output: output/podcast_<date>/podcast.wav, script.md, sources.md
Requires interest.md — see below.
AUDIO_FILE Path to audio file (optional)
--youtube, -yt YouTube URL to download and process
--podcast Generate a podcast from your interests
--kb Knowledge base directory for context-aware summaries
--kb-rebuild Force re-index the knowledge base
--embedding-model Fastembed model for KB embeddings (default: BAAI/bge-small-en-v1.5)
--model, -m Whisper model size (default: medium)
--output-dir, -o Output directory (default: output/<name>/)
--llm-model Ollama model (default: llama3.1:8b)
--language, -l Audio language: auto, nl, en (default: auto)
--chunk-minutes Chunk size in minutes (default: 10)
On Apple Silicon Macs, the tool automatically uses mlx-whisper for GPU-accelerated transcription via Apple's MLX framework. This is significantly faster than the CPU-based faster-whisper backend.
- Automatic: If
mlx-whisperis installed and you're on macOS, it's used by default - Override: Set
WHISPER_BACKEND=faster-whisperto force CPU, orWHISPER_BACKEND=mlxto force MLX - Models are downloaded automatically from HuggingFace on first use
| CLI Model | MLX HuggingFace Repo |
|---|---|
tiny |
mlx-community/whisper-tiny |
base |
mlx-community/whisper-base |
small |
mlx-community/whisper-small |
medium |
mlx-community/whisper-medium |
large-v2 |
mlx-community/whisper-large-v2 |
large-v3 |
mlx-community/whisper-large-v3-turbo |
Add a --kb flag pointing to a directory of reference documents to make summaries and podcasts more domain-aware:
python src/main.py meeting.mp3 --kb ./my_docs/
python src/main.py --youtube "URL" --kb ./my_docs/
python src/main.py --podcast --kb ./my_docs/For meetings and YouTube videos, relevant KB content is injected into the summarization prompts. For podcasts, fetched articles are discussed in the context of your knowledge base.
Supported formats: .txt, .md, .pdf, .docx, .html, .csv
On first run, documents are chunked, embedded, and stored in a local Qdrant vector store (data/kb_store/). Subsequent runs reuse the cached index. Use --kb-rebuild to re-index when files change:
python src/main.py meeting.mp3 --kb ./my_docs/ --kb-rebuildBy default the KB uses BAAI/bge-small-en-v1.5 (~130 MB, 384 dimensions). For better retrieval quality, use a larger model:
python src/main.py meeting.mp3 --kb ./my_docs/ --embedding-model BAAI/bge-base-en-v1.5
python src/main.py meeting.mp3 --kb ./my_docs/ --embedding-model BAAI/bge-large-en-v1.5Popular fastembed models (downloaded automatically on first use):
| Model | Size | Dimensions |
|---|---|---|
BAAI/bge-small-en-v1.5 (default) |
~130 MB | 384 |
BAAI/bge-base-en-v1.5 |
~440 MB | 768 |
BAAI/bge-large-en-v1.5 |
~1.2 GB | 1024 |
sentence-transformers/all-MiniLM-L6-v2 |
~90 MB | 384 |
nomic-ai/nomic-embed-text-v1.5 |
~560 MB | 768 |
Changing the embedding model requires re-indexing. The tool will detect the mismatch and ask you to add --kb-rebuild.
Personalizes YouTube watch scores and podcast content. Create in the project root:
I'm interested in AI/ML engineering, startup strategy, and Python tooling.
I don't care about marketing or social media growth.Controls podcast generation. Edit to change voice, style, or sources:
tts:
engine: piper # piper | macos_say
voice: en_US-lessac-medium # Piper model name (or macOS voice name)
voice_host2: en_US-ryan-medium # Second voice for two_host mode
speed: 1.0
podcast:
style: solo # solo | two_host
max_articles: 5
target_length: medium # short (~3min) | medium (~7min) | long (~15min)
sources:
feeds:
- https://hnrss.org/newest?points=100
- https://feeds.arstechnica.com/arstechnica/technology-lab
- https://arxiv.org/rss/cs.AI
web_search: trueWhen using Piper, you need to download voice model files (.onnx + .onnx.json) and place them in the project root. Browse available voices at https://github.com/rhasspy/piper/blob/master/VOICES.md.
# Example: download the default voice
python3 -m piper.download_voices en_US-ryan-medium
python3 -m piper.download_voices en_US-lessac-mediumThe voice value in podcast_config.yaml must match the filename without .onnx (e.g. en_US-lessac-medium).
For macOS without extra setup, use engine: macos_say with a system voice name like Samantha or Daniel.
Supports Dutch and English audio. Output is always in English.