Audio Summarizer & Podcast Generator

A fully local CLI tool that transcribes meetings, summarizes YouTube videos, or generates personalized tech podcasts. Runs entirely on your machine — no cloud APIs, no data leaves your device.

Requirements

Python 3.10+
Ollama running locally with a model pulled (default: llama3.1:8b)
ffmpeg
TTS engine for podcast audio: macOS say (built-in, zero setup) or Piper TTS

Setup

pip install -r requirements.txt
ollama pull llama3.1:8b

Modes

1. Summarize a local audio file

python src/main.py meeting.mp3

Output: output/meeting/transcript.md, summary.md, remarks.md

2. Summarize a YouTube video

python src/main.py --youtube "https://www.youtube.com/watch?v=abc123"

Audio is downloaded to data/, output goes to output/<video_title>/. Includes a Watch Recommendation Score (1-10) in remarks.md.

3. Generate a podcast

python src/main.py --podcast

Fetches recent articles from RSS feeds and web search, filters by your interests, generates a podcast script, and converts it to audio.

Output: output/podcast_<date>/podcast.wav, script.md, sources.md

Requires interest.md — see below.

Options

AUDIO_FILE             Path to audio file (optional)
--youtube, -yt         YouTube URL to download and process
--podcast              Generate a podcast from your interests
--kb                   Knowledge base directory for context-aware summaries
--kb-rebuild           Force re-index the knowledge base
--embedding-model      Fastembed model for KB embeddings (default: BAAI/bge-small-en-v1.5)
--model, -m            Whisper model size (default: medium)
--output-dir, -o       Output directory (default: output/<name>/)
--llm-model            Ollama model (default: llama3.1:8b)
--language, -l         Audio language: auto, nl, en (default: auto)
--chunk-minutes        Chunk size in minutes (default: 10)

GPU acceleration on macOS (Apple Silicon)

On Apple Silicon Macs, the tool automatically uses mlx-whisper for GPU-accelerated transcription via Apple's MLX framework. This is significantly faster than the CPU-based faster-whisper backend.

Automatic: If mlx-whisper is installed and you're on macOS, it's used by default
Override: Set WHISPER_BACKEND=faster-whisper to force CPU, or WHISPER_BACKEND=mlx to force MLX
Models are downloaded automatically from HuggingFace on first use

CLI Model	MLX HuggingFace Repo
`tiny`	`mlx-community/whisper-tiny`
`base`	`mlx-community/whisper-base`
`small`	`mlx-community/whisper-small`
`medium`	`mlx-community/whisper-medium`
`large-v2`	`mlx-community/whisper-large-v2`
`large-v3`	`mlx-community/whisper-large-v3-turbo`

4. Knowledge base (RAG)

Add a --kb flag pointing to a directory of reference documents to make summaries and podcasts more domain-aware:

python src/main.py meeting.mp3 --kb ./my_docs/
python src/main.py --youtube "URL" --kb ./my_docs/
python src/main.py --podcast --kb ./my_docs/

For meetings and YouTube videos, relevant KB content is injected into the summarization prompts. For podcasts, fetched articles are discussed in the context of your knowledge base.

Supported formats: .txt, .md, .pdf, .docx, .html, .csv

On first run, documents are chunked, embedded, and stored in a local Qdrant vector store (data/kb_store/). Subsequent runs reuse the cached index. Use --kb-rebuild to re-index when files change:

python src/main.py meeting.mp3 --kb ./my_docs/ --kb-rebuild

Embedding model

By default the KB uses BAAI/bge-small-en-v1.5 (~130 MB, 384 dimensions). For better retrieval quality, use a larger model:

python src/main.py meeting.mp3 --kb ./my_docs/ --embedding-model BAAI/bge-base-en-v1.5
python src/main.py meeting.mp3 --kb ./my_docs/ --embedding-model BAAI/bge-large-en-v1.5

Popular fastembed models (downloaded automatically on first use):

Model	Size	Dimensions
`BAAI/bge-small-en-v1.5` (default)	~130 MB	384
`BAAI/bge-base-en-v1.5`	~440 MB	768
`BAAI/bge-large-en-v1.5`	~1.2 GB	1024
`sentence-transformers/all-MiniLM-L6-v2`	~90 MB	384
`nomic-ai/nomic-embed-text-v1.5`	~560 MB	768

Changing the embedding model requires re-indexing. The tool will detect the mismatch and ask you to add --kb-rebuild.

Configuration

`interest.md`

Personalizes YouTube watch scores and podcast content. Create in the project root:

I'm interested in AI/ML engineering, startup strategy, and Python tooling.
I don't care about marketing or social media growth.

`podcast_config.yaml`

Controls podcast generation. Edit to change voice, style, or sources:

tts:
  engine: piper                     # piper | macos_say
  voice: en_US-lessac-medium        # Piper model name (or macOS voice name)
  voice_host2: en_US-ryan-medium    # Second voice for two_host mode
  speed: 1.0

podcast:
  style: solo                       # solo | two_host
  max_articles: 5
  target_length: medium             # short (~3min) | medium (~7min) | long (~15min)

sources:
  feeds:
    - https://hnrss.org/newest?points=100
    - https://feeds.arstechnica.com/arstechnica/technology-lab
    - https://arxiv.org/rss/cs.AI
  web_search: true

Piper TTS voice setup

When using Piper, you need to download voice model files (.onnx + .onnx.json) and place them in the project root. Browse available voices at https://github.com/rhasspy/piper/blob/master/VOICES.md.

# Example: download the default voice
python3 -m piper.download_voices en_US-ryan-medium
python3 -m piper.download_voices en_US-lessac-medium

The voice value in podcast_config.yaml must match the filename without .onnx (e.g. en_US-lessac-medium).

For macOS without extra setup, use engine: macos_say with a system voice name like Samantha or Daniel.

Supports Dutch and English audio. Output is always in English.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
.gitignore		.gitignore
README.md		README.md
interest.md		interest.md
podcast_config.yaml		podcast_config.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Summarizer & Podcast Generator

Requirements

Setup

Modes

1. Summarize a local audio file

2. Summarize a YouTube video

3. Generate a podcast

Options

GPU acceleration on macOS (Apple Silicon)

4. Knowledge base (RAG)

Embedding model

Configuration

`interest.md`

`podcast_config.yaml`

Piper TTS voice setup

About

Uh oh!

Releases

Packages

Languages

aqzi/audio_summarizer

Folders and files

Latest commit

History

Repository files navigation

Audio Summarizer & Podcast Generator

Requirements

Setup

Modes

1. Summarize a local audio file

2. Summarize a YouTube video

3. Generate a podcast

Options

GPU acceleration on macOS (Apple Silicon)

4. Knowledge base (RAG)

Embedding model

Configuration

interest.md

podcast_config.yaml

Piper TTS voice setup

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`interest.md`

`podcast_config.yaml`

Packages