Voice Loop

A minimal on-device voice agent loop. Runs entirely on Mac M4 / Apple Silicon.

Need a custom voice model or production voice agent? See Trelis Voice AI Services.

Features

Smart turn detection — Silero VAD + pipecat's Smart Turn v3, so the agent waits when you pause mid-sentence
Voice interruption — speak over the agent; WebRTC AEC3 cancels echo from speakers so your voice cuts through
Editable persona — SOUL.md controls the agent's style, live-reloaded each turn
Optional long-term memory — enable with --memory; the agent learns durable facts about you in MEMORY.md and consolidates every 5 turns
Fully local — no API keys, no cloud. Everything runs on-device

Stack

Moonshine (CPU) for speech-to-text transcription
Gemma 4 E4B (MLX/Metal) for response generation
Kokoro (CPU) for TTS (streaming)
Silero VAD + Smart Turn v3 for turn detection
WebRTC AEC3 (via LiveKit APM) for voice interruption

Setup

brew install portaudio espeak-ng
git clone https://github.com/TrelisResearch/voice-loop.git
cd voice-loop
uv sync

First run downloads Gemma 4 E4B (~3GB), Moonshine (~250MB), Kokoro (~300MB).

Usage

# Recommended defaults (TTS + smart turn + voice interrupt all on)
uv run voice_loop_mac.py

# + chime on utterance + soft ticks while generating
uv run voice_loop_mac.py --chime

# + persistent memory (reads/writes MEMORY.md)
uv run voice_loop_mac.py --memory

# Text-only mode (no TTS)
uv run voice_loop_mac.py --no-tts

# Disable voice interruption (keypress only)
uv run voice_loop_mac.py --no-aec

# Different voice (see below)
uv run voice_loop_mac.py --voice bf_emma

# Use the smaller E2B model (faster, slightly lower quality)
uv run voice_loop_mac.py --model mlx-community/gemma-4-E2B-it-4bit

# Custom silence timeout
uv run voice_loop_mac.py --silence-ms 500

# Debug: record mic stream to a WAV
uv run voice_loop_mac.py --record

Recommended Kokoro voices

Only the higher-quality voices are listed here:

Voice	Accent	Gender	Notes
`af_heart`	US	Female	Top pick — Grade A (default)
`af_bella`	US	Female	Grade A-, HH training
`bf_emma`	UK	Female	Grade B-, HH training
`am_fenrir`	US	Male	Grade C+, H training
`am_puck`	US	Male	Grade C+, H training
`am_michael`	US	Male	Grade C+, H training
`bm_fable`	UK	Male	Grade C, MM training
`bm_george`	UK	Male	Grade C, MM training

Architecture

   Mic (16kHz) ──► Silero VAD ──► Smart Turn ──► Moonshine ──► Gemma 4 E4B ──► Kokoro ──► Speakers
                                                                    ▲                         │
                                                        SOUL.md + MEMORY.md                   │
                                                                                              ▼
   Mic during TTS ──► WebRTC AEC3 (LiveKit APM) ──► Silero VAD ──► voice interrupt ◄──────────┘

How it works

Mic capture via sounddevice (16kHz mono)
Silero VAD detects speech vs silence
Smart Turn confirms end-of-turn on silence (default on)
Moonshine transcribes your audio to text (CPU)
Gemma 4 E4B responds using SOUL.md (+ MEMORY.md if --memory) as system prompt
Kokoro synthesizes speech, streams audio
WebRTC AEC3 cleans mic during TTS playback → Silero VAD on cleaned audio → voice interrupt

Press any key during TTS to interrupt.

Persona & Memory

SOUL.md — persona / style (always loaded, live-reloaded each turn)
MEMORY.md — long-term facts. Only read/written when --memory is passed. When enabled, the agent extracts new durable facts after each turn and consolidates every 5 turns.

Both files are re-read at the start of every turn, so edits take effect immediately.

Memory usage

~3.5 GB total. Fits easily in 16GB.

Credits

Built with:

Moonshine — STT
Kokoro — TTS
Silero VAD — voice activity detection
Smart Turn v3 — end-of-turn detection
LiveKit APM — WebRTC AEC3
mlx-vlm — MLX multimodal inference
Gemma 4 — LLM

License

Apache 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
LICENSE		LICENSE
MEMORY.md.example		MEMORY.md.example
README.md		README.md
SOUL.md		SOUL.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock
voice_loop_mac.py		voice_loop_mac.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Loop

Features

Stack

Setup

Usage

Recommended Kokoro voices

Architecture

How it works

Persona & Memory

Memory usage

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice Loop

Features

Stack

Setup

Usage

Recommended Kokoro voices

Architecture

How it works

Persona & Memory

Memory usage

Credits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages