Give your terminal AI a voice.
Turn any terminal-based AI agent (Codex CLI, Claude Code, OpenRouter, Ollama, or local LLM shells) into a real-time voice assistant.
talk2agent captures microphone input, transcribes speech using Whisper, and injects the text directly into the active AI agent terminal session — no copy-paste required.
(Add a short GIF here — this dramatically increases adoption)
You speak → text appears in terminal → AI agent responds
Example use:
“Explain this Python error”
“Write a regex for email validation”
“Refactor this function”
Terminal AI agents are powerful, but typing prompts slows down interaction.
talk2agent removes the keyboard barrier and enables natural conversation with your coding agent.
Instead of:
think → type → edit → resend
you get:
think → speak → agent responds
This makes brainstorming, debugging, and prompt iteration significantly faster.
- Real-time speech-to-text transcription
- Works with any terminal AI agent
- No copy-paste required
- Local-first capable
- Whisper / Faster-Whisper backend
- Low-latency interaction
- Hands-free prompt engineering
- Codex CLI
- Claude Code
- OpenRouter CLI agents
- Ollama shell agents
- Any terminal-based LLM interface
git clone https://github.com/Edwu0304/talk2agent.git
cd talk2agentpython -m venv .venv
.\.venv\Scripts\python.exe -m pip install -U pip
.\.venv\Scripts\python.exe -m pip install -r requirements.txt.\.venv\Scripts\python.exe -m speech_ptt --device auto --model mediumAfter starting talk2agent.ahk (double-click the script) and pressing F9, focus your agent terminal and speak — the transcribed text is sent automatically.
- Python 3.12+
- Microphone input device
- Windows / Linux / macOS
- (Optional) NVIDIA GPU for faster Whisper inference
Microphone
↓
Audio Recorder
↓
Whisper Transcription
↓
Text Stream Processor
↓
Terminal Injection
↓
AI Agent (Codex / Claude / Ollama / etc.)
talk2agent does not depend on a specific model or provider. It acts as a voice interface layer for any CLI-based LLM.
- Talk to Codex while coding
- Hands-free debugging
- Accessibility support
- Brainstorming with an LLM
- Dictating code or documentation
- Faster prompt iteration
You may modify:
- Whisper model size
- Input device selection
- Injection delay
- Silence threshold
(Details can be documented later in /docs.)
talk2agent can run fully locally.
Audio is processed on your machine when using local Whisper models. No voice data is sent to external servers unless your AI agent itself forwards the prompts to a remote API.
- Push-to-talk mode
- Wake-word activation
- Streaming partial transcription
- Multi-agent routing
- VSCode extension
- More robust paste strategies for different terminals
PRs and ideas are welcome.
If you find a bug or want a feature, please open an issue.
MIT. See LICENSE.