talk2agent

Give your terminal AI a voice.

Turn any terminal-based AI agent (Codex CLI, Claude Code, OpenRouter, Ollama, or local LLM shells) into a real-time voice assistant.

talk2agent captures microphone input, transcribes speech using Whisper, and injects the text directly into the active AI agent terminal session — no copy-paste required.

Demo

(Add a short GIF here — this dramatically increases adoption)

You speak → text appears in terminal → AI agent responds

Example use:

“Explain this Python error”
“Write a regex for email validation”
“Refactor this function”

Why?

Terminal AI agents are powerful, but typing prompts slows down interaction.

talk2agent removes the keyboard barrier and enables natural conversation with your coding agent.

Instead of:

think → type → edit → resend

you get:

think → speak → agent responds

This makes brainstorming, debugging, and prompt iteration significantly faster.

Features

Real-time speech-to-text transcription
Works with any terminal AI agent
No copy-paste required
Local-first capable
Whisper / Faster-Whisper backend
Low-latency interaction
Hands-free prompt engineering

Supported Agents

Codex CLI
Claude Code
OpenRouter CLI agents
Ollama shell agents
Any terminal-based LLM interface

Quick Start (3 minutes)

1. Clone

git clone https://github.com/Edwu0304/talk2agent.git
cd talk2agent

2. Install dependencies

python -m venv .venv
.\.venv\Scripts\python.exe -m pip install -U pip
.\.venv\Scripts\python.exe -m pip install -r requirements.txt

3. Run

.\.venv\Scripts\python.exe -m speech_ptt --device auto --model medium

After starting talk2agent.ahk (double-click the script) and pressing F9, focus your agent terminal and speak — the transcribed text is sent automatically.

Requirements

Python 3.12+
Microphone input device
Windows / Linux / macOS
(Optional) NVIDIA GPU for faster Whisper inference

How It Works

Microphone
    ↓
Audio Recorder
    ↓
Whisper Transcription
    ↓
Text Stream Processor
    ↓
Terminal Injection
    ↓
AI Agent (Codex / Claude / Ollama / etc.)

talk2agent does not depend on a specific model or provider. It acts as a voice interface layer for any CLI-based LLM.

Use Cases

Talk to Codex while coding
Hands-free debugging
Accessibility support
Brainstorming with an LLM
Dictating code or documentation
Faster prompt iteration

Configuration (Optional)

You may modify:

Whisper model size
Input device selection
Injection delay
Silence threshold

(Details can be documented later in /docs.)

Security & Privacy

talk2agent can run fully locally.

Audio is processed on your machine when using local Whisper models. No voice data is sent to external servers unless your AI agent itself forwards the prompts to a remote API.

Roadmap

Push-to-talk mode
Wake-word activation
Streaming partial transcription
Multi-agent routing
VSCode extension
More robust paste strategies for different terminals

Contributing

PRs and ideas are welcome.

If you find a bug or want a feature, please open an issue.

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
docs		docs
speech_ptt		speech_ptt
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.zh-TW.md		README.zh-TW.md
SKILL.md		SKILL.md
devices.txt		devices.txt
mic_probe.py		mic_probe.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt
run.ps1		run.ps1
talk2agent.ahk		talk2agent.ahk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

talk2agent

Demo

Why?

Features

Supported Agents

Quick Start (3 minutes)

1. Clone

2. Install dependencies

3. Run

Requirements

How It Works

Use Cases

Configuration (Optional)

Security & Privacy

Roadmap

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

talk2agent

Demo

Why?

Features

Supported Agents

Quick Start (3 minutes)

1. Clone

2. Install dependencies

3. Run

Requirements

How It Works

Use Cases

Configuration (Optional)

Security & Privacy

Roadmap

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages