Skip to content

Edwu0304/talk2agent

Repository files navigation

talk2agent

繁體中文

Give your terminal AI a voice.

Turn any terminal-based AI agent (Codex CLI, Claude Code, OpenRouter, Ollama, or local LLM shells) into a real-time voice assistant.

talk2agent captures microphone input, transcribes speech using Whisper, and injects the text directly into the active AI agent terminal session — no copy-paste required.


Demo

(Add a short GIF here — this dramatically increases adoption)

You speak → text appears in terminal → AI agent responds

Example use:

“Explain this Python error”
“Write a regex for email validation”
“Refactor this function”


Why?

Terminal AI agents are powerful, but typing prompts slows down interaction.

talk2agent removes the keyboard barrier and enables natural conversation with your coding agent.

Instead of:

think → type → edit → resend

you get:

think → speak → agent responds

This makes brainstorming, debugging, and prompt iteration significantly faster.


Features

  • Real-time speech-to-text transcription
  • Works with any terminal AI agent
  • No copy-paste required
  • Local-first capable
  • Whisper / Faster-Whisper backend
  • Low-latency interaction
  • Hands-free prompt engineering

Supported Agents

  • Codex CLI
  • Claude Code
  • OpenRouter CLI agents
  • Ollama shell agents
  • Any terminal-based LLM interface

Quick Start (3 minutes)

1. Clone

git clone https://github.com/Edwu0304/talk2agent.git
cd talk2agent

2. Install dependencies

python -m venv .venv
.\.venv\Scripts\python.exe -m pip install -U pip
.\.venv\Scripts\python.exe -m pip install -r requirements.txt

3. Run

.\.venv\Scripts\python.exe -m speech_ptt --device auto --model medium

After starting talk2agent.ahk (double-click the script) and pressing F9, focus your agent terminal and speak — the transcribed text is sent automatically.


Requirements

  • Python 3.12+
  • Microphone input device
  • Windows / Linux / macOS
  • (Optional) NVIDIA GPU for faster Whisper inference

How It Works

Microphone
    ↓
Audio Recorder
    ↓
Whisper Transcription
    ↓
Text Stream Processor
    ↓
Terminal Injection
    ↓
AI Agent (Codex / Claude / Ollama / etc.)

talk2agent does not depend on a specific model or provider. It acts as a voice interface layer for any CLI-based LLM.


Use Cases

  • Talk to Codex while coding
  • Hands-free debugging
  • Accessibility support
  • Brainstorming with an LLM
  • Dictating code or documentation
  • Faster prompt iteration

Configuration (Optional)

You may modify:

  • Whisper model size
  • Input device selection
  • Injection delay
  • Silence threshold

(Details can be documented later in /docs.)


Security & Privacy

talk2agent can run fully locally.

Audio is processed on your machine when using local Whisper models. No voice data is sent to external servers unless your AI agent itself forwards the prompts to a remote API.


Roadmap

  • Push-to-talk mode
  • Wake-word activation
  • Streaming partial transcription
  • Multi-agent routing
  • VSCode extension
  • More robust paste strategies for different terminals

Contributing

PRs and ideas are welcome.

If you find a bug or want a feature, please open an issue.


License

MIT. See LICENSE.