The final AI voice conversational system all running in your terminal! vtmate is a Powerful terminal-based voice ai toolkit with many realistic voices, extremely low latency, 28 languages supported. Allows you to voice conversate with local ai models, pipe data and save into files.
The program self contains (1.2GB) all TTS models and voices and necessary files to recognize speech and speak with voice with no external intallations ensuring maximum portability.
- ⬇️ Download (⭐ MacOS ⭐ Linux and ⭐ Windows supported)
- 🤠 Quicksheet (PDF) (🖨️ print ready for easy access)
- 🎥 Video Overview
(🇬🇧 English) Conversation mode demo
en.-.demo.-.conversation.mp4
(🇬🇧 English) Debate mode demo
en.-.demo.-.debate.mp4
(🇬🇧 English) Reading mode demo
en.-.demo.-.reading.mode.mp4
- 📌 Continuous Voice chat (live conversation):
records user continuously and stops on silence, submitting the request to the agent - 📌 Push to Talk mode (PTT):
keep <SPACE> pressed while talking and release to stop recording - 🚀 AI agents debate (2 agents talking to each other):
give an initial input and let the agents talk to each other. You can interrupt in the middle of the debate changing the subject - 📌 Realtime agent swap:
change the agent by pressing <ARROW_LEFT> / <ARROW_RIGHT> (applicable to next response) - 📌 Voice interrupt:
the agent stops talking if you interrupt via voice - 📌 Recording Pause / Resume:
toggle "<SPACE>" key to pause / resume voice recording only - 📌 Stop PlayBack:
press "<ESCAPE>" ONCE to stop the playback for the current response - 📌 Interrupt:
press "<ESCAPE>" TWICE to interrupt the current response alltogether - 📌 Voice speed change:
change the agent voice speed by pressing <ARROW_UP> / <ARROW_DOWN> (applicable to next response) - 📌 Voice read a txt file:
vtmate -r myfile.txt - 📌 Voice read text from stdin phrase by phrase:
echo "Hello. How are you?" | vtmate -r - - 📌 Save conversation as audio and text:
vtmate -s - 📌 Load separate settings file with different agents:
vtmate -c philosophers-settings.txt - 📌 Integrated
whisper - 📌 Integrated
kokoro TTSsystem - 📌 Interface with
OpenTTSsystem - 📌 Supports
ollamaorllama-server - 📌 28 languages supported (
vtmate --list-voices) - 📌 Use any gguf model from huggingface.com or ollama models (small models reply faster)
- You start the program and start talking
- Once audio is detected (based on sound-threshold-peak option) it will start recording
- As soon as there is a time of silence (based on end_silence_ms option), it will transcribe the recorded audio using speech to text system (whisper). In ptt mode, this option is ignored, the program will wait for SPACE key to be released to submit the audio
- The transcribed text will be sent to the ai model
- The ai model will reply with text
- The text converted to audio using text to speech system
- You can interrupt the ai agent at any moment by start speaking, this will cause the response and audio to stop and you can continue talking.
- In debate mode, the agents reply to each other automatically, playing the audio in each turn
- ✅ ollama (default)
- ✅ llama-server
You can run the models locally (by default) or remotely by configuring the base urls via cli option.
- ✅ Kokoro (integrated)
- ✅ Supersonic 2 (integrated)
- ✅ OpenTTS (requires external docker service)
https://github.com/DavidValin/vtmate/releases- Move the binary to a folder in your $PATH so you can use
vtmatecommand anywhere
Option A- ollama (the default)
- Install
https://ollama.com/download. - Pull the model you want to use with vtmate, for instance:
ollama pull llama3.2:3b.
Option B- llama-server support.
- Install llama.cpp:
https://github.com/ggml-org/llama.cpp. - Download a gguf model:
https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q8_0.gguf?download=true.
- Install Windows Terminal (which supports emojis):
https://apps.microsoft.com/detail/9n0dx20hk701(use this terminal to run vtmate)
docker pull synesthesiam/opentts:all
The first time you run vtmate it will create a configuration file if it doesn't exist in ~/.vtmate/settings with 2 agents. You can define as many agents as you want.
Example of agent definition:
[agent]
name = explainer
language = en
tts = supersonic2
voice = F1
voice_speed = 1.1
provider = ollama
baseurl = http://127.0.0.1:11434
model = llama3.2:3b
system_prompt = "You are a helpful AI assistant. Your only funcion is to explain things as simple as possible in no more than 150 words or 450 words if the user asks for a longer explanation."
sound_threshold_peak = 0.12
end_silence_ms = 2500
ptt = true
whisper_model_path = ~/.whisper-models/ggml-tiny.bin
- By default all agents are set in
PTTmode, you have to keepSPACEpressed to talk. If you want to useLIVEmode, make sure you adjust your microphone levels correctly and adjustsound_threshold_peakandend_silence_mssettings to your need ⚠️ Currently you cannot mix kokoro and supersonic tts systems (pick one).- Voice mixing is supported for kokoro TTS system only, you can create a voice by mixing 2 kokoro voices by percentage. Example mixing 50% of bm_daniel and 50% of am_puck: set voice name to
bm_daniel.5+am_puck.5
To see explanation of each field:
vtmate --help
The first agent defined in ~/vtmate/settings will always be selected agent when running vtmate, unless -a <agent_name> is used.
Before running vtmate make sure ollama is running: ollama serve.
Optionally, if you want to use llama.cpp make sure llama-server is running.
All cli options:
-a <agent_name> set a specific initial agent
-p <prompt> initialize with a text prompt
-q quiet mode: produces a single response and exit (requires `-p` or `-i`)
-i <file.txt> initialize with a file prompt
-i - initialize with prompt from STDIN (runs in quiet mode)
-s save the conversation to text and audio file in ~/.vtmate/conversations or ~/.vtmate/read-files
--debate <AGENT1> <AGENT2> [SUBJECT] initialize a debate between 2 agents with an initial prompt
--debate <AGENT1> <AGENT2> -i <FILE> initialize a debate between 2 agents with an initial prompt from file
--debate <AGENT1> <AGENT2> -i – initialize a debate between 2 agents with an initial prompt from STDIN
-r <file.txt> read a file with voice, phrase by phrase (no llm involved)
-r - read text from STDIN with voice, phrase by phrase (no llm involved). Use - for STDIN (runs in quiet mode)
-c <settings_file> use a specific settings file
--list-voices list all voices for all languages and tts systems
--ptt <true/false> override for this session the ptt setting for all agents independently of its settings
--verbose run the program in verbose mode
--version print the vtmate installed version
--help show help
Start conversation with default agent and save it as audio and text (waits for user voice input and respond)
vtmate -s
Start conversation with a specific agent (waits for user voice input and respond)
vtmate -a "main agent"
Start conversation with an initial text prompt
vtmate -p "Are we alone in the galaxy?"
Start conversation with an initial prompt from file
vtmate -i myprompt.txt
Get a single response from STDIN text and exit
echo "How to fly without wings?" | vtmate -i -
- When running in LIVE mode just talk. You can also pause/resume recording by pressing
SPACEonce - When running in PTT mode: keep
SPACEpushed while talking, and then release - You can switch agents in realtime by pressing
ARROW_LEFT/ARROW_RIGHTkeyword arrows (you need at least 2 agents defined in~/vtmate/settings). - You can change the voice speed by pressing
ARROW_UP/ARROW_DOWN - Be able to save the conversation in a wav and text file by adding
-soption. It will save it in~/.vtmate/conversationsfolder
Initialize a debate between two agents and be able to participate in the debate by speaking at any time. To create a good debate adjust the system prompts of each agent and give a detailed initial input.
In debate mode is good idea to set --ptt <true/false> option so that the ptt value is not switched on each agent turn.
Start a debate with an initial subject (with forced ptt mode)
vtmate --debate "God" "Devil" "How to succeed in life?" --ptt true
Start a debate with an initial prompt from file (with forced live mode)
vtmate --debate "God" "Devil" -i myprompt.txt --ptt false
Start a debate with an initial file prompt (with forced ptt mode)
cat "Lets discuss the permissions of this files: \n\n $(ls -la)" > prompt.txt
vtmate --debate "Unix administrator" "Security Expert" -i prompt.txt --ptt true
- When running in LIVE mode just talk. You can also pause/resume recording by pressing
SPACEonce - When running in PTT mode: keep
SPACEpushed while talking, and then release - You can also start/stop a debate from conversation mode by pressing
Control+Dand picking the debate agents. - Be able to save the conversation in a wav and text file by adding
-soption. It will save it in~/.vtmate/conversationsfolder - Here is an example on how to create automated audio debates from youtube videos using vtmate in combination with other tools
This mode process a text input, responds (text and audio) and exits
Get a single response from prompt
vtmate -q -p "Explain me the Zettelkasten Method"
Get a single response from prompt from file
vtmate -q -i myprompt.txt
Get a single response from prompt from STDIN and exit
echo "Is $(date) a national holiday day in Spain?" | vtmate -q -i -
Get a single response and save it as audio file and text file
echo "Can you find any suspicious processes in the next list? If so, why?\n\n $(ps aux | head -20)" | vtmate -q -i - -s
Read a text file or STDIN text phrase by phrase using an agent voice. Ensure the agent you choose has correct language and voice for your text. In this mode, only the next agent settings are used: "tts", "voice" and "language".
read from a txt file (and save it in ~/.vtmate/read-files)
vtmate -r myfile.txt -a reader
read from STDIN text, get a response and exit
echo "First phrase. Second phrase" | vtmate -r -
In this mode you can:
- Move to previous phrase by pressing
ARROW_UP - Move to next phrase by pressing
ARROW_DOWN - Stop / Resume playback by pressing
SPACE
By default vtmate uses ~/.vtmate/settings file.
You can create different setting fields for different agent groups, example:
philosophers.txt
scientists.txt
employees.txt
And then load each as you need:
vtmate -c philosophers.txt --debate "Aristoteles" "Ptahhotep" "how to achieve harmony?"
vtmate self contains (no need for manual installation) espeak-ng-data, the whisper tiny & small models, kokoro model and voices and supersonic2 model and voices which will be autoextracted from the binary when running vtmate if they are not found in next locations:
whisper models:
- `~/.whisper-models/ggml-tiny.bin`
- `~/.whisper-models/ggml-small.bin`
kokoro model files:
~/.cache/k/0.onnx
~/.cache/k/0.bin
espeak phonemes (used by kokoro):
- `~/.vtmate/espeak-ng-data.tar.gz`
supersonic2 files:
~/.vtmate/tts/supersonic2-model/onnx/duration_predictor.onnx
~/.vtmate/tts/supersonic2-model/onnx/text_encoder.onnx
~/.vtmate/tts/supersonic2-model/onnx/tts.json
~/.vtmate/tts/supersonic2-model/onnx/unicode_indexer.json
~/.vtmate/tts/supersonic2-model/onnx/vector_estimator.onnx
~/.vtmate/tts/supersonic2-model/onnx/vocoder.onnx
~/.vtmate/tts/supersonic2-model/voice_styles/M1.json
~/.vtmate/tts/supersonic2-model/voice_styles/M2.json
~/.vtmate/tts/supersonic2-model/voice_styles/M3.json
~/.vtmate/tts/supersonic2-model/voice_styles/M4.json
~/.vtmate/tts/supersonic2-model/voice_styles/M5.json
~/.vtmate/tts/supersonic2-model/voice_styles/F1.json
~/.vtmate/tts/supersonic2-model/voice_styles/F2.json
~/.vtmate/tts/supersonic2-model/voice_styles/F3.json
~/.vtmate/tts/supersonic2-model/voice_styles/F4.json
~/.vtmate/tts/supersonic2-model/voice_styles/F5.json
- If you want to avoid sound interruptions you can use
pttmode or increase thesound_threshold_peakfor your microphone levels. - If you want to use OpenTTS, start the docker service first:
docker run --rm --platform=linux/amd64 -p 5500:5500 synesthesiam/opentts:all(it will pull the image the first time). Adjust the platform as needed depending on your hardware. - If you have problems starting vtmate you can remove
~/vtmate/settingsso it recreates the default configuration - By default whisper tiny is used (from ~/.whisper-models/ggml-small.bin). If you need better speech recognition, download a better whisper model and update the
whisper_model_pathsetting.
If you need help:
vtmate --help
| ID | Language | Support | TTS supported | Number of voices |
|---|---|---|---|---|
| en | 🇬🇧 English | 🏆 Best support | ✅ SS2 ✅ Kokoro ✅ OpenTTS | > 38 voices |
| es | 🇪🇸 Spanish | 🏆 Best support | ✅ SS2 ✅ Kokoro ✅ OpenTTS | > 14 voices |
| fr | 🇫🇷 French | 🏆 Best support | ✅ SS2 ✅ Kokoro ✅ OpenTTS | > 12 voices |
| zh | 🇨🇳 Mandarin Chinese | 🥈 Good support | ❌ SS2 ✅ Kokoro ✅ OpenTTS | > 9 voices |
| ja | 🇯🇵 Japanese | 🥈 Good support | ❌ SS2 ✅ Kokoro ✅ OpenTTS | > 6 voices |
| pt | 🇵🇹 Portuguese | 🥈 Good support | ✅ SS2 ✅ Kokoro ❌ OpenTTS | > 13 voices |
| ko | 🇰🇷 Korean | 🥈 Good support | ✅ SS2 ❌ Kokoro ✅ OpenTTS | 11 voices |
| it | 🇮🇹 Italian | 🥈 Good support | ❌ SS2 ✅ Kokoro ✅ OpenTTS | > 3 voices |
| hi | 🇮🇳 Hindi | 🥈 Good support | ❌ SS2 ✅ Kokoro ✅ OpenTTS | > 4 voices |
| ar | 🇸🇦 Arabic | Supported | ❌ SS2 ❌ Kokoro ✅ OpenTTS | 1 voice |
| bn | 🇧🇩 Bengali | Supported | ❌ SS2 ❌ Kokoro ✅ OpenTTS | 1 voice |
| ca | 🇪🇸 Catalan | Supported | ❌ SS2 ❌ Kokoro ✅ OpenTTS | 1 voice |
| cs | 🇨🇿 Czech | Supported | ❌ SS2 ❌ Kokoro ✅ OpenTTS | 1 voice |
| de | 🇩🇪 German | Supported | ❌ SS2 ❌ Kokoro ✅ OpenTTS | 1 voice |
| el | 🇬🇷 Greek | Supported | ❌ SS2 ❌ Kokoro ✅ OpenTTS | 1 voice |
| fi | 🇫🇮 Finnish | Supported | ❌ SS2 ❌ Kokoro ✅ OpenTTS | 1 voice |
| gu | 🇮🇳 Gujarati | Supported | ❌ SS2 ❌ Kokoro ✅ OpenTTS | 1 voice |
| hu | 🇭🇺 Hungarian | Supported | ❌ SS2 ❌ Kokoro ✅ OpenTTS | 1 voice |
| kn | 🇮🇳 Kannada | Supported | ❌ SS2 ❌ Kokoro ✅ OpenTTS | 1 voice |
| mr | 🇮🇳 Marathi | Supported | ❌ SS2 ❌ Kokoro ✅ OpenTTS | 1 voice |
| nl | 🇳🇱 Dutch | Supported | ❌ SS2 ❌ Kokoro ✅ OpenTTS | 1 voice |
| pa | 🇮🇳 Punjabi | Supported | ❌ SS2 ❌ Kokoro ✅ OpenTTS | 1 voice |
| ru | 🇷🇺 Russian | Supported | ❌ SS2 ❌ Kokoro ✅ OpenTTS | 1 voice |
| sv | 🇸🇪 Swedish | Supported | ❌ SS2 ❌ Kokoro ✅ OpenTTS | 1 voice |
| sw | 🇰🇪 Swahili | Supported | ❌ SS2 ❌ Kokoro ✅ OpenTTS | 1 voice |
| ta | 🇮🇳 Tamil | Supported | ❌ SS2 ❌ Kokoro ✅ OpenTTS | 1 voice |
| te | 🇮🇳 Telugu | Supported | ❌ SS2 ❌ Kokoro ✅ OpenTTS | 1 voice |
| tr | 🇹🇷 Turkish | Supported | ❌ SS2 ❌ Kokoro ✅ OpenTTS | 1 voice |
Do you have GPU? (nvidia? an apple computer?) Great! then vtmate speed is at lighting speed =)
- To be able to use acceleration, pick the built version for your hardware from Releases list
- For CUDA install CUDA Toolkit. For Vulkan install VULKAN SDK
macOS: ✅ CPU ✅ Metal
Linux (amd64): ✅ CPU ✅ CUDA ⚠️ Vulkan
Linux (arm64): ✅ CPU ⚠️ CUDA ❌ Vulkan
Windows (x86_64) ✅ CPU ⚠️ CUDA ⚠️ Vulkan
Windows (arm64) ❌ CPU ❌ CUDA ❌ Vulkan
Simplest way:
cargo install vtmate
From git repository:
git clone https://github.com/DavidValin/vtmate
cargo build --release
Full configurable builds (OS, arch and gpu acceleration)
see:
build_linux.sh
build_macos.sh
build_windows.sh
Have fun o:)




