GitHub - DavidValin/vtmate: Powerful ai toolkit to interact with ai models via voice from your terminal at extremely low latency with realistic voices. Allows live voice conversations and run debates between ai agents with user intervention, stdin and text file inputs and more

vtmate

The final AI voice conversational system all running in your terminal! vtmate is a Powerful terminal-based voice ai toolkit with many realistic voices, extremely low latency, 28 languages supported. Allows you to voice conversate with local ai models, pipe data and save into files.

The program self contains (1.2GB) all TTS models and voices and necessary files to recognize speech and speak with voice with no external intallations ensuring maximum portability.

⬇️ Download (⭐ MacOS ⭐ Linux and ⭐ Windows supported)
🤠 Quicksheet (PDF) (🖨️ print ready for easy access)
🎥 Video Overview

Video demonstration

(🇬🇧 English) Conversation mode demo

en.-.demo.-.conversation.mp4

(🇬🇧 English) Debate mode demo

en.-.demo.-.debate.mp4

(🇬🇧 English) Reading mode demo

en.-.demo.-.reading.mode.mp4

Sponsor this project

Features

📌 Continuous Voice chat (live conversation): records user continuously and stops on silence, submitting the request to the agent
📌 Push to Talk mode (PTT): keep <SPACE> pressed while talking and release to stop recording
🚀 AI agents debate (2 agents talking to each other): give an initial input and let the agents talk to each other. You can interrupt in the middle of the debate changing the subject
📌 Realtime agent swap: change the agent by pressing <ARROW_LEFT> / <ARROW_RIGHT> (applicable to next response)
📌 Voice interrupt: the agent stops talking if you interrupt via voice
📌 Recording Pause / Resume: toggle "<SPACE>" key to pause / resume voice recording only
📌 Stop PlayBack: press "<ESCAPE>" ONCE to stop the playback for the current response
📌 Interrupt: press "<ESCAPE>" TWICE to interrupt the current response alltogether
📌 Voice speed change: change the agent voice speed by pressing <ARROW_UP> / <ARROW_DOWN> (applicable to next response)
📌 Voice read a txt file: vtmate -r myfile.txt
📌 Voice read text from stdin phrase by phrase: echo "Hello. How are you?" | vtmate -r -
📌 Save conversation as audio and text: vtmate -s
📌 Load separate settings file with different agents: vtmate -c philosophers-settings.txt
📌 Integrated whisper
📌 Integrated kokoro TTS system
📌 Interface with OpenTTS system
📌 Supports ollama or llama-server
📌 28 languages supported (vtmate --list-voices)
📌 Use any gguf model from huggingface.com or ollama models (small models reply faster)

How it works

- You start the program and start talking
- Once audio is detected (based on sound-threshold-peak option) it will start recording
- As soon as there is a time of silence (based on end_silence_ms option), it will transcribe the recorded audio using speech to text system (whisper). In ptt mode, this option is ignored, the program will wait for SPACE key to be released to submit the audio
- The transcribed text will be sent to the ai model
- The ai model will reply with text
- The text converted to audio using text to speech system
- You can interrupt the ai agent at any moment by start speaking, this will cause the response and audio to stop and you can continue talking.
- In debate mode, the agents reply to each other automatically, playing the audio in each turn

LLM integration

✅ ollama (default)
✅ llama-server

You can run the models locally (by default) or remotely by configuring the base urls via cli option.

TTS engine support

✅ Kokoro (integrated)
✅ Supersonic 2 (integrated)
✅ OpenTTS (requires external docker service)

Installation

📌 1. Download vtmate

https://github.com/DavidValin/vtmate/releases
Move the binary to a folder in your $PATH so you can use vtmate command anywhere

📌 2. Install llm engine (needed for ai responses)

Option A- ollama (the default)

Install https://ollama.com/download.
Pull the model you want to use with vtmate, for instance: ollama pull llama3.2:3b.

Option B- llama-server support.

Install llama.cpp: https://github.com/ggml-org/llama.cpp.
Download a gguf model: https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q8_0.gguf?download=true.

📌 3. (Windows only) Install supported terminal

Install Windows Terminal (which supports emojis): https://apps.microsoft.com/detail/9n0dx20hk701 (use this terminal to run vtmate)

📌 4. (Optional) OpenTTS support

docker pull synesthesiam/opentts:all

Configure agents

The first time you run vtmate it will create a configuration file if it doesn't exist in ~/.vtmate/settings with 2 agents. You can define as many agents as you want.

Example of agent definition:

[agent]
name = explainer
language = en
tts = supersonic2
voice = F1
voice_speed = 1.1
provider = ollama
baseurl = http://127.0.0.1:11434
model = llama3.2:3b
system_prompt = "You are a helpful AI assistant. Your only funcion is to explain things as simple as possible in no more than 150 words or 450 words if the user asks for a longer explanation."
sound_threshold_peak = 0.12
end_silence_ms = 2500
ptt = true
whisper_model_path = ~/.whisper-models/ggml-tiny.bin

By default all agents are set in PTT mode, you have to keep SPACE pressed to talk. If you want to use LIVE mode, make sure you adjust your microphone levels correctly and adjust sound_threshold_peak and end_silence_ms settings to your need
⚠️ Currently you cannot mix kokoro and supersonic tts systems (pick one).
Voice mixing is supported for kokoro TTS system only, you can create a voice by mixing 2 kokoro voices by percentage. Example mixing 50% of bm_daniel and 50% of am_puck: set voice name to bm_daniel.5+am_puck.5

To see explanation of each field:

vtmate --help

How to use it

The first agent defined in ~/vtmate/settings will always be selected agent when running vtmate, unless -a <agent_name> is used.

Before running vtmate make sure ollama is running: ollama serve. Optionally, if you want to use llama.cpp make sure llama-server is running.

All cli options:

  -a <agent_name>                       set a specific initial agent
  -p <prompt>                           initialize with a text prompt
  -q                                    quiet mode: produces a single response and exit (requires `-p` or `-i`)
  -i <file.txt>                         initialize with a file prompt
  -i -                                  initialize with prompt from STDIN (runs in quiet mode)
  -s                                    save the conversation to text and audio file in ~/.vtmate/conversations or ~/.vtmate/read-files
  --debate <AGENT1> <AGENT2> [SUBJECT]  initialize a debate between 2 agents with an initial prompt
  --debate <AGENT1> <AGENT2> -i <FILE>  initialize a debate between 2 agents with an initial prompt from file
  --debate <AGENT1> <AGENT2> -i –       initialize a debate between 2 agents with an initial prompt from STDIN
  -r <file.txt>                         read a file with voice, phrase by phrase (no llm involved)
  -r -                                  read text from STDIN with voice, phrase by phrase (no llm involved). Use - for STDIN (runs in quiet mode)
  -c <settings_file>                    use a specific settings file
  --list-voices                         list all voices for all languages and tts systems
  --ptt <true/false>                    override for this session the ptt setting for all agents independently of its settings
  --verbose                             run the program in verbose mode
  --version                             print the vtmate installed version
  --help                                show help

Conversation mode

Start conversation with default agent and save it as audio and text (waits for user voice input and respond)

vtmate -s

Start conversation with a specific agent (waits for user voice input and respond)

vtmate -a "main agent"

Start conversation with an initial text prompt

vtmate -p "Are we alone in the galaxy?"

Start conversation with an initial prompt from file

vtmate -i myprompt.txt

Get a single response from STDIN text and exit

echo "How to fly without wings?" | vtmate -i -

When running in LIVE mode just talk. You can also pause/resume recording by pressing SPACE once
When running in PTT mode: keep SPACE pushed while talking, and then release
You can switch agents in realtime by pressing ARROW_LEFT / ARROW_RIGHT keyword arrows (you need at least 2 agents defined in ~/vtmate/settings).
You can change the voice speed by pressing ARROW_UP / ARROW_DOWN
Be able to save the conversation in a wav and text file by adding -s option. It will save it in ~/.vtmate/conversations folder

Debate mode

Initialize a debate between two agents and be able to participate in the debate by speaking at any time. To create a good debate adjust the system prompts of each agent and give a detailed initial input. In debate mode is good idea to set --ptt <true/false> option so that the ptt value is not switched on each agent turn.

Start a debate with an initial subject (with forced ptt mode)

vtmate --debate "God" "Devil" "How to succeed in life?" --ptt true

Start a debate with an initial prompt from file (with forced live mode)

vtmate --debate "God" "Devil" -i myprompt.txt  --ptt false

Start a debate with an initial file prompt (with forced ptt mode)

cat "Lets discuss the permissions of this files: \n\n $(ls -la)" > prompt.txt
vtmate --debate "Unix administrator" "Security Expert" -i prompt.txt --ptt true

When running in LIVE mode just talk. You can also pause/resume recording by pressing SPACE once
When running in PTT mode: keep SPACE pushed while talking, and then release
You can also start/stop a debate from conversation mode by pressing Control+D and picking the debate agents.
Be able to save the conversation in a wav and text file by adding -s option. It will save it in ~/.vtmate/conversations folder
Here is an example on how to create automated audio debates from youtube videos using vtmate in combination with other tools

Quiet mode

This mode process a text input, responds (text and audio) and exits

Get a single response from prompt

vtmate -q -p "Explain me the Zettelkasten Method"

Get a single response from prompt from file

vtmate -q -i myprompt.txt

Get a single response from prompt from STDIN and exit

echo "Is $(date) a national holiday day in Spain?" | vtmate -q -i -

Get a single response and save it as audio file and text file

echo "Can you find any suspicious processes in the next list? If so, why?\n\n $(ps aux | head -20)" | vtmate -q -i - -s

Read mode (file to speech)

Read a text file or STDIN text phrase by phrase using an agent voice. Ensure the agent you choose has correct language and voice for your text. In this mode, only the next agent settings are used: "tts", "voice" and "language".

read from a txt file (and save it in ~/.vtmate/read-files)

vtmate -r myfile.txt -a reader

read from STDIN text, get a response and exit

echo "First phrase. Second phrase" | vtmate -r -

In this mode you can:

Move to previous phrase by pressing ARROW_UP
Move to next phrase by pressing ARROW_DOWN
Stop / Resume playback by pressing SPACE

Separate agents

By default vtmate uses ~/.vtmate/settings file. You can create different setting fields for different agent groups, example:

philosophers.txt
scientists.txt
employees.txt

And then load each as you need:

vtmate -c philosophers.txt --debate "Aristoteles" "Ptahhotep" "how to achieve harmony?"

Model files

vtmate self contains (no need for manual installation) espeak-ng-data, the whisper tiny & small models, kokoro model and voices and supersonic2 model and voices which will be autoextracted from the binary when running vtmate if they are not found in next locations:

whisper models:

- `~/.whisper-models/ggml-tiny.bin`
- `~/.whisper-models/ggml-small.bin`

kokoro model files:

~/.cache/k/0.onnx
~/.cache/k/0.bin

espeak phonemes (used by kokoro):

- `~/.vtmate/espeak-ng-data.tar.gz`

supersonic2 files:

~/.vtmate/tts/supersonic2-model/onnx/duration_predictor.onnx
~/.vtmate/tts/supersonic2-model/onnx/text_encoder.onnx
~/.vtmate/tts/supersonic2-model/onnx/tts.json
~/.vtmate/tts/supersonic2-model/onnx/unicode_indexer.json
~/.vtmate/tts/supersonic2-model/onnx/vector_estimator.onnx
~/.vtmate/tts/supersonic2-model/onnx/vocoder.onnx
~/.vtmate/tts/supersonic2-model/voice_styles/M1.json
~/.vtmate/tts/supersonic2-model/voice_styles/M2.json
~/.vtmate/tts/supersonic2-model/voice_styles/M3.json
~/.vtmate/tts/supersonic2-model/voice_styles/M4.json
~/.vtmate/tts/supersonic2-model/voice_styles/M5.json
~/.vtmate/tts/supersonic2-model/voice_styles/F1.json
~/.vtmate/tts/supersonic2-model/voice_styles/F2.json
~/.vtmate/tts/supersonic2-model/voice_styles/F3.json
~/.vtmate/tts/supersonic2-model/voice_styles/F4.json
~/.vtmate/tts/supersonic2-model/voice_styles/F5.json

If you want to avoid sound interruptions you can use ptt mode or increase the sound_threshold_peak for your microphone levels.
If you want to use OpenTTS, start the docker service first: docker run --rm --platform=linux/amd64 -p 5500:5500 synesthesiam/opentts:all (it will pull the image the first time). Adjust the platform as needed depending on your hardware.
If you have problems starting vtmate you can remove ~/vtmate/settings so it recreates the default configuration
By default whisper tiny is used (from ~/.whisper-models/ggml-small.bin). If you need better speech recognition, download a better whisper model and update the whisper_model_path setting.

If you need help:

vtmate --help

Language support

ID	Language	Support	TTS supported	Number of voices
en	🇬🇧 English	🏆 Best support	✅ SS2 ✅ Kokoro ✅ OpenTTS	> 38 voices
es	🇪🇸 Spanish	🏆 Best support	✅ SS2 ✅ Kokoro ✅ OpenTTS	> 14 voices
fr	🇫🇷 French	🏆 Best support	✅ SS2 ✅ Kokoro ✅ OpenTTS	> 12 voices
zh	🇨🇳 Mandarin Chinese	🥈 Good support	❌ SS2 ✅ Kokoro ✅ OpenTTS	> 9 voices
ja	🇯🇵 Japanese	🥈 Good support	❌ SS2 ✅ Kokoro ✅ OpenTTS	> 6 voices
pt	🇵🇹 Portuguese	🥈 Good support	✅ SS2 ✅ Kokoro ❌ OpenTTS	> 13 voices
ko	🇰🇷 Korean	🥈 Good support	✅ SS2 ❌ Kokoro ✅ OpenTTS	11 voices
it	🇮🇹 Italian	🥈 Good support	❌ SS2 ✅ Kokoro ✅ OpenTTS	> 3 voices
hi	🇮🇳 Hindi	🥈 Good support	❌ SS2 ✅ Kokoro ✅ OpenTTS	> 4 voices
ar	🇸🇦 Arabic	Supported	❌ SS2 ❌ Kokoro ✅ OpenTTS	1 voice
bn	🇧🇩 Bengali	Supported	❌ SS2 ❌ Kokoro ✅ OpenTTS	1 voice
ca	🇪🇸 Catalan	Supported	❌ SS2 ❌ Kokoro ✅ OpenTTS	1 voice
cs	🇨🇿 Czech	Supported	❌ SS2 ❌ Kokoro ✅ OpenTTS	1 voice
de	🇩🇪 German	Supported	❌ SS2 ❌ Kokoro ✅ OpenTTS	1 voice
el	🇬🇷 Greek	Supported	❌ SS2 ❌ Kokoro ✅ OpenTTS	1 voice
fi	🇫🇮 Finnish	Supported	❌ SS2 ❌ Kokoro ✅ OpenTTS	1 voice
gu	🇮🇳 Gujarati	Supported	❌ SS2 ❌ Kokoro ✅ OpenTTS	1 voice
hu	🇭🇺 Hungarian	Supported	❌ SS2 ❌ Kokoro ✅ OpenTTS	1 voice
kn	🇮🇳 Kannada	Supported	❌ SS2 ❌ Kokoro ✅ OpenTTS	1 voice
mr	🇮🇳 Marathi	Supported	❌ SS2 ❌ Kokoro ✅ OpenTTS	1 voice
nl	🇳🇱 Dutch	Supported	❌ SS2 ❌ Kokoro ✅ OpenTTS	1 voice
pa	🇮🇳 Punjabi	Supported	❌ SS2 ❌ Kokoro ✅ OpenTTS	1 voice
ru	🇷🇺 Russian	Supported	❌ SS2 ❌ Kokoro ✅ OpenTTS	1 voice
sv	🇸🇪 Swedish	Supported	❌ SS2 ❌ Kokoro ✅ OpenTTS	1 voice
sw	🇰🇪 Swahili	Supported	❌ SS2 ❌ Kokoro ✅ OpenTTS	1 voice
ta	🇮🇳 Tamil	Supported	❌ SS2 ❌ Kokoro ✅ OpenTTS	1 voice
te	🇮🇳 Telugu	Supported	❌ SS2 ❌ Kokoro ✅ OpenTTS	1 voice
tr	🇹🇷 Turkish	Supported	❌ SS2 ❌ Kokoro ✅ OpenTTS	1 voice

Acceleration support

Do you have GPU? (nvidia? an apple computer?) Great! then vtmate speed is at lighting speed =)

To be able to use acceleration, pick the built version for your hardware from Releases list
For CUDA install CUDA Toolkit. For Vulkan install VULKAN SDK

macOS:            ✅ CPU    ✅ Metal
Linux (amd64):    ✅ CPU    ✅ CUDA     ⚠️ Vulkan
Linux (arm64):    ✅ CPU    ⚠️ CUDA     ❌ Vulkan
Windows (x86_64)  ✅ CPU    ⚠️ CUDA     ⚠️ Vulkan
Windows (arm64)   ❌ CPU    ❌ CUDA     ❌ Vulkan

⚠️ Currently working on full static builds for all OS with Openblas + CUDA + Vulkan support. In the meantime, pick a release available from Releases list or build one yourself.

Build vtmate from source code

Simplest way:

cargo install vtmate

From git repository:

git clone https://github.com/DavidValin/vtmate
cargo build --release

Full configurable builds (OS, arch and gpu acceleration)

see:

build_linux.sh
build_macos.sh
build_windows.sh

Have fun o:)

Name		Name	Last commit message	Last commit date
Latest commit History 285 Commits
assets		assets
docs		docs
spec		spec
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
LICENSE.commercial		LICENSE.commercial
LICENSE.noncommercial		LICENSE.noncommercial
README.md		README.md
build.rs		build.rs
build_linux.sh		build_linux.sh
build_macos.sh		build_macos.sh
build_windows.bat		build_windows.bat
installer.sh		installer.sh
preview.png		preview.png
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vtmate

Video demonstration

Sponsor this project

Features

How it works

LLM integration

TTS engine support

Installation

📌 1. Download vtmate

📌 2. Install llm engine (needed for ai responses)

📌 3. (Windows only) Install supported terminal

📌 4. (Optional) OpenTTS support

Configure agents

How to use it

Conversation mode

Debate mode

Quiet mode

Read mode (file to speech)

Separate agents

Model files

Language support

Acceleration support

Build vtmate from source code

About

Licenses found

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vtmate

Video demonstration

Sponsor this project

Features

How it works

LLM integration

TTS engine support

Installation

📌 1. Download vtmate

📌 2. Install llm engine (needed for ai responses)

📌 3. (Windows only) Install supported terminal

📌 4. (Optional) OpenTTS support

Configure agents

How to use it

Conversation mode

Debate mode

Quiet mode

Read mode (file to speech)

Separate agents

Model files

Language support

Acceleration support

Build vtmate from source code

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages