VODER - Voice Blender

VODER is a Local, Free, Offline, professional-grade voice processing and transformation tool that enables seamless conversion between speech, text, and music. Built for creators, developers, and audio professionals, VODER delivers high-quality synthesis, voice cloning, and music generation capabilities through an intuitive interface.

🚀 Ready in Colab: Open VODER in Google Colab

🤖 For AI agents and automated tools: See Bots.md

Quick Start

Run from Source

# Clone the repository
git clone https://github.com/HAKORADev/VODER.git
cd VODER

# Install dependencies
pip install -r requirements.txt

# IMPORTANT: After installing requirements, upgrade protobuf to avoid compatibility issues
pip install --upgrade protobuf==5.29.6

# Launch GUI
python src/voder.py

# Or use CLI mode
python src/voder.py cli

Run in Google Colab

Open the link, connect to a runtime, and press Run All (or run cells one by one until the last one). Once execution completes, VODER is ready to use directly in your browser — no installation required.

Installation Requirements

# Install FFmpeg (required for audio processing)
# Windows: winget install FFmpeg
# macOS: brew install ffmpeg
# Linux: sudo apt install ffmpeg

Core Capabilities

🎤 6 Processing Modes

VODER offers six distinct voice processing modes, each designed for specific audio transformation needs:

Mode	Description	Input	Output
STT+TTS	Speech-to-Text then Text-to-Speech	Audio	Audio
TTS	Text-to-Speech with Voice Design	Text	Audio
TTS+VC	Text-to-Speech + Voice Cloning	Text + Reference	Audio
STS	Speech-to-Speech (Voice Conversion)	Audio + Reference	Audio
TTM	Text-to-Music Generation	Text	Audio
TTM+VC	Text-to-Music + Voice Conversion	Text + Reference	Audio

🎭 Dialogue System

VODER features a powerful row-based dialogue editor designed for creating multi-speaker audio content such as podcasts, AI news broadcasts, audiobooks, and conversational content. This system enables script-based generation where multiple characters speak with distinct voices in a cohesive narrative flow.

GUI Dialogue Input:

Each line is a separate row with Character and Dialogue fields.
New rows are added automatically when you fill the last row.
First row has no delete button; subsequent rows can be deleted individually.
Voice prompts or audio assignments appear dynamically for every character found in the script.

Optional Background Music:

When generating dialogue (TTS or TTS+VC mode), VODER can automatically add ambient background music that matches the length of the spoken audio.
A clean dialog appears before processing, asking: “Enter music description (or press Skip):”
If a description is provided (e.g., "soft piano, cinematic strings"), VODER:
- Generates music via ACE-Step using the description as style prompt and "..." as empty lyrics.
- Automatically fits the music duration to the exact length of the dialogue.
- Mixes the music at 35% volume relative to the dialogue.
- Cleans up temporary files and saves the final result with an _m suffix (e.g., voder_tts_dialogue_..._m.wav).
If the user skips, processing proceeds normally without music.

This feature is available in both GUI and CLI modes (interactive and one‑line). It is only triggered for dialogue scripts (i.e., more than one line, or a single line containing a colon).

Example Script (conceptual):

James: Welcome to our podcast! Today we'll explore AI advances.
Sarah: Thanks James! I'm excited to discuss my latest research.
James: Let's dive in. First, tell us about neural networks.

Key Features:

Multi-character script support with real-time character extraction
Individual voice prompts for each character (TTS mode)
Reference audio assignment per character via dropdown numbers (TTS+VC mode)
Optional background music – automatically generated, duration‑fitted, volume‑controlled
Automatic audio concatenation with proper pacing
Ideal for podcasts, news segments, interviews, and storytelling

The dialogue system is available in both TTS (Voice Design) and TTS+VC (Voice Cloning) modes, allowing you to create voices either through descriptive prompts or by cloning from real audio samples.

🔧 AI Model Integration

VODER leverages state-of-the-art open-source models for professional-grade audio processing:

Speech Recognition: openai/whisper — Whisper for accurate audio transcription
Voice Synthesis: QwenLM/Qwen3-TTS — Qwen3-TTS for natural text-to-speech
Voice Conversion: Plachtaa/seed-vc — Seed-VC for speech-to-speech transformation
Music Generation: ace-step/ACE-Step-1.5 — ACE-Step for lyrics-to-music synthesis

Usage Guide

GUI Mode

Launch: python src/voder.py
Select mode from dropdown (6 available modes)
Load input files based on mode:
- STT+TTS: Load base audio (content), then load target audio (voice)
- TTS: Enter dialogue row‑by‑row in the script area, and fill the automatically generated voice prompts for each character
  Optional: Before generation, a dialog will ask if you want background music; enter a description or press Skip.
- TTS+VC: Enter dialogue rows, load voice reference audio files (each assigned a number), then assign each character an audio number via dropdown
  Optional: The same background music dialog appears before generation.
- STS: Load base audio and target voice audio
- TTM: Enter lyrics and style prompt
- TTM+VC: Enter lyrics, style prompt, and load target voice audio
Click "Generate" (TTS/TTS+VC/TTM/TTM+VC) or "Patch" (STT+TTS/STS)
Listen to output and save results

CLI Mode (Interactive)

python src/voder.py cli

The interactive CLI now supports full dialogue creation:

Enter multiple lines (empty line to finish).
Lines without a colon → single mode (one text, one voice prompt/audio).
Lines with colon (Character: text) → dialogue mode.
VODER will ask for a voice prompt (TTS) or audio file path (TTS+VC) for each character, in order.
After collecting all voice prompts/assignments, you will be asked:
Add background music? (y/N):
If you answer y or yes, you can enter a music description.
Leaving the description blank or entering empty skips the music.

One-Line Commands

One‑line commands now support dialogue mode through multiple values per parameter, as well as the optional music parameter for background music.

Single mode examples:

python src/voder.py tts script "Hello world" voice "female, cheerful"
python src/voder.py tts+vc script "Hello" target "voice.wav"
python src/voder.py sts base "input.wav" target "voice.wav"
python src/voder.py ttm lyrics "Verse 1: ..." styling "upbeat pop" 30
python src/voder.py ttm+vc lyrics "..." styling "pop" duration 30 target "voice.wav"

Dialogue mode examples (TTS):

python src/voder.py tts script "James: Welcome to the show!" "Sarah: Glad to be here." voice "James: deep male voice, authoritative" "Sarah: bright female voice, energetic"
python src/voder.py tts script "James: Welcome to the show!" "Sarah: Glad to be here." voice "James: deep male voice, authoritative" "Sarah: bright female voice, energetic" music "soft piano, cinematic"

Dialogue mode examples (TTS+VC):

python src/voder.py tts+vc script "James: Let's start with AI." "Sarah: I've been working on this for years." target "James: /path/to/james_voice.wav" "Sarah: /path/to/sarah_voice.wav"
python src/voder.py tts+vc script "James: Let's start with AI." "Sarah: I've been working on this for years." target "James: /path/to/james_voice.wav" "Sarah: /path/to/sarah_voice.wav" music "ambient electronic, chill"

Note: STT+TTS mode is not available in one-line CLI because it requires interactive text editing. If the music parameter is supplied in single‑mode (plain text without colon), it is ignored with a warning.

System Requirements

Minimum Requirements

Component	Specification
CPU	4-6 cores
RAM	12GB+ system memory
GPU (CUDA)	Optional (CPU-only operation supported)
VRAM	4GB minimum (6GB recommended, 16GB for best performance)
Storage	SSD recommended

Note: VODER runs entirely on CPU. No GPU is required for any mode. However, having a GPU with sufficient VRAM can significantly improve processing speed for certain modes.

Recommended Requirements

VODER is designed to maximize output quality rather than speed. Meeting the minimum requirements ensures reliable operation — the focus is on achieving professional-grade audio results, not processing benchmarks. More RAM allows for longer audio generation and more complex workflows.

Technical Highlights

Unified Audio Pipeline: Six processing modes in a single interface eliminates the need for multiple tools
Intelligent Dialogue Editor: Row‑based script input with automatic character tracking and per‑voice assignment
State-of-the-Art Models: Production-quality models from leading AI research organizations
Voice Cloning: Extract and replicate voice characteristics from reference audio samples
Music Generation: Lyrics-to-music synthesis with style control and voice conversion
Cross-Modal Transformation: Speech-to-speech, text-to-speech, and speech-to-text conversions
Memory Optimisation: TTM+VC pipeline now releases GPU memory between stages to reduce VRAM usage
Background Music for Dialogue: Automatically generated, duration‑fitted, volume‑controlled ambient music – a unique enhancement for narrated content

Documentation

Guide.md — Detailed usage guide, technical implementation, and creative techniques
CHANGELOG.md — Development history and version changes
Bots.md — Guidelines for AI agents and automated systems

Version Information

Note: VODER does not maintain PyPI packages or pre-built binaries. Running from source ensures access to the most recent features and improvements.

Contributing

VODER is open-source (MIT License) and welcomes contributions:

New voice processing modes
Additional model integrations
UI/UX improvements
Performance optimizations
Documentation and translations
Bug reports and feature requests

Please submit pull requests or issues via GitHub.

License

MIT License — See LICENSE for full details.

Acknowledgments

Built with appreciation for the open-source AI voice synthesis community and the amazing models that power VODER.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VODER - Voice Blender

Quick Start

Run from Source

Run in Google Colab

Installation Requirements

Core Capabilities

🎤 6 Processing Modes

🎭 Dialogue System

🔧 AI Model Integration

Usage Guide

GUI Mode

CLI Mode (Interactive)

One-Line Commands

System Requirements

Minimum Requirements

Recommended Requirements

Technical Highlights

Documentation

Version Information

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 374 Commits
src		src
Bots.md		Bots.md
CHANGELOG.md		CHANGELOG.md
Guide.md		Guide.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

HAKORADev/VODER

Folders and files

Latest commit

History

Repository files navigation

VODER - Voice Blender

Quick Start

Run from Source

Run in Google Colab

Installation Requirements

Core Capabilities

🎤 6 Processing Modes

🎭 Dialogue System

🔧 AI Model Integration

Usage Guide

GUI Mode

CLI Mode (Interactive)

One-Line Commands

System Requirements

Minimum Requirements

Recommended Requirements

Technical Highlights

Documentation

Version Information

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages