omnivoice.cpp

Local AI text-to-speech with voice cloning and voice design, powered by GGML. C++17 port of OmniVoice (k2-fsa/OmniVoice). 646 languages, 24 kHz mono output, runs on CPU, CUDA, ROCm, Metal, Vulkan.

Features

Voice cloning from a reference WAV plus its transcript
Voice design via attribute keywords (gender, age, pitch, style, volume, emotion)
Auto voice with consistent speaker identity across long inputs
Long-form synthesis with punctuation-aware text chunking, voice prompt promotion, cross-fade and pydub-strict silence removal
Bit deterministic generation in greedy mode, seedable Philox PRNG for stochastic sampling
Q8_0 quantisation of the 612 M parameter Qwen3 backbone
Two CLI tools : omnivoice-tts (text -> WAV) and omnivoice-codec (WAV <-> RVQ codes)

Build

git clone --recurse-submodules https://github.com/ServeurpersoCom/omnivoice.cpp.git
cd omnivoice.cpp
./buildcuda.sh        # NVIDIA GPU
./buildvulkan.sh      # AMD/Intel GPU (Vulkan)
./buildcpu.sh         # CPU only
./buildall.sh         # all backends, runtime DL loading

Model conversion

Pre-converted GGUFs are available on Hugging Face :

https://huggingface.co/Serveurperso/OmniVoice-GGUF

Drop them in models/ and skip to the quick start. To convert from the original checkpoint :

./checkpoints.sh      # hf download k2-fsa/OmniVoice -> checkpoints/
./convert.py          # 2 GGUFs in BF16 -> models/
./quantize.sh         # base LM Q8_0 (tokenizer stays at native dtype)

Quick start

echo "Hello world." | ./build/omnivoice-tts \
    --model models/omnivoice-base-Q8_0.gguf \
    --codec models/omnivoice-tokenizer-F32.gguf \
    --lang English -o hello.wav

Voice cloning :

./build/omnivoice-tts \
    --model models/omnivoice-base-Q8_0.gguf \
    --codec models/omnivoice-tokenizer-F32.gguf \
    --ref-wav ref.wav --ref-text ref.txt \
    --lang French -o out.wav < prompt.txt

See docs/ARCHITECTURE.md for the model, the GGUF layout, the inference pipeline, every CLI flag, and the validation results.

License

MIT. See LICENSE.

Upstream model : OmniVoice by Xiaomi / k2-fsa, Apache 2.0. Audio codec : Higgs Audio v2 (bosonai/higgs-audio-v2-tokenizer), Apache 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
docs		docs
examples		examples
ggml @ 22b00a5		ggml @ 22b00a5
src		src
tests		tests
tools		tools
.clang-format		.clang-format
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
buildall.cmd		buildall.cmd
buildall.sh		buildall.sh
buildcpu.sh		buildcpu.sh
buildcuda.cmd		buildcuda.cmd
buildcuda.sh		buildcuda.sh
buildtermux.sh		buildtermux.sh
buildvulkan.cmd		buildvulkan.cmd
buildvulkan.sh		buildvulkan.sh
checkpoints.sh		checkpoints.sh
convert.py		convert.py
format.sh		format.sh
models.sh		models.sh
quantize.sh		quantize.sh
update.cmd		update.cmd
update.sh		update.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

omnivoice.cpp

Features

Build

Model conversion

Quick start

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

omnivoice.cpp

Features

Build

Model conversion

Quick start

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages