Skip to content

ServeurpersoCom/omnivoice.cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

omnivoice.cpp

Local AI text-to-speech with voice cloning and voice design, powered by GGML. C++17 port of OmniVoice (k2-fsa/OmniVoice). 646 languages, 24 kHz mono output, runs on CPU, CUDA, ROCm, Metal, Vulkan.

Features

  • Voice cloning from a reference WAV plus its transcript
  • Voice design via attribute keywords (gender, age, pitch, style, volume, emotion)
  • Auto voice with consistent speaker identity across long inputs
  • Long-form synthesis with punctuation-aware text chunking, voice prompt promotion, cross-fade and pydub-strict silence removal
  • Bit deterministic generation in greedy mode, seedable Philox PRNG for stochastic sampling
  • Q8_0 quantisation of the 612 M parameter Qwen3 backbone
  • Two CLI tools : omnivoice-tts (text -> WAV) and omnivoice-codec (WAV <-> RVQ codes)

Build

git clone --recurse-submodules https://github.com/ServeurpersoCom/omnivoice.cpp.git
cd omnivoice.cpp
./buildcuda.sh        # NVIDIA GPU
./buildvulkan.sh      # AMD/Intel GPU (Vulkan)
./buildcpu.sh         # CPU only
./buildall.sh         # all backends, runtime DL loading

Model conversion

Pre-converted GGUFs are available on Hugging Face :

https://huggingface.co/Serveurperso/OmniVoice-GGUF

Drop them in models/ and skip to the quick start. To convert from the original checkpoint :

./checkpoints.sh      # hf download k2-fsa/OmniVoice -> checkpoints/
./convert.py          # 2 GGUFs in BF16 -> models/
./quantize.sh         # base LM Q8_0 (tokenizer stays at native dtype)

Quick start

echo "Hello world." | ./build/omnivoice-tts \
    --model models/omnivoice-base-Q8_0.gguf \
    --codec models/omnivoice-tokenizer-F32.gguf \
    --lang English -o hello.wav

Voice cloning :

./build/omnivoice-tts \
    --model models/omnivoice-base-Q8_0.gguf \
    --codec models/omnivoice-tokenizer-F32.gguf \
    --ref-wav ref.wav --ref-text ref.txt \
    --lang French -o out.wav < prompt.txt

See docs/ARCHITECTURE.md for the model, the GGUF layout, the inference pipeline, every CLI flag, and the validation results.

License

MIT. See LICENSE.

Upstream model : OmniVoice by Xiaomi / k2-fsa, Apache 2.0. Audio codec : Higgs Audio v2 (bosonai/higgs-audio-v2-tokenizer), Apache 2.0.

About

Local AI text-to-speech with voice cloning and voice design, powered by GGML. C++17 port of OmniVoice (k2-fsa/OmniVoice). 646 languages, 24 kHz mono output, runs on CPU, CUDA, ROCm, Metal, Vulkan.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages