🎙️ VoxCPM WebUI: Voice Cloning & Fine-Tuning

A comprehensive WebUI for fine-tuning and interacting with VoxCPM models. This application leverages Efficient LoRA (Low-Rank Adaptation) capabilities to enable high-quality voice cloning.

🔄 Application Workflow

Data Preparation: Upload/record audio and generate transcriptions using Faster-Whisper.
Fine-Tuning: Configure LoRA or Full Fine-Tuning parameters and train your voice adapter.
Inference: Generate speech using the base model combined with your trained weights.

Persistent Torch & Triton cache: Integration of triton-windows and a custom kernel caching system in models/.cache, enabling the full power of torch.compile for inference speed-up.
Building the persistent cache for the first time, might take a up to 5 minutes. This is a one time process. Once cached, Subsequent generations will be significantly faster.

⚙️ System Requirements & Hardware

💻 Software Dependencies

Python: 3.10 – 3.11 (Recommended for stability during training).
PyTorch: 2.5.0+
CUDA: 12.0+
Format Support: .wav is recommended.

🔌 Hardware Setup (VRAM Requirements)

Model	LoRA Training
VoxCPM 1.5 (750M)	~12 GB VRAM
VoxCPM 2.0 (2B)	~20 GB VRAM

📊 Dataset & Audio Specifications

🎯 Clip Requirements

Format: .wav is highly recommended. Other formats supported by torchaudio also work.
Duration: 3–30 seconds per clip is the "sweet spot."
- Warning: Clips < 1s produce unstable results.
- Warning: Very long clips increase VRAM usage and may be filtered by max_batch_tokens.
Sample Rate: The dataloader resamples automatically. Your config sample_rate must match the AudioVAE encoder input:
- VoxCPM 1.0: 16kHz
- VoxCPM 1.5: 44.1kHz
- VoxCPM 2.0: 16kHz (The encoder operates at 16kHz; the decoder outputs 48kHz).

✨ Preprocessing Tips

Trim Trailing Silence: Keep silence to < 0.5 seconds. Excessive trailing silence is the leading cause of "infinite generation" issues after fine-tuning.
Normalize Volume: Ensure consistent levels across all training samples.
Clean Transcripts: Text must match audio exactly. Inaccurate transcripts degrade both cloning quality and text adherence.
Remove Noise: The model is highly sensitive to background noise. Use clean, isolated voice recordings.

💻 Hardware Requirements

Inference (Running the model):
- Minimum: 8 GB VRAM
- Recommended: 12 GB VRAM
Training (LoRA):
- Minimum: +12 GB VRAM (VoxCPM 1.5)
- Recommended: +20 GB VRAM (VoxCPM 2.0)

Clone the repository:

git clone https://github.com/OpenBMB/VoxCPM.git

🛠️ Installation & Execution (Windows)

This project utilizes uv for lightning-fast dependency management.

Setup Steps

Run Installer: Double-click install.bat.
- This installs uv via Winget (if not present).
- Synchronizes the environment and installs all required libraries automatically.
Launch App: Double-click start.bat.
Access: Navigate to http://127.0.0.1:7860 in your web browser.

Inspired by FranckyB Voice Clone Studio

Based on VoxCPM2 by OpenBMB

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
conf		conf
examples		examples
scripts		scripts
src/voxcpm		src/voxcpm
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
install.bat		install.bat
pyproject.toml		pyproject.toml
start.bat		start.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ VoxCPM WebUI: Voice Cloning & Fine-Tuning

🔄 Application Workflow

⚙️ System Requirements & Hardware

💻 Software Dependencies

🔌 Hardware Setup (VRAM Requirements)

📊 Dataset & Audio Specifications

🎯 Clip Requirements

✨ Preprocessing Tips

💻 Hardware Requirements

Clone the repository:

🛠️ Installation & Execution (Windows)

Setup Steps

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

🎙️ VoxCPM WebUI: Voice Cloning & Fine-Tuning

🔄 Application Workflow

⚙️ System Requirements & Hardware

💻 Software Dependencies

🔌 Hardware Setup (VRAM Requirements)

📊 Dataset & Audio Specifications

🎯 Clip Requirements

✨ Preprocessing Tips

💻 Hardware Requirements

Clone the repository:

🛠️ Installation & Execution (Windows)

Setup Steps

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages