This is a text-to-speech Gradio webui for RVC models, using edge-tts.
Requirements: Tested for Python 3.10 on Windows 11. Python 3.11 is probably not supported, so please use Python 3.10.
git clone https://github.com/Blane187/rvc-tts.git
cd rvc-tts
# Download models in root directory
curl -L -O https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/hubert_base.pt
curl -L -O https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/rmvpe.pt
# Make virtual environment
python -m venv venv
# Activate venv (for Windows)
venv\Scripts\activate
# Install PyTorch manually if you want to use NVIDIA GPU (Windows)
# See https://pytorch.org/get-started/locally/ for more details
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Install requirements
pip install -r requirements.txt
Place your RVC models in weights/
directory as follows:
weights
├── model1
│ ├── my_model1.pth
│ └── my_index_file_for_model1.index
└── model2
├── my_model2.pth
└── my_index_file_for_model2.index
...
Each model directory should contain exactly one .pth
file and at most one .index
file. Directory names are used as model names.
It seems that non-ASCII characters in path names gave faiss errors (like weights/モデル1/index.index
), so please avoid them.
# Activate venv (for Windows)
venv\Scripts\activate
python app.py
git pull
venv\Scripts\activate
pip install -r requirements.txt --upgrade
[!Troubleshooting] error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/ [end of output]
note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for fairseq Failed to build fairseq ERROR: Could not build wheels for fairseq, which is required to install pyproject.toml-based projects
Maybe fairseq needs Microsoft C++ Build Tools. Download installer and install it.
To use the CLI, run the tts-cli.py
script with the appropriate arguments.
--model_name
: The name of the model to use (required, must be one of the models in theweights
directory).--speed
: Speech speed in percentage (default: 0).--tts_text
: Input text for TTS (required).--tts_voice
: Edge-tts speaker (required, must be one of the available voices).--f0_up_key
: Transpose key (default: 0).--f0_method
: Pitch extraction method, either "rmvpe" or "crepe" (default: "rmvpe").--index_rate
: Index rate (default: 1.0).--protect
: Protect value (default: 0.33).--filter_radius
: Filter radius (default: 3).--resample_sr
: Resample sample rate (default: 0).--rms_mix_rate
: RMS mix rate (default: 0.25).
python tts-cli.py --model_name model_name_1 --speed 0 --tts_text "これは日本語テキストから音声への変換デモです。" --tts_voice "ja-JP-NanamiNeural-Female" --f0_up_key 0 --f0_method rmvpe --index_rate 1 --protect 0.33 --filter_radius 3 --resample_sr 0 --rms_mix_rate 0.25