Stella VC Based on Soft-VC and VITS

Update

Colab demo is available!
3/9 Corrected a signficant error!
30/8 Start training model of A Certain Magical Index!
30/8 Multi-speaker training is available!

Introduction

Inspired by Rcell, I replaced the word embedding of TextEncoder in VITS with the output of the ContentEncoder used in Soft-VC to achieve any-to-one voice conversion with non-parallel data. Of course, any-to-many voice converison is also doable!

If you are interested in the performance of Soft-VC, you may refer to this demo. I've trained a aoustic model for 3 days with about 2000 audio clips.

How to use

Train

Prepare dataset

Audio should be wav file, with mono channel and a sampling rate of 22050 Hz.

Your dataset should be like:

└───wavs
    ├───dev
    │   ├───LJ001-0001.wav
    │   ├───...
    │   └───LJ050-0278.wav
    └───train
        ├───LJ002-0332.wav
        ├───...
        └───LJ047-0007.wav

Extract speech units

Utilize the content encoder to extract speech units in the audio.

For more information, refer to this repo.

cd hubert
python3 encode.py soft path/to/wavs/directory path/to/soft/directory --extension .wav

Then you need to generate filelists for both your training and validation files. It's recommended that you prepare your filelists beforehand!

Your filelists should look like:

Single speaker:

path/to/wav|path/to/unit
...

Multi-speaker:

path/to/wav|id|path/to/unit
...

Train Sovits

Single speaker:

python train.py -c configs/config.json -m model_name

Multi-speaker:

python train_ms.py -c configs/config.json -m model_name

You may also refer to train.ipynb

Inference

Please refer to inference.ipynb

Contact

QQ: 2235306122

BILIBILI: Francis-Komizu

Ackowledgement

Special thanks to Rcell for giving me both inspiration and advice!

References

基于VITS和SoftVC实现任意对一VoiceConversion

Soft-VC

vits

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
configs		configs
filelists		filelists
monotonic_align		monotonic_align
LICENSE		LICENSE
README.md		README.md
attentions.py		attentions.py
commons.py		commons.py
data_utils.py		data_utils.py
inference.ipynb		inference.ipynb
losses.py		losses.py
mel_processing.py		mel_processing.py
models.py		models.py
modules.py		modules.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
train.ipynb		train.ipynb
train.py		train.py
train_ms.py		train_ms.py
transforms.py		transforms.py
utils.py		utils.py

License

CjangCjengh/hubert-vits

Folders and files

Latest commit

History

Repository files navigation

Stella VC Based on Soft-VC and VITS

Contents

Update

Introduction

How to use

Train

Prepare dataset

Extract speech units

Train Sovits

Inference

Contact

Ackowledgement

References

About

Resources

License

Stars

Watchers

Forks

Languages