CoMoSVC: Consistency Model Based Singing Voice Conversion

A consistency model based Singing Voice Conversion system is composed, which is inspired by CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model.

This is an implemention of the paper CoMoSVC.

Improvements

The subjective evaluations are illustrated through the table below.

Environment

We have tested the code and it runs successfully on Python 3.8, so you can set up your Conda environment using the following command:

conda create -n Your_Conda_Environment_Name python=3.8

Then after activating your conda environment, you can install the required packages under it by:

pip install -r requirements.txt

Download the Checkpoints

1. m4singer_hifigan

You should first download m4singer_hifigan and then unzip the zip file by

unzip m4singer_hifigan.zip

The checkpoints of the vocoder will be in the m4singer_hifigan directory

2. ContentVec

You should download the checkpoint ContentVec and the put it in the Content directory to extract the content feature.

3. m4singer_pe

You should download the pitch_extractor checkpoint of the m4singer_pe and then unzip the zip file by

unzip m4singer_pe.zip

Dataset Preparation

You should first create the folders by

mkdir dataset_raw
mkdir dataset

You can refer to different preparation methods based on your needs.

Preparation With Slicing can help you remove the silent parts and slice the audio for stable training.

0. Preparation With Slicing

Please place your original dataset in the dataset_slice directory.

The original audios can be in any waveformat which should be specified in the command line. You can designate the length of slices you want, the unit of slice_size is milliseconds. The default wavformat and slice_size is mp3 and 10000 respectively.

python preparation_slice.py -w your_wavformat -s slice_size

1. Preparation Without Slicing

You can just place the dataset in the dataset_raw directory with the following file structure:

dataset_raw
├───speaker0
│   ├───xxx1-xxx1.wav
│   ├───...
│   └───Lxx-0xx8.wav
└───speaker1
    ├───xx2-0xxx2.wav
    ├───...
    └───xxx7-xxx007.wav

Preprocessing

1. Resample to 24000Hz and mono

python preprocessing1_resample.py -n num_process

num_process is the number of processes, the default num_process is 5.

2. Split the Training and Validation Datasets, and Generate Configuration Files.

python preprocessing2_flist.py

3. Generate Features

python preprocessing3_feature.py -c your_config_file -n num_processes

Training

1. Train the Teacher Model

python train.py

The checkpoints will be saved in the logs/teacher directory

2. Train the Consistency Model

If you want to adjust the config file, you can duplicate a new config file and modify some parameters.

python train.py -t -c Your_new_configfile_path -p The_teacher_model_checkpoint_path

Inference

You should put the audios you want to convert under the raw directory firstly.

Inference by the Teacher Model

python inference_main.py -ts 50 -tm "logs/teacher/model_800000.pt" -tc "logs/teacher/config.yaml" -n "src.wav" -k 0 -s "target_singer"

-ts refers to the total number of iterative steps during inference for the teacher model

-tm refers to the teacher_model_path

-tc refers to the teacher_config_path

-n refers to the source audio

-k refers to the pitch shift, it can be positive and negative (semitone) values

-s refers to the target singer

Inference by the Consistency Model

python inference_main.py -ts 1 -cm "logs/como/model_800000.pt" -cc "logs/como/config.yaml" -n "src.wav" -k 0 -s "target_singer" -t

-ts refers to the total number of iterative steps during inference for the student model

-cm refers to the como_model_path

-cc refers to the como_config_path

-t means it is not the teacher model and you don't need to specify anything after it

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
Content		Content
configs		configs
configs_template		configs_template
dataset		dataset
dataset_slice		dataset_slice
filelists		filelists
logs		logs
vocoder		vocoder
.gitattributes		.gitattributes
ComoSVC.py		ComoSVC.py
Features.py		Features.py
LICENSE		LICENSE
README.md		README.md
Readme_CN.md		Readme_CN.md
Vocoder.py		Vocoder.py
como.py		como.py
data_loaders.py		data_loaders.py
infer_tool.py		infer_tool.py
inference_main.py		inference_main.py
mel_processing.py		mel_processing.py
meldataset.py		meldataset.py
pitch_extractor.py		pitch_extractor.py
preparation_slice.py		preparation_slice.py
preprocessing1_resample.py		preprocessing1_resample.py
preprocessing2_flist.py		preprocessing2_flist.py
preprocessing3_feature.py		preprocessing3_feature.py
requirements.txt		requirements.txt
saver.py		saver.py
slicer.py		slicer.py
solver.py		solver.py
train.py		train.py
utils.py		utils.py
wavenet.py		wavenet.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoMoSVC: Consistency Model Based Singing Voice Conversion

Improvements

Environment

Download the Checkpoints

1. m4singer_hifigan

2. ContentVec

3. m4singer_pe

Dataset Preparation

0. Preparation With Slicing

1. Preparation Without Slicing

Preprocessing

1. Resample to 24000Hz and mono

2. Split the Training and Validation Datasets, and Generate Configuration Files.

3. Generate Features

Training

1. Train the Teacher Model

2. Train the Consistency Model

Inference

Inference by the Teacher Model

Inference by the Consistency Model

About

Releases

Packages

Contributors 2

Languages

License

Grace9994/CoMoSVC

Folders and files

Latest commit

History

Repository files navigation

CoMoSVC: Consistency Model Based Singing Voice Conversion

Improvements

Environment

Download the Checkpoints

1. m4singer_hifigan

2. ContentVec

3. m4singer_pe

Dataset Preparation

0. Preparation With Slicing

1. Preparation Without Slicing

Preprocessing

1. Resample to 24000Hz and mono

2. Split the Training and Validation Datasets, and Generate Configuration Files.

3. Generate Features

Training

1. Train the Teacher Model

2. Train the Consistency Model

Inference

Inference by the Teacher Model

Inference by the Consistency Model

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages