Usage of Refactor Branch

This is a cleaner version of DiffSinger, which provides:

fewer code: scripts unused or obsolete in the DiffSinger are removed;
better readability: many important functions are annotated (however, we assume the reader already knows how the neural networks work);
abstract classes: the bass classes are filtered out into the "basics/" folder and are annotated. Other classes directly inherent from the base classes.
re-organized project structure: pipelines are seperated into preparation, preprocessing, augmentation, training, inference and deployment.
main command-line entries are collected into the "scripts/" folder.

Progress since we forked into this repository

TBD

Getting Started

[ 中文教程 | Chinese Tutorial ]

Installation

Environments and dependencies

# Install PyTorch manually (1.8.2 LTS recommended)
# See instructions at https://pytorch.org/get-started/locally/
# Below is an example for CUDA 11.1
pip3 install torch==1.8.2 torchvision==0.9.2 torchaudio==0.8.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu111

# Install other requirements
pip install -r requirements.txt

Pretrained models

(Required) Get the pretrained vocoder from the DiffSinger Community Vocoders Project and unzip it into checkpoints/ folder, or train a ultra-lightweight DDSP vocoder first by yourself, then configure it according to the relevant instructions.
Get the acoustic model from releases or elsewhere and unzip into the checkpoints/ folder.

Building your own dataset

This pipeline will guide you from installing dependencies to formatting your recordings and generating the final configuration file.

Preprocessing

The following is only an example for opencpop dataset.

export PYTHONPATH=.
CUDA_VISIBLE_DEVICES=0 python scripts/binarize.py --config configs/acoustic.yaml

Training

The following is only an example for opencpop dataset.

CUDA_VISIBLE_DEVICES=0 python scripts/train.py --config configs/acoustic.yaml --exp_name $MY_DS_EXP_NAME --reset

Inference

Infer from *.ds file

python scripts/infer.py path/to/your.ds --exp $MY_DS_EXP_NAME

See more supported arguments with python scripts/infer.py -h. See examples of *.ds files in the samples/ folder.

Deployment

Export model to ONNX format

Please see this documentation before you run the following command:

python scripts/export.py acoustic --exp $MY_DS_EXP_NAME

See more supported arguments with scripts/export.py acoustic --help.

Use DiffSinger via OpenUTAU editor

OpenUTAU, an open-sourced SVS editor with modern GUI, has unofficial temporary support for DiffSinger. See OpenUTAU for DiffSinger for more details.

Algorithms, principles and advanced features

See the original paper, the docs/ folder and releases for more details.

License

This forked DiffSinger is licensed under Apache 2.0 License.

Below is the README inherited from the original repository.

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism

| Interactive🤗 TTS | Interactive🤗 SVS

This repository is the official PyTorch implementation of our AAAI-2022 paper, in which we propose DiffSinger (for Singing-Voice-Synthesis) and DiffSpeech (for Text-to-Speech).

DiffSinger/DiffSpeech at training	DiffSinger/DiffSpeech at inference

🎉 🎉 🎉 Updates:

Sep.11, 2022: 🔌 DiffSinger-PN. Add plug-in PNDM, ICLR 2022 in our laboratory, to accelerate DiffSinger freely.
Jul.27, 2022: Update documents for SVS. Add easy inference A & B; Add Interactive SVS running on HuggingFace🤗 SVS.
Mar.2, 2022: MIDI-B-version.
Mar.1, 2022: NeuralSVB, for singing voice beautifying, has been released.
Feb.13, 2022: NATSpeech, the improved code framework, which contains the implementations of DiffSpeech and our NeurIPS-2021 work PortaSpeech has been released.
Jan.29, 2022: support MIDI-A-version SVS.
Jan.13, 2022: support SVS, release PopCS dataset.
Dec.19, 2021: support TTS. HuggingFace🤗 TTS

🚀 News:

Feb.24, 2022: Our new work, NeuralSVB was accepted by ACL-2022 . Demo Page.
Dec.01, 2021: DiffSinger was accepted by AAAI-2022.
Sep.29, 2021: Our recent work PortaSpeech: Portable and High-Quality Generative Text-to-Speech was accepted by NeurIPS-2021 .
May.06, 2021: We submitted DiffSinger to Arxiv .

Environments

conda create -n your_env_name python=3.8
source activate your_env_name 
pip install -r requirements_2080.txt   (GPU 2080Ti, CUDA 10.2)
or pip install -r requirements_3090.txt   (GPU 3090, CUDA 11.4)

Documents

Tensorboard

tensorboard --logdir_spec exp_name

Audio Demos

Old audio samples can be found in our demo page. Audio samples generated by this repository are listed here:

TTS audio samples

Speech samples (test set of LJSpeech) can be found in demos_1213.

SVS audio samples

Singing samples (test set of PopCS) can be found in demos_0112.

Citation

@article{liu2021diffsinger,
  title={Diffsinger: Singing voice synthesis via shallow diffusion mechanism},
  author={Liu, Jinglin and Li, Chengxi and Ren, Yi and Chen, Feiyang and Liu, Peng and Zhao, Zhou},
  journal={arXiv preprint arXiv:2105.02446},
  volume={2},
  year={2021}}

Acknowledgements

Our codes are based on the following repos:

Also thanks Keon Lee for fast implementation of our work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Usage of Refactor Branch

Progress since we forked into this repository

Getting Started

Installation

Environments and dependencies

Pretrained models

Building your own dataset

Preprocessing

Training

Inference

Infer from *.ds file

Deployment

Export model to ONNX format

Use DiffSinger via OpenUTAU editor

Algorithms, principles and advanced features

License

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism

Environments

Documents

Tensorboard

Audio Demos

TTS audio samples

SVS audio samples

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 720 Commits
augmentation		augmentation
basics		basics
checkpoints		checkpoints
configs		configs
data		data
deployment		deployment
dictionaries		dictionaries
docs		docs
inference		inference
modules		modules
preparation		preparation
preprocessing		preprocessing
samples		samples
scripts		scripts
training		training
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

Diff-SVC-develop-team/Diff-SVC

Folders and files

Latest commit

History

Repository files navigation

Usage of Refactor Branch

Progress since we forked into this repository

Getting Started

Installation

Environments and dependencies

Pretrained models

Building your own dataset

Preprocessing

Training

Inference

Infer from *.ds file

Deployment

Export model to ONNX format

Use DiffSinger via OpenUTAU editor

Algorithms, principles and advanced features

License

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism

Environments

Documents

Tensorboard

Audio Demos

TTS audio samples

SVS audio samples

Citation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages