- transformer-parallel模式,详见openvpi#191 ,可以提高onnx的推理效率
- lynxnet,参考https://github.com/CNChTu/Diffusion-SVC/blob/v2.0_dev/diffusion/naive_v2/naive_v2_diff.py
- 相比Diffusion-SVC更新了SwiGLU,近期也会同步更新
- 使用lynxnet可以直接套用
configs/templates/config_acoustic_lynxnet.yaml
- 不建议在声学模型(acoustic model)以外的地方使用lynxnet
- 为了方便与openvpi/DiffSinger同步,并没有对新组件参数传递进行兼容,如需改动请前往
modules/commons/common_layers.py
和modules/backbone/naive_v2_diff.py
- 可能需要改动的部分:
modules/commons/common_layers.py
→class EncSALayer
中tf_enc_mode
(设定到series
恢复transformer原始设定)modules/backbone/naive_v2_diff.py
→class NaiveV2Diff
中conv_model_activation
(目前提供了ReLU
、SiLU
、PReLU
三种设定,其中PReLU
风格较为接近原repo中Wavenet,ReLU
较虚,SiLU
较为实,可以根据想要的风格进行调节)use_norm
(遇到训练极为不稳定的情况下,设定为True
)GLU_type
(设定为GLU
为lynxnet的原始设定)
——原repo↓↓↓↓↓↓
This is a refactored and enhanced version of DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism based on the original paper and implementation, which provides:
- Cleaner code structure: useless and redundant files are removed and the others are re-organized.
- Better sound quality: the sampling rate of synthesized audio are adapted to 44.1 kHz instead of the original 24 kHz.
- Higher fidelity: improved acoustic models and diffusion sampling acceleration algorithms are integrated.
- More controllability: introduced variance models and parameters for prediction and control of pitch, energy, breathiness, etc.
- Production compatibility: functionalities are designed to match the requirements of production deployment and the SVS communities.
Overview | Variance Model | Acoustic Model |
---|---|---|
![]() |
![]() |
![]() |
- Installation & basic usages: See Getting Started
- Dataset creation pipelines & tools: See MakeDiffSinger
- Best practices & tutorials: See Best Practices
- Editing configurations: See Configuration Schemas
- Deployment & production: OpenUTAU for DiffSinger, DiffScope (under development)
- Communication groups: QQ Group (907879266), Discord server
- Progress since we forked into this repository: See Releases
- Roadmap for future releases: See Project Board
- Thoughts, proposals & ideas: See Discussions
TBD
TBD
- Paper: DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
- Implementation: MoonInTheRiver/DiffSinger
- Denoising Diffusion Probabilistic Models (DDPM): paper, implementation
- DDIM for diffusion sampling acceleration
- PNDM for diffusion sampling acceleration
- DPM-Solver++ for diffusion sampling acceleration
- UniPC for diffusion sampling acceleration
- Rectified Flow (RF): paper, implementation
- HiFi-GAN and NSF for waveform reconstruction
- pc-ddsp for waveform reconstruction
- RMVPE and yxlllc's fork for pitch extraction
Any organization or individual is prohibited from using any functionalities included in this repository to generate someone's speech without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.
This forked DiffSinger repository is licensed under the Apache 2.0 License.