Unofficial implementation of Voicebox.
Core codes from lucidrains/voicebox-pytorch. I do not use voicebox-pytorch pypi release, instead put it in this repository just for convenient.
I did not trained duration model. It's on the TODO list.
see demo
LJSpeech:
Original Text: Field agents supplement those on the detail, particularly when the President is traveling.
Edited Text: Field agents supplement those on the detail, particularly when the Prime Minister is traveling.
AIShell3:
Original Text: 夺得队史第五座 中超冠军
Edited Text: 夺得队史第五座 英超冠军
Note: aishell3's edited.wav is not good enough, because vocoder i used need more steps to converge.
see LJSpeech
First, install dependencies
# clone project
git clone https://github.com/chenht2010/Voicebox.git
# install dependeces
pip install lightning[extra] torch torchaudio tgt vocos torchdiffeq torchode einops beartype naturalspeech2-pytorch audiolm-pytorchNext, navigate to examples, check README and run it.
- [] try other universal vocoder
- [] try other alignment tools
- [] train duration model
@article{YourName,
title={Your Title},
author={Your team},
journal={Location},
year={Year}
}